Key Takeaways
1. Define the Problem Before Diving into Data
Gutman and Goldmeier filter through much of the noise to break down complex data and statistical concepts we hear today into basic examples and analogies that stick.
Focus on the business problem. Before starting any data project, clearly define the problem you're trying to solve. Avoid getting caught up in the hype of new technologies or methodologies. Instead, focus on the business value and the impact of solving the problem.
Ask key questions. To ensure the problem is well-defined, ask:
- Why is this problem important?
- Who does this problem affect?
- What if we don't have the right data?
- When is the project over?
- What if we don't like the results?
Avoid methodology and deliverable focus. Be wary of projects that start with a specific technology or deliverable in mind. Instead, focus on the business problem and then determine the appropriate tools and methods.
2. Data is Encoded Information, Not Just Numbers
In demystifying these complex statistical topics, they have also created a common language that bridges the longstanding communication divide that has — until now — separated data work from business value.
Data vs. Information. Understand the difference between data and information. Data is encoded information, while information is derived knowledge. Data is the raw material, and information is the result of analysis.
Data Types. Be familiar with different data types:
- Numeric (continuous and count)
- Categorical (ordered and unordered)
- Dates
Data Collection. Understand how data is collected (observational vs. experimental) and structured (structured vs. unstructured). This will help you assess its quality and limitations.
3. Statistical Thinking Requires Questioning Everything
Statistical thinking is a different way of thinking that is part detective, skeptical, and involves alternate takes on a problem.
Embrace skepticism. Develop a critical mindset and question the data and results you encounter. Don't take numbers at face value. Be especially skeptical of claims that align with your existing beliefs.
Understand variation. Recognize that there is variation in all things. Not every peak and valley needs an explanation. Differentiate between measurement variation and random variation.
Probability and Statistics. Use probability and statistics to manage uncertainty. Understand the difference between probability (drilling down) and statistics (drilling up).
4. Argue with the Data's Origin and Representativeness
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
Data Origin Story. Always ask about the origin of the data. Who collected it? How was it collected? Is it observational or experimental? This will help you assess its reliability and potential biases.
Representativeness. Ensure the data is representative of the population you care about. Is there sampling bias? What did you do with outliers? What data am I not seeing? How did you deal with missing values?
Measurement. Can the data measure what you want it to measure? Be wary of proxy measures and indirect approximations.
5. Explore Data to Uncover Relationships and Opportunities
Gutman and Goldmeier offer practical advice for asking the right questions, challenging assumptions, and avoiding common pitfalls.
Embrace the exploratory mindset. Approach data analysis with curiosity and a willingness to iterate. Don't follow a rigid script. Be open to discovering new relationships and opportunities.
Ask guiding questions. As you explore the data, ask:
- Can the data answer the question?
- Did you discover any relationships?
- Did you find new opportunities in the data?
Use visualizations. Use histograms, box plots, bar charts, and scatter plots to explore the data and spot anomalies. Verify noteworthy correlations with visualizations.
6. Probabilities Quantify Uncertainty, Challenge Intuition
Many people’s notion of probability is so impoverished that it admits [one] of only two values: 50-50 and 99%, tossup or essentially certain.
Probability vs. Intuition. Recognize that your intuition can play tricks on you. Don't underestimate variation, especially when dealing with small numbers.
Rules of the Game. Understand the basic rules of probability:
- Probabilities range from 0 to 1.
- The sum of all possible outcomes must equal 1.
- The chance of any two events happening together cannot be greater than either event happening by itself.
Conditional Probability. Know that all probabilities are conditional. Be careful assuming independence. Don't fall for the gambler's fallacy.
7. Challenge Statistics by Understanding Inference
The most clear, concise, and practical characterization of working in corporate analytics that I’ve seen.
Statistical Inference. Understand the process of statistical inference:
- Ask a meaningful question.
- Formulate a hypothesis test.
- Establish a significance level.
- Calculate a p-value.
- Calculate confidence intervals.
- Reject or fail to reject the null hypothesis.
Key Questions. Ask these questions to challenge the statistics:
- What is the context for these statistics?
- What is the sample size?
- What are you testing?
- What is the null hypothesis?
- What is the significance level?
- How many tests are you doing?
- Can I see the confidence intervals?
- Is this practically significant?
- Are you assuming causality?
Decision Errors. Balance decision errors (false positives and false negatives).
8. Unsupervised Learning Reveals Hidden Groups
Becoming a Data Head raises the level of education and knowledge in an industry desperate for clarity in thinking.
Unsupervised Learning. Understand the goal of unsupervised learning: to discover hidden patterns and groups in datasets without predefined labels.
Dimensionality Reduction. Learn about dimensionality reduction and principal component analysis (PCA). PCA creates composite features that capture the most variance in the data.
Clustering. Understand clustering and k-means clustering. K-means groups similar observations together based on a distance metric.
9. Regression Models Explain and Predict Relationships
Gutman and Goldmeier have written a book that is as useful for applied statisticians and data scientists as it is for business leaders and technical professionals.
Supervised Learning. Understand the goal of supervised learning: to find relationships in data with inputs and known outputs.
Regression Models. Learn about linear regression and its goal: to find the line of best fit that minimizes the sum of squared errors.
Multiple Regression. Extend linear regression to multiple features. Understand the importance of coefficients and p-values.
10. Classification Models Predict Categories
THE book that business and technology leaders need to read to fully understand the potential, power, AND limitations of data science.
Classification Models. Understand the goal of classification models: to predict a categorical variable (label).
Logistic Regression. Learn about logistic regression and its ability to predict probabilities.
Decision Trees. Understand decision trees and their ability to create a flowchart of rules.
Ensemble Methods. Learn about ensemble methods (random forests and gradient boosted trees) and their ability to improve prediction accuracy.
11. Text Analytics Transforms Words into Insights
Gutman and Goldmeier filter through much of the noise to break down complex data and statistical concepts we hear today into basic examples and analogies that stick.
Text Analytics. Understand the goal of text analytics: to extract useful insights from raw text.
Bag of Words. Learn about the bag-of-words model and its limitations.
N-grams. Understand N-grams and their ability to capture context.
Word Embeddings. Learn about word embeddings and their ability to represent words as vectors.
12. Deep Learning Mimics the Brain for Complex Tasks
What is keeping data science from reaching its true potential? It is not slow algorithms, lack of data, lack of computing power, or even lack of data scientists.
Neural Networks. Understand the basic structure of neural networks: neurons, activation functions, and layers.
Deep Learning. Learn about deep learning and its ability to automate feature engineering.
Convolutional Neural Networks. Understand convolutional neural networks and their application to image analysis.
Last updated:
Review Summary
Becoming a Data Head is highly praised for its accessible introduction to data science concepts. Readers appreciate its clear explanations of complex topics, making it valuable for both beginners and experienced professionals. The book covers a wide range of subjects, from basic statistics to machine learning and AI. Many reviewers found it helpful for understanding data-driven decision-making in business contexts. While some felt it was too basic, most agreed it provides a solid foundation for anyone looking to enhance their data literacy.
Similar Books








