Facebook Pixel
Searching...
English
EnglishEnglish
EspañolSpanish
简体中文Chinese
FrançaisFrench
DeutschGerman
日本語Japanese
PortuguêsPortuguese
ItalianoItalian
한국어Korean
РусскийRussian
NederlandsDutch
العربيةArabic
PolskiPolish
हिन्दीHindi
Tiếng ViệtVietnamese
SvenskaSwedish
ΕλληνικάGreek
TürkçeTurkish
ไทยThai
ČeštinaCzech
RomânăRomanian
MagyarHungarian
УкраїнськаUkrainian
Bahasa IndonesiaIndonesian
DanskDanish
SuomiFinnish
БългарскиBulgarian
עבריתHebrew
NorskNorwegian
HrvatskiCroatian
CatalàCatalan
SlovenčinaSlovak
LietuviųLithuanian
SlovenščinaSlovenian
СрпскиSerbian
EestiEstonian
LatviešuLatvian
فارسیPersian
മലയാളംMalayalam
தமிழ்Tamil
اردوUrdu
Becoming a Data Head

Becoming a Data Head

How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
by Alex J. Gutman 2021 288 pages
4.24
100+ ratings
Listen
Listen to Summary

Key Takeaways

1. Define the Problem Before Diving into Data

Gutman and Goldmeier filter through much of the noise to break down complex data and statistical concepts we hear today into basic examples and analogies that stick.

Focus on the business problem. Before starting any data project, clearly define the problem you're trying to solve. Avoid getting caught up in the hype of new technologies or methodologies. Instead, focus on the business value and the impact of solving the problem.

Ask key questions. To ensure the problem is well-defined, ask:

  • Why is this problem important?
  • Who does this problem affect?
  • What if we don't have the right data?
  • When is the project over?
  • What if we don't like the results?

Avoid methodology and deliverable focus. Be wary of projects that start with a specific technology or deliverable in mind. Instead, focus on the business problem and then determine the appropriate tools and methods.

2. Data is Encoded Information, Not Just Numbers

In demystifying these complex statistical topics, they have also created a common language that bridges the longstanding communication divide that has — until now — separated data work from business value.

Data vs. Information. Understand the difference between data and information. Data is encoded information, while information is derived knowledge. Data is the raw material, and information is the result of analysis.

Data Types. Be familiar with different data types:

  • Numeric (continuous and count)
  • Categorical (ordered and unordered)
  • Dates

Data Collection. Understand how data is collected (observational vs. experimental) and structured (structured vs. unstructured). This will help you assess its quality and limitations.

3. Statistical Thinking Requires Questioning Everything

Statistical thinking is a different way of thinking that is part detective, skeptical, and involves alternate takes on a problem.

Embrace skepticism. Develop a critical mindset and question the data and results you encounter. Don't take numbers at face value. Be especially skeptical of claims that align with your existing beliefs.

Understand variation. Recognize that there is variation in all things. Not every peak and valley needs an explanation. Differentiate between measurement variation and random variation.

Probability and Statistics. Use probability and statistics to manage uncertainty. Understand the difference between probability (drilling down) and statistics (drilling up).

4. Argue with the Data's Origin and Representativeness

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

Data Origin Story. Always ask about the origin of the data. Who collected it? How was it collected? Is it observational or experimental? This will help you assess its reliability and potential biases.

Representativeness. Ensure the data is representative of the population you care about. Is there sampling bias? What did you do with outliers? What data am I not seeing? How did you deal with missing values?

Measurement. Can the data measure what you want it to measure? Be wary of proxy measures and indirect approximations.

5. Explore Data to Uncover Relationships and Opportunities

Gutman and Goldmeier offer practical advice for asking the right questions, challenging assumptions, and avoiding common pitfalls.

Embrace the exploratory mindset. Approach data analysis with curiosity and a willingness to iterate. Don't follow a rigid script. Be open to discovering new relationships and opportunities.

Ask guiding questions. As you explore the data, ask:

  • Can the data answer the question?
  • Did you discover any relationships?
  • Did you find new opportunities in the data?

Use visualizations. Use histograms, box plots, bar charts, and scatter plots to explore the data and spot anomalies. Verify noteworthy correlations with visualizations.

6. Probabilities Quantify Uncertainty, Challenge Intuition

Many people’s notion of probability is so impoverished that it admits [one] of only two values: 50-50 and 99%, tossup or essentially certain.

Probability vs. Intuition. Recognize that your intuition can play tricks on you. Don't underestimate variation, especially when dealing with small numbers.

Rules of the Game. Understand the basic rules of probability:

  • Probabilities range from 0 to 1.
  • The sum of all possible outcomes must equal 1.
  • The chance of any two events happening together cannot be greater than either event happening by itself.

Conditional Probability. Know that all probabilities are conditional. Be careful assuming independence. Don't fall for the gambler's fallacy.

7. Challenge Statistics by Understanding Inference

The most clear, concise, and practical characterization of working in corporate analytics that I’ve seen.

Statistical Inference. Understand the process of statistical inference:

  1. Ask a meaningful question.
  2. Formulate a hypothesis test.
  3. Establish a significance level.
  4. Calculate a p-value.
  5. Calculate confidence intervals.
  6. Reject or fail to reject the null hypothesis.

Key Questions. Ask these questions to challenge the statistics:

  • What is the context for these statistics?
  • What is the sample size?
  • What are you testing?
  • What is the null hypothesis?
  • What is the significance level?
  • How many tests are you doing?
  • Can I see the confidence intervals?
  • Is this practically significant?
  • Are you assuming causality?

Decision Errors. Balance decision errors (false positives and false negatives).

8. Unsupervised Learning Reveals Hidden Groups

Becoming a Data Head raises the level of education and knowledge in an industry desperate for clarity in thinking.

Unsupervised Learning. Understand the goal of unsupervised learning: to discover hidden patterns and groups in datasets without predefined labels.

Dimensionality Reduction. Learn about dimensionality reduction and principal component analysis (PCA). PCA creates composite features that capture the most variance in the data.

Clustering. Understand clustering and k-means clustering. K-means groups similar observations together based on a distance metric.

9. Regression Models Explain and Predict Relationships

Gutman and Goldmeier have written a book that is as useful for applied statisticians and data scientists as it is for business leaders and technical professionals.

Supervised Learning. Understand the goal of supervised learning: to find relationships in data with inputs and known outputs.

Regression Models. Learn about linear regression and its goal: to find the line of best fit that minimizes the sum of squared errors.

Multiple Regression. Extend linear regression to multiple features. Understand the importance of coefficients and p-values.

10. Classification Models Predict Categories

THE book that business and technology leaders need to read to fully understand the potential, power, AND limitations of data science.

Classification Models. Understand the goal of classification models: to predict a categorical variable (label).

Logistic Regression. Learn about logistic regression and its ability to predict probabilities.

Decision Trees. Understand decision trees and their ability to create a flowchart of rules.

Ensemble Methods. Learn about ensemble methods (random forests and gradient boosted trees) and their ability to improve prediction accuracy.

11. Text Analytics Transforms Words into Insights

Gutman and Goldmeier filter through much of the noise to break down complex data and statistical concepts we hear today into basic examples and analogies that stick.

Text Analytics. Understand the goal of text analytics: to extract useful insights from raw text.

Bag of Words. Learn about the bag-of-words model and its limitations.

N-grams. Understand N-grams and their ability to capture context.

Word Embeddings. Learn about word embeddings and their ability to represent words as vectors.

12. Deep Learning Mimics the Brain for Complex Tasks

What is keeping data science from reaching its true potential? It is not slow algorithms, lack of data, lack of computing power, or even lack of data scientists.

Neural Networks. Understand the basic structure of neural networks: neurons, activation functions, and layers.

Deep Learning. Learn about deep learning and its ability to automate feature engineering.

Convolutional Neural Networks. Understand convolutional neural networks and their application to image analysis.

Last updated:

Review Summary

4.24 out of 5
Average of 100+ ratings from Goodreads and Amazon.

Becoming a Data Head is highly praised for its accessible introduction to data science concepts. Readers appreciate its clear explanations of complex topics, making it valuable for both beginners and experienced professionals. The book covers a wide range of subjects, from basic statistics to machine learning and AI. Many reviewers found it helpful for understanding data-driven decision-making in business contexts. While some felt it was too basic, most agreed it provides a solid foundation for anyone looking to enhance their data literacy.

Your rating:

About the Author

Alex J. Gutman is a data scientist and author who co-wrote "Becoming a Data Head" with Jordan Goldmeier. The book aims to demystify data science concepts for a broad audience, including business professionals and those new to the field. Gutman's approach focuses on practical applications and real-world examples, helping readers understand how data can be used effectively in various contexts. His writing style is praised for its clarity and ability to explain complex topics in an accessible manner. Gutman's expertise in data science and his skill in communicating technical concepts to non-technical audiences are evident throughout the book.

0:00
-0:00
1x
Dan
Andrew
Michelle
Lauren
Select Speed
1.0×
+
200 words per minute
Home
Library
Get App
Create a free account to unlock:
Requests: Request new book summaries
Bookmarks: Save your favorite books
History: Revisit books later
Recommendations: Get personalized suggestions
Ratings: Rate books & see your ratings
Try Full Access for 7 Days
Listen, bookmark, and more
Compare Features Free Pro
📖 Read Summaries
All summaries are free to read in 40 languages
🎧 Listen to Summaries
Listen to unlimited summaries in 40 languages
❤️ Unlimited Bookmarks
Free users are limited to 10
📜 Unlimited History
Free users are limited to 10
Risk-Free Timeline
Today: Get Instant Access
Listen to full summaries of 73,530 books. That's 12,000+ hours of audio!
Day 4: Trial Reminder
We'll send you a notification that your trial is ending soon.
Day 7: Your subscription begins
You'll be charged on Apr 26,
cancel anytime before.
Consume 2.8x More Books
2.8x more books Listening Reading
Our users love us
100,000+ readers
"...I can 10x the number of books I can read..."
"...exceptionally accurate, engaging, and beautifully presented..."
"...better than any amazon review when I'm making a book-buying decision..."
Save 62%
Yearly
$119.88 $44.99/year
$3.75/mo
Monthly
$9.99/mo
Try Free & Unlock
7 days free, then $44.99/year. Cancel anytime.
Scanner
Find a barcode to scan

Settings
General
Widget
Appearance
Loading...
Black Friday Sale 🎉
$20 off Lifetime Access
$79.99 $59.99
Upgrade Now →