👩‍💻Foundations of Data Science Unit 7 Review

7.3 T-tests, ANOVA, and Chi-square Tests

👩‍💻Foundations of Data Science
Unit 7 Review

7.3 T-tests, ANOVA, and Chi-square Tests

Written by the Fiveable Content Team • Last updated September 2025

👩‍💻Foundations of Data Science

Unit & Topic Study Guides

7.1 Point and Interval Estimation

7.2 Hypothesis Testing Fundamentals

7.3 T-tests, ANOVA, and Chi-square Tests

7.4 Non-parametric Tests

Comparing means is crucial in data science for understanding differences between groups or conditions. T-tests, ANOVA, and chi-square tests offer powerful tools to analyze various data types and experimental designs.

These statistical methods help researchers make informed decisions about their hypotheses. By calculating test statistics and interpreting p-values, we can determine if observed differences are statistically significant or likely due to chance.

Statistical Tests for Comparing Means

T-tests for mean comparisons

One-sample t-test compares sample mean to known population mean
- Null hypothesis assumes sample mean equals population mean
- Test statistic: $t = \frac{\bar{x} - \mu}{s / \sqrt{n}}$ measures deviation from expected value
Two-sample t-test evaluates differences between independent groups
- Assumes equal variances (pooled) or unequal variances (Welch's)
- Null hypothesis posits equal population means
- Test statistic: $t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_p^2(\frac{1}{n_1} + \frac{1}{n_2})}}$ quantifies group mean differences
Paired t-test analyzes dependent samples or before-after studies
- Examines differences between paired observations
- Null hypothesis states mean difference between pairs is zero
- Test statistic: $t = \frac{\bar{d}}{s_d / \sqrt{n}}$ assesses paired differences
Degrees of freedom vary by test type
- One-sample and paired use n - 1
- Two-sample uses n1 + n2 - 2 (equal variances) or Welch–Satterthwaite equation (unequal variances)
p-value interpretation guides decision-making
- Reject null hypothesis when p-value falls below significance level α (0.05)

ANOVA for multiple group comparisons

One-way ANOVA compares means across three or more independent groups
- Single categorical independent variable influences continuous dependent variable
- Null hypothesis assumes all group means are equal
- Requires normality, homogeneity of variances, and independence assumptions
Two-way ANOVA examines effects of two categorical independent variables
- Analyzes main effects and interaction effects simultaneously
- Null hypotheses test for absence of main effects and interaction effect
ANOVA calculations involve several components
1. Calculate Sum of Squares (SS) Total, Between, and Within
2. Determine degrees of freedom (df) for each source
3. Compute Mean Square (MS) by dividing SS by df
4. Derive F-statistic as ratio of MS(Between) to MS(Within)
Post-hoc tests refine analysis after significant ANOVA results
- Tukey's HSD identifies specific group differences
- Bonferroni correction adjusts for multiple comparisons

Interpretation of ANOVA results

ANOVA table presents key information for analysis
- Source of variation identifies factor effects
- Degrees of freedom (df) indicate sample size and complexity
- Sum of Squares (SS) quantify variability
- Mean Square (MS) represent average variability
- F-statistic compares between-group to within-group variance
- p-value determines statistical significance
F-test interpretation guides hypothesis testing
- Compare F-statistic to critical F-value from distribution table
- Reject null hypothesis when F exceeds F-critical or p-value falls below α
Effect size measures quantify practical significance
- Eta-squared (η²) estimates proportion of variance explained
- Partial eta-squared (ηp²) accounts for other factors in design
Interaction effects in two-way ANOVA require careful interpretation
- Analyze main effects when interaction is not significant
- Examine simple effects when significant interaction exists

Chi-square tests for categorical data

Chi-square test for goodness of fit evaluates categorical variable distributions
- Compares observed frequencies to expected theoretical frequencies
- Null hypothesis assumes observed distribution matches expected distribution
- Test statistic: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$ measures discrepancy
Chi-square test for independence assesses relationships between categorical variables
- Analyzes contingency tables for association
- Null hypothesis posits no association between variables
- Test statistic: $\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$ quantifies deviations
Degrees of freedom vary by test type
- Goodness of fit uses k - 1 (k = number of categories)
- Independence uses (r - 1)(c - 1) (r = rows, c = columns)
Expected frequencies calculation differs between tests
- Goodness of fit bases on hypothesized distribution
- Independence uses $E_{ij} = \frac{row_i total \times column_j total}{grand total}$
Chi-square tests assume independent observations and sufficient sample size
- Expected frequencies should exceed 5 in each cell
Effect size measures assess strength of association
- Cramer's V applies to larger contingency tables
- Phi coefficient suits 2x2 tables specifically

👩‍💻Foundations of Data Science Unit 7 Review

7.3 T-tests, ANOVA, and Chi-square Tests

👩‍💻Foundations of Data Science
Unit 7 Review

7.3 T-tests, ANOVA, and Chi-square Tests

Unit & Topic Study Guides

Statistical Tests for Comparing Means

T-tests for mean comparisons

ANOVA for multiple group comparisons

Interpretation of ANOVA results

Chi-square tests for categorical data

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes