Fiveable

📊Honors Statistics Unit 11 Review

QR code for Honors Statistics practice questions

11.3 Test of Independence

📊Honors Statistics
Unit 11 Review

11.3 Test of Independence

Written by the Fiveable Content Team • Last updated September 2025
Written by the Fiveable Content Team • Last updated September 2025
📊Honors Statistics
Unit & Topic Study Guides

The test of independence helps determine if two categorical variables are related. Using contingency tables and chi-square calculations, we can analyze observed frequencies against expected values to assess independence.

This statistical method is crucial for understanding relationships in categorical data. By following a step-by-step process, we can calculate test statistics, compare them to critical values, and draw conclusions about variable dependencies.

Test of Independence

Construction of contingency tables

  • Two-way frequency table displays relationship between two categorical variables
    • Rows represent one categorical variable (gender)
    • Columns represent the other categorical variable (preferred color)
    • Each cell contains observed frequency or count of intersection between row and column variables
  • Steps to construct contingency table:
    1. Identify two categorical variables of interest
    2. Determine levels or categories for each variable
    3. Create table with rows and columns representing levels of each variable
    4. Fill in cells with observed frequencies or counts for each combination of row and column categories (25 males prefer blue)
  • Total of each row and column called marginal frequency
  • Grand total is sum of all observations in table

Calculation of chi-square test statistic

  • Test statistic for test of independence calculated using chi-square distribution
    • Chi-square distribution is right-skewed distribution with degrees of freedom equal to $(r-1)(c-1)$, where $r$ is number of rows and $c$ is number of columns in contingency table
  • To calculate test statistic:
    1. Compute expected frequency for each cell in contingency table
      • Expected frequency $E_{ij} = \frac{(\text{row }i\text{ total}) \times (\text{column }j\text{ total})}{\text{grand total}}$
    2. Calculate chi-square test statistic using formula:
      • $\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$
      • $O_{ij}$ is observed frequency in $i$-th row and $j$-th column
      • $E_{ij}$ is expected frequency in $i$-th row and $j$-th column
  • Test statistic measures discrepancy between observed and expected frequencies
    • Larger test statistic indicates greater difference between observed and expected values, suggesting dependence between variables (test statistic of 12.5 suggests strong dependence)
    • Effect size can be calculated to quantify the strength of the relationship between variables

Determination of factor independence

  • Test of independence used to determine if there is significant relationship between two categorical variables
  • Null hypothesis ($H_0$): Two categorical variables are independent
  • Alternative hypothesis ($H_1$): Two categorical variables are dependent
  • Steps to conduct test of independence:
    1. State null and alternative hypotheses
    2. Construct contingency table and calculate expected frequencies
    3. Calculate chi-square test statistic
    4. Determine degrees of freedom $(r-1)(c-1)$
    5. Choose significance level ($\alpha = 0.05$)
    6. Find critical value from chi-square distribution table using degrees of freedom and significance level
    7. Compare test statistic to critical value or calculate p-value
      • If test statistic greater than critical value or p-value less than significance level, reject null hypothesis and conclude variables are dependent (test statistic of 15.2 > critical value of 7.81, reject $H_0$)
      • If test statistic less than critical value or p-value greater than significance level, fail to reject null hypothesis and conclude insufficient evidence to suggest dependence between variables (test statistic of 3.5 < critical value of 7.81, fail to reject $H_0$)
    • Sample size affects the power of the test to detect significant relationships

Additional Analysis

  • Post-hoc analysis can be conducted to identify specific categories contributing to significant results
  • Standardized residuals can be calculated to determine which cells in the contingency table contribute most to the chi-square statistic