Fiveable

๐Ÿ“ˆIntro to Probability for Business Unit 13 Review

QR code for Intro to Probability for Business practice questions

13.3 Contingency Tables Analysis

๐Ÿ“ˆIntro to Probability for Business
Unit 13 Review

13.3 Contingency Tables Analysis

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ˆIntro to Probability for Business
Unit & Topic Study Guides

Contingency tables are powerful tools for analyzing relationships between categorical variables. They organize data into rows and columns, allowing us to calculate probabilities and assess connections between different categories. Understanding these tables is crucial for making sense of complex data.

Assessing independence in contingency tables helps determine if variables are related or not. By comparing joint and marginal probabilities, we can spot patterns that suggest associations. The chi-square test provides a statistical method to confirm these relationships, guiding our interpretation of the data.

Contingency Tables

Structure of contingency tables

  • Contingency tables summarize data to analyze relationships between two categorical variables
    • Rows represent categories of one variable (gender)
    • Columns represent categories of the other variable (major)
  • Cells contain frequencies or counts of observations for each combination of row and column categories (number of male engineering students)
  • Tables include row totals, column totals, and the overall total of observations

Calculations in contingency tables

  • Calculate row totals by summing frequencies across each row
  • Calculate column totals by summing frequencies down each column
  • Marginal probabilities are probabilities of categories in a single variable, ignoring the other variable
    • Calculate row marginal probabilities by dividing row totals by the overall total
    • Calculate column marginal probabilities by dividing column totals by the overall total
  • Joint probabilities are probabilities of specific combinations of categories from both variables
    • Calculate by dividing cell frequencies by the overall total

Interpretation of probabilities

  • Marginal probabilities represent the proportion or likelihood of an observation in a specific category of one variable, regardless of the other variable
    • Marginal probability of a student being female, regardless of major
  • Joint probabilities represent the proportion or likelihood of an observation in a specific combination of categories from both variables
    • Joint probability of a student being female and majoring in business

Assessing Independence

Patterns in categorical variables

  • Compare joint probabilities to the product of corresponding marginal probabilities
    • Joint probabilities close to the product of marginal probabilities suggest independence
    • Joint probabilities differing significantly from the product of marginal probabilities suggest association or dependency
  • Look for patterns or trends in the distribution of frequencies or probabilities across cells
    • Uneven distribution may indicate an association between variables (more females in nursing, more males in engineering)

Chi-square test for independence

  • Determines if there is a statistically significant association between two categorical variables
  • Compares observed frequencies in the contingency table to expected frequencies assuming independence
  • Calculate the chi-square test statistic:
    • $\chi^2 = \sum \frac{(O - E)^2}{E}$
    • $O$ = observed frequency, $E$ = expected frequency for each cell
  • Calculate expected frequency for each cell:
    • $E = \frac{(\text{row total}) \times (\text{column total})}{\text{overall total}}$
  • Test statistic follows a chi-square distribution with $(r-1)(c-1)$ degrees of freedom
    • $r$ = number of rows, $c$ = number of columns
  • If the calculated chi-square test statistic exceeds the critical value at the chosen significance level, reject the null hypothesis of independence and conclude a significant association between variables