Fiveable

๐ŸŽฒIntro to Statistics Unit 3 Review

QR code for Intro to Statistics practice questions

3.4 Contingency Tables

๐ŸŽฒIntro to Statistics
Unit 3 Review

3.4 Contingency Tables

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸŽฒIntro to Statistics
Unit & Topic Study Guides

Contingency tables are powerful tools for analyzing relationships between categorical variables. They organize data into rows and columns, allowing us to see patterns and calculate probabilities easily.

Using these tables, we can determine joint, marginal, and conditional probabilities. This helps us understand how variables are related and whether they're independent, providing valuable insights into categorical data relationships.

Contingency Tables

Construction of contingency tables

  • Contingency tables summarize categorical data in a tabular format
    • Rows represent levels of one variable (gender, also known as the row variable)
    • Columns represent levels of another variable (preferred color, also known as the column variable)
  • Each cell contains the frequency or count of observations for that specific combination of row and column categories (number of females who prefer blue)
  • Contingency tables clearly display the relationship between two categorical variables (gender and color preference)
  • Marginal totals are sums of each row and column found in the table margins
    • Represent total frequency for each category of a single variable (total number of males, total number who prefer red)

Probability calculations from contingency tables

  • Calculate probabilities using frequencies in a contingency table
  • Probability of an event is frequency of the event divided by total number of observations
    • $P(A) = \frac{n(A)}{n}$, where $n(A)$ is frequency of event $A$ and $n$ is total observations (probability of preferring green)
  • Joint probability is probability of two events occurring simultaneously
    • Calculated by dividing frequency in a specific cell by total observations
    • $P(A \cap B) = \frac{n(A \cap B)}{n}$, where $n(A \cap B)$ is frequency of events $A$ and $B$ together (probability of being female and preferring blue)
  • Marginal probability is probability of a single event, regardless of the other variable
    • Calculated by dividing marginal total of a row or column by total observations
    • $P(A) = \frac{n(A)}{n}$, where $n(A)$ is marginal total for event $A$ (probability of being male)

Conditional probabilities in contingency tables

  • Conditional probability is probability of an event occurring given another event has already occurred
    • Denoted as $P(A|B)$, read as "probability of $A$ given $B$" (probability of preferring red given the person is female)
  • To calculate conditional probability from a contingency table:
    1. Identify the row or column representing the given event (the condition)
    2. Use frequencies in that row or column to calculate probabilities
    3. $P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{n(A \cap B)}{n(B)}$, where $n(A \cap B)$ is frequency of events $A$ and $B$ together and $n(B)$ is marginal total for event $B$
  • Conditional probabilities help identify patterns or relationships between variables
    • Comparing $P(A|B)$ to $P(A)$ reveals if event $B$ affects likelihood of event $A$ (seeing if gender affects color preference)

Association and Independence

  • Association refers to the relationship between two categorical variables in a contingency table
  • Independence occurs when the probability of one event is not affected by the occurrence of another event
    • In a contingency table, variables are independent if the conditional probability equals the marginal probability for all categories
  • Observed frequencies in each cell are used to determine if there is an association between variables
  • Degrees of freedom for a contingency table are calculated as (number of rows - 1) ร— (number of columns - 1)
    • Used in statistical tests to assess the significance of associations between variables