🎳Intro to Econometrics Unit 6 Review

6.4 Sample selection bias

🎳Intro to Econometrics
Unit 6 Review

6.4 Sample selection bias

Written by the Fiveable Content Team • Last updated September 2025

🎳Intro to Econometrics

Unit & Topic Study Guides

6.1 Dummy variables

6.2 Interaction terms

6.3 Chow tests

6.4 Sample selection bias

6.5 Heckman selection model

Sample selection bias is a critical issue in econometrics that can lead to skewed results and faulty conclusions. It occurs when the sample used in a study isn't representative of the population, resulting in biased estimates and reduced external validity.

Detecting and correcting for sample selection bias is crucial for accurate analysis. Methods like comparing sample characteristics to the population, using the Heckman selection model, and applying techniques such as inverse probability weighting can help address this issue.

Types of sample selection bias

Sample selection bias occurs when the sample used in a study is not representative of the population of interest, leading to biased and inconsistent estimates
Arises due to non-random selection or self-selection of individuals into or out of the sample based on unobserved factors that are correlated with both the dependent variable and the independent variables
Common types include non-response bias (individuals who refuse to participate in a survey may differ systematically from those who do participate), incidental truncation (sample is truncated based on some variable of interest, such as observing wages only for employed individuals), and self-selection bias (individuals self-select into treatment or control groups based on unobserved characteristics)

Consequences of sample selection bias

Biased parameter estimates

Sample selection bias leads to biased and inconsistent estimates of the parameters of interest in a regression model
The estimated coefficients will be biased because the sample used for estimation is not representative of the true population
The direction and magnitude of the bias depend on the nature of the selection process and the correlation between the unobserved factors affecting selection and the dependent variable
- For example, if high-ability individuals are more likely to self-select into a training program and also have higher earnings, the estimated effect of the training program on earnings will be upward biased

Incorrect inferences

Biased parameter estimates due to sample selection can lead to incorrect inferences and conclusions about the relationship between variables
Hypothesis tests and confidence intervals based on biased estimates will be invalid, potentially leading to Type I (false positive) or Type II (false negative) errors
The presence of sample selection bias can make it difficult to establish causal relationships between variables, as the observed association may be driven by unobserved factors rather than a true causal effect

Reduced external validity

Sample selection bias can limit the external validity or generalizability of the study's findings to the broader population of interest
If the sample is not representative of the population, the estimated relationships and effects may not hold for individuals outside the sample
This can be particularly problematic when the goal is to make policy recommendations or draw conclusions that are applicable to a wider population
- For instance, if a study on the effectiveness of a job training program only includes individuals who chose to participate, the results may not generalize to the population of all eligible individuals

Detecting sample selection bias

Comparing sample vs population

One way to detect sample selection bias is to compare the characteristics of the sample used in the study with those of the target population
Significant differences between the sample and population in terms of observable characteristics (such as demographics or socioeconomic status) may indicate the presence of selection bias
Statistical tests, such as t-tests or chi-square tests, can be used to assess whether the differences between the sample and population are statistically significant
- For example, if a survey on income has a significantly higher proportion of high-income respondents compared to the population, this may suggest non-response bias

Using Heckman selection model

The Heckman selection model is a statistical method designed to detect and correct for sample selection bias in regression analysis
It involves estimating two equations: a selection equation that models the probability of an individual being included in the sample, and an outcome equation that models the relationship between the dependent variable and independent variables for the selected sample
The Heckman model includes an additional term, the inverse Mills ratio, in the outcome equation to account for the potential correlation between the unobserved factors affecting selection and the dependent variable
- A statistically significant coefficient on the inverse Mills ratio indicates the presence of sample selection bias

Correcting for sample selection bias

Heckman two-step procedure

The Heckman two-step procedure is a method for correcting sample selection bias in regression analysis
In the first step, a probit model is estimated to predict the probability of an individual being included in the sample based on observed characteristics (the selection equation)
In the second step, the inverse Mills ratio is computed from the predicted probabilities and included as an additional regressor in the outcome equation, which is then estimated using ordinary least squares (OLS)
- The coefficient on the inverse Mills ratio captures the effect of sample selection bias, and the remaining coefficients provide consistent estimates of the parameters of interest

Maximum likelihood estimation

Maximum likelihood estimation (MLE) is an alternative approach to correcting for sample selection bias
MLE involves specifying a joint distribution for the selection and outcome equations and estimating the parameters of both equations simultaneously by maximizing the likelihood function
MLE is more efficient than the two-step procedure and provides consistent estimates of the parameters, but it requires stronger distributional assumptions and can be more computationally intensive
- MLE is often used when the selection and outcome equations are believed to have a specific joint distribution, such as a bivariate normal distribution

Inverse probability weighting

Inverse probability weighting (IPW) is a method for correcting sample selection bias by reweighting the observed sample to make it representative of the population
IPW involves estimating the probability of each individual being included in the sample (the propensity score) based on observed characteristics and then weighting each observation by the inverse of its propensity score
Observations with a low probability of being selected receive higher weights, while observations with a high probability of being selected receive lower weights
- The reweighted sample mimics the distribution of the population, allowing for consistent estimation of the parameters of interest
IPW is particularly useful when the selection process is based on observable characteristics and does not require specifying a joint distribution for the selection and outcome equations

Examples of sample selection bias

Non-response bias in surveys

Non-response bias occurs when individuals who refuse to participate in a survey differ systematically from those who do participate
For example, in a survey on income, high-income individuals may be less likely to respond due to privacy concerns or time constraints
If non-respondents have systematically different incomes than respondents, the estimated average income from the survey will be biased
- To correct for non-response bias, researchers can use methods such as weighting the sample based on observable characteristics or imputing missing values based on the characteristics of respondents

Incidental truncation in labor economics

Incidental truncation arises when the sample is truncated based on some variable of interest, such as observing wages only for employed individuals
In labor economics, incidental truncation can occur when studying the determinants of wages because wages are only observed for individuals who are employed
If the factors that affect an individual's decision to work are correlated with the factors that affect their wage, the estimated wage equation will suffer from sample selection bias
- The Heckman selection model can be used to correct for incidental truncation by modeling the employment decision and the wage equation simultaneously

Self-selection bias in treatment effects

Self-selection bias occurs when individuals self-select into treatment or control groups based on unobserved characteristics that are correlated with the outcome of interest
For example, in a study on the effectiveness of a job training program, individuals who choose to participate may have higher motivation or ability than those who do not participate
If these unobserved characteristics are positively correlated with employment outcomes, the estimated effect of the training program will be upward biased
- To correct for self-selection bias, researchers can use methods such as instrumental variables (using a variable that affects participation but not the outcome) or propensity score matching (matching treated and control individuals based on observable characteristics)

🎳Intro to Econometrics Unit 6 Review

6.4 Sample selection bias

🎳Intro to Econometrics Unit 6 Review

6.4 Sample selection bias

Unit & Topic Study Guides

Types of sample selection bias

Consequences of sample selection bias

Biased parameter estimates

Incorrect inferences

Reduced external validity

Detecting sample selection bias

Comparing sample vs population

Using Heckman selection model

Correcting for sample selection bias

Heckman two-step procedure

Maximum likelihood estimation

Inverse probability weighting

Examples of sample selection bias

Non-response bias in surveys

Incidental truncation in labor economics

Self-selection bias in treatment effects

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🎳Intro to Econometrics
Unit 6 Review