🎳Intro to Econometrics Unit 9 Review

9.1 Endogeneity

🎳Intro to Econometrics
Unit 9 Review

9.1 Endogeneity

Written by the Fiveable Content Team • Last updated September 2025

🎳Intro to Econometrics

Unit & Topic Study Guides

9.1 Endogeneity

9.2 Instrumental variables

9.3 Two-stage least squares (2SLS)

9.4 Validity of instruments

9.5 Weak instruments

Endogeneity is a critical issue in econometrics that can lead to biased and inconsistent estimates. It occurs when explanatory variables are correlated with the error term, violating a key assumption of ordinary least squares (OLS) regression.

This topic explores the sources of endogeneity, including omitted variable bias, measurement error, and simultaneity. It also covers methods for detecting and addressing endogeneity, such as instrumental variables and fixed effects estimation, along with their limitations and challenges.

Sources of endogeneity

Endogeneity arises when the explanatory variable is correlated with the error term in a regression model, violating the assumption of exogeneity required for unbiased and consistent OLS estimates
Endogeneity can lead to biased and inconsistent estimates of the causal effect of the explanatory variable on the dependent variable, making it difficult to draw valid inferences about the relationship between the variables
Three main sources of endogeneity in econometric models include omitted variable bias, measurement error, and simultaneity bias

Omitted variable bias

Occurs when a relevant variable that is correlated with both the explanatory variable and the dependent variable is omitted from the regression model
The omitted variable becomes part of the error term, causing the explanatory variable to be correlated with the error term and leading to biased estimates
Examples of omitted variables include ability in wage regressions (correlated with education and wages) and advertising expenditure in demand estimation (correlated with price and quantity demanded)

Measurement error

Arises when the explanatory variable is measured with error, causing the observed values to differ from the true values
Measurement error in the explanatory variable leads to attenuation bias, where the estimated coefficient is biased towards zero
Examples of measurement error include self-reported income (subject to recall bias and social desirability bias) and proxy variables used to measure unobservable concepts like ability or motivation

Simultaneity bias

Occurs when the explanatory variable is jointly determined with the dependent variable, creating a bidirectional causal relationship
Simultaneity bias arises from reverse causality, where the dependent variable also affects the explanatory variable
Examples of simultaneity bias include the relationship between price and quantity in supply and demand models (price affects quantity demanded, but quantity demanded also affects price) and the link between crime rates and police presence (higher crime rates lead to increased police presence, but increased police presence may also deter crime)

Consequences of endogeneity

Biased OLS estimates

Endogeneity leads to biased OLS estimates, where the estimated coefficients systematically deviate from the true population parameters
The direction and magnitude of the bias depend on the nature of the endogeneity and the correlation between the explanatory variable and the error term
Biased estimates can lead to incorrect conclusions about the causal effect of the explanatory variable on the dependent variable

Inconsistent OLS estimates

Endogeneity also results in inconsistent OLS estimates, where the estimated coefficients do not converge to the true population parameters as the sample size increases
Inconsistent estimates do not provide reliable information about the true relationship between the variables, even with large sample sizes
The presence of endogeneity violates the consistency assumption of OLS, making the estimates unreliable for inference and policy-making

Misleading inference

Endogeneity can lead to misleading inference about the statistical significance and magnitude of the estimated coefficients
Biased and inconsistent estimates may result in incorrect conclusions about the presence, direction, and strength of the causal relationship between the variables
Misleading inference can have serious consequences for policy decisions and resource allocation based on the flawed estimates

Detecting endogeneity

Theoretical considerations

Identifying potential sources of endogeneity based on economic theory and knowledge of the research context
Considering whether there are omitted variables, measurement errors, or simultaneous relationships that could lead to endogeneity in the model
Using theoretical arguments to justify the presence or absence of endogeneity in the specific research setting

Hausman specification test

A statistical test that compares the OLS estimates with alternative estimates that are consistent under the presence of endogeneity (e.g., IV estimates)
The null hypothesis is that the explanatory variable is exogenous, and the OLS estimates are consistent and efficient
Rejecting the null hypothesis suggests the presence of endogeneity and the need for alternative estimation methods

Durbin-Wu-Hausman test

An alternative version of the Hausman specification test that uses the residuals from the first-stage regression of the endogenous variable on the instruments as an additional regressor in the second stage
The null hypothesis is that the explanatory variable is exogenous, and the coefficient on the first-stage residuals is zero
Rejecting the null hypothesis indicates the presence of endogeneity and the need for alternative estimation methods

Addressing endogeneity

Instrumental variables (IV) approach

A method that uses one or more instrumental variables to isolate the exogenous variation in the endogenous explanatory variable
An instrumental variable is a variable that is correlated with the endogenous explanatory variable but uncorrelated with the error term
The IV approach estimates the causal effect of the explanatory variable on the dependent variable by using the exogenous variation in the instrumental variable

Two-stage least squares (2SLS)

A common estimation method for implementing the IV approach
In the first stage, the endogenous explanatory variable is regressed on the instrumental variables and other exogenous variables to obtain the predicted values
In the second stage, the dependent variable is regressed on the predicted values of the endogenous explanatory variable from the first stage and other exogenous variables
The 2SLS estimates are consistent and unbiased in the presence of endogeneity, provided that the instrumental variables are valid

Fixed effects estimation

A method that controls for unobserved time-invariant factors that may be correlated with the explanatory variable and the dependent variable
Fixed effects estimation uses within-group variation (e.g., within individuals, firms, or states) to estimate the causal effect, eliminating the bias from time-invariant omitted variables
Examples include individual fixed effects in panel data models and state fixed effects in cross-sectional models

Difference-in-differences (DID)

A method that estimates the causal effect of a treatment by comparing the change in outcomes for the treatment group with the change in outcomes for a control group
DID controls for time-invariant unobserved factors and common time trends that may be correlated with the treatment and the outcome
The key assumption is that the treatment and control groups would have followed parallel trends in the absence of the treatment (parallel trends assumption)

Regression discontinuity design (RDD)

A method that estimates the causal effect of a treatment by comparing observations just above and below a cutoff point that determines treatment assignment
RDD exploits the discontinuity in treatment assignment at the cutoff point, assuming that observations near the cutoff are similar in unobserved characteristics
The key assumption is that the potential outcomes are continuous at the cutoff point (continuity assumption)

Instrumental variables

Relevance condition

An instrumental variable must be correlated with the endogenous explanatory variable
The relevance condition ensures that the instrument provides sufficient exogenous variation in the explanatory variable to identify the causal effect
The strength of the correlation between the instrument and the endogenous explanatory variable determines the strength of the instrument

Exclusion restriction

An instrumental variable must be uncorrelated with the error term in the structural equation
The exclusion restriction implies that the instrument affects the dependent variable only through its effect on the endogenous explanatory variable
Violating the exclusion restriction leads to invalid instruments and biased IV estimates

Strength of instruments

The strength of an instrument refers to the magnitude of its correlation with the endogenous explanatory variable
Weak instruments are those that have a low correlation with the endogenous explanatory variable, leading to imprecise and potentially biased IV estimates
The strength of instruments can be assessed using the first-stage F-statistic, with a common rule of thumb being an F-statistic greater than 10 for a single endogenous regressor

Weak instruments problem

Weak instruments can lead to biased and inconsistent IV estimates, especially in small samples
Weak instruments also result in larger standard errors and wider confidence intervals, reducing the precision of the estimates
The weak instruments problem can be addressed by using stronger instruments, increasing the sample size, or employing alternative estimation methods (e.g., limited information maximum likelihood, LIML)

Evaluating IV estimates

First-stage F-statistic

A measure of the strength of the instrumental variables in the first stage of the 2SLS estimation
The first-stage F-statistic tests the joint significance of the excluded instruments in the first-stage regression
A large F-statistic (greater than 10 for a single endogenous regressor) indicates strong instruments, while a small F-statistic suggests weak instruments

Sargan-Hansen overidentification test

A test for the validity of the instrumental variables when there are more instruments than endogenous regressors (overidentified model)
The null hypothesis is that all instruments are valid, i.e., uncorrelated with the error term and correctly excluded from the structural equation
Rejecting the null hypothesis suggests that at least one of the instruments is invalid and the IV estimates may be biased

Comparison with OLS estimates

Comparing the IV estimates with the OLS estimates can provide insights into the presence and direction of endogeneity bias
If the IV and OLS estimates are similar, it suggests that endogeneity may not be a significant problem in the model
If the IV and OLS estimates differ substantially, it indicates the presence of endogeneity bias and the need for IV estimation

Limitations of IV approach

Finding valid instruments

The main challenge in implementing the IV approach is finding valid instruments that satisfy both the relevance condition and the exclusion restriction
In many research settings, it can be difficult to identify variables that are correlated with the endogenous explanatory variable but uncorrelated with the error term
Using invalid or weak instruments can lead to biased and inconsistent IV estimates, undermining the purpose of the IV approach

Local average treatment effect (LATE)

IV estimates identify the local average treatment effect (LATE) for the subpopulation of compliers, i.e., those who respond to changes in the instrument
The LATE may differ from the average treatment effect (ATE) for the entire population, limiting the generalizability of the IV estimates
The interpretation of the LATE depends on the specific instrument used and the compliers' characteristics, which may not be representative of the population of interest

External validity concerns

IV estimates may have limited external validity, as they are specific to the context, population, and instruments used in the study
The causal effect identified by the IV approach may not generalize to other settings, populations, or time periods
Assessing the external validity of IV estimates requires careful consideration of the similarities and differences between the study context and the target context for generalization

🎳Intro to Econometrics Unit 9 Review

9.1 Endogeneity

🎳Intro to Econometrics Unit 9 Review

9.1 Endogeneity

Unit & Topic Study Guides

Sources of endogeneity

Omitted variable bias

Measurement error

Simultaneity bias

Consequences of endogeneity

Biased OLS estimates

Inconsistent OLS estimates

Misleading inference

Detecting endogeneity

Theoretical considerations

Hausman specification test

Durbin-Wu-Hausman test

Addressing endogeneity

Instrumental variables (IV) approach

Two-stage least squares (2SLS)

Fixed effects estimation

Difference-in-differences (DID)

Regression discontinuity design (RDD)

Instrumental variables

Relevance condition

Exclusion restriction

Strength of instruments

Weak instruments problem

Evaluating IV estimates

First-stage F-statistic

Sargan-Hansen overidentification test

Comparison with OLS estimates

Limitations of IV approach

Finding valid instruments

Local average treatment effect (LATE)

External validity concerns

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🎳Intro to Econometrics
Unit 9 Review