Fiveable

๐ŸŽณIntro to Econometrics Unit 9 Review

QR code for Intro to Econometrics practice questions

9.1 Endogeneity

๐ŸŽณIntro to Econometrics
Unit 9 Review

9.1 Endogeneity

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸŽณIntro to Econometrics
Unit & Topic Study Guides

Endogeneity is a critical issue in econometrics that can lead to biased and inconsistent estimates. It occurs when explanatory variables are correlated with the error term, violating a key assumption of ordinary least squares (OLS) regression.

This topic explores the sources of endogeneity, including omitted variable bias, measurement error, and simultaneity. It also covers methods for detecting and addressing endogeneity, such as instrumental variables and fixed effects estimation, along with their limitations and challenges.

Sources of endogeneity

  • Endogeneity arises when the explanatory variable is correlated with the error term in a regression model, violating the assumption of exogeneity required for unbiased and consistent OLS estimates
  • Endogeneity can lead to biased and inconsistent estimates of the causal effect of the explanatory variable on the dependent variable, making it difficult to draw valid inferences about the relationship between the variables
  • Three main sources of endogeneity in econometric models include omitted variable bias, measurement error, and simultaneity bias

Omitted variable bias

  • Occurs when a relevant variable that is correlated with both the explanatory variable and the dependent variable is omitted from the regression model
  • The omitted variable becomes part of the error term, causing the explanatory variable to be correlated with the error term and leading to biased estimates
  • Examples of omitted variables include ability in wage regressions (correlated with education and wages) and advertising expenditure in demand estimation (correlated with price and quantity demanded)

Measurement error

  • Arises when the explanatory variable is measured with error, causing the observed values to differ from the true values
  • Measurement error in the explanatory variable leads to attenuation bias, where the estimated coefficient is biased towards zero
  • Examples of measurement error include self-reported income (subject to recall bias and social desirability bias) and proxy variables used to measure unobservable concepts like ability or motivation

Simultaneity bias

  • Occurs when the explanatory variable is jointly determined with the dependent variable, creating a bidirectional causal relationship
  • Simultaneity bias arises from reverse causality, where the dependent variable also affects the explanatory variable
  • Examples of simultaneity bias include the relationship between price and quantity in supply and demand models (price affects quantity demanded, but quantity demanded also affects price) and the link between crime rates and police presence (higher crime rates lead to increased police presence, but increased police presence may also deter crime)

Consequences of endogeneity

Biased OLS estimates

  • Endogeneity leads to biased OLS estimates, where the estimated coefficients systematically deviate from the true population parameters
  • The direction and magnitude of the bias depend on the nature of the endogeneity and the correlation between the explanatory variable and the error term
  • Biased estimates can lead to incorrect conclusions about the causal effect of the explanatory variable on the dependent variable

Inconsistent OLS estimates

  • Endogeneity also results in inconsistent OLS estimates, where the estimated coefficients do not converge to the true population parameters as the sample size increases
  • Inconsistent estimates do not provide reliable information about the true relationship between the variables, even with large sample sizes
  • The presence of endogeneity violates the consistency assumption of OLS, making the estimates unreliable for inference and policy-making

Misleading inference

  • Endogeneity can lead to misleading inference about the statistical significance and magnitude of the estimated coefficients
  • Biased and inconsistent estimates may result in incorrect conclusions about the presence, direction, and strength of the causal relationship between the variables
  • Misleading inference can have serious consequences for policy decisions and resource allocation based on the flawed estimates

Detecting endogeneity

Theoretical considerations

  • Identifying potential sources of endogeneity based on economic theory and knowledge of the research context
  • Considering whether there are omitted variables, measurement errors, or simultaneous relationships that could lead to endogeneity in the model
  • Using theoretical arguments to justify the presence or absence of endogeneity in the specific research setting

Hausman specification test

  • A statistical test that compares the OLS estimates with alternative estimates that are consistent under the presence of endogeneity (e.g., IV estimates)
  • The null hypothesis is that the explanatory variable is exogenous, and the OLS estimates are consistent and efficient
  • Rejecting the null hypothesis suggests the presence of endogeneity and the need for alternative estimation methods

Durbin-Wu-Hausman test

  • An alternative version of the Hausman specification test that uses the residuals from the first-stage regression of the endogenous variable on the instruments as an additional regressor in the second stage
  • The null hypothesis is that the explanatory variable is exogenous, and the coefficient on the first-stage residuals is zero
  • Rejecting the null hypothesis indicates the presence of endogeneity and the need for alternative estimation methods

Addressing endogeneity

Instrumental variables (IV) approach

  • A method that uses one or more instrumental variables to isolate the exogenous variation in the endogenous explanatory variable
  • An instrumental variable is a variable that is correlated with the endogenous explanatory variable but uncorrelated with the error term
  • The IV approach estimates the causal effect of the explanatory variable on the dependent variable by using the exogenous variation in the instrumental variable

Two-stage least squares (2SLS)

  • A common estimation method for implementing the IV approach
  • In the first stage, the endogenous explanatory variable is regressed on the instrumental variables and other exogenous variables to obtain the predicted values
  • In the second stage, the dependent variable is regressed on the predicted values of the endogenous explanatory variable from the first stage and other exogenous variables
  • The 2SLS estimates are consistent and unbiased in the presence of endogeneity, provided that the instrumental variables are valid

Fixed effects estimation

  • A method that controls for unobserved time-invariant factors that may be correlated with the explanatory variable and the dependent variable
  • Fixed effects estimation uses within-group variation (e.g., within individuals, firms, or states) to estimate the causal effect, eliminating the bias from time-invariant omitted variables
  • Examples include individual fixed effects in panel data models and state fixed effects in cross-sectional models

Difference-in-differences (DID)

  • A method that estimates the causal effect of a treatment by comparing the change in outcomes for the treatment group with the change in outcomes for a control group
  • DID controls for time-invariant unobserved factors and common time trends that may be correlated with the treatment and the outcome
  • The key assumption is that the treatment and control groups would have followed parallel trends in the absence of the treatment (parallel trends assumption)

Regression discontinuity design (RDD)

  • A method that estimates the causal effect of a treatment by comparing observations just above and below a cutoff point that determines treatment assignment
  • RDD exploits the discontinuity in treatment assignment at the cutoff point, assuming that observations near the cutoff are similar in unobserved characteristics
  • The key assumption is that the potential outcomes are continuous at the cutoff point (continuity assumption)

Instrumental variables

Relevance condition

  • An instrumental variable must be correlated with the endogenous explanatory variable
  • The relevance condition ensures that the instrument provides sufficient exogenous variation in the explanatory variable to identify the causal effect
  • The strength of the correlation between the instrument and the endogenous explanatory variable determines the strength of the instrument

Exclusion restriction

  • An instrumental variable must be uncorrelated with the error term in the structural equation
  • The exclusion restriction implies that the instrument affects the dependent variable only through its effect on the endogenous explanatory variable
  • Violating the exclusion restriction leads to invalid instruments and biased IV estimates

Strength of instruments

  • The strength of an instrument refers to the magnitude of its correlation with the endogenous explanatory variable
  • Weak instruments are those that have a low correlation with the endogenous explanatory variable, leading to imprecise and potentially biased IV estimates
  • The strength of instruments can be assessed using the first-stage F-statistic, with a common rule of thumb being an F-statistic greater than 10 for a single endogenous regressor

Weak instruments problem

  • Weak instruments can lead to biased and inconsistent IV estimates, especially in small samples
  • Weak instruments also result in larger standard errors and wider confidence intervals, reducing the precision of the estimates
  • The weak instruments problem can be addressed by using stronger instruments, increasing the sample size, or employing alternative estimation methods (e.g., limited information maximum likelihood, LIML)

Evaluating IV estimates

First-stage F-statistic

  • A measure of the strength of the instrumental variables in the first stage of the 2SLS estimation
  • The first-stage F-statistic tests the joint significance of the excluded instruments in the first-stage regression
  • A large F-statistic (greater than 10 for a single endogenous regressor) indicates strong instruments, while a small F-statistic suggests weak instruments

Sargan-Hansen overidentification test

  • A test for the validity of the instrumental variables when there are more instruments than endogenous regressors (overidentified model)
  • The null hypothesis is that all instruments are valid, i.e., uncorrelated with the error term and correctly excluded from the structural equation
  • Rejecting the null hypothesis suggests that at least one of the instruments is invalid and the IV estimates may be biased

Comparison with OLS estimates

  • Comparing the IV estimates with the OLS estimates can provide insights into the presence and direction of endogeneity bias
  • If the IV and OLS estimates are similar, it suggests that endogeneity may not be a significant problem in the model
  • If the IV and OLS estimates differ substantially, it indicates the presence of endogeneity bias and the need for IV estimation

Limitations of IV approach

Finding valid instruments

  • The main challenge in implementing the IV approach is finding valid instruments that satisfy both the relevance condition and the exclusion restriction
  • In many research settings, it can be difficult to identify variables that are correlated with the endogenous explanatory variable but uncorrelated with the error term
  • Using invalid or weak instruments can lead to biased and inconsistent IV estimates, undermining the purpose of the IV approach

Local average treatment effect (LATE)

  • IV estimates identify the local average treatment effect (LATE) for the subpopulation of compliers, i.e., those who respond to changes in the instrument
  • The LATE may differ from the average treatment effect (ATE) for the entire population, limiting the generalizability of the IV estimates
  • The interpretation of the LATE depends on the specific instrument used and the compliers' characteristics, which may not be representative of the population of interest

External validity concerns

  • IV estimates may have limited external validity, as they are specific to the context, population, and instruments used in the study
  • The causal effect identified by the IV approach may not generalize to other settings, populations, or time periods
  • Assessing the external validity of IV estimates requires careful consideration of the similarities and differences between the study context and the target context for generalization