Instrumental variables are a powerful tool in econometrics for estimating causal relationships when endogeneity is present. This technique uses an external variable to isolate exogenous variation in the explanatory variable, allowing researchers to obtain consistent estimates of causal effects.
The instrumental variables approach relies on finding a variable that meets specific conditions: relevance, exclusion restriction, and exogeneity. By using methods like two-stage least squares, economists can leverage natural experiments, lagged variables, or geographical variations to estimate causal effects in various fields.
Instrumental variables overview
- Instrumental variables (IV) is an econometric technique used to estimate causal relationships when there are endogenous explanatory variables in a regression model
- IV approach aims to address the problem of endogeneity and obtain consistent estimates of the causal effect of an explanatory variable on the dependent variable
Endogeneity problem
- Endogeneity occurs when an explanatory variable is correlated with the error term in a regression model
- Causes of endogeneity include omitted variable bias, measurement error, and simultaneity
- Endogeneity leads to biased and inconsistent estimates of the causal effect of the explanatory variable on the dependent variable
- Example: Estimating the effect of education on earnings, where unobserved ability is correlated with both education and earnings (omitted variable bias)
Causal inference challenges
- Establishing causal relationships is crucial for policy analysis and decision-making in economics
- Randomized controlled trials (RCTs) are the gold standard for causal inference but are not always feasible or ethical
- Observational data often suffers from endogeneity issues, making it difficult to identify causal effects
- Instrumental variables provide a way to estimate causal effects using observational data when certain assumptions are met
Instrumental variables approach
- The instrumental variables approach relies on finding a variable (the instrument) that is correlated with the endogenous explanatory variable but uncorrelated with the error term
- The instrument affects the dependent variable only through its effect on the endogenous explanatory variable
- The IV approach estimates the causal effect of the explanatory variable on the dependent variable by using the variation in the explanatory variable that is driven by the instrument
Relevance condition
- The relevance condition requires that the instrument is correlated with the endogenous explanatory variable
- A weak correlation between the instrument and the endogenous explanatory variable can lead to weak instrument problems and biased estimates
- The first-stage regression in the two-stage least squares (2SLS) procedure tests the relevance condition
- Example: Using the distance to the nearest college as an instrument for education, the instrument should be correlated with the individual's level of education
Exclusion restriction
- The exclusion restriction assumes that the instrument affects the dependent variable only through its effect on the endogenous explanatory variable
- The instrument should not have a direct effect on the dependent variable or be correlated with any omitted variables that affect the dependent variable
- Violations of the exclusion restriction can lead to biased estimates of the causal effect
- Example: The distance to the nearest college (instrument) should not directly affect an individual's earnings (dependent variable), except through its effect on education (endogenous explanatory variable)
Exogeneity assumption
- The exogeneity assumption requires that the instrument is uncorrelated with the error term in the regression model
- This assumption ensures that the variation in the endogenous explanatory variable captured by the instrument is exogenous and not related to unobserved factors affecting the dependent variable
- Violations of the exogeneity assumption can lead to biased estimates of the causal effect
- Example: The distance to the nearest college (instrument) should not be correlated with unobserved factors (e.g., family background) that affect both education and earnings
Types of instrumental variables
- Various types of instrumental variables can be used depending on the research question and available data
- The choice of instrument should be based on theoretical arguments and empirical evidence supporting the relevance and exclusion restriction assumptions
Natural experiments
- Natural experiments are events or policies that create exogenous variation in the endogenous explanatory variable
- These events or policies are often used as instruments because they are unlikely to be related to unobserved factors affecting the dependent variable
- Examples of natural experiments include policy changes, natural disasters, and birthplace or birth timing
- Example: Using the Vietnam War draft lottery as an instrument for military service to estimate the effect of military service on future earnings
Lagged variables
- Lagged values of the endogenous explanatory variable or other variables can sometimes be used as instruments
- The idea is that past values of a variable may be correlated with its current value but uncorrelated with the current error term
- Lagged variables are more likely to be valid instruments in settings with time-series or panel data
- Example: Using lagged advertising expenditure as an instrument for current advertising expenditure to estimate the effect of advertising on sales
Geographical variations
- Geographical variations in policies, infrastructure, or other factors can be used as instruments
- The idea is that these variations are often determined by historical or institutional factors that are exogenous to the individuals or firms being studied
- Geographical instruments are more likely to be valid in settings with cross-sectional or panel data
- Example: Using the presence of a land-grant college in a county as an instrument for individual education to estimate the effect of education on earnings
Two-stage least squares (2SLS)
- Two-stage least squares (2SLS) is a commonly used estimation procedure for instrumental variables regression
- 2SLS involves two regression stages that aim to isolate the exogenous variation in the endogenous explanatory variable and use it to estimate the causal effect on the dependent variable
First stage regression
- In the first stage regression, the endogenous explanatory variable is regressed on the instrument(s) and any other exogenous control variables
- The predicted values from this regression represent the exogenous variation in the endogenous explanatory variable that is driven by the instrument(s)
- The first stage regression tests the relevance condition and provides evidence on the strength of the instrument(s)
- Example: Regressing education on the distance to the nearest college and other control variables to obtain predicted values of education
Second stage regression
- In the second stage regression, the dependent variable is regressed on the predicted values of the endogenous explanatory variable from the first stage and any other exogenous control variables
- The coefficient on the predicted values of the endogenous explanatory variable represents the causal effect of interest
- The standard errors in the second stage regression need to be adjusted to account for the two-stage estimation procedure
- Example: Regressing earnings on the predicted values of education from the first stage and other control variables to estimate the causal effect of education on earnings
Interpreting 2SLS estimates
- The 2SLS estimates can be interpreted as the local average treatment effect (LATE) for the subpopulation of individuals who are affected by the instrument (compliers)
- The LATE may differ from the average treatment effect (ATE) for the entire population if the treatment effect is heterogeneous
- The interpretation of the 2SLS estimates depends on the validity of the instrument and the assumptions underlying the IV approach
- Example: The 2SLS estimate of the effect of education on earnings represents the average return to education for individuals whose education level was affected by their distance to the nearest college
Validity tests for instruments
- Several tests can be used to assess the validity of instruments and the assumptions underlying the IV approach
- These tests provide evidence on the relevance, exclusion restriction, and exogeneity of the instruments
Weak instruments problem
- Weak instruments are instruments that are only weakly correlated with the endogenous explanatory variable
- Weak instruments can lead to biased 2SLS estimates and unreliable inference
- The first-stage F-statistic and the Cragg-Donald Wald F-statistic can be used to test for weak instruments
- A rule of thumb is that the first-stage F-statistic should be greater than 10 for the instruments to be considered strong
- Example: Testing whether the distance to the nearest college is a strong instrument for education by examining the first-stage F-statistic
Overidentifying restrictions test
- When there are more instruments than endogenous explanatory variables (overidentified model), the overidentifying restrictions test can be used to assess the validity of the instruments
- The test checks whether the instruments are uncorrelated with the error term in the second stage regression
- Rejecting the null hypothesis of the test suggests that at least one of the instruments is not valid
- The Sargan test and the Hansen J test are commonly used overidentifying restrictions tests
- Example: Using the Sargan test to check the validity of multiple instruments (e.g., distance to the nearest college and local unemployment rate) for education
Hausman test for endogeneity
- The Hausman test can be used to test for the presence of endogeneity in the explanatory variable
- The test compares the OLS and 2SLS estimates and checks whether their difference is statistically significant
- Rejecting the null hypothesis of the test suggests that the explanatory variable is endogenous and the IV approach is necessary
- The Hausman test can help justify the use of instrumental variables in a regression analysis
- Example: Using the Hausman test to check whether education is endogenous in the earnings regression and whether the IV approach is necessary
Applications of instrumental variables
- Instrumental variables have been widely used in various fields of economics to estimate causal effects and address endogeneity issues
- Some common applications include supply and demand estimation, policy evaluation, and labor economics
Supply and demand estimation
- IV approach can be used to estimate the price elasticity of supply and demand when prices and quantities are jointly determined (simultaneity bias)
- Common instruments for price include cost shifters (e.g., input prices, weather shocks) and demand shifters (e.g., income, population)
- Example: Using weather shocks as an instrument for crop prices to estimate the price elasticity of supply for agricultural products
Policy evaluation examples
- IV approach can be used to evaluate the causal effects of policies or interventions when there are endogeneity issues (e.g., self-selection, omitted variables)
- Instruments can be based on policy changes, eligibility rules, or other exogenous factors that affect the treatment variable
- Example: Using the introduction of compulsory schooling laws as an instrument for education to estimate the causal effect of education on health outcomes
Limitations and criticisms
- IV approach relies on strong assumptions (relevance, exclusion restriction, exogeneity) that may not always hold in practice
- Finding valid instruments can be challenging and requires careful theoretical and empirical justification
- IV estimates are local average treatment effects (LATE) and may not generalize to the entire population
- IV approach can be sensitive to specification choices and the choice of instruments
- Some critics argue that IV approach does not fully address the endogeneity problem and may introduce new biases
Advanced topics in instrumental variables
- Several advanced topics in instrumental variables have been developed to address limitations and extend the applicability of the IV approach
- These topics include heterogeneous treatment effects, nonlinear models, and weak and many instruments
Heterogeneous treatment effects
- The IV approach can be extended to estimate heterogeneous treatment effects when the causal effect of the explanatory variable varies across individuals
- The marginal treatment effect (MTE) framework can be used to estimate treatment effect heterogeneity and construct policy-relevant parameters
- Example: Estimating the heterogeneous returns to education across individuals with different levels of unobserved ability using the MTE framework
Nonlinear models with instruments
- The IV approach can be adapted to estimate causal effects in nonlinear models (e.g., probit, logit, Poisson)
- Two-stage residual inclusion (2SRI) and control function approaches are commonly used for nonlinear models with endogenous explanatory variables
- Example: Using the control function approach to estimate the causal effect of health insurance on healthcare utilization in a count data model
Weak and many instruments
- Weak instruments and many instruments (relative to the sample size) can lead to biased IV estimates and unreliable inference
- Weak instrument robust inference methods (e.g., Anderson-Rubin test, conditional likelihood ratio test) have been developed to address the weak instruments problem
- Many weak instruments can be combined using the Jackknife IV estimator (JIVE) or the limited information maximum likelihood (LIML) estimator
- Example: Using the LIML estimator to estimate the returns to education when there are many weak instruments based on interactions between birth year and birth quarter