🎳Intro to Econometrics Unit 11 Review

11.2 Logit and probit models

🎳Intro to Econometrics
Unit 11 Review

11.2 Logit and probit models

Written by the Fiveable Content Team • Last updated September 2025

🎳Intro to Econometrics

Unit & Topic Study Guides

11.1 Binary choice models

11.2 Logit and probit models

11.3 Multinomial models

11.4 Ordered choice models

11.5 Count data models

Logit and probit models are essential tools for analyzing binary outcomes in econometrics. These models estimate the probability of an event occurring based on predictor variables, using different underlying probability distributions but often yielding similar results.

Understanding logit and probit models is crucial for interpreting binary data in economics. They allow us to estimate the impact of various factors on the likelihood of an event, providing valuable insights for decision-making and policy analysis in real-world scenarios.

Logit and probit models overview

Logit and probit models are used to model binary outcome variables in econometrics
These models estimate the probability of an event occurring based on a set of predictor variables
Logit and probit models differ in their underlying probability distributions but often produce similar results

Logit model

Logistic regression equation

The logistic regression equation is given by $\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \ldots + \beta_k X_k$
$p$ represents the probability of the event occurring
$\beta_0$ is the intercept term and $\beta_1, \ldots, \beta_k$ are the coefficients for the predictor variables $X_1, \ldots, X_k$

Logit model interpretation

Coefficients in a logit model represent the change in the log odds of the event occurring for a one-unit increase in the predictor variable, holding other variables constant
A positive coefficient indicates that an increase in the predictor variable is associated with an increase in the probability of the event occurring
A negative coefficient suggests that an increase in the predictor variable is associated with a decrease in the probability of the event occurring

Odds ratios in logit models

Odds ratios can be calculated by exponentiating the coefficients in a logit model
An odds ratio greater than 1 indicates that the event is more likely to occur as the predictor variable increases
An odds ratio less than 1 suggests that the event is less likely to occur as the predictor variable increases
Odds ratios provide a more intuitive interpretation compared to the raw coefficients

Marginal effects for logit

Marginal effects measure the change in the probability of the event occurring for a one-unit increase in a predictor variable, holding other variables at their means
Marginal effects are calculated as $\frac{\partial p}{\partial X_k} = p(1-p)\beta_k$
Marginal effects provide a more practical interpretation of the impact of predictor variables on the probability of the event occurring

Probit model

Normal CDF for probit

The probit model assumes that the probability of the event occurring follows a standard normal cumulative distribution function (CDF)
The probit model equation is given by $\Phi^{-1}(p) = \beta_0 + \beta_1 X_1 + \ldots + \beta_k X_k$, where $\Phi^{-1}$ is the inverse of the standard normal CDF

Probit model interpretation

Coefficients in a probit model represent the change in the z-score (standard normal variable) for a one-unit increase in the predictor variable, holding other variables constant
A positive coefficient indicates that an increase in the predictor variable is associated with an increase in the probability of the event occurring
A negative coefficient suggests that an increase in the predictor variable is associated with a decrease in the probability of the event occurring

Marginal effects for probit

Marginal effects in a probit model are calculated as $\frac{\partial p}{\partial X_k} = \phi(\beta_0 + \beta_1 X_1 + \ldots + \beta_k X_k)\beta_k$, where $\phi$ is the standard normal probability density function (PDF)
Marginal effects in a probit model have a similar interpretation to those in a logit model
They measure the change in the probability of the event occurring for a one-unit increase in a predictor variable, holding other variables at their means

Logit vs probit models

Logit and probit similarities

Both logit and probit models are used to model binary outcome variables
They estimate the probability of an event occurring based on a set of predictor variables
Logit and probit models often produce similar results and lead to the same conclusions

Logit and probit differences

The logit model assumes a logistic distribution for the probability of the event occurring, while the probit model assumes a standard normal distribution
Coefficients in a logit model are interpreted in terms of log odds, while coefficients in a probit model are interpreted in terms of z-scores
The logistic distribution has slightly heavier tails compared to the standard normal distribution

Maximum likelihood estimation

Log likelihood function

Logit and probit models are typically estimated using maximum likelihood estimation (MLE)
The log likelihood function for a logit or probit model is given by $\ln L = \sum_{i=1}^n \left[y_i \ln p_i + (1-y_i) \ln (1-p_i)\right]$, where $y_i$ is the observed binary outcome and $p_i$ is the predicted probability for observation $i$

MLE for logit and probit

MLE finds the values of the coefficients that maximize the log likelihood function
The optimization process involves iterative algorithms such as Newton-Raphson or Fisher scoring
Standard errors for the coefficients are obtained from the inverse of the Hessian matrix evaluated at the maximum likelihood estimates

Model evaluation

Goodness of fit measures

Pseudo R-squared measures, such as McFadden's R-squared or Cox and Snell's R-squared, provide an indication of the model's fit
These measures compare the log likelihood of the fitted model to the log likelihood of a null model with only an intercept term
Higher values of pseudo R-squared suggest a better model fit

Classification accuracy

Classification tables can be used to assess the accuracy of the model's predictions
The table compares the observed binary outcomes to the predicted outcomes based on a chosen probability threshold (e.g., 0.5)
Metrics such as sensitivity, specificity, and overall accuracy can be calculated from the classification table

ROC curves and AUC

Receiver Operating Characteristic (ROC) curves plot the true positive rate (sensitivity) against the false positive rate (1-specificity) for various probability thresholds
The Area Under the ROC Curve (AUC) provides a summary measure of the model's discriminatory power
An AUC of 0.5 indicates a model with no discriminatory power, while an AUC of 1 indicates perfect discrimination

Assumptions and diagnostics

Binary outcome variable

The dependent variable in a logit or probit model must be binary, taking on only two possible values (e.g., 0 and 1)
If the outcome variable has more than two categories, alternative models such as multinomial logit or ordered probit should be considered

Independence of observations

Observations in the dataset should be independent of each other
Violation of this assumption can lead to biased standard errors and incorrect inferences
Clustered standard errors can be used to account for dependence within groups or clusters

No perfect multicollinearity

Predictor variables should not be perfectly correlated with each other
Perfect multicollinearity can lead to unstable coefficient estimates and inflated standard errors
Variance Inflation Factors (VIFs) can be used to detect multicollinearity

Large sample size

Logit and probit models rely on large sample asymptotic properties for valid inferences
A general rule of thumb is to have at least 10 events per predictor variable in the model
Small sample sizes can lead to biased coefficient estimates and unreliable standard errors

Interpreting coefficients

Sign and significance

The sign of a coefficient indicates the direction of the relationship between the predictor variable and the probability of the event occurring
A positive coefficient suggests a positive relationship, while a negative coefficient suggests a negative relationship
Statistical significance of coefficients can be assessed using Wald tests or likelihood ratio tests

Comparing coefficient magnitudes

Coefficients in logit and probit models are not directly comparable across variables due to different scales
Standardized coefficients or marginal effects can be used to compare the relative importance of predictor variables
Standardized coefficients are obtained by standardizing the predictor variables before fitting the model

Reporting results

Tables of coefficients

Results from logit and probit models are typically reported in tables
The table should include the estimated coefficients, standard errors, z-values (or t-values), and p-values for each predictor variable
Odds ratios or marginal effects can also be reported to facilitate interpretation

Plots of predicted probabilities

Plots of predicted probabilities can be used to visualize the relationship between predictor variables and the probability of the event occurring
These plots can be created by varying one predictor variable while holding other variables at their means or specific values
Confidence intervals can be added to the plots to show the uncertainty around the predicted probabilities

🎳Intro to Econometrics Unit 11 Review

11.2 Logit and probit models

🎳Intro to Econometrics Unit 11 Review

11.2 Logit and probit models

Unit & Topic Study Guides

Logit and probit models overview

Logit model

Logistic regression equation

Logit model interpretation

Odds ratios in logit models

Marginal effects for logit

Probit model

Normal CDF for probit

Probit model interpretation

Marginal effects for probit

Logit vs probit models

Logit and probit similarities

Logit and probit differences

Maximum likelihood estimation

Log likelihood function

MLE for logit and probit

Model evaluation

Goodness of fit measures

Classification accuracy

ROC curves and AUC

Assumptions and diagnostics

Binary outcome variable

Independence of observations

No perfect multicollinearity

Large sample size

Interpreting coefficients

Sign and significance

Comparing coefficient magnitudes

Reporting results

Tables of coefficients

Plots of predicted probabilities

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🎳Intro to Econometrics
Unit 11 Review