🪚Public Policy Analysis Unit 11 Review

11.2 Regression Analysis and Modeling

🪚Public Policy Analysis
Unit 11 Review

11.2 Regression Analysis and Modeling

Written by the Fiveable Content Team • Last updated September 2025

🪚Public Policy Analysis

Unit & Topic Study Guides

11.1 Descriptive and Inferential Statistics

11.2 Regression Analysis and Modeling

11.3 Survey Design and Analysis

Regression analysis is a powerful tool in policy analysis, helping us understand relationships between variables. It allows us to predict outcomes and measure the impact of different factors on policy issues. From simple linear models to complex logistic regressions, these techniques offer valuable insights.

Understanding variables, coefficients, and model evaluation metrics is crucial for interpreting regression results. We'll look at common issues like multicollinearity and heteroscedasticity, and learn how to address them to ensure our analyses are accurate and reliable.

Regression Models

Linear Regression

Simple linear regression models the relationship between two variables using a straight line
Assumes a linear relationship exists between the dependent variable and a single independent variable
Equation takes the form $y = \beta_0 + \beta_1x + \varepsilon$, where $y$ is the dependent variable, $x$ is the independent variable, $\beta_0$ is the y-intercept, $\beta_1$ is the slope, and $\varepsilon$ is the error term
Can predict values of the dependent variable based on the independent variable (housing prices based on square footage)

Multiple Regression

Extends linear regression to include multiple independent variables
Models the relationship between the dependent variable and two or more independent variables
Equation takes the form $y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n + \varepsilon$, where $y$ is the dependent variable, $x_1, x_2, ..., x_n$ are the independent variables, $\beta_0$ is the y-intercept, $\beta_1, \beta_2, ..., \beta_n$ are the coefficients, and $\varepsilon$ is the error term
Allows for more complex relationships and interactions between variables to be captured (predicting salary based on education level, years of experience, and job title)

Logistic Regression

Used when the dependent variable is binary or categorical (pass/fail, yes/no)
Models the probability of an event occurring based on one or more independent variables
Employs a logistic function to transform the output to a probability between 0 and 1
Equation takes the form $\ln(\frac{p}{1-p}) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$, where $p$ is the probability of the event occurring, $x_1, x_2, ..., x_n$ are the independent variables, and $\beta_0, \beta_1, \beta_2, ..., \beta_n$ are the coefficients
Can be used for classification problems (predicting whether a customer will churn based on their demographics and behavior)

Variables and Coefficients

Dependent and Independent Variables

Dependent variable is the outcome or response variable that is being predicted or explained by the model (house price, test score)
Independent variables, also known as predictor or explanatory variables, are the factors used to predict or explain the dependent variable (square footage, hours studied)
Choice of dependent and independent variables depends on the research question and the hypothesized relationships between variables

Coefficients

Coefficients represent the change in the dependent variable associated with a one-unit change in the corresponding independent variable, holding all other variables constant
Interpretation depends on the scale and units of the variables involved
In linear regression, the coefficient of the independent variable is the slope of the line (a coefficient of 1.5 means that for every one-unit increase in the independent variable, the dependent variable increases by 1.5 units on average)
In logistic regression, coefficients are interpreted in terms of odds ratios (a coefficient of 0.7 means that a one-unit increase in the independent variable is associated with a 70% increase in the odds of the event occurring)

Model Evaluation Metrics

R-squared

R-squared, or the coefficient of determination, measures the proportion of variance in the dependent variable that is explained by the independent variables in the model
Ranges from 0 to 1, with higher values indicating a better fit
Calculated as the ratio of the explained variance to the total variance, $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$, where $SS_{res}$ is the sum of squared residuals and $SS_{tot}$ is the total sum of squares
Adjusted R-squared accounts for the number of independent variables in the model and penalizes the addition of irrelevant variables

Residuals

Residuals are the differences between the observed values of the dependent variable and the predicted values from the regression model
Calculated as $e_i = y_i - \hat{y}_i$, where $e_i$ is the residual for observation $i$, $y_i$ is the observed value, and $\hat{y}_i$ is the predicted value
Used to assess the assumptions of the model (normality, homoscedasticity, linearity) and identify outliers or influential observations
Plotting residuals against predicted values or independent variables can reveal patterns or deviations from assumptions (a funnel-shaped plot may indicate heteroscedasticity)

Common Issues

Multicollinearity

Multicollinearity occurs when two or more independent variables in a multiple regression model are highly correlated with each other
Can lead to unstable and unreliable coefficient estimates, as it becomes difficult to separate the individual effects of the correlated variables
Detected through correlation matrices, variance inflation factors (VIFs), or condition indices
Addressed by removing one of the correlated variables, combining them into a single variable, or using regularization techniques like ridge regression or lasso regression

Heteroscedasticity

Heteroscedasticity refers to the situation where the variance of the residuals is not constant across the range of the independent variables
Violates the assumption of homoscedasticity in linear regression, which states that the variance of the residuals should be constant
Can lead to biased standard errors and incorrect inferences about the significance of the coefficients
Detected through visual inspection of residual plots (residuals vs. fitted values) or formal tests like the Breusch-Pagan test or White's test
Addressed by using robust standard errors, weighted least squares, or transforming the variables to stabilize the variance (taking the logarithm of the dependent variable)

🪚Public Policy Analysis Unit 11 Review

11.2 Regression Analysis and Modeling

🪚Public Policy Analysis Unit 11 Review

11.2 Regression Analysis and Modeling

Unit & Topic Study Guides

Regression Models

Linear Regression

Multiple Regression

Logistic Regression

Variables and Coefficients

Dependent and Independent Variables

Coefficients

Model Evaluation Metrics

R-squared

Residuals

Common Issues

Multicollinearity

Heteroscedasticity

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🪚Public Policy Analysis
Unit 11 Review