Fiveable

๐Ÿฅ–Linear Modeling Theory Unit 2 Review

QR code for Linear Modeling Theory practice questions

2.3 Measures of Model Fit: R-squared and Adjusted R-squared

๐Ÿฅ–Linear Modeling Theory
Unit 2 Review

2.3 Measures of Model Fit: R-squared and Adjusted R-squared

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿฅ–Linear Modeling Theory
Unit & Topic Study Guides

Measures of model fit help us gauge how well our linear regression model explains the data. R-squared tells us what percentage of variation in the dependent variable our model accounts for, ranging from 0 to 1.

While R-squared is useful, it has limitations. Enter adjusted R-squared, which penalizes adding unnecessary variables. This helps us avoid overfitting and compare models with different numbers of predictors more accurately.

Coefficient of determination (R-squared)

Definition and interpretation

  • R-squared is a statistical measure representing the proportion of variance in the dependent variable predictable from the independent variable(s) in a linear regression model
  • Ranges from 0 to 1, with higher values indicating a better fit of the model to the data
    • An R-squared of 1 means the model explains all the variability of the response data around its mean
  • Interprets the percentage of variation in the dependent variable explainable by the independent variable(s) in the model
  • Also known as the coefficient of determination, commonly used to assess the goodness of fit of a linear regression model
  • Formula for R-squared: $R-squared = 1 - (SSR / SST)$, where SSR is the sum of squared residuals and SST is the total sum of squares

Importance and usage

  • R-squared provides a quantitative measure of how well the linear regression model fits the observed data
  • Helps evaluate the strength of the relationship between the dependent and independent variables
  • Allows comparison of different models to determine which one better explains the variability in the data
  • Widely used in various fields (economics, social sciences, engineering) to assess the explanatory power of linear regression models

Calculating R-squared

Required components

  • To calculate R-squared, you need the sum of squared residuals (SSR) and the total sum of squares (SST) from the linear regression model
  • SSR is the sum of the squared differences between the predicted values and the actual values of the dependent variable
    • Represents the amount of variation in the dependent variable not explained by the model
  • SST is the sum of the squared differences between the actual values of the dependent variable and its mean
    • Represents the total variation in the dependent variable

Calculation methods

  • Once you have SSR and SST, use the formula $R-squared = 1 - (SSR / SST)$ to calculate R-squared
  • Alternatively, most statistical software packages (SPSS, R) and programming languages (Python) provide functions to directly compute R-squared for a given linear regression model
    • Example in R: summary(lm_model)$r.squared returns the R-squared value for the linear model lm_model
    • Example in Python with scikit-learn: from sklearn.metrics import r2_score; r2_score(y_true, y_pred) calculates R-squared given the true values (y_true) and predicted values (y_pred)

R-squared limitations vs adjusted R-squared

Limitations of R-squared

  • R-squared increases as more independent variables are added to the model, even if those variables do not have a significant impact on the dependent variable
    • This can lead to the inclusion of irrelevant variables and overfitting
  • Does not indicate whether the independent variables are statistically significant or if the model is appropriate for the data
    • Only measures the goodness of fit without considering the model's validity
  • Does not consider the number of independent variables in the model, potentially leading to overfitting if too many variables are included

Adjusted R-squared as an alternative

  • Adjusted R-squared addresses the limitations of R-squared by adjusting for the number of independent variables in the model
  • Penalizes the addition of unnecessary independent variables, providing a more reliable measure of the model's goodness of fit
  • Particularly useful when comparing models with different numbers of independent variables
    • Helps determine if adding more variables truly improves the model's explanatory power

Adjusted R-squared interpretation

Calculation and formula

  • Adjusted R-squared is calculated using the formula: $Adjusted R-squared = 1 - [(1 - R-squared) (n - 1) / (n - k - 1)]$, where n is the number of observations and k is the number of independent variables in the model
  • The adjusted R-squared value will always be less than or equal to the R-squared value
    • Decreases when the number of independent variables increases without a corresponding improvement in the model's fit

Interpretation and comparison

  • The interpretation of adjusted R-squared is similar to R-squared
    • Represents the proportion of variance in the dependent variable predictable from the independent variable(s), adjusted for the number of variables in the model
  • A higher adjusted R-squared value indicates a better fit of the model to the data, considering the number of independent variables used
  • When comparing models with different numbers of independent variables, adjusted R-squared is a more appropriate measure than R-squared
    • Helps identify the model that strikes a balance between explanatory power and parsimony (using fewer variables)