Fiveable

๐ŸŽฒData, Inference, and Decisions Unit 7 Review

QR code for Data, Inference, and Decisions practice questions

7.4 Coefficient of determination and model evaluation

๐ŸŽฒData, Inference, and Decisions
Unit 7 Review

7.4 Coefficient of determination and model evaluation

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸŽฒData, Inference, and Decisions
Unit & Topic Study Guides

Linear regression models help us understand relationships between variables. The coefficient of determination (Rยฒ) tells us how well our model fits the data. It measures the proportion of variance explained by the model.

While Rยฒ is useful, it has limitations. We'll explore adjusted Rยฒ and residual analysis to get a more complete picture of model fit. These tools help us evaluate and improve our regression models.

Coefficient of Determination for Model Fit

Understanding R-squared

  • Coefficient of determination (Rยฒ) measures proportion of variance in dependent variable predictable from independent variable(s) in regression model
  • Rยฒ ranges from 0 to 1
    • 0 indicates model explains no variability in data
    • 1 indicates perfect prediction
  • Calculate Rยฒ using formula R2=1โˆ’Sumย ofย Squaredย ResidualsTotalย Sumย ofย SquaresRยฒ = 1 - \frac{\text{Sum of Squared Residuals}}{\text{Total Sum of Squares}}
  • Interpret Rยฒ as percentage of variation in response variable explained by model
  • Higher Rยฒ value suggests better model fit to data
    • Does not imply causation or model appropriateness
  • Use Rยฒ to compare explanatory power of different regression models on same dataset

Applications and Considerations

  • Rยฒ increases as more predictors added to multiple regression model
    • May not reflect actual improvement in predictive power
  • Utilize Rยฒ for various purposes
    • Assessing model performance (weather prediction models)
    • Comparing different statistical models (economic forecasting)
    • Evaluating goodness of fit in scientific research (drug efficacy studies)
  • Consider Rยฒ alongside other model evaluation metrics
    • Mean squared error (MSE)
    • Root mean squared error (RMSE)
    • Akaike Information Criterion (AIC)

Limitations of R-squared

Potential Misinterpretations

  • Rยฒ artificially inflates by adding more variables to model
    • Does not guarantee improved predictive power
  • Rยฒ fails to indicate
    • Coefficient bias
    • Correct regression model usage
    • Cause-and-effect relationships between variables
  • Rยฒ exhibits sensitivity to outliers and influential points
    • Can significantly affect value
  • Rยฒ limitations in various scenarios
    • Overfitting in small sample sizes
    • Misleading in non-linear relationships
    • Inadequate for comparing models with different dependent variables

Adjusted R-squared

  • Adjusted Rยฒ modifies Rยฒ to account for number of predictors in model
  • Calculate adjusted Rยฒ using formula Adjustedย Rยฒ=1โˆ’(1โˆ’R2)(nโˆ’1)nโˆ’kโˆ’1\text{Adjusted Rยฒ} = 1 - \frac{(1 - Rยฒ)(n - 1)}{n - k - 1}
    • n represents sample size
    • k represents number of predictors
  • Adjusted Rยฒ penalizes addition of unnecessary predictors
    • Provides more accurate measure of model fit in multiple regression
  • Adjusted Rยฒ can decrease when irrelevant predictors added to model
    • Useful for comparing models with different numbers of predictors
  • Examples of adjusted Rยฒ applications
    • Selecting optimal number of variables in stepwise regression
    • Evaluating feature importance in machine learning models

Residual Analysis for Model Fit

Residual Plot Interpretation

  • Residual analysis examines differences between observed and predicted values (residuals) to assess model adequacy
  • Residual plot displays residuals on y-axis against fitted values or predictor variables on x-axis
  • Well-fitting model characteristics
    • Residuals randomly scattered around zero
    • No discernible pattern in residual plot
  • Assess homoscedasticity by checking for constant variance in residuals across predictor variable levels
  • Evaluate normality of residuals using
    • Q-Q plot (should approximate straight line)
    • Histogram of residuals (should resemble normal distribution)
  • Identify outliers and influential points through residual analysis
    • Use measures like standardized residuals, Cook's distance, or leverage statistics
  • Detect non-linearity in variable relationships
    • Indicates linear model may be inappropriate
  • Examples of residual patterns and interpretations
    • Funnel shape: heteroscedasticity
    • U-shape: non-linear relationship
    • Clustering: presence of subgroups in data

Advanced Residual Analysis Techniques

  • Durbin-Watson test detects autocorrelation in residuals
    • Violates independence assumption in linear regression
  • Partial residual plots assess individual predictor effects while controlling for other variables
  • Standardized residual plots help identify outliers and influential observations
  • LOESS smoothing on residual plots reveals subtle patterns and trends
  • Residual analysis applications in various fields
    • Financial modeling (stock price prediction)
    • Environmental science (pollution level forecasting)
    • Medical research (drug effectiveness studies)