Fiveable

๐Ÿ‘ฉโ€๐Ÿ’ปFoundations of Data Science Unit 12 Review

QR code for Foundations of Data Science practice questions

12.2 Regression Metrics

๐Ÿ‘ฉโ€๐Ÿ’ปFoundations of Data Science
Unit 12 Review

12.2 Regression Metrics

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ‘ฉโ€๐Ÿ’ปFoundations of Data Science
Unit & Topic Study Guides

Regression metrics are essential tools for evaluating model performance in data science. Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) measure prediction accuracy, while R-squared assesses the model's explanatory power.

Understanding these metrics helps data scientists choose the best model for their specific problem. MSE and RMSE are sensitive to outliers, MAE provides a consistent error scale, and R-squared quantifies overall fit. Selecting the right metric depends on data characteristics and analysis goals.

Understanding Regression Metrics

Calculation of MSE and RMSE

  • Mean Squared Error (MSE) quantifies average squared differences between predicted and actual values
    • $MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$ measures prediction accuracy
    • Lower values indicate better model performance (house price predictions)
    • Amplifies large errors due to squaring, sensitive to outliers
  • Root Mean Squared Error (RMSE) square root of MSE, provides error metric in original units
    • $RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$ facilitates interpretation
    • Expressed in same units as target variable (โ„ƒ for temperature predictions)
    • More intuitive than MSE for comparing model performance
  • Calculation process:
    1. Compute differences between predicted and actual values
    2. Square the differences
    3. Calculate average (MSE) or take square root of average (RMSE)

Application of MAE

  • Mean Absolute Error (MAE) averages absolute differences between predicted and actual values
    • $MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y_i}|$ provides linear scale of errors
    • Lower values indicate better model performance (stock price forecasts)
    • Less affected by outliers compared to MSE/RMSE
  • MAE characteristics:
    • Expressed in same units as target variable (km for distance predictions)
    • Provides consistent error scale across all predictions
    • Easier to interpret than MSE for non-technical stakeholders
  • Model evaluation applications:
    • Preferred when outliers less concerning (retail sales predictions)
    • Useful when error magnitude should be proportional to value scale (currency exchange rate forecasts)

R-squared concept and limitations

  • R-squared (Coefficient of Determination) measures proportion of variance explained by independent variables
    • $R^2 = 1 - \frac{SSR}{SST}$ quantifies model's explanatory power
    • Ranges from 0 to 1, with 1 indicating perfect fit
    • Higher values suggest better model performance (weather pattern predictions)
  • R-squared interpretation:
    • Represents goodness of fit of a model
    • Indicates how well the model captures data variability
  • R-squared limitations:
    • Doesn't indicate model bias (overfitting or underfitting)
    • Can increase with irrelevant variables added
    • Lacks information about prediction accuracy
    • Potentially misleading for non-linear relationships (economic growth models)
  • Adjusted R-squared accounts for number of predictors:
    • Addresses issue of R-squared increasing with additional variables
    • Penalizes unnecessary complexity in the model

Comparison of regression metrics

  • MSE vs RMSE:
    • Both sensitive to outliers, penalize large errors
    • RMSE more interpretable due to same units as target variable (cm for height predictions)
    • Use when larger errors particularly undesirable (medical diagnosis models)
  • MAE vs MSE/RMSE:
    • MAE less affected by outliers, provides consistent error scale
    • MAE preferred when outliers less concerning (customer satisfaction scores)
    • Use MAE for uniform error distributions or outlier presence
  • R-squared vs error-based metrics:
    • R-squared focuses on explained variance, overall model fit
    • Error metrics directly measure prediction accuracy
    • Use R-squared for model comparison, error metrics for performance assessment
  • Metric selection considerations:
    • Data characteristics and outlier presence (financial time series)
    • Interpretability needs (presenting results to non-technical audience)
    • Analysis goals (prediction accuracy for sales forecasts vs model fit for scientific research)
    • Nature of the problem (linear vs non-linear relationships)