Goodness-of-fit measures are crucial for assessing the performance of Generalized Linear Models (GLMs) in logistic and Poisson regression. These tools help us evaluate how well our models explain the data and make predictions, guiding us in model selection and improvement.
From deviance to ROC curves, these measures offer insights into model fit, predictive accuracy, and potential issues like overdispersion. Understanding these concepts is key to building reliable GLMs and interpreting their results with confidence.
Deviance for Model Fit
Measuring Lack of Fit
- Deviance measures the lack of fit between a model and the observed data, with smaller values indicating better fit
- The deviance is defined as -2 times the log-likelihood ratio between the fitted model and a saturated model that fits the data perfectly
- For GLMs, the deviance is used as a generalization of the residual sum of squares from linear regression models
Comparing Null and Residual Deviances
- The null deviance represents the lack of fit of the null model $(\text{intercept-only})$, while the residual deviance represents the lack of fit of the proposed model
- Comparing the null and residual deviances helps assess the improvement in fit provided by the explanatory variables in the model
- A substantial reduction in deviance from the null to the residual model indicates that the explanatory variables are contributing to the model's explanatory power $(\text{e.g., a reduction from 100 to 50})$
- If the residual deviance is close to the null deviance, it suggests that the explanatory variables are not providing much additional information beyond the intercept $(\text{e.g., a reduction from 100 to 95})$
Likelihood Ratio Test for Models
Comparing Nested Models
- The likelihood ratio test $(\text{LRT})$ is used to compare the fit of two nested models, where one model is a special case of the other
- The LRT statistic is calculated as the difference in deviances between the null $(\text{reduced})$ and alternative $(\text{full})$ models, and it follows a chi-square distribution under the null hypothesis
- The degrees of freedom for the LRT are equal to the difference in the number of parameters between the two models being compared
- A significant LRT indicates that the additional parameters in the alternative model provide a significant improvement in fit over the null model
Applications of LRT
- The LRT can be used for variable selection, testing the significance of individual predictors, and assessing the overall fit of the model
- For example, when comparing a model with a single predictor to a model with multiple predictors, a significant LRT would suggest that the additional predictors improve the model's fit
- LRT can also be used to test the significance of interaction terms by comparing models with and without the interaction
- In the context of model building, LRT can be employed in a stepwise fashion to sequentially add or remove predictors based on their contribution to the model's fit
Predictive Accuracy of GLMs
Pseudo-R-Squared Measures
- Pseudo-R-squared measures, such as McFadden's, Cox and Snell's, and Nagelkerke's, provide an indication of the proportion of variance explained by the model
- These measures are based on the likelihood ratio between the null and fitted models, but they do not have the same interpretation as the R-squared in linear regression
- Higher values of pseudo-R-squared suggest better model fit, but they should be interpreted with caution and used in conjunction with other model diagnostics
- Example: A McFadden's pseudo-R-squared of 0.3 indicates that the fitted model explains approximately 30% of the variation in the response variable compared to the null model
ROC Curves and AUC
- Receiver Operating Characteristic $(\text{ROC})$ curves are used to assess the predictive accuracy of binary response GLMs, such as logistic regression
- ROC curves plot the true positive rate $(\text{sensitivity})$ against the false positive rate $(\text{1-specificity})$ for various classification thresholds
- The area under the ROC curve $(\text{AUC})$ is a summary measure of the model's discriminatory power, with higher values indicating better predictive accuracy
- Example: An AUC of 0.8 suggests that the model has a good ability to discriminate between the two classes, while an AUC of 0.5 indicates that the model's predictions are no better than random guessing
Cross-Validation Techniques
- Cross-validation techniques, such as k-fold or leave-one-out, can be used to assess the model's predictive performance on unseen data and detect overfitting
- In k-fold cross-validation, the data is divided into k subsets, and the model is trained on k-1 subsets and validated on the remaining subset, with this process repeated k times
- Leave-one-out cross-validation is a special case of k-fold cross-validation where k equals the number of observations
- Cross-validation helps to provide a more robust estimate of the model's predictive accuracy and can help identify if the model is overfitting to the training data
Overdispersion in GLMs
Understanding Overdispersion
- Overdispersion occurs when the observed variance in the response variable is greater than the variance assumed by the GLM, violating the model's assumptions
- In the context of Poisson regression, overdispersion implies that the variance of the response is larger than the mean, which is not accounted for by the model
- Overdispersion can lead to underestimated standard errors, inflated test statistics, and incorrect inferences about the significance of predictors
Diagnosing Overdispersion
- Diagnostic plots, such as the residuals vs. fitted values or scale-location plots, can help identify the presence of overdispersion
- In the presence of overdispersion, the residuals may exhibit a fan-shaped pattern, with increasing spread as the fitted values increase
- The scale-location plot, which plots the square root of the absolute residuals against the fitted values, can also reveal non-constant variance
- Formal tests for overdispersion include the dispersion test, which compares the residual deviance to the degrees of freedom, and the Pearson chi-square test
- A dispersion test statistic significantly greater than 1 indicates the presence of overdispersion
- The Pearson chi-square test compares the observed and expected frequencies, with a significant test result suggesting overdispersion
Addressing Overdispersion
- To address overdispersion, alternative models can be used, such as the negative binomial regression or quasi-Poisson models, which allow for more flexible variance structures
- Negative binomial regression includes an additional parameter that accounts for overdispersion by modeling the variance as a function of the mean
- Quasi-Poisson models introduce a dispersion parameter that scales the variance independently of the mean, allowing for overdispersion
- Including additional explanatory variables or interaction terms in the model may also help capture the excess variability and reduce overdispersion
- By incorporating more relevant information into the model, the unexplained variability may be reduced, leading to a better fit and less overdispersion
- Example: In a study of the number of doctor visits, if a Poisson regression model exhibits overdispersion, using a negative binomial regression or including additional explanatory variables $(\text{e.g., age, chronic conditions})$ may help account for the excess variability and improve the model's fit