🥖Linear Modeling Theory Unit 8 Review

8.2 Information Criteria: AIC and BIC

🥖Linear Modeling Theory
Unit 8 Review

8.2 Information Criteria: AIC and BIC

Written by the Fiveable Content Team • Last updated September 2025

🥖Linear Modeling Theory

Unit & Topic Study Guides

8.1 Stepwise Regression Methods

8.2 Information Criteria: AIC and BIC

8.3 Cross-validation Techniques

8.4 Best Subset Selection

Model selection is crucial in linear modeling, and information criteria like AIC and BIC help us choose the best model. These tools balance model fit and complexity, preventing overfitting while ensuring our model captures important patterns in the data.

AIC and BIC differ in how they penalize complexity, with BIC favoring simpler models, especially with large sample sizes. Understanding these criteria helps us make informed decisions about which model to use for our specific needs.

Akaike vs Bayesian Information Criteria

Information Criteria for Model Selection

AIC and BIC compare and select models based on a balance between model fit and complexity
AIC estimates the relative amount of information lost by a given model, considering both the goodness of fit and the number of parameters in the model
- Defined as $AIC = 2k - 2ln(L)$, where $k$ is the number of parameters and $L$ is the likelihood function
BIC introduces a stronger penalty term for the number of parameters compared to AIC
- Defined as $BIC = k ln(n) - 2ln(L)$, where $k$ is the number of parameters, $n$ is the sample size, and $L$ is the likelihood function
Both AIC and BIC identify the model that best explains the data with the least complexity, following the principle of parsimony (Occam's razor)

Assumptions and Considerations

AIC and BIC assume that the candidate models are nested, meaning simpler models are special cases of more complex models
- May not be suitable for comparing non-nested models
Information criteria provide a quantitative approach to model selection but should be used in conjunction with subject matter knowledge and practical considerations when choosing the best model for a given problem
Other factors to consider include interpretability, computational efficiency, and the specific goals of the modeling task (prediction, inference, etc.)

Calculating AIC and BIC

Steps to Calculate AIC

Determine the number of parameters ($k$) in the model
Compute the likelihood function ($L$) based on the model's fit to the data
Substitute these values into the AIC formula: $AIC = 2k - 2ln(L)$

Steps to Calculate BIC

Determine the number of parameters ($k$) in the model
Determine the sample size ($n$)
Compute the likelihood function ($L$) based on the model's fit to the data
Substitute these values into the BIC formula: $BIC = k ln(n) - 2ln(L)$

Model Comparison using AIC and BIC

Compare the AIC or BIC values of competing models
- The model with the lowest AIC or BIC value is considered the best model among the candidates
When comparing models using AIC or BIC, ensure that:
- The models are fitted to the same dataset
- The likelihood functions are computed using the same approach
Differences in AIC or BIC values between models can be used to assess the strength of evidence in favor of one model over another
- A difference of 2-6 suggests moderate evidence, 6-10 suggests strong evidence, and >10 suggests very strong evidence

Model Complexity vs Goodness of Fit

Balancing Fit and Complexity

Information criteria balance the goodness of fit and the complexity of the model to avoid overfitting and promote model parsimony
As the number of parameters in a model increases, the model becomes more complex and may fit the data better
- However, overly complex models may overfit the data, leading to poor generalization performance
AIC and BIC introduce penalty terms for the number of parameters in the model
- These penalties discourage the selection of overly complex models unless the improvement in fit is substantial enough to justify the increased complexity

Underfitting and Overfitting

The trade-off between model complexity and goodness of fit is crucial to strike a balance between underfitting (too simple) and overfitting (too complex) models
Underfitting occurs when a model is too simple to capture the underlying patterns in the data
- Underfit models have high bias and low variance
Overfitting occurs when a model is too complex and fits the noise in the data rather than the underlying patterns
- Overfit models have low bias but high variance and may not generalize well to new data
Information criteria help identify models that strike a balance between underfitting and overfitting by penalizing complexity

Comparing AIC and BIC

Differences in Penalty Terms

AIC and BIC differ in the way they penalize model complexity
- AIC has a fixed penalty term ($2k$), while BIC's penalty term ($k ln(n)$) increases with the sample size
BIC tends to favor simpler models compared to AIC, especially when the sample size is large
- This is because the penalty term in BIC grows with the sample size, making it more conservative in selecting complex models

Asymptotic Properties

AIC is asymptotically efficient, meaning it selects the model that minimizes the mean squared error of prediction as the sample size approaches infinity
- However, it may tend to favor more complex models when the sample size is small
BIC is consistent, meaning it selects the true model with probability approaching 1 as the sample size approaches infinity, assuming the true model is among the candidate models
- BIC tends to select simpler models than AIC, especially when the sample size is large

Advantages and Limitations

AIC advantages:
- Asymptotically efficient in selecting models that minimize prediction error
- Performs well in small sample sizes compared to BIC
AIC limitations:
- May tend to favor more complex models, especially when the sample size is small
- Not consistent in selecting the true model as the sample size increases
BIC advantages:
- Consistent in selecting the true model as the sample size increases (assuming the true model is among the candidates)
- Tends to select simpler models than AIC, reducing the risk of overfitting
BIC limitations:
- May select models that underfit the data, especially when the sample size is small
- Assumes that the true model is among the candidate models, which may not always be the case

🥖Linear Modeling Theory Unit 8 Review

8.2 Information Criteria: AIC and BIC

🥖Linear Modeling Theory Unit 8 Review

8.2 Information Criteria: AIC and BIC

Unit & Topic Study Guides

Akaike vs Bayesian Information Criteria

Information Criteria for Model Selection

Assumptions and Considerations

Calculating AIC and BIC

Steps to Calculate AIC

Steps to Calculate BIC

Model Comparison using AIC and BIC

Model Complexity vs Goodness of Fit

Balancing Fit and Complexity

Underfitting and Overfitting

Comparing AIC and BIC

Differences in Penalty Terms

Asymptotic Properties

Advantages and Limitations

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🥖Linear Modeling Theory
Unit 8 Review