Fiveable

๐Ÿฅ–Linear Modeling Theory Unit 8 Review

QR code for Linear Modeling Theory practice questions

8.2 Information Criteria: AIC and BIC

๐Ÿฅ–Linear Modeling Theory
Unit 8 Review

8.2 Information Criteria: AIC and BIC

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿฅ–Linear Modeling Theory
Unit & Topic Study Guides

Model selection is crucial in linear modeling, and information criteria like AIC and BIC help us choose the best model. These tools balance model fit and complexity, preventing overfitting while ensuring our model captures important patterns in the data.

AIC and BIC differ in how they penalize complexity, with BIC favoring simpler models, especially with large sample sizes. Understanding these criteria helps us make informed decisions about which model to use for our specific needs.

Akaike vs Bayesian Information Criteria

Information Criteria for Model Selection

  • AIC and BIC compare and select models based on a balance between model fit and complexity
  • AIC estimates the relative amount of information lost by a given model, considering both the goodness of fit and the number of parameters in the model
    • Defined as $AIC = 2k - 2ln(L)$, where $k$ is the number of parameters and $L$ is the likelihood function
  • BIC introduces a stronger penalty term for the number of parameters compared to AIC
    • Defined as $BIC = k ln(n) - 2ln(L)$, where $k$ is the number of parameters, $n$ is the sample size, and $L$ is the likelihood function
  • Both AIC and BIC identify the model that best explains the data with the least complexity, following the principle of parsimony (Occam's razor)

Assumptions and Considerations

  • AIC and BIC assume that the candidate models are nested, meaning simpler models are special cases of more complex models
    • May not be suitable for comparing non-nested models
  • Information criteria provide a quantitative approach to model selection but should be used in conjunction with subject matter knowledge and practical considerations when choosing the best model for a given problem
  • Other factors to consider include interpretability, computational efficiency, and the specific goals of the modeling task (prediction, inference, etc.)

Calculating AIC and BIC

Steps to Calculate AIC

  1. Determine the number of parameters ($k$) in the model
  2. Compute the likelihood function ($L$) based on the model's fit to the data
  3. Substitute these values into the AIC formula: $AIC = 2k - 2ln(L)$

Steps to Calculate BIC

  1. Determine the number of parameters ($k$) in the model
  2. Determine the sample size ($n$)
  3. Compute the likelihood function ($L$) based on the model's fit to the data
  4. Substitute these values into the BIC formula: $BIC = k ln(n) - 2ln(L)$

Model Comparison using AIC and BIC

  • Compare the AIC or BIC values of competing models
    • The model with the lowest AIC or BIC value is considered the best model among the candidates
  • When comparing models using AIC or BIC, ensure that:
    • The models are fitted to the same dataset
    • The likelihood functions are computed using the same approach
  • Differences in AIC or BIC values between models can be used to assess the strength of evidence in favor of one model over another
    • A difference of 2-6 suggests moderate evidence, 6-10 suggests strong evidence, and >10 suggests very strong evidence

Model Complexity vs Goodness of Fit

Balancing Fit and Complexity

  • Information criteria balance the goodness of fit and the complexity of the model to avoid overfitting and promote model parsimony
  • As the number of parameters in a model increases, the model becomes more complex and may fit the data better
    • However, overly complex models may overfit the data, leading to poor generalization performance
  • AIC and BIC introduce penalty terms for the number of parameters in the model
    • These penalties discourage the selection of overly complex models unless the improvement in fit is substantial enough to justify the increased complexity

Underfitting and Overfitting

  • The trade-off between model complexity and goodness of fit is crucial to strike a balance between underfitting (too simple) and overfitting (too complex) models
  • Underfitting occurs when a model is too simple to capture the underlying patterns in the data
    • Underfit models have high bias and low variance
  • Overfitting occurs when a model is too complex and fits the noise in the data rather than the underlying patterns
    • Overfit models have low bias but high variance and may not generalize well to new data
  • Information criteria help identify models that strike a balance between underfitting and overfitting by penalizing complexity

Comparing AIC and BIC

Differences in Penalty Terms

  • AIC and BIC differ in the way they penalize model complexity
    • AIC has a fixed penalty term ($2k$), while BIC's penalty term ($k ln(n)$) increases with the sample size
  • BIC tends to favor simpler models compared to AIC, especially when the sample size is large
    • This is because the penalty term in BIC grows with the sample size, making it more conservative in selecting complex models

Asymptotic Properties

  • AIC is asymptotically efficient, meaning it selects the model that minimizes the mean squared error of prediction as the sample size approaches infinity
    • However, it may tend to favor more complex models when the sample size is small
  • BIC is consistent, meaning it selects the true model with probability approaching 1 as the sample size approaches infinity, assuming the true model is among the candidate models
    • BIC tends to select simpler models than AIC, especially when the sample size is large

Advantages and Limitations

  • AIC advantages:
    • Asymptotically efficient in selecting models that minimize prediction error
    • Performs well in small sample sizes compared to BIC
  • AIC limitations:
    • May tend to favor more complex models, especially when the sample size is small
    • Not consistent in selecting the true model as the sample size increases
  • BIC advantages:
    • Consistent in selecting the true model as the sample size increases (assuming the true model is among the candidates)
    • Tends to select simpler models than AIC, reducing the risk of overfitting
  • BIC limitations:
    • May select models that underfit the data, especially when the sample size is small
    • Assumes that the true model is among the candidate models, which may not always be the case