Fiveable

๐Ÿค–Statistical Prediction Unit 13 Review

QR code for Statistical Prediction practice questions

13.3 Bayesian Model Averaging and Ensemble Diversity

๐Ÿค–Statistical Prediction
Unit 13 Review

13.3 Bayesian Model Averaging and Ensemble Diversity

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿค–Statistical Prediction
Unit & Topic Study Guides

Bayesian Model Averaging tackles model uncertainty by combining predictions from multiple models. It weighs each model based on its posterior probability, considering all plausible models instead of relying on a single one.

Ensemble diversity measures how much base learners in an ensemble disagree. Higher diversity often leads to better performance. Measures like the Kappa statistic and Q-statistic quantify pairwise agreement, while correlation-based measures assess overall ensemble diversity.

Bayesian Model Averaging

Posterior Probability and Model Uncertainty

  • Bayesian Model Averaging (BMA) addresses model uncertainty by combining predictions from multiple models
    • Weights each model's predictions based on its posterior probability
    • Posterior probability indicates the likelihood of a model being true given the observed data
  • Model uncertainty refers to the uncertainty in selecting the best model from a set of candidate models
    • Arises when multiple models fit the data reasonably well but make different predictions
  • BMA accounts for model uncertainty by considering the predictions of all plausible models
    • Avoids relying on a single model that may not capture the full complexity of the data

Marginal Likelihood and Occam's Razor

  • Marginal likelihood measures how well a model fits the data while penalizing model complexity
    • Calculated by integrating the likelihood function over the prior distribution of model parameters
    • Higher marginal likelihood indicates a better balance between model fit and complexity
  • Occam's razor principle favors simpler models over complex ones when both explain the data equally well
    • BMA incorporates Occam's razor by assigning higher posterior probabilities to simpler models
    • Prevents overfitting and encourages parsimony in model selection ($P(M_i|D) \propto P(D|M_i)P(M_i)$)

Ensemble Diversity Measures

Pairwise Diversity Measures

  • Ensemble diversity measures quantify the degree of disagreement among base learners in an ensemble
    • Higher diversity often leads to better ensemble performance
    • Diversity ensures that base learners make different errors and complement each other
  • Kappa statistic measures pairwise agreement between two classifiers while correcting for chance agreement
    • Ranges from -1 (complete disagreement) to 1 (complete agreement)
    • Lower kappa values indicate higher diversity ($\kappa = \frac{p_o - p_e}{1 - p_e}$)
  • Q-statistic measures pairwise similarity between two classifiers based on their predictions
    • Ranges from -1 (complete disagreement) to 1 (complete agreement)
    • Lower Q-statistic values indicate higher diversity ($Q_{ij} = \frac{N^{11}N^{00} - N^{01}N^{10}}{N^{11}N^{00} + N^{01}N^{10}}$)

Correlation-based Diversity and Bias-Variance-Covariance Decomposition

  • Correlation-based diversity measures the average pairwise correlation between base learners' predictions
    • Lower correlation indicates higher diversity
    • Can be calculated using Pearson's correlation coefficient or rank correlation measures (Spearman's rho, Kendall's tau)
  • Bias-variance-covariance decomposition breaks down the ensemble's error into three components
    • Bias: the difference between the ensemble's average prediction and the true value
    • Variance: the variability of predictions across base learners
    • Covariance: the interdependence between base learners' predictions
  • Ideal ensemble has low bias, high variance, and low covariance
    • High variance ensures diversity among base learners
    • Low covariance prevents base learners from making similar errors