📊Bayesian Statistics Unit 1 Review

1.4 Expectation and variance

📊Bayesian Statistics
Unit 1 Review

1.4 Expectation and variance

Written by the Fiveable Content Team • Last updated September 2025

📊Bayesian Statistics

Unit & Topic Study Guides

1.1 Probability axioms

1.2 Random variables

1.3 Probability distributions

1.4 Expectation and variance

1.5 Joint and conditional probabilities

1.6 Law of total probability

1.7 Independence

Expectation and variance are fundamental concepts in Bayesian statistics, providing tools to analyze random variables and quantify uncertainty. These measures help us understand the average behavior and spread of probability distributions, forming the basis for parameter estimation and prediction in Bayesian inference.

Expectation calculates the average outcome, while variance measures the spread around that average. Together, they enable us to characterize distributions, make informed decisions, and update our beliefs as we gather new data. These concepts are essential for understanding the core principles of Bayesian analysis and their practical applications.

Definition of expectation

Expectation quantifies the average outcome of a random variable in probability theory and statistics
Plays a crucial role in Bayesian statistics for estimating parameters and making predictions based on prior knowledge and observed data

Probability-weighted average

Calculates the sum of all possible values multiplied by their respective probabilities
Expressed mathematically as $E[X] = \sum_{i=1}^{n} x_i \cdot p(x_i)$ for discrete random variables
Represents the center of mass of a probability distribution
Used to determine the long-run average outcome of repeated experiments

Discrete vs continuous cases

Discrete case involves summing over finite or countably infinite possible values
Continuous case requires integration over the entire range of the random variable
Continuous expectation formula: $E[X] = \int_{-\infty}^{\infty} x \cdot f(x) dx$ , where f(x) is the probability density function
Both cases yield a single value representing the average outcome
Provides a foundation for comparing different probability distributions

Properties of expectation

Expectation serves as a fundamental tool in probability theory and Bayesian statistics
Enables the analysis of random variables' behavior and relationships between multiple variables

Linearity of expectation

States that the expectation of a sum equals the sum of individual expectations
Expressed as $E[aX + bY] = aE[X] + bE[Y]$ for constants a and b and random variables X and Y
Holds true even when random variables are dependent
Simplifies calculations involving complex combinations of random variables
Applies to both discrete and continuous cases

Expectation of constants

Expectation of a constant equals the constant itself: $E[c] = c$
Allows for easy incorporation of fixed values in expectation calculations
Useful when working with linear combinations of random variables and constants
Simplifies expressions involving both random variables and deterministic values

Expectation of functions

Calculates the average value of a function applied to a random variable
Expressed as $E[g(X)] = \sum_{i=1}^{n} g(x_i) \cdot p(x_i)$ for discrete cases
Continuous case formula: $E[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f(x) dx$
Enables analysis of transformed random variables
Useful for deriving moments and other statistical properties

Definition of variance

Variance measures the spread or dispersion of a random variable around its expected value
Plays a crucial role in Bayesian statistics for quantifying uncertainty and assessing the reliability of estimates

Measure of spread

Quantifies the average squared deviation from the mean
Expressed mathematically as $Var(X) = E[(X - \mu)^2]$ , where μ is the expected value of X
Provides insight into the variability and concentration of probability mass
Larger variance indicates greater spread and more uncertainty
Useful for comparing the dispersion of different probability distributions

Relationship to expectation

Variance can be computed using expectations: $Var(X) = E[X^2] - (E[X])^2$
Demonstrates the connection between second moment and first moment (mean)
Allows for alternative calculation methods when direct computation is challenging
Highlights the importance of both the average value and squared values in determining spread
Provides a foundation for understanding higher-order moments and distribution shapes

Properties of variance

Variance properties enable efficient analysis and manipulation of random variables in Bayesian statistics
Facilitate the study of uncertainty propagation and error estimation in statistical models

Non-negativity

Variance is always non-negative: $Var(X) \geq 0$
Equals zero only for constants or degenerate random variables
Reflects the fact that spread is measured as squared deviations
Ensures consistency in interpreting variance across different distributions
Provides a lower bound for uncertainty in statistical estimates

Variance of constants

Variance of a constant is always zero: $Var(c) = 0$
Indicates that constants have no uncertainty or variability
Useful when working with combinations of random variables and fixed values
Simplifies variance calculations for expressions involving constants

Variance of linear transformations

For constants a and b, and random variable X: $Var(aX + b) = a^2 Var(X)$
Demonstrates how scaling affects the spread of a distribution
Shows that adding constants does not change the variance
Enables analysis of how linear transformations impact uncertainty
Useful in standardizing random variables and creating z-scores

Covariance and correlation

Covariance and correlation measure the relationship between two random variables
Essential concepts in Bayesian statistics for understanding dependencies and joint distributions

Definition of covariance

Measures the joint variability of two random variables
Expressed as $Cov(X,Y) = E[(X - E[X])(Y - E[Y])]$
Positive covariance indicates variables tend to move together
Negative covariance suggests inverse relationship
Zero covariance implies no linear relationship (but does not rule out non-linear dependencies)

Correlation coefficient

Normalized measure of linear dependence between two random variables
Defined as $\rho_{X,Y} = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}$
Ranges from -1 to 1, with -1 indicating perfect negative correlation and 1 perfect positive correlation
Value of 0 suggests no linear correlation
Unitless measure, allowing comparison of relationships between different variable pairs

Properties of correlation

Symmetric: $\rho_{X,Y} = \rho_{Y,X}$
Unchanged by linear transformations: $\rho_{aX+b,cY+d} = \rho_{X,Y}$ for constants a, b, c, d (a and c non-zero)
Absolute value never exceeds 1: $|\rho_{X,Y}| \leq 1$
Correlation of 1 or -1 implies perfect linear relationship
Independent variables have zero correlation (but zero correlation does not imply independence)

Expectation in Bayesian inference

Expectation plays a central role in Bayesian inference for parameter estimation and prediction
Allows incorporation of prior knowledge and updating beliefs based on observed data

Prior expectation

Represents the average value of a parameter before observing data
Calculated using the prior distribution: $E_{\text{prior}}[\theta] = \int \theta p(\theta) d\theta$
Encapsulates initial beliefs or expert knowledge about the parameter
Serves as a starting point for Bayesian updating
Influences posterior estimates, especially with limited data

Posterior expectation

Average value of a parameter after incorporating observed data
Computed using the posterior distribution: $E_{\text{posterior}}[\theta|x] = \int \theta p(\theta|x) d\theta$
Combines prior knowledge with information from the likelihood
Often used as a point estimate for the parameter of interest
Represents updated beliefs about the parameter given the evidence

Predictive expectation

Average value of future observations based on current knowledge
Calculated using the predictive distribution: $E_{\text{pred}}[x_{\text{new}}|x] = \int x_{\text{new}} p(x_{\text{new}}|x) dx_{\text{new}}$
Accounts for both parameter uncertainty and inherent randomness
Useful for making predictions and assessing model performance
Provides a single summary of expected future outcomes

Variance in Bayesian inference

Variance quantifies uncertainty in Bayesian inference for parameters and predictions
Crucial for assessing the reliability and precision of Bayesian estimates

Prior variance

Measures the spread of the prior distribution for a parameter
Calculated as $Var_{\text{prior}}(\theta) = E_{\text{prior}}[\theta^2] - (E_{\text{prior}}[\theta])^2$
Reflects the initial uncertainty about the parameter before observing data
Larger prior variance indicates less informative prior knowledge
Influences the weight given to prior information in posterior calculations

Posterior variance

Quantifies the remaining uncertainty about a parameter after observing data
Computed using the posterior distribution: $Var_{\text{posterior}}(\theta|x) = E_{\text{posterior}}[\theta^2|x] - (E_{\text{posterior}}[\theta|x])^2$
Generally smaller than prior variance due to information gained from data
Used to construct credible intervals for parameter estimates
Provides a measure of estimation precision in Bayesian analysis

Predictive variance

Represents the uncertainty in future observations
Calculated using the predictive distribution: $Var_{\text{pred}}(x_{\text{new}}|x) = E_{\text{pred}}[x_{\text{new}}^2|x] - (E_{\text{pred}}[x_{\text{new}}|x])^2$
Accounts for both parameter uncertainty and inherent randomness in the data
Useful for constructing prediction intervals
Helps assess the reliability of model predictions

Moment-generating functions

Moment-generating functions (MGFs) provide a powerful tool for analyzing probability distributions
Play a significant role in Bayesian statistics for deriving distribution properties and making inferences

Definition and properties

MGF of a random variable X defined as $M_X(t) = E[e^{tX}]$
Exists in an open interval containing zero
Uniquely determines the distribution if it exists
MGF of sum of independent random variables is the product of their individual MGFs
Useful for proving limit theorems and characterizing distributions

Relationship to expectation

kth moment can be obtained by differentiating MGF k times and evaluating at t=0
Expected value: $E[X] = M'_X(0)$
Allows for easy computation of moments without direct integration
Facilitates the derivation of expectations for transformed random variables
Useful in Bayesian analysis for calculating expectations of complex functions

Relationship to variance

Variance can be computed using the first and second derivatives of MGF
$Var(X) = M''_X(0) - (M'_X(0))^2$
Provides an alternative method for calculating variance
Useful when direct computation of variance is challenging
Enables analysis of how transformations affect the spread of distributions

Law of total expectation

Fundamental theorem in probability theory with important applications in Bayesian statistics
Allows decomposition of expectations based on conditional probabilities

Conditional expectation

Expected value of a random variable given that another variable takes a specific value
Denoted as $E[Y|X]$ , a function of the conditioning variable X
Useful for analyzing relationships between variables in Bayesian models
Provides insight into how one variable influences the average behavior of another
Forms the basis for many Bayesian prediction and estimation techniques

Tower property

States that $E[Y] = E[E[Y|X]]$
Also known as the law of iterated expectations
Allows computation of unconditional expectations using conditional expectations
Simplifies complex expectation calculations by breaking them into steps
Crucial in Bayesian inference for marginalizing over nuisance parameters

Law of total variance

Extends the law of total expectation to variance calculations
Important tool in Bayesian statistics for analyzing and decomposing uncertainty

Conditional variance

Measures the variability of a random variable given the value of another variable
Denoted as $Var(Y|X)$ , a function of the conditioning variable X
Quantifies the remaining uncertainty in Y after knowing X
Useful for assessing the predictive power of one variable on another
Often used in hierarchical Bayesian models to analyze multi-level variability

Decomposition of variance

States that $Var(Y) = E[Var(Y|X)] + Var(E[Y|X])$
Separates total variance into expected conditional variance and variance of conditional expectation
First term represents average unexplained variance
Second term quantifies variability explained by the conditioning variable
Provides insight into sources of uncertainty in Bayesian models

Applications in Bayesian analysis

Expectation and variance concepts form the foundation for various Bayesian analysis techniques
Enable sophisticated statistical inference and decision-making under uncertainty

Parameter estimation

Uses posterior expectation as point estimate for parameters
Employs posterior variance to quantify estimation uncertainty
Allows for incorporation of prior knowledge in the estimation process
Facilitates the construction of credible intervals for parameters
Enables comparison of different estimation methods (MAP, median, mean)

Hypothesis testing

Utilizes Bayes factors to compare competing hypotheses
Employs posterior probabilities to assess the plausibility of hypotheses
Allows for continuous updating of beliefs as new evidence becomes available
Provides a natural framework for model comparison and selection
Enables decision-making based on expected losses or utilities

Decision theory

Uses expected utility to guide optimal decision-making
Incorporates both parameter uncertainty and consequences of actions
Allows for formal treatment of risk and loss functions
Facilitates the design of experiments to maximize information gain
Enables adaptive strategies that update decisions as new data is observed

Computational methods

Computational techniques play a crucial role in applying expectation and variance concepts in Bayesian statistics
Enable analysis of complex models and high-dimensional problems

Monte Carlo estimation

Approximates expectations using random sampling
Estimates $E[g(X)]$ as $\frac{1}{n}\sum_{i=1}^n g(X_i)$ where X_i are samples from the distribution of X
Allows for estimation of complex integrals and high-dimensional problems
Provides unbiased estimates with quantifiable error bounds
Forms the basis for many advanced Bayesian computation techniques (MCMC)

Importance sampling

Improves Monte Carlo estimation efficiency for rare events or difficult-to-sample distributions
Uses an alternative proposal distribution q(x) to estimate $E[g(X)] = \int g(x) \frac{p(x)}{q(x)} q(x) dx$
Allows sampling from a simpler distribution while still estimating properties of the target distribution
Reduces variance of estimates compared to naive Monte Carlo in many cases
Crucial for estimating normalizing constants and marginal likelihoods in Bayesian models

📊Bayesian Statistics Unit 1 Review

1.4 Expectation and variance

📊Bayesian Statistics Unit 1 Review

1.4 Expectation and variance

Unit & Topic Study Guides

Definition of expectation

Probability-weighted average

Discrete vs continuous cases

Properties of expectation

Linearity of expectation

Expectation of constants

Expectation of functions

Definition of variance

Measure of spread

Relationship to expectation

Properties of variance

Non-negativity

Variance of constants

Variance of linear transformations

Covariance and correlation

Definition of covariance

Correlation coefficient

Properties of correlation

Expectation in Bayesian inference

Prior expectation

Posterior expectation

Predictive expectation

Variance in Bayesian inference

Prior variance

Posterior variance

Predictive variance

Moment-generating functions

Definition and properties

Relationship to expectation

Relationship to variance

Law of total expectation

Conditional expectation

Tower property

Law of total variance

Conditional variance

Decomposition of variance

Applications in Bayesian analysis

Parameter estimation

Hypothesis testing

Decision theory

Computational methods

Monte Carlo estimation

Importance sampling

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Bayesian Statistics
Unit 1 Review