🎣Statistical Inference Unit 3 Review

3.3 Covariance and Correlation

🎣Statistical Inference
Unit 3 Review

3.3 Covariance and Correlation

Written by the Fiveable Content Team • Last updated September 2025

🎣Statistical Inference

Unit & Topic Study Guides

3.1 Bivariate and Multivariate Distributions

3.2 Marginal and Conditional Distributions

3.3 Covariance and Correlation

3.4 Independence and Conditional Independence

Covariance and correlation are key tools for understanding relationships between variables. Covariance measures joint variability, while correlation standardizes this measure for easier interpretation. Both help quantify linear associations in data.

These measures have important applications but also limitations. They only capture linear relationships, can be affected by outliers, and don't imply causation. Understanding these nuances is crucial for proper statistical analysis and interpretation.

Measures of Association

Covariance calculation and interpretation

Covariance quantifies joint variability between two random variables indicates direction of linear relationship
Calculate using formula $Cov(X,Y) = E[(X - \mu_X)(Y - \mu_Y)]$ or $Cov(X,Y) = E[XY] - E[X]E[Y]$
Positive covariance suggests variables move in same direction (stock prices and company profits)
Negative covariance indicates variables move oppositely (temperature and heating costs)
Zero covariance points to no linear relationship (shoe size and test scores)
Units expressed as product of two variables' units (m·kg for height and weight)

Properties of covariance

Symmetry property states $Cov(X,Y) = Cov(Y,X)$
Linearity allows $Cov(aX + b, Y) = aCov(X,Y)$ for constants a and b
Variance emerges as special case where $Cov(X,X) = Var(X)$
Independent variables yield zero covariance but reverse not always true
Covariance matrix summarizes relationships between multiple variables (gene expression data)
Scale-dependence limits interpretability across different variable scales

Correlation coefficient basics

Correlation standardizes covariance offering scale-independent measure of linear relationship
Pearson's correlation coefficient computed as $\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$
Values range from -1 to 1 indicating strength and direction of relationship
Perfect positive (1) shows variables increase together (height and weight)
Perfect negative (-1) indicates inverse relationship (temperature and heating bill)
Zero correlation suggests no linear relationship (shoe size and intelligence)
Sample correlation estimates population parameter from observed data

Limitations of correlation

Only captures linear relationships missing complex patterns (sine wave)
Outliers can significantly distort correlation value
Fails to detect non-monotonic relationships (U-shaped curves)
Correlation ≠ causation (ice cream sales and crime rates)
Spurious correlations arise from coincidental data patterns (stork populations and birth rates)
Alternative measures include Spearman's rank and Kendall's tau for non-linear relationships
Range restriction can artificially lower correlation (college GPAs)
Measurement error introduces noise reducing observed correlation
Scatterplots crucial for visualizing relationship before interpreting correlation coefficient
Ecological fallacy warns against applying group-level correlations to individuals (country wealth vs individual income)

🎣Statistical Inference Unit 3 Review

3.3 Covariance and Correlation

🎣Statistical Inference
Unit 3 Review

3.3 Covariance and Correlation

Unit & Topic Study Guides

Measures of Association

Covariance calculation and interpretation

Properties of covariance

Correlation coefficient basics

Limitations of correlation

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes