Fiveable

๐ŸŽฒIntro to Probability Unit 11 Review

QR code for Intro to Probability practice questions

11.3 Properties of correlation

๐ŸŽฒIntro to Probability
Unit 11 Review

11.3 Properties of correlation

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸŽฒIntro to Probability
Unit & Topic Study Guides

Correlation is a crucial concept in probability, measuring the strength and direction of linear relationships between variables. It's bounded between -1 and 1, with 0 indicating no linear relationship. Understanding correlation's properties helps interpret data relationships accurately.

Correlation has interesting properties like symmetry and invariance under linear transformations. However, it has limitations too. It doesn't imply causation, misses nonlinear relationships, and can be affected by outliers. Knowing these nuances is key to proper statistical analysis.

Correlation Properties

Range and Interpretation

  • Correlation coefficients always fall between -1 and 1, inclusive
    • -1 signifies a perfect negative linear relationship
    • 0 indicates no linear relationship
    • 1 represents a perfect positive linear relationship
  • Measures strength and direction of linear relationships between two variables
  • Typically denoted as ฯ (rho) for population correlation or r for sample correlation
  • Square of correlation coefficient (rยฒ) shows proportion of variance in one variable explained by linear relationship with other variable
    • Example: rยฒ of 0.64 means 64% of variance in Y explained by X

Symmetry and Invariance

  • Exhibits symmetry correlation between X and Y equals correlation between Y and X
  • Remains invariant under linear transformations of variables
    • Changing scale or adding constants to either/both variables does not affect correlation
    • Example: Correlation between height in inches and weight in pounds same as correlation between height in centimeters and weight in kilograms
  • Sensitive to outliers can significantly influence strength and direction of relationship
    • Example: A few extreme data points in a scatterplot can dramatically alter the correlation coefficient

Correlation and Independence

Relationship Between Correlation and Independence

  • Zero correlation does not necessarily imply independence between random variables
  • Independence of random variables always results in zero correlation
  • Non-zero correlation always indicates dependence between random variables
  • For bivariate normal distributions, zero correlation equivalent to independence (special case)
  • Absence of linear correlation does not rule out other forms of dependence
    • Example: Y = Xยฒ has zero linear correlation but strong nonlinear relationship

Practical Considerations

  • Correlation measures only linear relationships while independence considers all possible relationships
  • Very low correlation values (close to zero) often interpreted as practical independence
    • Requires caution in interpretation
    • Example: Correlation of 0.05 between shoe size and test scores might be considered practically independent
  • In real-world data analysis, weak correlations (|r| < 0.3) often treated as negligible
    • Context-dependent interpretation necessary

Correlation Limitations

Nonlinear Relationships and Causality

  • Fails to capture nonlinear patterns or complex associations between variables
    • Example: Sine wave relationship between variables shows zero correlation despite clear pattern
  • Zero correlation does not mean no relationship only absence of linear relationship
  • Does not imply causation strong correlation does not indicate one variable causes changes in other
    • Example: Ice cream sales and crime rates may correlate due to shared influence of temperature
  • Spurious correlations occur when two variables correlated due to influence of unmeasured third variable
    • Example: Correlation between number of pirates and global temperature (both decreasing over time)

Statistical and Methodological Issues

  • Presence of outliers or influential points can distort correlation coefficient
    • Can lead to misleading conclusions about relationship between variables
  • Not robust to monotonic transformations of data
    • Can change strength and even direction of correlation
    • Example: Log transformation of positively skewed data may alter correlation with another variable
  • Only measures strength of linear relationships
    • Misses important nonlinear patterns
    • Example: U-shaped relationship between age and happiness shows near-zero correlation

Population vs Sample Correlation

Definitions and Calculations

  • Population correlation (ฯ) describes true relationship between variables in entire population
  • Sample correlation (r) estimated from subset of population subject to sampling variability
  • Sample correlation formula involves standardizing variables and taking average product
  • Population correlation defined using expected values and standard deviations
  • Fisher z-transformation normalizes sampling distribution of correlation coefficients
    • Used for constructing confidence intervals and hypothesis testing

Statistical Properties and Considerations

  • Sample correlation biased for small sample sizes
    • Tends to underestimate absolute value of population correlation
    • Example: Sample of 10 data points likely to produce less accurate estimate than sample of 100
  • Confidence intervals constructed for sample correlations estimate range of plausible population correlation values
  • As sample size increases, sample correlation converges to population correlation
    • Assumes random sampling and absence of systematic biases
  • Sample correlation used to estimate unknown population correlation
    • Example: Studying correlation between study time and test scores in a class of 30 students to infer relationship for all students