Fiveable

๐Ÿ“ŠProbability and Statistics Unit 10 Review

QR code for Probability and Statistics practice questions

10.2 Spearman rank correlation

๐Ÿ“ŠProbability and Statistics
Unit 10 Review

10.2 Spearman rank correlation

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠProbability and Statistics
Unit & Topic Study Guides

Spearman rank correlation measures the strength and direction of association between two ranked variables. It's a non-parametric method that assesses monotonic relationships, making it useful for ordinal data or when assumptions for Pearson correlation aren't met.

The coefficient ranges from -1 to +1, with values closer to these extremes indicating stronger relationships. It's less sensitive to outliers than Pearson correlation and can detect non-linear monotonic relationships, making it versatile for various fields of study.

Definition of Spearman rank correlation

  • Spearman rank correlation is a non-parametric measure of the strength and direction of association between two ranked variables
  • Assesses the monotonic relationship between two variables, where the variables tend to change together but not necessarily at a constant rate
  • Calculates a correlation coefficient, denoted by the Greek letter ฯ (rho) or rs, which ranges from -1 to +1

Assumptions for Spearman rank correlation

  • The data must be at least ordinal, meaning that the variables can be ranked in a meaningful order
  • The relationship between the two variables should be monotonic, either increasing or decreasing consistently
  • The observations must be paired and come from the same population
  • There are no specific assumptions about the distribution of the data or the presence of outliers

Calculating Spearman rank correlation coefficient

Ranking data values

  • Assign ranks to each observation within each variable separately, starting with 1 for the smallest value
  • If there are tied values, assign the average rank to each tied observation
  • The sum of the ranks for each variable should be equal

Differences between ranks

  • Calculate the difference between the ranks (di) for each pair of observations
  • Square each difference to obtain diยฒ
  • Sum the squared differences to obtain ฮฃdiยฒ

Formula for Spearman rank correlation

  • The Spearman rank correlation coefficient is calculated using the following formula: rs=1โˆ’6โˆ‘di2n(n2โˆ’1)r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
  • Where:
    • rs is the Spearman rank correlation coefficient
    • di is the difference between the ranks of the ith pair of observations
    • n is the number of pairs of observations

Interpreting Spearman rank correlation

Strength of monotonic relationship

  • The absolute value of the Spearman rank correlation coefficient indicates the strength of the monotonic relationship between the two variables
  • A value close to 1 suggests a strong monotonic relationship, while a value close to 0 indicates a weak or no monotonic relationship

Positive vs negative correlation

  • A positive Spearman rank correlation coefficient (0 < rs โ‰ค 1) indicates a monotonically increasing relationship, where both variables tend to increase together
  • A negative Spearman rank correlation coefficient (-1 โ‰ค rs < 0) indicates a monotonically decreasing relationship, where one variable tends to decrease as the other increases

No correlation

  • A Spearman rank correlation coefficient close to 0 suggests no monotonic relationship between the variables
  • However, this does not necessarily imply that there is no relationship at all, as there could be a non-monotonic relationship

Hypothesis testing with Spearman rank correlation

Null and alternative hypotheses

  • The null hypothesis (H0) states that there is no monotonic relationship between the two variables in the population
  • The alternative hypothesis (Ha) states that there is a monotonic relationship between the two variables in the population

Test statistic and p-value

  • The test statistic for the Spearman rank correlation is the sample correlation coefficient (rs)
  • The p-value is the probability of obtaining a sample correlation coefficient as extreme as the observed value, assuming the null hypothesis is true

Significance level and decision rule

  • Choose a significance level (ฮฑ) for the hypothesis test (common choices are 0.01, 0.05, or 0.10)
  • If the p-value is less than the chosen significance level, reject the null hypothesis in favor of the alternative hypothesis
  • If the p-value is greater than or equal to the significance level, fail to reject the null hypothesis

Comparing Spearman and Pearson correlation

Similarities in interpretation

  • Both Spearman and Pearson correlation coefficients range from -1 to +1
  • The sign of the correlation coefficient indicates the direction of the relationship (positive or negative)
  • The absolute value of the correlation coefficient indicates the strength of the relationship

Differences in assumptions

  • Pearson correlation assumes a linear relationship between the variables and requires interval or ratio data
  • Spearman rank correlation assumes a monotonic relationship and requires at least ordinal data
  • Pearson correlation is parametric and assumes normally distributed data, while Spearman rank correlation is non-parametric and does not make assumptions about the distribution

Robustness to outliers

  • Spearman rank correlation is less sensitive to outliers than Pearson correlation because it is based on ranks rather than actual values
  • Outliers can have a significant impact on the Pearson correlation coefficient, potentially leading to misleading results

Applications of Spearman rank correlation

Ordinal or ranked data

  • Spearman rank correlation is particularly useful when dealing with ordinal or ranked data, such as survey responses on a Likert scale (strongly disagree to strongly agree)
  • It can be applied to data that does not meet the assumptions of Pearson correlation, such as non-normally distributed data or data with outliers

Non-linear monotonic relationships

  • Spearman rank correlation can detect monotonic relationships that are not necessarily linear
  • This makes it suitable for assessing relationships between variables that have a consistent increasing or decreasing trend, even if the rate of change is not constant

Examples in various fields

  • In psychology, Spearman rank correlation can be used to study the relationship between participants' rankings of different stimuli (preferences for various colors)
  • In environmental science, it can be used to assess the relationship between the ranks of different pollutants in various locations (air quality rankings in different cities)
  • In finance, Spearman rank correlation can be used to analyze the relationship between the performance rankings of different stocks or investment funds

Limitations of Spearman rank correlation

Sensitivity to tied ranks

  • When there are many tied values in the data, the presence of tied ranks can affect the accuracy of the Spearman rank correlation coefficient
  • Tied ranks are assigned the average rank, which may not accurately represent the true relationship between the variables

Inability to detect non-monotonic relationships

  • Spearman rank correlation only assesses monotonic relationships and cannot detect non-monotonic relationships, such as U-shaped or inverted U-shaped relationships
  • If the relationship between the variables is non-monotonic, the Spearman rank correlation coefficient may not accurately represent the true nature of the relationship

Interpretation challenges with small samples

  • When working with small sample sizes, the interpretation of the Spearman rank correlation coefficient can be challenging
  • Small samples may not provide a reliable representation of the population, leading to increased uncertainty in the results
  • It is important to consider the sample size and the potential impact of sampling variability when interpreting Spearman rank correlation results