Fiveable

๐Ÿ“ŠExperimental Design Unit 12 Review

QR code for Experimental Design practice questions

12.2 Multiple comparisons and post-hoc tests

๐Ÿ“ŠExperimental Design
Unit 12 Review

12.2 Multiple comparisons and post-hoc tests

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠExperimental Design
Unit & Topic Study Guides

When conducting multiple statistical tests, the risk of false positives increases. Multiple comparison corrections help control this risk by adjusting significance levels. These methods ensure that overall error rates stay within acceptable limits, maintaining the integrity of research findings.

Post-hoc tests come into play after finding significant effects in analyses like ANOVA. They allow for pairwise comparisons between group means, helping researchers pinpoint specific differences. Various post-hoc tests exist, each with unique strengths for different research scenarios.

Multiple Comparison Corrections

Controlling False Positives

  • Family-wise error rate (FWER) represents the probability of making at least one Type I error (false positive) among all hypotheses tested
  • FWER increases as the number of hypotheses tested increases, leading to a higher chance of obtaining false positives
  • Multiple comparison corrections aim to control the FWER by adjusting the significance level (ฮฑ) for each individual hypothesis test
  • Controlling FWER ensures that the overall Type I error rate is maintained at the desired level (usually 0.05) across all comparisons

Bonferroni and Holm-Bonferroni Corrections

  • Bonferroni correction is a simple and conservative method for controlling FWER
    • Divides the desired overall significance level (ฮฑ) by the number of hypotheses tested (m) to obtain the adjusted significance level for each individual test: $\alpha_{adjusted} = \frac{\alpha}{m}$
    • Ensures that the FWER is controlled at the desired level, but may be overly conservative, leading to reduced statistical power (increased Type II error rate)
  • Holm-Bonferroni method is a step-down procedure that improves upon the Bonferroni correction
    • Orders the p-values from smallest to largest and compares each p-value to a sequentially adjusted significance level: $\alpha_{adjusted, i} = \frac{\alpha}{m - i + 1}$, where $i$ is the rank of the p-value
    • Offers more power than the Bonferroni correction while still controlling FWER

False Discovery Rate

  • False discovery rate (FDR) is an alternative approach to multiple comparison corrections that controls the expected proportion of false positives among all significant results
  • FDR is less conservative than FWER control methods and provides a better balance between Type I and Type II errors
  • Benjamini-Hochberg procedure is a popular method for controlling FDR
    • Orders the p-values from smallest to largest and compares each p-value to a sequentially adjusted threshold: $\frac{i}{m} \times \alpha$, where $i$ is the rank of the p-value and $m$ is the total number of hypotheses tested
    • Identifies the largest p-value that satisfies the condition and declares all hypotheses with smaller or equal p-values as significant

Post-hoc Tests

Pairwise Comparisons

  • Post-hoc tests are used to make pairwise comparisons between group means after a significant overall effect has been found in an ANOVA
  • Pairwise comparisons involve testing the differences between all possible pairs of group means
  • Multiple comparison corrections are often applied to control the FWER or FDR when conducting pairwise comparisons
  • Common post-hoc tests for pairwise comparisons include Tukey's HSD test, Scheffe's test, and Dunnett's test

Tukey's HSD and Scheffe's Tests

  • Tukey's Honest Significant Difference (HSD) test is a widely used post-hoc test for pairwise comparisons
    • Computes a critical value based on the studentized range distribution, which depends on the number of groups and the degrees of freedom for the error term
    • Controls the FWER for all pairwise comparisons and is more powerful than the Bonferroni correction when the number of groups is large
  • Scheffe's test is another post-hoc test that can be used for pairwise comparisons and complex contrasts
    • Uses the F-distribution to compute a critical value and is more conservative than Tukey's HSD test
    • Offers simultaneous confidence intervals for all possible contrasts, making it flexible for testing any linear combination of means

Dunnett's Test

  • Dunnett's test is a specialized post-hoc test used when comparing several treatment groups to a single control group
  • Computes a critical value based on the Dunnett's distribution, which accounts for the correlation between the comparisons to the control group
  • Controls the FWER for the comparisons between each treatment group and the control group
  • Useful in experiments where the main interest lies in comparing treatments to a control (e.g., drug trials comparing different doses to a placebo)