📊Experimental Design Unit 12 Review

12.2 Multiple comparisons and post-hoc tests

📊Experimental Design
Unit 12 Review

12.2 Multiple comparisons and post-hoc tests

Written by the Fiveable Content Team • Last updated September 2025

📊Experimental Design

Unit & Topic Study Guides

12.1 Statistical inference and hypothesis testing

12.2 Multiple comparisons and post-hoc tests

12.3 Effect size interpretation and practical significance

12.4 Limitations and generalizability of experimental results

When conducting multiple statistical tests, the risk of false positives increases. Multiple comparison corrections help control this risk by adjusting significance levels. These methods ensure that overall error rates stay within acceptable limits, maintaining the integrity of research findings.

Post-hoc tests come into play after finding significant effects in analyses like ANOVA. They allow for pairwise comparisons between group means, helping researchers pinpoint specific differences. Various post-hoc tests exist, each with unique strengths for different research scenarios.

Multiple Comparison Corrections

Controlling False Positives

Family-wise error rate (FWER) represents the probability of making at least one Type I error (false positive) among all hypotheses tested
FWER increases as the number of hypotheses tested increases, leading to a higher chance of obtaining false positives
Multiple comparison corrections aim to control the FWER by adjusting the significance level (α) for each individual hypothesis test
Controlling FWER ensures that the overall Type I error rate is maintained at the desired level (usually 0.05) across all comparisons

Bonferroni and Holm-Bonferroni Corrections

Bonferroni correction is a simple and conservative method for controlling FWER
- Divides the desired overall significance level (α) by the number of hypotheses tested (m) to obtain the adjusted significance level for each individual test: $\alpha_{adjusted} = \frac{\alpha}{m}$
- Ensures that the FWER is controlled at the desired level, but may be overly conservative, leading to reduced statistical power (increased Type II error rate)
Holm-Bonferroni method is a step-down procedure that improves upon the Bonferroni correction
- Orders the p-values from smallest to largest and compares each p-value to a sequentially adjusted significance level: $\alpha_{adjusted, i} = \frac{\alpha}{m - i + 1}$, where $i$ is the rank of the p-value
- Offers more power than the Bonferroni correction while still controlling FWER

False Discovery Rate

False discovery rate (FDR) is an alternative approach to multiple comparison corrections that controls the expected proportion of false positives among all significant results
FDR is less conservative than FWER control methods and provides a better balance between Type I and Type II errors
Benjamini-Hochberg procedure is a popular method for controlling FDR
- Orders the p-values from smallest to largest and compares each p-value to a sequentially adjusted threshold: $\frac{i}{m} \times \alpha$, where $i$ is the rank of the p-value and $m$ is the total number of hypotheses tested
- Identifies the largest p-value that satisfies the condition and declares all hypotheses with smaller or equal p-values as significant

Post-hoc Tests

Pairwise Comparisons

Post-hoc tests are used to make pairwise comparisons between group means after a significant overall effect has been found in an ANOVA
Pairwise comparisons involve testing the differences between all possible pairs of group means
Multiple comparison corrections are often applied to control the FWER or FDR when conducting pairwise comparisons
Common post-hoc tests for pairwise comparisons include Tukey's HSD test, Scheffe's test, and Dunnett's test

Tukey's HSD and Scheffe's Tests

Tukey's Honest Significant Difference (HSD) test is a widely used post-hoc test for pairwise comparisons
- Computes a critical value based on the studentized range distribution, which depends on the number of groups and the degrees of freedom for the error term
- Controls the FWER for all pairwise comparisons and is more powerful than the Bonferroni correction when the number of groups is large
Scheffe's test is another post-hoc test that can be used for pairwise comparisons and complex contrasts
- Uses the F-distribution to compute a critical value and is more conservative than Tukey's HSD test
- Offers simultaneous confidence intervals for all possible contrasts, making it flexible for testing any linear combination of means

Dunnett's Test

Dunnett's test is a specialized post-hoc test used when comparing several treatment groups to a single control group
Computes a critical value based on the Dunnett's distribution, which accounts for the correlation between the comparisons to the control group
Controls the FWER for the comparisons between each treatment group and the control group
Useful in experiments where the main interest lies in comparing treatments to a control (e.g., drug trials comparing different doses to a placebo)

📊Experimental Design Unit 12 Review

12.2 Multiple comparisons and post-hoc tests

📊Experimental Design
Unit 12 Review

12.2 Multiple comparisons and post-hoc tests

Unit & Topic Study Guides

Multiple Comparison Corrections

Controlling False Positives

Bonferroni and Holm-Bonferroni Corrections

False Discovery Rate

Post-hoc Tests

Pairwise Comparisons

Tukey's HSD and Scheffe's Tests

Dunnett's Test

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes