Fisher's exact test and McNemar's test are key tools for analyzing categorical data in biology. These tests help researchers assess associations between variables and changes in proportions, especially when dealing with small sample sizes or paired data.
Understanding these tests is crucial for interpreting results in biological studies. Fisher's exact test is ideal for small samples, while McNemar's test shines in before-after designs. Both tests provide valuable insights into categorical data analysis in biostatistics.
Fisher's exact test for small samples
When to use Fisher's exact test
- Analyzes contingency tables, typically 2x2 tables, when the sample size is small or the expected frequencies in any cell are less than 5
- Based on the hypergeometric distribution and calculates the exact probability of observing the given data or more extreme data, assuming the null hypothesis of no association between the variables is true
- Particularly useful when dealing with rare events (genetic disorders) or when the assumptions of the chi-square test, such as the minimum expected cell frequency, are not met due to small sample sizes
- Developed by Ronald Fisher in the 1930s as an alternative to the chi-square test for small samples or sparse data (ecological studies with limited observations)
Hypergeometric distribution and exact probability
- Fisher's exact test uses the hypergeometric distribution to calculate the exact probability of observing the given data or more extreme data
- Assumes the row and column totals are fixed and the null hypothesis of no association between the variables is true
- The probability of a specific table configuration is calculated using the formula: , where a, b, c, and d are the cell frequencies and n is the total sample size
- Summing the probabilities of the observed table and all more extreme tables (tables with the same row and column totals but a smaller probability) yields the p-value for a one-tailed test, which is doubled for a two-tailed test
Fisher's exact test for 2x2 tables
Arranging data and calculating probabilities
- Arrange the data in a 2x2 contingency table with two categorical variables, each having two levels (treatment vs. control and success vs. failure)
- Calculate the probability of observing the given table or a more extreme table using the hypergeometric distribution formula:
- Sum the probabilities of the observed table and all more extreme tables to obtain the p-value for a one-tailed test, and double this value for a two-tailed test
- Compare the p-value to the chosen significance level (0.05) to determine if the association between the variables is statistically significant
Example of Fisher's exact test
- Suppose a study investigates the association between a rare genetic variant and a disease in a small sample of 50 individuals
- The 2x2 contingency table shows 8 individuals with the variant and disease, 2 with the variant but no disease, 12 without the variant but with the disease, and 28 without the variant and disease
- Calculating the exact probabilities using the hypergeometric distribution and summing them yields a one-tailed p-value of 0.0198
- Doubling the one-tailed p-value results in a two-tailed p-value of 0.0396, which is less than the significance level of 0.05, indicating a significant association between the genetic variant and the disease
McNemar's test for paired data
Applying McNemar's test
- McNemar's test is a non-parametric statistical test used to compare paired proportions or analyze changes in proportions for dichotomous variables in a before-after design or matched-pair samples
- Introduced by Quinn McNemar in 1947 as a modification of the chi-square test for paired data
- Arrange the data in a 2x2 contingency table, where the rows and columns represent the two time points or conditions (before and after) and the cells contain the frequencies of subjects falling into each combination of categories
- The null hypothesis is that the proportions of subjects changing from one category to the other are equal in both directions (no significant change in proportions over time or between conditions)
Calculating the test statistic and interpreting results
- Calculate the test statistic using the formula: , where b and c are the frequencies of subjects who changed categories in each direction
- The test statistic follows a chi-square distribution with 1 degree of freedom
- Compare the calculated test statistic to the critical value from the chi-square distribution at the chosen significance level to determine if the change in proportions is statistically significant
- A significant result (test statistic > critical value) indicates a significant change in the proportions of subjects falling into each category between the two time points or conditions
- The direction of the change can be determined by comparing the frequencies of subjects who changed categories in each direction (b and c in the test statistic formula)
Interpreting Fisher's and McNemar's tests
Interpreting Fisher's exact test results
- A significant result (p-value < significance level) indicates a significant association between the two categorical variables in the 2x2 contingency table
- The direction of the association can be determined by comparing the observed frequencies to the expected frequencies under the null hypothesis of no association
- A non-significant result suggests insufficient evidence to conclude that an association exists between the variables, but it does not necessarily prove that no association exists
- In biological research, Fisher's exact test can be used to assess the effectiveness of treatments (comparing treatment response rates) or compare the prevalence of conditions between groups (genetic variants and disease risk)
Interpreting McNemar's test results
- A significant result (test statistic > critical value) indicates a significant change in the proportions of subjects falling into each category between the two time points or conditions
- The direction of the change can be determined by comparing the frequencies of subjects who changed categories in each direction (b and c in the test statistic formula)
- A non-significant result suggests insufficient evidence to conclude that the proportions have changed significantly over time or between conditions
- In biological research, McNemar's test can be used to analyze changes in biological characteristics over time (disease status before and after treatment) or compare paired samples (tumor vs. adjacent normal tissue)
Limitations and considerations
- Fisher's exact test assumes independence between observations, while McNemar's test requires paired data
- Researchers should consider these limitations when interpreting the results and drawing conclusions
- Both tests are useful for analyzing categorical data in biological research, but the choice between them depends on the study design and the nature of the data (independent vs. paired observations)
- It is essential to report the effect size (odds ratio for Fisher's exact test and proportion of discordant pairs for McNemar's test) alongside the p-value to provide a more comprehensive understanding of the results