The goodness-of-fit test helps determine if sample data follows a specific probability distribution. It uses a chi-square statistic to compare observed frequencies with expected frequencies based on the hypothesized distribution.
Interpreting the test results involves comparing the calculated test statistic to a critical value. If the statistic exceeds the critical value, we reject the null hypothesis, suggesting the data doesn't fit the specified distribution.
Goodness-of-Fit Test
Goodness-of-fit test for distributions
- Determines if sample data comes from a population with a specific probability distribution (normal, binomial, Poisson)
- Null hypothesis ($H_0$) states the data follows the specified distribution
- Alternative hypothesis ($H_a$) states the data does not follow the specified distribution
- Steps to perform the test:
- State the hypotheses
- Calculate expected frequencies for each category based on the hypothesized distribution
- Calculate the test statistic using observed and expected frequencies
- Determine degrees of freedom and critical value from the chi-square distribution
- Compare test statistic to critical value and decide to reject or fail to reject $H_0$
- Calculate the p-value to assess the strength of evidence against $H_0$
Test statistic calculation
- Chi-square formula calculates the test statistic:
- $\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}$
- $\chi^2$ represents the test statistic
- $O_i$ represents observed frequency for category $i$
- $E_i$ represents expected frequency for category $i$
- $k$ represents the number of categories
- $\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}$
- Expected frequencies calculated by multiplying total sample size by probability of each category according to hypothesized distribution
- Degrees of freedom for the test:
- $df = k - 1 - m$
- $k$ represents the number of categories
- $m$ represents the number of parameters estimated from the data
- $df = k - 1 - m$
Interpretation of chi-square results
- Goodness-of-fit test is a right-tailed test
- $H_0$ rejected if test statistic greater than critical value
- Critical value determined using chi-square distribution table with calculated degrees of freedom and desired significance level (typically $\alpha = 0.05$)
- If test statistic greater than critical value:
- Reject $H_0$
- Sufficient evidence suggests data does not follow specified distribution
- If test statistic less than or equal to critical value:
- Fail to reject $H_0$
- Insufficient evidence to conclude data does not follow specified distribution
Additional Considerations
- Goodness-of-fit test is a nonparametric test, meaning it does not assume a specific underlying distribution for the data
- Used primarily for categorical data analysis
- Can be extended to analyze contingency tables for independence or homogeneity tests