The test of independence helps determine if two categorical variables are related. Using contingency tables and chi-square calculations, we can analyze observed frequencies against expected values to assess independence.
This statistical method is crucial for understanding relationships in categorical data. By following a step-by-step process, we can calculate test statistics, compare them to critical values, and draw conclusions about variable dependencies.
Test of Independence
Construction of contingency tables
- Two-way frequency table displays relationship between two categorical variables
- Rows represent one categorical variable (gender)
- Columns represent the other categorical variable (preferred color)
- Each cell contains observed frequency or count of intersection between row and column variables
- Steps to construct contingency table:
- Identify two categorical variables of interest
- Determine levels or categories for each variable
- Create table with rows and columns representing levels of each variable
- Fill in cells with observed frequencies or counts for each combination of row and column categories (25 males prefer blue)
- Total of each row and column called marginal frequency
- Grand total is sum of all observations in table
Calculation of chi-square test statistic
- Test statistic for test of independence calculated using chi-square distribution
- Chi-square distribution is right-skewed distribution with degrees of freedom equal to $(r-1)(c-1)$, where $r$ is number of rows and $c$ is number of columns in contingency table
- To calculate test statistic:
- Compute expected frequency for each cell in contingency table
- Expected frequency $E_{ij} = \frac{(\text{row }i\text{ total}) \times (\text{column }j\text{ total})}{\text{grand total}}$
- Calculate chi-square test statistic using formula:
- $\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$
- $O_{ij}$ is observed frequency in $i$-th row and $j$-th column
- $E_{ij}$ is expected frequency in $i$-th row and $j$-th column
- Compute expected frequency for each cell in contingency table
- Test statistic measures discrepancy between observed and expected frequencies
- Larger test statistic indicates greater difference between observed and expected values, suggesting dependence between variables (test statistic of 12.5 suggests strong dependence)
- Effect size can be calculated to quantify the strength of the relationship between variables
Determination of factor independence
- Test of independence used to determine if there is significant relationship between two categorical variables
- Null hypothesis ($H_0$): Two categorical variables are independent
- Alternative hypothesis ($H_1$): Two categorical variables are dependent
- Steps to conduct test of independence:
- State null and alternative hypotheses
- Construct contingency table and calculate expected frequencies
- Calculate chi-square test statistic
- Determine degrees of freedom $(r-1)(c-1)$
- Choose significance level ($\alpha = 0.05$)
- Find critical value from chi-square distribution table using degrees of freedom and significance level
- Compare test statistic to critical value or calculate p-value
- If test statistic greater than critical value or p-value less than significance level, reject null hypothesis and conclude variables are dependent (test statistic of 15.2 > critical value of 7.81, reject $H_0$)
- If test statistic less than critical value or p-value greater than significance level, fail to reject null hypothesis and conclude insufficient evidence to suggest dependence between variables (test statistic of 3.5 < critical value of 7.81, fail to reject $H_0$)
- Sample size affects the power of the test to detect significant relationships
Additional Analysis
- Post-hoc analysis can be conducted to identify specific categories contributing to significant results
- Standardized residuals can be calculated to determine which cells in the contingency table contribute most to the chi-square statistic