When studying a population, we often rely on sample proportions. These proportions form a sampling distribution, which becomes normal as sample size grows. This distribution's center matches the population proportion, while its spread is measured by standard error.
Understanding the sampling distribution of proportions is crucial for making inferences about populations. It allows us to calculate confidence intervals and determine necessary sample sizes for accurate estimates. This knowledge is fundamental for statistical analysis in various fields.
Sampling Distribution of the Proportion
Sampling distribution of proportion
- Probability distribution of sample proportions obtained from repeated sampling of a population
- Describes variability and behavior of sample proportions from different samples of the same size
- Shape approaches a normal distribution as sample size increases, according to Central Limit Theorem
- True regardless of population distribution shape, if sample size is sufficiently large ($n \geq 30$) and population is at least 10 times larger than sample
- Center equals the population proportion ($p$)
- Mean of sample proportions ($\mu_{\hat{p}}$) is an unbiased estimator of population proportion
- Spread measured by standard deviation, also known as standard error of the proportion ($\sigma_{\hat{p}}$)
- Standard error decreases as sample size increases, indicating larger samples provide more precise estimates of population proportion
Standard error calculation
- Standard error of the proportion ($\sigma_{\hat{p}}$) calculated using formula: $\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}$
- $p$ is population proportion
- $n$ is sample size
- Inversely related to sample size ($n$)
- As sample size increases, standard error decreases, indicating larger samples provide more precise estimates of population proportion
- Affected by population proportion ($p$)
- When $p$ is close to 0 or 1, standard error is smaller compared to when $p$ is close to 0.5, assuming constant sample size
Confidence intervals for proportions
- Range of values likely to contain true population proportion with specified level of confidence
- Constructed using formula: $\hat{p} \pm z_{\alpha/2} \cdot \sigma_{\hat{p}}$
- $\hat{p}$ is sample proportion
- $z_{\alpha/2}$ is critical value from standard normal distribution corresponding to desired confidence level
- $\sigma_{\hat{p}}$ is standard error of the proportion
- Interpreted as: "We are $(1-\alpha)$% confident that the true population proportion falls within the calculated interval"
- 95% confidence interval means if we repeatedly sample population and construct intervals, about 95% would contain true population proportion
Sample size determination
- Minimum sample size required depends on desired level of confidence, margin of error, and estimate of population proportion
- Calculated using formula: $n = \frac{z_{\alpha/2}^2 \cdot \hat{p}(1-\hat{p})}{E^2}$
- $z_{\alpha/2}$ is critical value from standard normal distribution corresponding to desired confidence level
- $\hat{p}$ is estimate of population proportion (often 0.5 if no prior information available)
- $E$ is desired margin of error
- If calculated sample size is more than 5% of population size, use finite population correction factor to adjust: $n_{adjusted} = \frac{n}{1+\frac{n-1}{N}}$
- $N$ is population size