The Central Limit Theorem is a game-changer in statistics. It tells us that when we take big enough samples, their means will follow a normal distribution, no matter what the original data looks like. This is super helpful for making sense of messy real-world data.
Using pocket change as an example, we can see this theorem in action. By taking lots of samples and plotting their means, we end up with a nice, bell-shaped curve. This lets us make predictions and draw conclusions about the population, even when our data starts out all over the place.
Central Limit Theorem (Pocket Change)
Simulate the distribution of sample means using pocket change data to demonstrate the Central Limit Theorem
- Central Limit Theorem states sampling distribution of mean will be approximately normal, regardless of shape of population distribution, when sample size is sufficiently large (typically $n \geq 30$)
- Applies to pocket change data, where population distribution may be skewed or non-normal (coins, bills)
- Simulate distribution of sample means using pocket change data:
- Collect large number of samples (100 or more) of specific sample size ($n = 30$) from population of pocket change
- Calculate mean of each sample
- Plot distribution of sample means in histogram
- Resulting distribution of sample means will be approximately normal, even if original population distribution of pocket change is not normal (skewed towards lower values)
- This demonstrates the law of large numbers, where the sample mean converges to the population mean as sample size increases
Sample size effects on sampling distribution
- Shape:
- As sample size increases, sampling distribution of mean becomes more normal, even if population distribution is not normal
- Smaller sample sizes may result in sampling distribution that is less normal, especially if population distribution is heavily skewed (many pennies, few large bills)
- Center:
- Mean of sampling distribution of mean is equal to population mean, regardless of sample size
- Center of sampling distribution will be same as center of population distribution, regardless of sample size used
- Spread:
- Standard deviation of sampling distribution of mean, also known as standard error, is equal to $\frac{\sigma}{\sqrt{n}}$
- As sample size increases, standard error decreases, resulting in narrower sampling distribution (less variability)
- Larger sample sizes will produce sampling distribution with less variability, while smaller sample sizes will result in sampling distribution with more variability
- Sampling error, the difference between the sample statistic and population parameter, decreases with larger sample sizes
Probability calculations with Central Limit Theorem
- Calculate probabilities using Central Limit Theorem:
- Determine population mean and standard deviation of pocket change data
- Calculate standard error using formula $\frac{\sigma}{\sqrt{n}}$, where $\sigma$ is population standard deviation and $n$ is sample size
- Use standard normal distribution (z-distribution) to calculate probabilities, with z-score formula: $z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}$, where $\bar{x}$ is sample mean, $\mu$ is population mean, $\sigma$ is population standard deviation, and $n$ is sample size
- Make inferences about population parameters:
- Construct confidence intervals for population mean using sample mean, standard error, and appropriate z-score or t-score (depending on sample size and whether population standard deviation is known)
- Formula for confidence interval is: $\bar{x} \pm (z \text{ or } t) \times \frac{\sigma}{\sqrt{n}}$
- Conduct hypothesis tests about population mean using sample mean, standard error, and appropriate test statistic (z-score or t-score)
- Test statistic is calculated using formula: $z \text{ or } t = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}}$, where $\mu_0$ is hypothesized population mean
- Construct confidence intervals for population mean using sample mean, standard error, and appropriate z-score or t-score (depending on sample size and whether population standard deviation is known)
Statistical Inference and Probability Distributions
- Statistical inference involves drawing conclusions about a population based on sample data
- The Central Limit Theorem is crucial for statistical inference, as it allows us to make assumptions about the sampling distribution
- A random variable is a variable whose value is determined by the outcome of a random process (e.g., the amount of money in a randomly selected person's pocket)
- The probability distribution of a random variable describes the likelihood of different outcomes occurring