The normal distribution is a crucial concept in probability theory, shaping our understanding of data spread. It's characterized by its symmetrical bell curve, with two key parameters: the mean (ฮผ) and standard deviation (ฯ). These determine the curve's center and spread, respectively.
Standardizing normal variables transforms them into a standard normal distribution with a mean of 0 and standard deviation of 1. This process, along with the z-table, allows us to calculate probabilities for various scenarios, making the normal distribution a powerful tool in statistical analysis.
Normal distribution properties
Characteristics and parameters
- Normal distribution, also known as Gaussian distribution, exhibits symmetry around its mean
- Two parameters characterize the normal distribution
- Mean (ฮผ) determines the center of the distribution
- Standard deviation (ฯ) measures the spread of the distribution
- Probability density function (PDF) for a normal distribution follows the equation:
- Bell-shaped curve represents the normal distribution
- Highest point occurs at the mean
- Probability decreases symmetrically as values move away from the mean
Distribution properties
- Empirical rule (68-95-99.7 rule) describes data distribution
- 68% of data falls within one standard deviation of the mean
- 95% of data falls within two standard deviations
- 99.7% of data falls within three standard deviations
- Unimodal distribution with mode, median, and mean equal and centered
- Total area under the normal distribution curve always equals 1
- Represents the sum of probabilities for all possible outcomes
Standardizing normal variables
Standardization process
- Standardization converts a normal random variable X to a standard normal variable Z
- Resulting Z has a mean of 0 and standard deviation of 1
- Formula for standardization:
- X represents the original value
- ฮผ represents the mean
- ฯ represents the standard deviation
- Standard normal distribution (z-distribution) results from standardization
- Special case of normal distribution with ฮผ = 0 and ฯ = 1
Using the standard normal table
- Z-table (standard normal table) provides cumulative probabilities for the standard normal distribution
- Steps to find probabilities using the z-table:
- Standardize the given value
- Locate the corresponding probability in the table
- Z-table typically gives area to the left of a given z-score
- Can be used to find areas to the right or between two z-scores through calculations
- Interpolation may be necessary for z-scores falling between provided table values
- Example: For z-score 1.234, interpolate between values for 1.23 and 1.24
Probabilities for normal distributions
Calculating probabilities
- Determine probabilities for general normal distributions by standardizing values and using the z-table
- Cumulative distribution function (CDF) gives probability that X โค x for a random variable X
- Find probabilities between two values by calculating the difference between their CDFs
- Use symmetric intervals around the mean with the formula:
- ฮฆ represents the standard normal CDF
- Example: Probability of X falling within 2ฯ of the mean is 2ฮฆ(2) - 1 โ 0.9545
Determining quantiles
- Calculate quantiles (percentiles, quartiles) using inverse standardization and the z-table
- Formula for finding a quantile:
- Z represents the z-score corresponding to the desired percentile
- Interquartile range (IQR) for a normal distribution approximately equals 1.34ฯ
- Useful for identifying potential outliers
- Example: In a normal distribution with ฯ = 10, IQR โ 13.4
Normal approximation of binomial distributions
Conditions for approximation
- Normal distribution approximates binomial distribution when:
- Sample size n is large
- Probability p is not too close to 0 or 1
- Rule of thumb for using normal approximation
- Both np and n(1-p) should be โฅ 5 or 10, depending on desired accuracy
- Example: For n = 100 and p = 0.3, np = 30 and n(1-p) = 70, satisfying the condition
Applying the approximation
- Approximating normal distribution parameters:
- Mean = np
- Standard deviation = โ(np(1-p))
- Apply continuity correction when using normal approximation
- Add or subtract 0.5 to the value of interest
- Depends on calculating "less than" or "greater than" probability
- Accuracy improves as n increases and p approaches 0.5
- Useful for large n values where direct binomial calculation becomes computationally intensive
- Recognize limitations and use exact binomial probabilities when high precision is required
- Example: Medical studies often require exact probabilities rather than approximations