The normal distribution is a fundamental concept in probability and statistics. It's characterized by its symmetric, bell-shaped curve and is defined by two parameters: the mean and standard deviation. This distribution is crucial for understanding many natural phenomena and forms the basis for numerous statistical techniques.
Normal distributions have several key properties, including symmetry and the 68-95-99.7 rule. The probability density function and cumulative distribution function are essential mathematical tools for working with normal distributions. The standard normal distribution, with a mean of 0 and standard deviation of 1, is particularly useful for standardizing data and making comparisons.
Definition of normal distribution
- Continuous probability distribution that is symmetric and bell-shaped, with the mean, median, and mode all equal
- Describes many natural phenomena such as heights, weights, and IQ scores
- Defined by two parameters: the mean ($\mu$) and standard deviation ($\sigma$)
Properties of normal distribution
Symmetry of normal distribution
- Normal distribution is symmetric about the mean
- 50% of the data falls below the mean and 50% falls above the mean
- Skewness, a measure of asymmetry, is zero for a normal distribution
Mean, median, mode of normal distribution
- In a normal distribution, the mean, median, and mode are all equal
- Mean represents the average value of the data
- Median is the middle value when data is arranged in order
- Mode is the most frequently occurring value
Standard deviation of normal distribution
- Measures the spread or dispersion of data from the mean
- Approximately 68% of data falls within one standard deviation of the mean
- Approximately 95% of data falls within two standard deviations of the mean
- Approximately 99.7% of data falls within three standard deviations of the mean
Probability density function
Formula for probability density function
- $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$
- $\mu$ is the mean
- $\sigma$ is the standard deviation
- $\pi \approx 3.14159$
- $e \approx 2.71828$
Characteristics of probability density function
- Gives the relative likelihood of a continuous random variable taking on a specific value
- Area under the curve between two points represents the probability of the variable falling within that range
- Total area under the curve is equal to 1
Cumulative distribution function
Definition of cumulative distribution function
- Gives the probability that a random variable $X$ takes a value less than or equal to $x$
- Denoted as $F(x) = P(X \leq x)$
- Obtained by integrating the probability density function from $-\infty$ to $x$
Properties of cumulative distribution function
- Non-decreasing function, i.e., $F(a) \leq F(b)$ if $a \leq b$
- Ranges from 0 to 1
- $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$
Standard normal distribution
Definition of standard normal distribution
- Normal distribution with a mean of 0 and a standard deviation of 1
- Denoted as $Z \sim N(0, 1)$
- Any normal distribution can be transformed into a standard normal distribution using z-scores
Z-scores in standard normal distribution
- Measures the number of standard deviations a data point is from the mean
- Calculated as $z = \frac{x - \mu}{\sigma}$
- $x$ is the data point
- $\mu$ is the mean
- $\sigma$ is the standard deviation
- Allows for comparison of data points from different normal distributions
Applications of normal distribution
Normal approximation to binomial distribution
- Binomial distribution can be approximated by a normal distribution when certain conditions are met
- Sample size is large ($n \geq 30$)
- Success probability is not too close to 0 or 1 ($np \geq 5$ and $n(1-p) \geq 5$)
- Simplifies calculations for binomial probabilities
Confidence intervals using normal distribution
- Used to estimate population parameters based on sample data
- For large samples, confidence intervals for the mean can be constructed using the normal distribution
- Example: 95% confidence interval for the mean is $\bar{x} \pm 1.96 \frac{\sigma}{\sqrt{n}}$, where $\bar{x}$ is the sample mean, $\sigma$ is the population standard deviation, and $n$ is the sample size
Hypothesis testing with normal distribution
- Used to test claims about population parameters based on sample data
- For large samples, the normal distribution can be used to calculate test statistics and p-values
- Example: Z-test for a population mean with known standard deviation
Assessing normality
Graphical methods for assessing normality
- Histogram: Should be approximately bell-shaped and symmetric
- Normal probability plot (Q-Q plot): Data points should fall close to a straight line
- Box plot: Should be symmetric with no outliers
Quantitative methods for assessing normality
- Shapiro-Wilk test: Null hypothesis is that the data is normally distributed
- P-value > 0.05 suggests normality
- Kolmogorov-Smirnov test: Compares the empirical distribution function to the theoretical normal distribution function
- P-value > 0.05 suggests normality
- Skewness and kurtosis: Measures of asymmetry and tail thickness, respectively
- Values close to 0 suggest normality
Transforming data to normal distribution
Box-Cox transformation
- Family of power transformations that can help to normalize skewed data
- Defined as: $y^{(\lambda)} = \begin{cases} \frac{y^\lambda - 1}{\lambda}, & \lambda \neq 0 \ \log(y), & \lambda = 0 \end{cases}$
- $y$ is the original data
- $\lambda$ is the transformation parameter
- Optimal $\lambda$ can be found using maximum likelihood estimation
Other transformations for normality
- Square root transformation: $\sqrt{y}$, useful for count data with Poisson distribution
- Logarithmic transformation: $\log(y)$, useful for right-skewed data
- Reciprocal transformation: $\frac{1}{y}$, useful for left-skewed data
Relationship to other distributions
Normal distribution vs t-distribution
- T-distribution has heavier tails than the normal distribution
- Used when the sample size is small ($n < 30$) and the population standard deviation is unknown
- Converges to the normal distribution as the degrees of freedom increase
Normal distribution vs chi-square distribution
- Chi-square distribution is right-skewed and non-negative
- Used in hypothesis testing and confidence intervals for variance
- Obtained by summing the squares of independent standard normal variables
Normal distribution vs F-distribution
- F-distribution is right-skewed and non-negative
- Used in hypothesis testing and confidence intervals for the ratio of two variances
- Obtained by dividing two independent chi-square variables
Limitations of normal distribution
Situations where normal distribution is inappropriate
- Data with extreme outliers or heavy tails
- Strongly skewed data
- Discrete or categorical data
Alternatives to normal distribution
- Student's t-distribution: For small sample sizes with unknown population standard deviation
- Poisson distribution: For count data with rare events
- Binomial distribution: For binary data with a fixed number of trials
- Exponential distribution: For modeling waiting times or time-to-event data