Fiveable

๐Ÿ“ŠProbability and Statistics Unit 3 Review

QR code for Probability and Statistics practice questions

3.5 Normal distribution

๐Ÿ“ŠProbability and Statistics
Unit 3 Review

3.5 Normal distribution

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠProbability and Statistics
Unit & Topic Study Guides

The normal distribution is a fundamental concept in probability and statistics. It's characterized by its symmetric, bell-shaped curve and is defined by two parameters: the mean and standard deviation. This distribution is crucial for understanding many natural phenomena and forms the basis for numerous statistical techniques.

Normal distributions have several key properties, including symmetry and the 68-95-99.7 rule. The probability density function and cumulative distribution function are essential mathematical tools for working with normal distributions. The standard normal distribution, with a mean of 0 and standard deviation of 1, is particularly useful for standardizing data and making comparisons.

Definition of normal distribution

  • Continuous probability distribution that is symmetric and bell-shaped, with the mean, median, and mode all equal
  • Describes many natural phenomena such as heights, weights, and IQ scores
  • Defined by two parameters: the mean ($\mu$) and standard deviation ($\sigma$)

Properties of normal distribution

Symmetry of normal distribution

  • Normal distribution is symmetric about the mean
  • 50% of the data falls below the mean and 50% falls above the mean
  • Skewness, a measure of asymmetry, is zero for a normal distribution

Mean, median, mode of normal distribution

  • In a normal distribution, the mean, median, and mode are all equal
  • Mean represents the average value of the data
  • Median is the middle value when data is arranged in order
  • Mode is the most frequently occurring value

Standard deviation of normal distribution

  • Measures the spread or dispersion of data from the mean
  • Approximately 68% of data falls within one standard deviation of the mean
  • Approximately 95% of data falls within two standard deviations of the mean
  • Approximately 99.7% of data falls within three standard deviations of the mean

Probability density function

Formula for probability density function

  • $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$
    • $\mu$ is the mean
    • $\sigma$ is the standard deviation
    • $\pi \approx 3.14159$
    • $e \approx 2.71828$

Characteristics of probability density function

  • Gives the relative likelihood of a continuous random variable taking on a specific value
  • Area under the curve between two points represents the probability of the variable falling within that range
  • Total area under the curve is equal to 1

Cumulative distribution function

Definition of cumulative distribution function

  • Gives the probability that a random variable $X$ takes a value less than or equal to $x$
  • Denoted as $F(x) = P(X \leq x)$
  • Obtained by integrating the probability density function from $-\infty$ to $x$

Properties of cumulative distribution function

  • Non-decreasing function, i.e., $F(a) \leq F(b)$ if $a \leq b$
  • Ranges from 0 to 1
  • $\lim_{x \to -\infty} F(x) = 0$ and $\lim_{x \to \infty} F(x) = 1$

Standard normal distribution

Definition of standard normal distribution

  • Normal distribution with a mean of 0 and a standard deviation of 1
  • Denoted as $Z \sim N(0, 1)$
  • Any normal distribution can be transformed into a standard normal distribution using z-scores

Z-scores in standard normal distribution

  • Measures the number of standard deviations a data point is from the mean
  • Calculated as $z = \frac{x - \mu}{\sigma}$
    • $x$ is the data point
    • $\mu$ is the mean
    • $\sigma$ is the standard deviation
  • Allows for comparison of data points from different normal distributions

Applications of normal distribution

Normal approximation to binomial distribution

  • Binomial distribution can be approximated by a normal distribution when certain conditions are met
    • Sample size is large ($n \geq 30$)
    • Success probability is not too close to 0 or 1 ($np \geq 5$ and $n(1-p) \geq 5$)
  • Simplifies calculations for binomial probabilities

Confidence intervals using normal distribution

  • Used to estimate population parameters based on sample data
  • For large samples, confidence intervals for the mean can be constructed using the normal distribution
  • Example: 95% confidence interval for the mean is $\bar{x} \pm 1.96 \frac{\sigma}{\sqrt{n}}$, where $\bar{x}$ is the sample mean, $\sigma$ is the population standard deviation, and $n$ is the sample size

Hypothesis testing with normal distribution

  • Used to test claims about population parameters based on sample data
  • For large samples, the normal distribution can be used to calculate test statistics and p-values
  • Example: Z-test for a population mean with known standard deviation

Assessing normality

Graphical methods for assessing normality

  • Histogram: Should be approximately bell-shaped and symmetric
  • Normal probability plot (Q-Q plot): Data points should fall close to a straight line
  • Box plot: Should be symmetric with no outliers

Quantitative methods for assessing normality

  • Shapiro-Wilk test: Null hypothesis is that the data is normally distributed
    • P-value > 0.05 suggests normality
  • Kolmogorov-Smirnov test: Compares the empirical distribution function to the theoretical normal distribution function
    • P-value > 0.05 suggests normality
  • Skewness and kurtosis: Measures of asymmetry and tail thickness, respectively
    • Values close to 0 suggest normality

Transforming data to normal distribution

Box-Cox transformation

  • Family of power transformations that can help to normalize skewed data
  • Defined as: $y^{(\lambda)} = \begin{cases} \frac{y^\lambda - 1}{\lambda}, & \lambda \neq 0 \ \log(y), & \lambda = 0 \end{cases}$
    • $y$ is the original data
    • $\lambda$ is the transformation parameter
  • Optimal $\lambda$ can be found using maximum likelihood estimation

Other transformations for normality

  • Square root transformation: $\sqrt{y}$, useful for count data with Poisson distribution
  • Logarithmic transformation: $\log(y)$, useful for right-skewed data
  • Reciprocal transformation: $\frac{1}{y}$, useful for left-skewed data

Relationship to other distributions

Normal distribution vs t-distribution

  • T-distribution has heavier tails than the normal distribution
  • Used when the sample size is small ($n < 30$) and the population standard deviation is unknown
  • Converges to the normal distribution as the degrees of freedom increase

Normal distribution vs chi-square distribution

  • Chi-square distribution is right-skewed and non-negative
  • Used in hypothesis testing and confidence intervals for variance
  • Obtained by summing the squares of independent standard normal variables

Normal distribution vs F-distribution

  • F-distribution is right-skewed and non-negative
  • Used in hypothesis testing and confidence intervals for the ratio of two variances
  • Obtained by dividing two independent chi-square variables

Limitations of normal distribution

Situations where normal distribution is inappropriate

  • Data with extreme outliers or heavy tails
  • Strongly skewed data
  • Discrete or categorical data

Alternatives to normal distribution

  • Student's t-distribution: For small sample sizes with unknown population standard deviation
  • Poisson distribution: For count data with rare events
  • Binomial distribution: For binary data with a fixed number of trials
  • Exponential distribution: For modeling waiting times or time-to-event data