Probability distributions are the backbone of statistical analysis, helping us model real-world scenarios. This section dives into discrete and continuous distributions, exploring their unique characteristics and applications in various fields.
We'll cover common distributions like Binomial, Poisson, Normal, and Exponential, learning how to calculate probabilities and expected values. Understanding these concepts is crucial for making informed decisions based on data in practical situations.
Discrete vs Continuous Variables
Defining Random Variables
- Random variables assign numerical values to outcomes of random experiments
- Categorized as either discrete or continuous
- Discrete random variables take on countable distinct values (whole numbers or integers)
- Continuous random variables take on any value within a given range (real numbers)
- Examples of discrete variables include number of customers in a store, dice rolls, or coin flips
- Examples of continuous variables include height, weight, or time elapsed
Probability Distribution Functions
- Discrete random variables described by probability mass function (PMF)
- Continuous random variables described by probability density function (PDF)
- Cumulative distribution functions (CDFs) exist for both types
- CDFs for discrete variables have step-like appearance
- CDFs for continuous variables are smooth curves
- PMF example for rolling a fair six-sided die: P(X = k) = 1/6 for k = 1, 2, 3, 4, 5, 6
- PDF example for standard normal distribution: f(x) = (1 / โ(2ฯ)) e^(-(x^2)/2)
Calculations and Properties
- Expected value calculations differ between discrete and continuous variables
- Discrete expected value: E(X) = ฮฃ(x P(X = x))
- Continuous expected value: E(X) = โซ(x f(x) dx)
- Variance calculations also differ reflecting fundamental nature differences
- Discrete variance: Var(X) = ฮฃ((x - E(X))^2 P(X = x))
- Continuous variance: Var(X) = โซ((x - E(X))^2 f(x) dx)
- Probabilities for discrete variables found by summing individual point probabilities
- Probabilities for continuous variables found by integrating over intervals
Common Discrete Distributions
Basic Discrete Distributions
- Bernoulli distribution models binary outcomes with single trial
- Characterized by probability of success (p)
- PMF: P(X = x) = p^x (1-p)^(1-x) for x = 0 or 1
- Example: Flipping a coin (heads = success, tails = failure)
- Binomial distribution extends Bernoulli to n independent trials
- Defined by parameters n (number of trials) and p (probability of success)
- PMF: P(X = k) = C(n,k) * p^k * (1-p)^(n-k)
- Example: Number of heads in 10 coin flips
- Geometric distribution models number of trials until first success
- Defined by probability of success p
- PMF: P(X = k) = (1-p)^(k-1) p
- Example: Number of times rolling a die until getting a 6
Advanced Discrete Distributions
- Poisson distribution models number of events in fixed interval
- Characterized by rate parameter ฮป (average number of events)
- PMF: P(X = k) = (e^(-ฮป) ฮป^k) / k!
- Example: Number of customers arriving at a store in one hour
- Negative Binomial distribution generalizes Geometric distribution
- Models number of failures before r successes occur
- PMF: P(X = k) = C(k+r-1, r-1) * p^r * (1-p)^k
- Example: Number of non-defective items inspected before finding 5 defective ones
Applications and Calculations
- Quality control uses Binomial distribution (number of defects in a batch)
- Rare event modeling employs Poisson distribution (number of earthquakes in a year)
- Reliability testing utilizes Geometric distribution (number of trials until component failure)
- Expected value for Binomial: E(X) = np
- Variance for Binomial: Var(X) = np(1-p)
- Expected value for Poisson: E(X) = ฮป
- Variance for Poisson: Var(X) = ฮป
Common Continuous Distributions
Fundamental Continuous Distributions
- Uniform distribution represents constant probability over finite interval
- Defined by minimum (a) and maximum (b) values
- PDF: f(x) = 1 / (b - a) for a โค x โค b
- Example: Selecting a random number between 0 and 1
- Normal (Gaussian) distribution characterized by bell-shaped curve
- Defined by mean (ฮผ) and standard deviation (ฯ) parameters
- PDF: f(x) = (1 / (ฯโ(2ฯ))) e^(-((x-ฮผ)^2) / (2ฯ^2))
- Example: Heights of adult males in a population
- Exponential distribution models time between events in Poisson process
- Characterized by rate parameter ฮป
- PDF: f(x) = ฮปe^(-ฮปx) for x โฅ 0
- Example: Time between customer arrivals at a bank
Specialized Continuous Distributions
- Gamma distribution generalizes Exponential distribution
- Useful for modeling waiting times
- Defined by shape (k) and scale (ฮธ) parameters
- PDF: f(x) = (x^(k-1) * e^(-x/ฮธ)) / (ฮธ^k * ฮ(k))
- Example: Total rainfall in a month
- t-distribution resembles Normal but has heavier tails
- Often used in hypothesis testing with small sample sizes
- PDF involves complex formula with degrees of freedom parameter
- Example: Estimating population mean with small sample size
- Chi-square distribution derived from sum of squared standard normal variables
- Commonly used in goodness-of-fit tests
- PDF: f(x) = (x^((k/2)-1) * e^(-x/2)) / (2^(k/2) * ฮ(k/2))
- Example: Testing independence in contingency tables
Advanced Distributions and Applications
- F-distribution represents ratio of two chi-square distributions
- Frequently applied in analysis of variance (ANOVA)
- PDF involves complex formula with two degrees of freedom parameters
- Example: Comparing variances of two populations
- Weibull distribution models product life and failure rates
- Characterized by shape (k) and scale (ฮป) parameters
- PDF: f(x) = (k/ฮป) * (x/ฮป)^(k-1) * e^(-(x/ฮป)^k)
- Example: Modeling wind speed distributions
Probability Calculations for Distributions
Discrete Probability Calculations
- Calculate probabilities for discrete distributions by summing individual outcomes
- Use PMF for single point probabilities: P(X = k)
- Use CDF for cumulative probabilities: P(X โค k)
- Example: Probability of exactly 3 heads in 5 coin flips (Binomial)
- P(X = 3) = C(5,3) * (0.5)^3 * (0.5)^2 = 0.3125
- Example: Probability of at most 2 events in Poisson distribution with ฮป = 3
- P(X โค 2) = e^(-3) (1 + 3 + (3^2/2!)) โ 0.4232
Continuous Probability Calculations
- Calculate probabilities for continuous distributions by integrating PDF
- Use CDF for probabilities of intervals: P(a โค X โค b) = F(b) - F(a)
- Z-score transformation standardizes normal distributions
- Z = (X - ฮผ) / ฯ
- Allows easy probability calculations using standard normal tables
- Example: Probability of a value between 1 and 2 in Uniform(0,3) distribution
- P(1 โค X โค 2) = (2 - 1) / (3 - 0) = 1/3
- Example: Probability of value within one standard deviation of mean in Normal distribution
- P(ฮผ - ฯ โค X โค ฮผ + ฯ) โ 0.6827 (using z-score and standard normal table)
Quantile Calculations and Advanced Techniques
- Calculate quantiles by inverting CDF for both discrete and continuous distributions
- Median found at 50th percentile, quartiles at 25th and 75th percentiles
- For discrete distributions, interpolation may be necessary for non-integer quantiles
- Example: Finding median of Exponential distribution with ฮป = 0.5
- Median = -ln(0.5) / ฮป โ 1.3863
- Statistical software provides built-in functions for probability and quantile calculations
- R functions (pnorm, qnorm for Normal distribution)
- Python scipy.stats module (norm.cdf, norm.ppf for Normal distribution)
- Understand relationships between distributions for appropriate calculation methods
- t-distribution approaches Normal as degrees of freedom increase
- Chi-square distribution related to Normal through sum of squares