Fiveable

๐ŸŽฒData, Inference, and Decisions Unit 2 Review

QR code for Data, Inference, and Decisions practice questions

2.3 Discrete and continuous probability distributions

๐ŸŽฒData, Inference, and Decisions
Unit 2 Review

2.3 Discrete and continuous probability distributions

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸŽฒData, Inference, and Decisions
Unit & Topic Study Guides

Probability distributions are the backbone of statistical analysis, helping us model real-world scenarios. This section dives into discrete and continuous distributions, exploring their unique characteristics and applications in various fields.

We'll cover common distributions like Binomial, Poisson, Normal, and Exponential, learning how to calculate probabilities and expected values. Understanding these concepts is crucial for making informed decisions based on data in practical situations.

Discrete vs Continuous Variables

Defining Random Variables

  • Random variables assign numerical values to outcomes of random experiments
  • Categorized as either discrete or continuous
  • Discrete random variables take on countable distinct values (whole numbers or integers)
  • Continuous random variables take on any value within a given range (real numbers)
  • Examples of discrete variables include number of customers in a store, dice rolls, or coin flips
  • Examples of continuous variables include height, weight, or time elapsed

Probability Distribution Functions

  • Discrete random variables described by probability mass function (PMF)
  • Continuous random variables described by probability density function (PDF)
  • Cumulative distribution functions (CDFs) exist for both types
  • CDFs for discrete variables have step-like appearance
  • CDFs for continuous variables are smooth curves
  • PMF example for rolling a fair six-sided die: P(X = k) = 1/6 for k = 1, 2, 3, 4, 5, 6
  • PDF example for standard normal distribution: f(x) = (1 / โˆš(2ฯ€)) e^(-(x^2)/2)

Calculations and Properties

  • Expected value calculations differ between discrete and continuous variables
  • Discrete expected value: E(X) = ฮฃ(x P(X = x))
  • Continuous expected value: E(X) = โˆซ(x f(x) dx)
  • Variance calculations also differ reflecting fundamental nature differences
  • Discrete variance: Var(X) = ฮฃ((x - E(X))^2 P(X = x))
  • Continuous variance: Var(X) = โˆซ((x - E(X))^2 f(x) dx)
  • Probabilities for discrete variables found by summing individual point probabilities
  • Probabilities for continuous variables found by integrating over intervals

Common Discrete Distributions

Basic Discrete Distributions

  • Bernoulli distribution models binary outcomes with single trial
    • Characterized by probability of success (p)
    • PMF: P(X = x) = p^x (1-p)^(1-x) for x = 0 or 1
    • Example: Flipping a coin (heads = success, tails = failure)
  • Binomial distribution extends Bernoulli to n independent trials
    • Defined by parameters n (number of trials) and p (probability of success)
    • PMF: P(X = k) = C(n,k) * p^k * (1-p)^(n-k)
    • Example: Number of heads in 10 coin flips
  • Geometric distribution models number of trials until first success
    • Defined by probability of success p
    • PMF: P(X = k) = (1-p)^(k-1) p
    • Example: Number of times rolling a die until getting a 6

Advanced Discrete Distributions

  • Poisson distribution models number of events in fixed interval
    • Characterized by rate parameter ฮป (average number of events)
    • PMF: P(X = k) = (e^(-ฮป) ฮป^k) / k!
    • Example: Number of customers arriving at a store in one hour
  • Negative Binomial distribution generalizes Geometric distribution
    • Models number of failures before r successes occur
    • PMF: P(X = k) = C(k+r-1, r-1) * p^r * (1-p)^k
    • Example: Number of non-defective items inspected before finding 5 defective ones

Applications and Calculations

  • Quality control uses Binomial distribution (number of defects in a batch)
  • Rare event modeling employs Poisson distribution (number of earthquakes in a year)
  • Reliability testing utilizes Geometric distribution (number of trials until component failure)
  • Expected value for Binomial: E(X) = np
  • Variance for Binomial: Var(X) = np(1-p)
  • Expected value for Poisson: E(X) = ฮป
  • Variance for Poisson: Var(X) = ฮป

Common Continuous Distributions

Fundamental Continuous Distributions

  • Uniform distribution represents constant probability over finite interval
    • Defined by minimum (a) and maximum (b) values
    • PDF: f(x) = 1 / (b - a) for a โ‰ค x โ‰ค b
    • Example: Selecting a random number between 0 and 1
  • Normal (Gaussian) distribution characterized by bell-shaped curve
    • Defined by mean (ฮผ) and standard deviation (ฯƒ) parameters
    • PDF: f(x) = (1 / (ฯƒโˆš(2ฯ€))) e^(-((x-ฮผ)^2) / (2ฯƒ^2))
    • Example: Heights of adult males in a population
  • Exponential distribution models time between events in Poisson process
    • Characterized by rate parameter ฮป
    • PDF: f(x) = ฮปe^(-ฮปx) for x โ‰ฅ 0
    • Example: Time between customer arrivals at a bank

Specialized Continuous Distributions

  • Gamma distribution generalizes Exponential distribution
    • Useful for modeling waiting times
    • Defined by shape (k) and scale (ฮธ) parameters
    • PDF: f(x) = (x^(k-1) * e^(-x/ฮธ)) / (ฮธ^k * ฮ“(k))
    • Example: Total rainfall in a month
  • t-distribution resembles Normal but has heavier tails
    • Often used in hypothesis testing with small sample sizes
    • PDF involves complex formula with degrees of freedom parameter
    • Example: Estimating population mean with small sample size
  • Chi-square distribution derived from sum of squared standard normal variables
    • Commonly used in goodness-of-fit tests
    • PDF: f(x) = (x^((k/2)-1) * e^(-x/2)) / (2^(k/2) * ฮ“(k/2))
    • Example: Testing independence in contingency tables

Advanced Distributions and Applications

  • F-distribution represents ratio of two chi-square distributions
    • Frequently applied in analysis of variance (ANOVA)
    • PDF involves complex formula with two degrees of freedom parameters
    • Example: Comparing variances of two populations
  • Weibull distribution models product life and failure rates
    • Characterized by shape (k) and scale (ฮป) parameters
    • PDF: f(x) = (k/ฮป) * (x/ฮป)^(k-1) * e^(-(x/ฮป)^k)
    • Example: Modeling wind speed distributions

Probability Calculations for Distributions

Discrete Probability Calculations

  • Calculate probabilities for discrete distributions by summing individual outcomes
  • Use PMF for single point probabilities: P(X = k)
  • Use CDF for cumulative probabilities: P(X โ‰ค k)
  • Example: Probability of exactly 3 heads in 5 coin flips (Binomial)
    • P(X = 3) = C(5,3) * (0.5)^3 * (0.5)^2 = 0.3125
  • Example: Probability of at most 2 events in Poisson distribution with ฮป = 3
    • P(X โ‰ค 2) = e^(-3) (1 + 3 + (3^2/2!)) โ‰ˆ 0.4232

Continuous Probability Calculations

  • Calculate probabilities for continuous distributions by integrating PDF
  • Use CDF for probabilities of intervals: P(a โ‰ค X โ‰ค b) = F(b) - F(a)
  • Z-score transformation standardizes normal distributions
    • Z = (X - ฮผ) / ฯƒ
    • Allows easy probability calculations using standard normal tables
  • Example: Probability of a value between 1 and 2 in Uniform(0,3) distribution
    • P(1 โ‰ค X โ‰ค 2) = (2 - 1) / (3 - 0) = 1/3
  • Example: Probability of value within one standard deviation of mean in Normal distribution
    • P(ฮผ - ฯƒ โ‰ค X โ‰ค ฮผ + ฯƒ) โ‰ˆ 0.6827 (using z-score and standard normal table)

Quantile Calculations and Advanced Techniques

  • Calculate quantiles by inverting CDF for both discrete and continuous distributions
  • Median found at 50th percentile, quartiles at 25th and 75th percentiles
  • For discrete distributions, interpolation may be necessary for non-integer quantiles
  • Example: Finding median of Exponential distribution with ฮป = 0.5
    • Median = -ln(0.5) / ฮป โ‰ˆ 1.3863
  • Statistical software provides built-in functions for probability and quantile calculations
    • R functions (pnorm, qnorm for Normal distribution)
    • Python scipy.stats module (norm.cdf, norm.ppf for Normal distribution)
  • Understand relationships between distributions for appropriate calculation methods
    • t-distribution approaches Normal as degrees of freedom increase
    • Chi-square distribution related to Normal through sum of squares