Fiveable

๐ŸŽฒData Science Statistics Unit 4 Review

QR code for Data Science Statistics practice questions

4.3 Hypergeometric and Negative Binomial Distributions

๐ŸŽฒData Science Statistics
Unit 4 Review

4.3 Hypergeometric and Negative Binomial Distributions

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸŽฒData Science Statistics
Unit & Topic Study Guides

Hypergeometric and negative binomial distributions are key players in discrete probability. They model scenarios involving sampling without replacement and counting trials until a certain number of successes, respectively.

These distributions build on concepts from binomial and geometric distributions, offering powerful tools for quality control, epidemiology, and more. Understanding their properties and applications is crucial for tackling real-world probability problems in various fields.

Hypergeometric Distribution

Sampling Without Replacement and Combinatorial Notation

  • Hypergeometric distribution models probability of k successes in n draws without replacement from a finite population
  • Sampling without replacement alters probability of success with each draw
  • Uses combinatorial notation to calculate number of ways to select items from a population
  • Probability mass function expressed as P(X=k)=(Kk)(Nโˆ’Knโˆ’k)(Nn)P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}
  • N represents total population size, K denotes number of success states in population
  • n indicates sample size, k signifies number of observed successes in sample
  • Applies to scenarios with fixed population size and known number of success states

Applications in Quality Control

  • Widely used in manufacturing for lot acceptance sampling
  • Determines probability of accepting or rejecting a batch based on sample inspection
  • Helps optimize sampling plans to balance cost and quality assurance
  • Used in defect detection (identifying number of defective items in a production run)
  • Assists in inventory management (estimating number of specific items in a warehouse)
  • Employed in auditing (determining number of errors in financial records)
  • Useful for ecological studies (estimating animal population sizes through capture-recapture methods)

Negative Binomial Distribution

Modeling Number of Trials Until rth Success

  • Negative binomial distribution describes probability of observing x failures before rth success
  • Extends concept of geometric distribution which models trials until first success
  • Probability mass function given by P(X=x)=(x+rโˆ’1x)pr(1โˆ’p)xP(X=x) = \binom{x+r-1}{x}p^r(1-p)^x
  • p represents probability of success on each trial
  • r denotes number of successes desired
  • x indicates number of failures observed before rth success
  • Assumes independent trials with constant probability of success

Comparison with Binomial Distribution

  • Binomial distribution focuses on number of successes in fixed number of trials
  • Negative binomial distribution models number of trials until fixed number of successes
  • Binomial has fixed number of trials, negative binomial has random number of trials
  • Binomial success probability decreases with each success, negative binomial remains constant
  • Binomial uses n choose k notation, negative binomial uses x+r-1 choose x
  • Negative binomial can be seen as a mixture of Poisson distributions

Applications in Epidemiology and Beyond

  • Models disease outbreaks (number of susceptible individuals infected before outbreak ends)
  • Used in insurance for modeling number of claims until certain total is reached
  • Applies to marketing (number of sales calls until quota is met)
  • Utilized in reliability engineering (number of failures before system replacement)
  • Helps in project management (tasks completed before milestone achieved)
  • Employed in sports analytics (at-bats before hitting home run)
  • Useful in customer behavior analysis (purchases before customer becomes loyal)

Properties of Hypergeometric and Negative Binomial Distributions

Probability Mass Functions and Expected Values

  • Hypergeometric PMF: P(X=k)=(Kk)(Nโˆ’Knโˆ’k)(Nn)P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}
  • Negative binomial PMF: P(X=x)=(x+rโˆ’1x)pr(1โˆ’p)xP(X=x) = \binom{x+r-1}{x}p^r(1-p)^x
  • Expected value for hypergeometric: E(X)=nKNE(X) = n\frac{K}{N}
  • Expected value for negative binomial: E(X)=r(1โˆ’p)pE(X) = \frac{r(1-p)}{p}
  • Both distributions discrete and only defined for non-negative integers
  • Hypergeometric expectation proportional to sample size and success proportion
  • Negative binomial expectation inversely related to success probability

Variance and Higher Moments

  • Variance for hypergeometric: Var(X)=nKNNโˆ’KNNโˆ’nNโˆ’1Var(X) = n\frac{K}{N}\frac{N-K}{N}\frac{N-n}{N-1}
  • Variance for negative binomial: Var(X)=r(1โˆ’p)p2Var(X) = \frac{r(1-p)}{p^2}
  • Hypergeometric variance affected by finite population correction factor
  • Negative binomial variance always greater than its mean (overdispersion)
  • Skewness and kurtosis can be derived for both distributions
  • Moment generating functions useful for deriving higher moments
  • Central limit theorem applies to both distributions for large sample sizes