The hypergeometric distribution models experiments with a fixed number of trials and sampling without replacement. It's used when drawing from a population split into two groups, where each draw changes the odds for subsequent draws.
Calculations involve combinatorics to determine probabilities of specific outcomes. Unlike the binomial distribution, which assumes independent trials, the hypergeometric distribution accounts for the changing composition of the population after each draw.
Hypergeometric Distribution
Characteristics of hypergeometric experiments
- Fixed number of trials determines the sample size in advance
- Population divided into two distinct groups
- Group of interest contains the items of interest (red marbles in a jar)
- Second group contains the remaining items (blue marbles in a jar)
- Sampling without replacement ensures each item can only be selected once
- Probability of success changes after each trial as the population composition changes (drawing marbles from a jar without replacing them)
- Trials are not independent since the outcome of one trial affects the probabilities of subsequent trials
- Selecting an item from one group reduces the number of items in that group for future trials (drawing a red marble decreases the chances of drawing another red marble)
- Success probability varies with each trial as items are sampled without replacement
- Group sizes and probabilities are updated after each draw (fewer red marbles remain after drawing one)
- Involves a finite population, which is crucial for the hypergeometric distribution's applicability
Hypergeometric distribution calculations
- Hypergeometric distribution formula: $P(X = k) = \frac{{K \choose k}{N-K \choose n-k}}{{N \choose n}}$
- $N$: total population size (sum of both group sizes)
- $K$: size of the group of interest
- $n$: number of items sampled (fixed sample size)
- $k$: number of successes (items from the group of interest) in the sample
- Calculate the probability of exactly $k$ successes using the formula
- Numerator: $({K \choose k}{N-K \choose n-k})$ represents the number of ways to select $k$ items from the group of interest and $n-k$ items from the second group (choosing 2 red marbles and 3 blue marbles from a jar)
- Denominator: $({N \choose n})$ represents the total number of ways to select $n$ items from the entire population (choosing 5 marbles from a jar)
- Sum the probabilities of individual outcomes to find the probability of a range of successes (probability of drawing 2 to 4 red marbles)
- Utilizes principles of combinatorics in its calculations
Hypergeometric vs binomial distributions
- Binomial distribution assumes sampling with replacement, making each trial independent
- Fixed number of trials, each with two possible outcomes (success or failure)
- Probability of success is the same for each trial (flipping a fair coin)
- Hypergeometric distribution involves sampling without replacement, making trials dependent on each other
- Fixed sample size drawn from a population divided into two groups (drawing cards from a deck)
- Probability of success varies based on the changing composition of the population (fewer aces remain after drawing one)
- Impact of sampling without replacement on hypergeometric distribution
- Probability of success changes after each trial as items are removed from the population
- Group sizes and probabilities are updated after each draw (fewer hearts remain after drawing one)
- Lack of independence between trials affects the probability calculations (drawing a heart on the first draw reduces the chances of drawing another heart)
Additional Concepts
- Conditional probability is inherent in the hypergeometric distribution, as the probability of success in each trial depends on the outcomes of previous trials
- The hypergeometric distribution is a discrete probability distribution, meaning it deals with countable, distinct outcomes
- In some cases, the hypergeometric distribution can be extended to a multivariate distribution when considering more than two groups in the population