📊Probability and Statistics Unit 6 Review

6.1 Simple random sampling

📊Probability and Statistics
Unit 6 Review

6.1 Simple random sampling

Written by the Fiveable Content Team • Last updated September 2025

📊Probability and Statistics

Unit & Topic Study Guides

6.1 Simple random sampling

6.2 Stratified sampling

6.3 Cluster sampling

6.4 Systematic sampling

6.5 Observational studies and experiments

Simple random sampling is a fundamental technique in probability and statistics. It involves selecting a subset of individuals from a population, where each member has an equal chance of being chosen. This method minimizes bias and ensures a representative sample, making it a cornerstone of statistical research.

While simple random sampling offers advantages like ease of implementation and unbiased selection, it also has limitations. It may not capture population diversity in heterogeneous groups and can be inefficient for large populations. Understanding these pros and cons is crucial for effective statistical analysis and research design.

Definition of simple random sampling

Simple random sampling (SRS) is a method of selecting a subset of individuals from a population
In SRS, each member of the population has an equal chance of being included in the sample
SRS is a probability sampling technique, meaning it relies on randomization to select the sample

Advantages of simple random sampling

SRS is a straightforward and intuitive sampling method that is easy to understand and implement
When properly conducted, SRS minimizes bias in the selection process, ensuring a representative sample

Ease of sampling

SRS does not require prior knowledge of the population's characteristics or subgroups
Sampling can be performed using readily available tools, such as random number generators or systematic selection from a list
The simplicity of SRS makes it an attractive choice for researchers with limited resources or expertise

Minimization of bias

By giving each unit an equal probability of selection, SRS reduces the potential for bias in the sampling process
Randomization helps to ensure that the sample is representative of the population, without favoring any particular subgroups
Minimizing bias is crucial for obtaining accurate and reliable estimates from the sample data

Disadvantages of simple random sampling

While SRS has several advantages, it also has some limitations that researchers should be aware of
These disadvantages can affect the representativeness and efficiency of the sampling process

Lack of representativeness

SRS does not guarantee that all relevant subgroups within the population will be adequately represented in the sample
If the population is heterogeneous, with distinct subgroups, SRS may result in a sample that does not capture this diversity
Stratified sampling techniques can be used to address this issue by ensuring representation of important subgroups

Inefficiency with large populations

As the population size increases, the efficiency of SRS decreases due to the need for larger sample sizes
Sampling from a large population can be time-consuming and costly, especially if the population is geographically dispersed
Cluster sampling or multi-stage sampling may be more efficient alternatives for large populations

Sampling frame in simple random sampling

The sampling frame is a list or database that contains all the units in the population from which the sample will be drawn
In SRS, the sampling frame should be complete, up-to-date, and free from duplicates or ineligible units
Examples of sampling frames include voter registration lists, customer databases, or school enrollment records
The quality of the sampling frame directly impacts the representativeness and accuracy of the sample

Probability of selection in simple random sampling

In SRS, each unit in the population has an equal probability of being selected for the sample
This equal probability of selection is a defining characteristic of SRS and contributes to its unbiased nature

Equal probability for all units

If the population size is $N$ and the sample size is $n$, then the probability of selection for each unit is $\frac{n}{N}$
For example, if a population has 1000 units and a sample of 100 is selected, each unit has a probability of selection of $\frac{100}{1000} = 0.1$
Equal probability of selection ensures that no unit is favored or disadvantaged in the sampling process

Calculation of selection probability

The selection probability can be calculated using the formula $P(\text{selection}) = \frac{n}{N}$
This formula assumes sampling without replacement, meaning that once a unit is selected, it is not returned to the population
If sampling with replacement is used, the selection probability remains constant across all draws

Sampling with vs without replacement

In SRS, sampling can be performed either with or without replacement
Sampling with replacement means that after a unit is selected, it is returned to the population and can be selected again
Sampling without replacement means that once a unit is selected, it is removed from the population and cannot be selected again
Sampling without replacement is more common in practice, as it avoids the possibility of selecting the same unit multiple times

Sample size determination

Determining the appropriate sample size is a crucial step in SRS, as it directly impacts the precision and reliability of the estimates
Several factors should be considered when determining the sample size, including the desired precision, confidence level, and population size

Desired precision and confidence level

The desired precision refers to the acceptable margin of error in the estimates, usually expressed as a percentage (e.g., ±5%)
The confidence level represents the probability that the true population parameter falls within the margin of error (e.g., 95% confidence level)
A smaller margin of error or a higher confidence level will require a larger sample size

Population size considerations

The population size also plays a role in determining the sample size, especially when the population is small or the sampling fraction ($\frac{n}{N}$) is large
As the population size increases, the impact of population size on the required sample size diminishes
For large populations, the sample size is primarily determined by the desired precision and confidence level

Selecting a simple random sample

Once the sampling frame and sample size have been determined, the next step is to select the units for the sample
There are several methods for selecting a simple random sample, including the use of random number generators and systematic selection from a list

Use of random number generators

Random number generators can be used to select a sample by assigning a unique number to each unit in the sampling frame
The random number generator then selects a set of numbers, and the corresponding units are included in the sample
This method ensures that each unit has an equal probability of selection and minimizes the potential for human bias

Systematic selection from a list

Systematic selection involves choosing every $k$-th unit from a list, where $k$ is the sampling interval ($k = \frac{N}{n}$)
A random starting point is selected between 1 and $k$, and then every $k$-th unit is selected from the list
This method is simple to implement and can be used when a random number generator is not available
However, systematic selection may introduce bias if there is a hidden pattern in the list that coincides with the sampling interval

Estimation using simple random sampling

Once the sample has been selected and the data collected, the next step is to use the sample data to estimate population parameters
SRS allows for the estimation of various population parameters, such as means and totals, along with their associated confidence intervals

Population mean and total estimation

The sample mean ($\bar{x}$) is used to estimate the population mean ($\mu$), while the sample total ($\sum x$) is used to estimate the population total ($\tau$)
The population mean is estimated using the formula $\bar{x} = \frac{\sum x}{n}$, where $\sum x$ is the sum of the sample values and $n$ is the sample size
The population total is estimated using the formula $\hat{\tau} = N \bar{x}$, where $N$ is the population size

Confidence intervals for estimates

Confidence intervals provide a range of plausible values for the population parameter, based on the sample data and the desired confidence level
For the population mean, the confidence interval is calculated as $\bar{x} \pm z_{\alpha/2} \frac{s}{\sqrt{n}}$, where $z_{\alpha/2}$ is the critical value for the desired confidence level and $s$ is the sample standard deviation
For the population total, the confidence interval is calculated as $\hat{\tau} \pm z_{\alpha/2} \sqrt{N^2 \frac{s^2}{n}}$

Variance of estimates in simple random sampling

The variance of the sample estimates is an important measure of their precision and reliability
SRS allows for the calculation of the variance of the sample mean and the population total estimate

Variance of sample mean

The variance of the sample mean is given by $\text{Var}(\bar{x}) = \frac{\sigma^2}{n}$, where $\sigma^2$ is the population variance
In practice, the population variance is usually unknown, so the sample variance $s^2$ is used as an estimate, giving $\text{Var}(\bar{x}) \approx \frac{s^2}{n}$
A larger sample size will result in a smaller variance, indicating greater precision in the estimate

Variance of population total estimate

The variance of the population total estimate is given by $\text{Var}(\hat{\tau}) = N^2 \frac{\sigma^2}{n}$
Again, the sample variance $s^2$ is used as an estimate of the population variance, giving $\text{Var}(\hat{\tau}) \approx N^2 \frac{s^2}{n}$
The variance of the population total estimate is influenced by both the sample size and the population size

Finite population correction factor

The finite population correction factor (fpc) is an adjustment applied to the variance of sample estimates when the sampling fraction ($\frac{n}{N}$) is large
The fpc accounts for the fact that when a significant portion of the population is sampled, there is less uncertainty in the estimates

When to apply correction factor

The fpc is typically applied when the sampling fraction exceeds 5% ($\frac{n}{N} > 0.05$)
If the sampling fraction is small, the fpc has a negligible impact on the variance and can be omitted
The decision to apply the fpc depends on the specific context and the desired level of precision

Impact on variance of estimates

When the fpc is applied, the variance of the sample mean becomes $\text{Var}(\bar{x}) = \frac{\sigma^2}{n} (1 - \frac{n}{N})$
Similarly, the variance of the population total estimate becomes $\text{Var}(\hat{\tau}) = N^2 \frac{\sigma^2}{n} (1 - \frac{n}{N})$
The fpc reduces the variance of the estimates, reflecting the increased precision due to the larger sampling fraction

Limitations of simple random sampling

While SRS is a widely used and unbiased sampling method, it has some limitations that researchers should be aware of
These limitations can impact the representativeness of the sample and the efficiency of the sampling process

Lack of stratification

SRS does not inherently account for the heterogeneity of the population, which can lead to underrepresentation of important subgroups
If the population consists of distinct subgroups with varying characteristics, SRS may not ensure adequate representation of these subgroups in the sample
Stratified sampling can be used to address this limitation by dividing the population into homogeneous subgroups and sampling from each stratum separately

Challenges with large or inaccessible populations

SRS can be challenging to implement when the population is large or geographically dispersed, as it may be difficult to obtain a complete and accurate sampling frame
In some cases, certain units in the population may be hard to reach or unwilling to participate, leading to non-response bias
Cluster sampling or multi-stage sampling can be used to overcome these challenges by sampling clusters of units instead of individual units, reducing the need for a complete sampling frame

📊Probability and Statistics Unit 6 Review

6.1 Simple random sampling

📊Probability and Statistics Unit 6 Review

6.1 Simple random sampling

Unit & Topic Study Guides

Definition of simple random sampling

Advantages of simple random sampling

Ease of sampling

Minimization of bias

Disadvantages of simple random sampling

Lack of representativeness

Inefficiency with large populations

Sampling frame in simple random sampling

Probability of selection in simple random sampling

Equal probability for all units

Calculation of selection probability

Sampling with vs without replacement

Sample size determination

Desired precision and confidence level

Population size considerations

Selecting a simple random sample

Use of random number generators

Systematic selection from a list

Estimation using simple random sampling

Population mean and total estimation

Confidence intervals for estimates

Variance of estimates in simple random sampling

Variance of sample mean

Variance of population total estimate

Finite population correction factor

When to apply correction factor

Impact on variance of estimates

Limitations of simple random sampling

Lack of stratification

Challenges with large or inaccessible populations

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Probability and Statistics
Unit 6 Review