📈Theoretical Statistics Unit 11 Review

11.2 Stratified sampling

📈Theoretical Statistics
Unit 11 Review

11.2 Stratified sampling

Written by the Fiveable Content Team • Last updated September 2025

📈Theoretical Statistics

Unit & Topic Study Guides

11.1 Simple random sampling

11.2 Stratified sampling

11.3 Cluster sampling

11.4 Systematic sampling

11.5 Sample size determination

Stratified sampling is a powerful statistical technique that divides a population into subgroups before sampling. This method ensures representation of key subgroups and can increase precision in population estimates. It's particularly useful when studying diverse populations or when certain subgroups are of special interest.

The process involves defining strata, allocating sample sizes, and selecting samples within each stratum. Different allocation methods, such as proportional or optimal allocation, can be used depending on study goals. Stratified sampling offers advantages in precision and representation, but requires careful planning and consideration of potential biases.

Definition of stratified sampling

Divides population into non-overlapping subgroups (strata) based on specific characteristics
Selects samples independently from each stratum using probability sampling methods
Combines stratum samples to form overall sample representative of entire population

Purpose and advantages

Increases precision of population estimates by reducing sampling error
Ensures representation of important subgroups that might be missed in simple random sampling
Allows for separate analysis of individual strata to compare group differences

Population stratification process

Identifying strata characteristics

Selects variables strongly related to the study's outcome of interest
Considers demographic factors (age, gender, income) or geographic regions
Ensures mutually exclusive and collectively exhaustive strata
Aims for homogeneity within strata and heterogeneity between strata

Determining optimal strata number

Balances increased precision with added complexity and cost
Uses statistical methods (Cumulative Square Root Frequency method)
Considers practical constraints (budget, time, available resources)
Typically ranges from 3 to 6 strata in most applications

Sample allocation methods

Proportional allocation

Allocates sample size to each stratum proportional to stratum size in population
Calculation: $n_h = n \times (N_h / N)$ where $n_h$ is stratum sample size, $n$ is total sample size, $N_h$ is stratum population size, and $N$ is total population size
Maintains population proportions in the sample
Simple to implement and understand

Optimal allocation

Allocates sample size based on stratum size and variability
Aims to minimize overall sampling variance for a given total sample size
Requires knowledge of within-stratum variances
Formula: $n_h = n \times \frac{N_h S_h}{\sum_{i=1}^{L} N_i S_i}$ where $S_h$ is the standard deviation of the variable of interest in stratum $h$

Neyman allocation

Special case of optimal allocation when cost per unit is constant across strata
Allocates larger samples to strata with higher variability or larger sizes
Calculation: $n_h = n \times \frac{N_h S_h}{\sum_{i=1}^{L} N_i S_i}$ (same as optimal allocation formula)
Provides minimum variance for a fixed total sample size

Stratified random sampling procedure

Within-stratum sampling techniques

Employs simple random sampling within each stratum
Uses systematic sampling for ordered lists within strata
Applies probability proportional to size (PPS) sampling for unequal selection probabilities
Ensures independence between samples from different strata

Sample size determination

Considers desired precision level (margin of error)
Accounts for expected response rate and budget constraints
Uses power analysis for hypothesis testing scenarios
Adjusts for finite population correction in small populations

Statistical properties

Variance estimation

Calculates within-stratum variances separately
Combines stratum variances using appropriate weighting
Formula for stratified sample variance: $V(\bar{y}_{st}) = \sum_{h=1}^{L} W_h^2 \frac{s_h^2}{n_h}$ where $W_h$ is the stratum weight and $s_h^2$ is the sample variance in stratum $h$
Provides more precise estimates compared to simple random sampling

Precision vs simple random sampling

Offers increased precision when strata are homogeneous
Reduces standard error of estimates for given sample size
Quantifies improvement using design effect (DEFF) measure
Achieves greater efficiency in population parameter estimation

Bias considerations

Selection bias in strata

Occurs when strata are not properly defined or identified
Results from incomplete or inaccurate sampling frames within strata
Mitigated by careful stratification variable selection and frame development
Requires thorough understanding of population characteristics

Non-response bias effects

Varies across strata due to different response rates
Impacts representativeness of final sample
Addressed through weighting adjustments or imputation techniques
Requires analysis of non-response patterns within each stratum

Stratified sampling applications

Market research examples

Customer satisfaction surveys stratified by product lines
Brand awareness studies stratified by geographic regions
Consumer behavior analysis stratified by age groups and income levels

Environmental studies cases

Water quality assessment stratified by river sections
Air pollution monitoring stratified by urban vs rural areas
Wildlife population estimates stratified by habitat types

Limitations and challenges

Stratum boundary issues

Difficulty in defining clear boundaries between strata
Overlapping characteristics leading to ambiguous stratum assignment
Potential for misclassification of population units
Requires careful consideration of stratification variables and cutoff points

Small stratum problems

Insufficient sample sizes in some strata for reliable estimates
Increased variability of estimates for small strata
Potential need for collapsing or combining small strata
Trade-off between maintaining stratum identity and achieving adequate precision

Analysis of stratified data

Weighted estimators

Uses stratum weights to calculate population estimates
Formula for stratified mean: $\bar{y}_{st} = \sum_{h=1}^{L} W_h \bar{y}_h$ where $W_h$ is the stratum weight and $\bar{y}_h$ is the sample mean in stratum $h$
Applies weights in regression analysis and other statistical procedures
Ensures proper representation of population structure in final estimates

Confidence interval construction

Accounts for stratified design in interval calculations
Uses stratified variance estimates for more accurate intervals
Formula: $CI = \bar{y}_{st} \pm t_{\alpha/2, df} \times \sqrt{V(\bar{y}_{st})}$ where $t_{\alpha/2, df}$ is the t-value for desired confidence level
Provides narrower intervals compared to simple random sampling

Stratified sampling vs other methods

Cluster sampling comparison

Stratified sampling selects units from all strata, cluster sampling selects entire clusters
Stratified sampling generally more precise than cluster sampling
Cluster sampling more cost-effective for geographically dispersed populations
Stratified sampling requires more information about population characteristics

Multistage sampling differences

Stratified sampling involves one stage of selection within strata
Multistage sampling uses multiple levels of sampling units
Stratified sampling offers more control over sample composition
Multistage sampling more suitable for complex, hierarchical populations

Software tools for stratified sampling

Statistical packages (R, SAS, SPSS) with built-in stratified sampling functions
Specialized survey software (Qualtrics, SurveyMonkey) for online stratified surveys
GIS tools (ArcGIS, QGIS) for spatial stratification in environmental studies
Custom programming languages (Python, MATLAB) for complex sampling designs

Ethical considerations in stratification

Potential for reinforcing stereotypes or discrimination through stratification variables
Privacy concerns when using sensitive characteristics for stratification
Balancing representativeness with individual rights and protections
Ensuring transparency in reporting stratification methods and limitations

📈Theoretical Statistics Unit 11 Review

11.2 Stratified sampling

📈Theoretical Statistics Unit 11 Review

11.2 Stratified sampling

Unit & Topic Study Guides

Definition of stratified sampling

Purpose and advantages

Population stratification process

Identifying strata characteristics

Determining optimal strata number

Sample allocation methods

Proportional allocation

Optimal allocation

Neyman allocation

Stratified random sampling procedure

Within-stratum sampling techniques

Sample size determination

Statistical properties

Variance estimation

Precision vs simple random sampling

Bias considerations

Selection bias in strata

Non-response bias effects

Stratified sampling applications

Market research examples

Environmental studies cases

Limitations and challenges

Stratum boundary issues

Small stratum problems

Analysis of stratified data

Weighted estimators

Confidence interval construction

Stratified sampling vs other methods

Cluster sampling comparison

Multistage sampling differences

Software tools for stratified sampling

Ethical considerations in stratification

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📈Theoretical Statistics
Unit 11 Review