📊Causal Inference Unit 3 Review

3.5 Analysis of randomized experiments

📊Causal Inference
Unit 3 Review

3.5 Analysis of randomized experiments

Written by the Fiveable Content Team • Last updated September 2025

📊Causal Inference

Unit & Topic Study Guides

3.1 Completely randomized designs

3.2 Stratified and blocked designs

3.3 Factorial designs

3.4 Cluster randomized designs

3.5 Analysis of randomized experiments

Randomized experiments are a powerful tool for determining cause and effect. By randomly assigning participants to treatment and control groups, researchers can eliminate confounding factors and draw unbiased conclusions about the impact of interventions.

This section covers the fundamentals of randomized experiments, including design, analysis, and interpretation. We'll explore methods for estimating treatment effects, handling noncompliance and missing data, and conducting subgroup analyses to uncover heterogeneous effects across different populations.

Fundamentals of randomized experiments

Randomized experiments are a powerful tool for drawing causal inferences by randomly assigning units to treatment and control groups
Randomization ensures that treatment and control groups are comparable on average, eliminating confounding factors
Key components of randomized experiments include defining the population of interest, specifying the treatment, choosing an appropriate sample size, and randomly assigning units to treatment and control conditions

Benefits of randomization

Randomization balances both observed and unobserved confounders between treatment and control groups on average
Enables unbiased estimation of causal effects by creating comparable groups
Provides a foundation for statistical inference, allowing researchers to assess the uncertainty associated with treatment effect estimates
Enhances the credibility and interpretability of research findings by minimizing the potential for bias and confounding

Estimating causal effects

Average treatment effect (ATE)

The ATE measures the average difference in outcomes between the treatment and control groups
Calculated as the difference in mean outcomes: $ATE = E[Y(1)] - E[Y(0)]$, where $Y(1)$ and $Y(0)$ represent potential outcomes under treatment and control conditions
Provides an overall measure of the effectiveness of the treatment across the entire population
Can be estimated using the difference in sample means or regression analysis

Intention-to-treat (ITT) analysis

ITT analysis estimates the effect of treatment assignment, regardless of whether units actually received or adhered to the assigned treatment
Preserves the benefits of randomization by analyzing units based on their assigned treatment, even if there is noncompliance or missing data
Provides a conservative estimate of the treatment effect, as it includes noncompliers in the analysis
Useful for assessing the effectiveness of a treatment in real-world settings where perfect compliance may not be achievable

Validity in randomized experiments

Internal vs external validity

Internal validity refers to the extent to which a study can establish a causal relationship between the treatment and the outcome within the study sample
Randomization strengthens internal validity by eliminating confounding factors and ensuring comparability between treatment and control groups
External validity concerns the generalizability of the study findings to broader populations and settings beyond the study sample
Randomized experiments may have high internal validity but limited external validity if the sample is not representative of the target population or if the study conditions differ from real-world settings

Analyzing completely randomized designs

Difference in means

In a completely randomized design, units are randomly assigned to treatment and control groups without stratification or matching
The simplest method for estimating the ATE is to calculate the difference in sample means between the treatment and control groups: $\hat{ATE} = \bar{Y}_1 - \bar{Y}_0$
The difference in means provides an unbiased estimate of the ATE, assuming the randomization was successful in creating comparable groups
Confidence intervals and hypothesis tests can be constructed using the standard errors of the sample means

Regression analysis

Regression analysis can be used to estimate the ATE while controlling for covariates and improving precision
The basic regression model for a completely randomized design is: $Y_i = \beta_0 + \beta_1 T_i + \epsilon_i$, where $T_i$ is the treatment indicator and $\beta_1$ represents the ATE
Including relevant covariates in the regression model can reduce the residual variance and increase the precision of the treatment effect estimate
Regression analysis also allows for the estimation of heterogeneous treatment effects by including interaction terms between the treatment indicator and covariates

Analyzing stratified and matched designs

Conditional average treatment effect

Stratified and matched designs involve grouping units into strata or pairs based on observed covariates before randomization
The conditional average treatment effect (CATE) measures the average treatment effect within each stratum or pair
Estimating the CATE involves calculating the difference in means or using regression analysis within each stratum or pair
The overall ATE can be obtained by averaging the CATEs across all strata or pairs, weighted by the proportion of units in each stratum or pair

Regression with strata indicators

Regression analysis can be extended to account for stratification or matching by including strata or pair indicators as covariates
The regression model for a stratified design is: $Y_i = \beta_0 + \beta_1 T_i + \sum_{s=1}^{S-1} \gamma_s I_{is} + \epsilon_i$, where $I_{is}$ are indicators for each stratum $s$
The coefficient $\beta_1$ represents the pooled CATE across all strata, assuming a constant treatment effect
Interacting the treatment indicator with the strata indicators allows for the estimation of stratum-specific treatment effects

Noncompliance in experiments

One-sided vs two-sided noncompliance

One-sided noncompliance occurs when some units assigned to the treatment group do not receive the treatment, but all units assigned to the control group remain untreated
Two-sided noncompliance arises when some units in both the treatment and control groups do not adhere to their assigned conditions
Noncompliance can bias the estimation of treatment effects if it is related to potential outcomes
ITT analysis provides a conservative estimate of the treatment effect in the presence of noncompliance

Complier average causal effect (CACE)

The CACE, also known as the local average treatment effect (LATE), measures the average treatment effect among the subgroup of compliers who adhere to their assigned treatment
Compliers are units that would receive the treatment if assigned to the treatment group and would not receive the treatment if assigned to the control group
The CACE is defined as: $CACE = E[Y(1) - Y(0) | D(1) = 1, D(0) = 0]$, where $D(1)$ and $D(0)$ represent potential treatment statuses under assignment to treatment and control
Estimating the CACE requires additional assumptions, such as the exclusion restriction and monotonicity

Instrumental variables estimation

Instrumental variables (IV) estimation can be used to estimate the CACE in the presence of noncompliance
The randomized treatment assignment serves as an instrument that affects the outcome only through its effect on treatment receipt
The IV estimator for the CACE is the ratio of the ITT effect on the outcome to the ITT effect on treatment receipt: $\hat{CACE} = \frac{\hat{ITT}_Y}{\hat{ITT}_D}$
IV estimation relies on the assumptions of random assignment, exclusion restriction, monotonicity, and the stable unit treatment value assumption (SUTVA)

Dealing with missing data

Inverse probability weighting

Inverse probability weighting (IPW) is a method for handling missing data in randomized experiments
IPW assigns weights to the observed units based on the inverse of their probability of being observed, giving more weight to units with characteristics similar to those of the missing units
The probability of being observed is typically estimated using a logistic regression model that includes treatment assignment and relevant covariates
IPW creates a pseudo-population where the distribution of observed covariates is balanced between the observed and missing units, allowing for unbiased estimation of treatment effects

Multiple imputation

Multiple imputation (MI) is another approach for dealing with missing data in randomized experiments
MI involves creating multiple plausible imputed datasets, where missing values are replaced with draws from a posterior predictive distribution based on the observed data
Each imputed dataset is analyzed separately, and the results are combined using Rubin's rules to obtain overall estimates and standard errors
MI accounts for the uncertainty associated with missing data by incorporating the variability across the imputed datasets
MI assumes that the data are missing at random (MAR), meaning that the probability of missingness depends only on observed variables

Subgroup analysis and heterogeneity

Interaction terms in regression

Subgroup analysis involves examining treatment effects within specific subgroups of the population defined by baseline characteristics
Interaction terms in regression models can be used to estimate and test for heterogeneous treatment effects across subgroups
The regression model with an interaction term is: $Y_i = \beta_0 + \beta_1 T_i + \beta_2 X_i + \beta_3 T_i X_i + \epsilon_i$, where $X_i$ is a subgroup indicator and $\beta_3$ represents the difference in treatment effects between subgroups
A significant interaction term suggests that the treatment effect varies across subgroups, indicating treatment effect heterogeneity

Dangers of post-hoc subgroup analysis

Post-hoc subgroup analysis, where subgroups are defined after the data have been collected and analyzed, can lead to false positive findings and overinterpretation
Multiple testing across numerous subgroups increases the risk of finding spurious subgroup effects by chance alone
Post-hoc subgroup analysis should be treated as exploratory and interpreted with caution, as it lacks the same level of credibility as pre-specified subgroup analysis
Replication of subgroup findings in independent studies is important to establish the robustness and generalizability of treatment effect heterogeneity

Statistical power and sample size

Minimum detectable effect size

Statistical power is the probability of detecting a true treatment effect of a given magnitude
The minimum detectable effect size (MDES) is the smallest true treatment effect that a study can detect with a specified level of power and significance
The MDES depends on factors such as the sample size, the variance of the outcome, the level of significance (Type I error rate), and the desired power (1 - Type II error rate)
Larger sample sizes, smaller outcome variances, and higher levels of power all contribute to a smaller MDES, meaning the study can detect smaller treatment effects

Factors influencing power

Sample size is a key determinant of statistical power, with larger sample sizes providing greater power to detect treatment effects
The variance of the outcome variable affects power, with smaller variances leading to greater power
The level of significance (Type I error rate) and the desired power (1 - Type II error rate) influence the MDES and the required sample size
The use of covariates in the analysis can increase power by reducing the residual variance and explaining some of the variability in the outcome
Stratified and matched designs can improve power by reducing the variance of the treatment effect estimator compared to completely randomized designs

Practical considerations

Randomized experiments must adhere to ethical principles, including respect for persons, beneficence, and justice
Informed consent is a critical component of ethical research, ensuring that participants understand the study procedures, risks, and benefits and voluntarily agree to participate
Equipoise, the genuine uncertainty about the relative benefits and risks of the treatments being compared, is necessary for the ethical justification of randomization
Special considerations may apply when conducting experiments with vulnerable populations or in settings where the benefits of participation are limited

Generalizability of results

The generalizability of randomized experiment results depends on the representativeness of the study sample and the similarity of the study conditions to real-world settings
Inclusion and exclusion criteria, as well as the recruitment and sampling methods, can affect the external validity of the study findings
The setting, population, and context in which the experiment is conducted may limit the generalizability of the results to other settings and populations
Replication of findings across different settings, populations, and implementation conditions can enhance the generalizability and robustness of the study conclusions
Careful consideration of the target population and the factors that may influence the treatment effect is important for designing experiments with high external validity

📊Causal Inference Unit 3 Review

3.5 Analysis of randomized experiments

📊Causal Inference Unit 3 Review

3.5 Analysis of randomized experiments

Unit & Topic Study Guides

Fundamentals of randomized experiments

Benefits of randomization

Estimating causal effects

Average treatment effect (ATE)

Intention-to-treat (ITT) analysis

Validity in randomized experiments

Internal vs external validity

Analyzing completely randomized designs

Difference in means

Regression analysis

Analyzing stratified and matched designs

Conditional average treatment effect

Regression with strata indicators

Noncompliance in experiments

One-sided vs two-sided noncompliance

Complier average causal effect (CACE)

Instrumental variables estimation

Dealing with missing data

Inverse probability weighting

Multiple imputation

Subgroup analysis and heterogeneity

Interaction terms in regression

Dangers of post-hoc subgroup analysis

Statistical power and sample size

Minimum detectable effect size

Factors influencing power

Practical considerations

Ethics and informed consent

Generalizability of results

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Causal Inference
Unit 3 Review