Fiveable

๐Ÿ“ŠCausal Inference Unit 5 Review

QR code for Causal Inference practice questions

5.3 Inverse probability weighting

๐Ÿ“ŠCausal Inference
Unit 5 Review

5.3 Inverse probability weighting

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠCausal Inference
Unit & Topic Study Guides

Inverse probability weighting is a powerful tool in causal inference. It helps estimate treatment effects in observational studies by creating a balanced pseudo-population. By weighting observations based on their likelihood of receiving treatment, IPW mimics randomized experiments.

IPW relies on key assumptions like positivity and exchangeability. Propensity scores are used to calculate weights, which are then applied in outcome models. Assessing covariate balance and understanding IPW's limitations are crucial for proper implementation and interpretation of results.

Overview of inverse probability weighting

  • Inverse probability weighting (IPW) is a statistical technique used in causal inference to estimate the average treatment effect (ATE) or average treatment effect on the treated (ATT) from observational data
  • IPW aims to create a pseudo-population where treatment assignment is independent of confounding variables, allowing for unbiased estimation of causal effects
  • The key idea behind IPW is to weight each observation by the inverse of the probability of receiving the treatment actually received, given the observed covariates

Motivation for weighting approach

  • In observational studies, treatment assignment is often influenced by confounding variables, leading to biased estimates of causal effects when using traditional regression methods
  • IPW addresses confounding by creating a weighted sample where the distribution of confounding variables is balanced between treatment groups
  • By weighting observations based on the propensity score (probability of treatment given covariates), IPW mimics a randomized experiment where treatment assignment is independent of confounders

Assumptions of IPW

Positivity assumption

  • The positivity assumption requires that every individual has a non-zero probability of receiving each level of the treatment, given their observed covariates
  • Formally, for all values of the confounders $X$ and treatment $A$, $P(A=a|X=x) > 0$ for all $a$ and $x$ in the support of $A$ and $X$
  • Violations of the positivity assumption can occur when there are regions of the covariate space where no individuals receive a particular treatment level, leading to extreme weights and unstable estimates

Exchangeability assumption

  • The exchangeability assumption, also known as the no unmeasured confounders assumption, states that treatment assignment is independent of potential outcomes given the observed covariates
  • Formally, $Y(a) \perp A | X$ for all values of $a$, where $Y(a)$ denotes the potential outcome under treatment level $a$
  • This assumption implies that all variables that influence both treatment assignment and the outcome are measured and included in the propensity score model
  • Violations of the exchangeability assumption can lead to biased estimates of causal effects, as there may be unobserved confounders that are not accounted for in the weighting process

Estimating weights

Propensity score models

  • The propensity score is the probability of receiving the observed treatment level given the observed covariates, denoted as $e(X) = P(A=1|X)$ for binary treatments
  • Propensity scores are typically estimated using logistic regression, modeling treatment assignment as a function of the observed confounders
  • The choice of variables to include in the propensity score model is crucial, as omitting important confounders can lead to biased estimates, while including too many variables can lead to overfitting and reduced efficiency

Stabilized vs unstabilized weights

  • Unstabilized weights are calculated as the inverse of the propensity score for treated individuals and the inverse of one minus the propensity score for untreated individuals: $w_i = \frac{A_i}{e(X_i)} + \frac{1-A_i}{1-e(X_i)}$
  • Stabilized weights include the marginal probability of treatment in the numerator, which helps to reduce the variability of the weights and improve efficiency: $sw_i = \frac{A_i P(A=1)}{e(X_i)} + \frac{(1-A_i) P(A=0)}{1-e(X_i)}$
  • Stabilized weights are generally preferred, as they have better statistical properties and are less sensitive to extreme propensity scores

Fitting outcome models

Weighted regression

  • After estimating the inverse probability weights, causal effects can be estimated by fitting a weighted regression model, where each observation is weighted by its corresponding IPW
  • For binary outcomes, a weighted logistic regression can be used, while for continuous outcomes, a weighted linear regression is appropriate
  • The weighted regression model should include the treatment variable and any additional confounders that were not sufficiently balanced by the weighting process

Estimating causal effects

  • The coefficient of the treatment variable in the weighted regression model provides an estimate of the average treatment effect (ATE) in the pseudo-population created by the IPW
  • For binary treatments, the ATE represents the difference in the expected outcome between the treated and untreated groups in the weighted sample
  • Confidence intervals for the ATE can be obtained using robust standard errors that account for the weights and the potential misspecification of the propensity score model

Assessing covariate balance

Standardized mean differences

  • After applying the inverse probability weights, it is important to assess the balance of the confounding variables between the treatment groups in the weighted sample
  • Standardized mean differences (SMDs) can be used to compare the means of continuous confounders between the treatment groups, with values close to zero indicating good balance
  • For binary confounders, SMDs can be calculated using the prevalence of the confounder in each treatment group
  • As a rule of thumb, SMDs below 0.1 are considered indicative of adequate balance

Variance ratios

  • In addition to comparing means, it is also important to assess the balance of the variances of the confounders between the treatment groups in the weighted sample
  • Variance ratios (VRs) can be calculated by dividing the variance of a confounder in the treated group by its variance in the untreated group
  • VRs close to one indicate good balance, while values far from one suggest that the weighting process may not have adequately balanced the confounder
  • If substantial imbalances remain after weighting, it may be necessary to modify the propensity score model or consider alternative methods for estimating causal effects

Comparison to other methods

IPW vs matching

  • Like IPW, matching methods aim to create balanced treatment groups by pairing treated and untreated individuals with similar values of the confounding variables
  • Matching can be done using propensity scores (propensity score matching) or by directly matching on the confounders (covariate matching)
  • Compared to IPW, matching may be more intuitive and easier to communicate, but it can be more sensitive to the choice of matching algorithm and may discard a substantial portion of the data

IPW vs stratification

  • Stratification involves dividing the sample into subgroups (strata) based on the values of the confounding variables and estimating causal effects within each stratum
  • Propensity score stratification is a common approach, where individuals are stratified based on their estimated propensity scores
  • Compared to IPW, stratification may be more robust to model misspecification, but it can be less efficient and may not fully remove confounding if the strata are too coarse

Limitations of IPW

Sensitivity to model misspecification

  • The performance of IPW relies heavily on the correct specification of the propensity score model
  • If important confounders are omitted from the model or if the functional form of the relationship between the confounders and treatment assignment is misspecified, the resulting weights may not adequately balance the confounders, leading to biased estimates of causal effects
  • Sensitivity analyses can be conducted to assess the robustness of the results to potential model misspecification, such as varying the set of confounders included in the propensity score model or using different functional forms

Instability with extreme weights

  • In some cases, the estimated propensity scores may be very close to zero or one, resulting in extremely large weights for some observations
  • These extreme weights can lead to unstable estimates of causal effects and inflated standard errors
  • To mitigate this issue, weight truncation or trimming can be employed, where weights above a certain threshold are set to a maximum value (truncation) or observations with extreme weights are removed from the analysis (trimming)
  • However, truncation and trimming may introduce bias and should be used with caution, as they may alter the target population and the interpretation of the causal effect

Applications of IPW

Time-varying treatments

  • IPW can be extended to handle time-varying treatments, where individuals may receive different levels of treatment over time
  • In this setting, weights are calculated based on the probability of an individual receiving their observed treatment history up to each time point, given their covariate history
  • Marginal structural models (MSMs) are often used in conjunction with IPW for time-varying treatments to estimate the causal effect of treatment trajectories on outcomes

Marginal structural models

  • MSMs are a class of models that use IPW to estimate the causal effect of a time-varying treatment on an outcome, while accounting for time-varying confounders that may be affected by prior treatment
  • MSMs model the marginal expectation of the potential outcomes as a function of the treatment history, without conditioning on the time-varying confounders
  • The weights used in MSMs are typically calculated using the inverse probability of treatment weighting (IPTW) approach, which accounts for both the probability of treatment at each time point and the probability of censoring or loss to follow-up

Simulation studies of IPW

Evaluating bias reduction

  • Simulation studies can be used to assess the performance of IPW in reducing bias due to confounding in a controlled setting
  • By generating data with known causal structures and comparing the estimated causal effects to the true values, researchers can evaluate the ability of IPW to recover unbiased estimates under various scenarios
  • Simulation studies can also be used to investigate the impact of violations of the positivity and exchangeability assumptions on the performance of IPW

Comparing to other estimators

  • Simulation studies can also be used to compare the performance of IPW to other causal inference methods, such as matching, stratification, or g-computation
  • By evaluating the bias, variance, and mean squared error of the estimators under different data-generating scenarios, researchers can gain insights into the relative strengths and weaknesses of each approach
  • The results of these simulation studies can inform the choice of method for a given research question and data structure

Extensions of IPW

Doubly robust estimation

  • Doubly robust (DR) estimation combines IPW with an outcome model to provide unbiased estimates of causal effects even if either the propensity score model or the outcome model is misspecified (but not both)
  • DR estimators typically involve fitting a weighted outcome model, where the weights are a function of the propensity score and the outcome model predictions
  • The DR property provides a safeguard against model misspecification and can improve the efficiency of the causal effect estimates compared to IPW alone

Targeted maximum likelihood

  • Targeted maximum likelihood estimation (TMLE) is a doubly robust method that combines IPW with a targeted outcome model to optimize the bias-variance tradeoff in causal effect estimation
  • TMLE involves fitting an initial outcome model, updating the model using a targeting step that incorporates the propensity score, and obtaining the final causal effect estimate by averaging the targeted predictions
  • TMLE has been shown to have desirable statistical properties, including double robustness, asymptotic efficiency, and optimal bias-variance tradeoff, making it a promising approach for causal inference in observational studies