📊Advanced Quantitative Methods Unit 11 Review

11.3 Generalized estimating equations (GEE)

📊Advanced Quantitative Methods
Unit 11 Review

11.3 Generalized estimating equations (GEE)

Written by the Fiveable Content Team • Last updated September 2025

📊Advanced Quantitative Methods

Unit & Topic Study Guides

11.1 Repeated measures ANOVA

11.2 Mixed-effects models

11.3 Generalized estimating equations (GEE)

11.4 Hierarchical linear modeling (HLM)

Generalized estimating equations (GEE) are a powerful tool for analyzing longitudinal and clustered data. They extend generalized linear models to account for correlated observations, focusing on population-averaged effects rather than subject-specific ones.

GEE offers flexibility in handling various data types and missing values. It provides consistent estimates even with misspecified correlation structures, making it robust and computationally efficient for large datasets. However, it doesn't capture subject-specific effects or include random effects.

Generalized Estimating Equations

Overview and Applications

Generalized estimating equations (GEE) extend generalized linear models (GLMs) to account for the correlation between observations in longitudinal or clustered data
GEE estimates the average response over the population (population-averaged effects) rather than subject-specific effects
GEE is used when the primary interest lies in the marginal expectation of the response variable, while accounting for the correlation structure within clusters or subjects
- Applicable to a wide range of data types, including continuous, binary, count, and categorical outcomes
- Can handle missing data under the assumption that the data are missing completely at random (MCAR) or missing at random (MAR)

Advantages and Limitations

GEE provides consistent estimates of regression coefficients even if the correlation structure is misspecified, as long as the mean structure is correctly specified
- Computationally efficient and can handle large datasets with many clusters or subjects
- Allows for the use of robust standard errors, which are valid even if the correlation structure is misspecified
However, GEE does not provide estimates of subject-specific effects, as it focuses on population-averaged effects
- May not be efficient when the number of clusters is small or when the cluster sizes are highly variable
- Assumes that the data are MCAR or MAR, and violations of these assumptions can lead to biased estimates
- Does not allow for the inclusion of random effects, which may be necessary to capture subject-specific variability

GEE vs Other Methods

Comparison with Mixed Effects Models

GEE focuses on population-averaged effects, while mixed effects models estimate both population-averaged and subject-specific effects
- Mixed effects models include random effects to capture subject-specific variability, while GEE does not
- GEE is more robust to misspecification of the correlation structure, while mixed effects models rely on correctly specifying the random effects structure
GEE is computationally more efficient than mixed effects models, especially for large datasets with many clusters or subjects

Comparison with Repeated Measures ANOVA

GEE can handle a wider range of data types (continuous, binary, count, categorical) compared to repeated measures ANOVA, which is limited to continuous outcomes
- GEE allows for the inclusion of time-varying covariates, while repeated measures ANOVA assumes that covariates are constant over time
- GEE can handle missing data under MCAR or MAR assumptions, while repeated measures ANOVA typically requires complete data or relies on imputation methods
Repeated measures ANOVA is more sensitive to violations of sphericity assumptions, while GEE is robust to misspecification of the correlation structure

Marginal Models with GEE

Specifying the Mean Structure

Marginal models specify the mean structure of the response variable as a function of covariates, while accounting for the correlation structure within clusters or subjects
- The mean structure is typically specified using a link function, such as the identity link for continuous outcomes, the logit link for binary outcomes, or the log link for count outcomes
- Example: In a study of blood pressure over time, the mean structure could be specified as a linear function of time, treatment group, and their interaction using an identity link

Specifying the Correlation Structure

The correlation structure is specified using a working correlation matrix, which can be independent, exchangeable, autoregressive, or unstructured
- Independent: Assumes no correlation between observations within a cluster or subject
- Exchangeable: Assumes a constant correlation between any two observations within a cluster or subject
- Autoregressive: Assumes that the correlation between observations decreases as the time lag between them increases
- Unstructured: Allows for a distinct correlation between any two observations within a cluster or subject
The choice of the working correlation matrix should be based on the nature of the data and the underlying biological or social processes

Estimating Regression Coefficients

The regression coefficients are estimated using quasi-likelihood methods, which involve solving a set of estimating equations that are based on the mean structure and the working correlation matrix
- The sandwich variance estimator is used to obtain robust standard errors for the regression coefficients, which are valid even if the working correlation matrix is misspecified
- Example: In the blood pressure study, the regression coefficients would represent the average change in blood pressure for a one-unit change in time, treatment group, or their interaction

Interpreting GEE Results

Interpreting Regression Coefficients

The regression coefficients in GEE represent the average change in the response variable for a one-unit change in the corresponding covariate, while holding all other covariates constant
- For continuous outcomes, the coefficients directly represent the change in the mean response
- For binary outcomes, the exponentiated coefficients (odds ratios) represent the change in the odds of the response
- For count outcomes, the exponentiated coefficients (rate ratios) represent the change in the rate of the response
Example: In the blood pressure study, a coefficient of -2.5 for the treatment group would indicate that, on average, the treatment group has a 2.5 mmHg lower blood pressure compared to the control group, holding time constant

Assessing Model Fit and Diagnostics

The quasi-likelihood information criterion (QIC) can be used to compare the fit of different marginal models, with lower values indicating better fit
- QIC is an extension of the Akaike information criterion (AIC) for GEE models
- Example: Comparing QIC values for models with different mean structures or working correlation matrices can help select the most appropriate model
Residual plots and other diagnostic tools can be used to assess the adequacy of the mean structure and the correlation structure, and to identify outliers or influential observations
- Residual plots can reveal patterns or trends that suggest misspecification of the mean structure or the presence of outliers
- Influence diagnostics, such as Cook's distance or leverage, can identify observations that have a disproportionate impact on the estimated coefficients
- Example: A residual plot showing a clear non-linear trend would suggest that the mean structure should be modified to include non-linear terms or transformations of the covariates

📊Advanced Quantitative Methods Unit 11 Review

11.3 Generalized estimating equations (GEE)

📊Advanced Quantitative Methods Unit 11 Review

11.3 Generalized estimating equations (GEE)

Unit & Topic Study Guides

Generalized Estimating Equations

Overview and Applications

Advantages and Limitations

GEE vs Other Methods

Comparison with Mixed Effects Models

Comparison with Repeated Measures ANOVA

Marginal Models with GEE

Specifying the Mean Structure

Specifying the Correlation Structure

Estimating Regression Coefficients

Interpreting GEE Results

Interpreting Regression Coefficients

Assessing Model Fit and Diagnostics

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Advanced Quantitative Methods
Unit 11 Review