Fiveable

๐Ÿ“ŠActuarial Mathematics Unit 11 Review

QR code for Actuarial Mathematics practice questions

11.2 Survival analysis and Cox proportional hazards

๐Ÿ“ŠActuarial Mathematics
Unit 11 Review

11.2 Survival analysis and Cox proportional hazards

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠActuarial Mathematics
Unit & Topic Study Guides

Survival analysis is a crucial tool in actuarial science, focusing on time-to-event data like death, disability, or policy lapse. It provides methods to model and predict these events, helping actuaries make informed decisions about pricing, reserving, and risk management.

Key concepts include time origin and scale, censoring and truncation, and survival and hazard functions. The Cox proportional hazards model is a popular semi-parametric approach, while parametric models like Weibull and Gompertz offer more specific functional forms for survival or hazard functions.

Survival analysis fundamentals

  • Survival analysis is a branch of statistics that studies time-to-event data, focusing on the analysis of the time until an event of interest occurs
  • It plays a crucial role in actuarial science by providing tools to model and predict the occurrence of events such as death, disability, or policy lapse
  • Key concepts in survival analysis include time origin and scale, censoring and truncation, and survival and hazard functions

Time origin and scale

  • Time origin refers to the starting point for measuring the time-to-event, which can be birth, policy inception, or diagnosis of a disease
  • Time scale is the unit of measurement for the time-to-event, commonly measured in years, months, or days
  • The choice of time origin and scale depends on the specific context and research question

Censoring and truncation

  • Censoring occurs when the exact time-to-event is not observed, either because the event has not occurred by the end of the study (right-censoring) or because the individual is lost to follow-up (random censoring)
  • Truncation happens when individuals are not included in the study because their event times do not fall within the observation window (left-truncation) or because they have already experienced the event before the study begins (right-truncation)
  • Proper handling of censoring and truncation is essential to avoid bias in survival analysis

Survival and hazard functions

  • The survival function, denoted as $S(t)$, represents the probability that an individual survives beyond time $t$
  • The hazard function, denoted as $h(t)$, represents the instantaneous risk of experiencing the event at time $t$, given that the individual has survived up to that point
  • The survival and hazard functions are related through the equation: $h(t) = -\frac{d}{dt} \log S(t)$
  • Estimating and interpreting these functions is fundamental to survival analysis

Nonparametric estimation

  • Nonparametric methods estimate the survival and hazard functions without assuming a specific parametric form for the underlying distribution
  • These methods are useful when the shape of the survival function is unknown or when the assumptions of parametric models are not met
  • Common nonparametric estimators include the Kaplan-Meier estimator and the Nelson-Aalen estimator

Kaplan-Meier estimator

  • The Kaplan-Meier estimator, also known as the product-limit estimator, is a nonparametric method for estimating the survival function from censored data
  • It is calculated as the product of the conditional probabilities of surviving each event time, given survival up to that point
  • The Kaplan-Meier estimator is a step function that jumps at each observed event time and provides a useful visual representation of the survival experience

Nelson-Aalen estimator

  • The Nelson-Aalen estimator is a nonparametric method for estimating the cumulative hazard function, denoted as $H(t)$
  • It is calculated as the sum of the increments in the hazard function at each observed event time, divided by the number of individuals at risk
  • The Nelson-Aalen estimator is particularly useful for estimating the hazard function when the proportional hazards assumption is violated

Confidence intervals and bands

  • Confidence intervals and bands provide a measure of uncertainty around the estimated survival or cumulative hazard functions
  • Pointwise confidence intervals are calculated at specific time points and indicate the range of plausible values for the true function at those points
  • Simultaneous confidence bands cover the entire range of the estimated function with a specified probability and are wider than pointwise intervals
  • These measures help assess the precision and reliability of the nonparametric estimates

Parametric survival models

  • Parametric survival models assume a specific functional form for the survival or hazard function, characterized by a set of parameters
  • These models provide a concise summary of the survival experience and allow for extrapolation beyond the observed data
  • Common parametric distributions used in survival analysis include the exponential, Weibull, and Gompertz distributions

Exponential distribution

  • The exponential distribution is the simplest parametric survival model, characterized by a constant hazard rate $\lambda$
  • The survival function for the exponential distribution is given by $S(t) = \exp(-\lambda t)$
  • The exponential distribution is memoryless, meaning that the probability of surviving an additional time $t$ does not depend on the time already survived

Weibull distribution

  • The Weibull distribution is a flexible parametric model that allows for increasing, decreasing, or constant hazard rates
  • It is characterized by two parameters: the shape parameter $\alpha$ and the scale parameter $\lambda$
  • The survival function for the Weibull distribution is given by $S(t) = \exp(-(\lambda t)^{\alpha})$
  • When $\alpha = 1$, the Weibull distribution reduces to the exponential distribution

Gompertz distribution

  • The Gompertz distribution is commonly used to model mortality in actuarial science, particularly for older ages
  • It is characterized by an exponentially increasing hazard rate, with parameters $\alpha$ and $\beta$
  • The survival function for the Gompertz distribution is given by $S(t) = \exp(\frac{\alpha}{\beta}(1 - \exp(\beta t)))$
  • The Gompertz distribution is often used in conjunction with the Makeham term to capture age-independent mortality

Accelerated failure time models

  • Accelerated failure time (AFT) models are a class of parametric survival models that relate the survival time to covariates through a linear equation
  • The logarithm of the survival time is modeled as a linear function of the covariates, with an error term following a specific distribution (e.g., Weibull, log-normal, log-logistic)
  • AFT models provide an alternative to the Cox proportional hazards model when the proportional hazards assumption is not met
  • The interpretation of AFT model coefficients is in terms of the acceleration factor, which represents the change in the time scale due to a unit change in the covariate

Cox proportional hazards model

  • The Cox proportional hazards (PH) model is a semi-parametric regression model for analyzing the effect of covariates on the hazard function
  • It assumes that the hazard ratio between two individuals with different covariate values is constant over time
  • The Cox PH model is widely used in survival analysis due to its flexibility and ability to handle censored data

Model specification and assumptions

  • The Cox PH model specifies the hazard function as the product of a baseline hazard $h_0(t)$ and an exponential term involving the covariates: $h(t|X) = h_0(t) \exp(\beta^T X)$
  • The baseline hazard is left unspecified, making the model semi-parametric
  • The key assumption of the Cox PH model is the proportional hazards assumption, which states that the hazard ratio between two individuals is constant over time

Partial likelihood estimation

  • The parameters of the Cox PH model are estimated using the partial likelihood method, which focuses on the order of the event times rather than their exact values
  • The partial likelihood is constructed by considering the probability of observing each event at its respective time, conditional on the set of individuals at risk
  • Maximum likelihood estimation is used to obtain the parameter estimates, which are interpreted as log-hazard ratios

Hazard ratio interpretation

  • The hazard ratio (HR) is the exponential of the coefficient in the Cox PH model and represents the relative change in the hazard function for a unit increase in the corresponding covariate
  • An HR greater than 1 indicates an increased risk, while an HR less than 1 indicates a decreased risk
  • The HR is assumed to be constant over time, reflecting the proportional hazards assumption

Tied event times

  • Tied event times occur when multiple individuals experience the event at the same recorded time
  • Handling ties is important for the accurate estimation of the Cox PH model parameters
  • Common methods for handling ties include the Breslow approximation, Efron approximation, and exact partial likelihood

Time-dependent covariates

  • Time-dependent covariates are variables whose values change over time, such as age or cumulative exposure
  • The Cox PH model can be extended to incorporate time-dependent covariates by allowing the covariate values to vary with time in the linear predictor
  • Including time-dependent covariates relaxes the proportional hazards assumption and enables the modeling of more complex hazard patterns

Model diagnostics and assessment

  • Model diagnostics and assessment are crucial steps in evaluating the adequacy and validity of survival models
  • These techniques help identify potential violations of model assumptions, assess the model's fit to the data, and compare different models
  • Common diagnostic tools include checking the proportional hazards assumption, residual analysis, goodness-of-fit tests, and model selection criteria

Proportional hazards assumption

  • Verifying the proportional hazards (PH) assumption is essential when using the Cox PH model
  • Graphical methods, such as plotting the scaled Schoenfeld residuals against time or the log-minus-log survival curves, can reveal departures from the PH assumption
  • Statistical tests, such as the Grambsch-Therneau test, provide formal assessments of the PH assumption for each covariate

Residual analysis

  • Residuals are the differences between the observed and expected outcomes based on the fitted model
  • In survival analysis, common types of residuals include Cox-Snell, deviance, and martingale residuals
  • Plotting residuals against time or covariates can reveal patterns or outliers that suggest model misspecification or influential observations

Goodness-of-fit tests

  • Goodness-of-fit tests assess how well the model fits the observed data
  • The likelihood ratio test compares nested models and evaluates the significance of additional covariates
  • The Wald test assesses the significance of individual coefficients in the model
  • The score test is used to test the inclusion of new covariates without refitting the model

Model selection criteria

  • Model selection criteria help choose the best model among a set of candidate models
  • The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) balance the model's fit with its complexity by penalizing the number of parameters
  • Lower values of AIC and BIC indicate better model performance
  • Cross-validation techniques, such as k-fold or leave-one-out, assess the model's predictive performance on unseen data

Competing risks and multistate models

  • Competing risks occur when an individual is at risk of experiencing multiple mutually exclusive events, such as different causes of death
  • Multistate models extend survival analysis to situations where individuals can transition between different states, such as healthy, diseased, and dead
  • These models provide a framework for analyzing more complex event histories and estimating cause-specific quantities

Cause-specific hazards

  • In the presence of competing risks, the cause-specific hazard function represents the instantaneous risk of experiencing a specific event type, given that no event has occurred up to that point
  • The cause-specific hazard functions are estimated separately for each event type using standard survival analysis methods, treating other event types as censored
  • The overall hazard function is the sum of the cause-specific hazard functions

Cumulative incidence functions

  • The cumulative incidence function (CIF) represents the probability of experiencing a specific event type by a given time, accounting for the presence of competing risks
  • The CIF is estimated using the Aalen-Johansen estimator, which generalizes the Kaplan-Meier estimator to the competing risks setting
  • The CIFs for different event types do not sum to 1, as they consider the probability of the competing events

Multistate Markov models

  • Multistate Markov models describe the transitions between different states over time, assuming that the future state depends only on the current state (Markov property)
  • The transition intensities between states are estimated using methods such as the Aalen-Johansen estimator or maximum likelihood estimation
  • Multistate models allow for the estimation of state occupation probabilities, transition probabilities, and sojourn times
  • These models are particularly useful in actuarial applications, such as disability insurance and long-term care insurance

Recurrent event analysis

  • Recurrent event analysis deals with situations where an individual can experience the same event multiple times, such as repeated hospitalizations or insurance claims
  • These models account for the dependence between events within an individual and provide insights into the event process over time
  • Common approaches to recurrent event analysis include intensity functions, variance-corrected models, and frailty models

Intensity functions

  • The intensity function represents the instantaneous risk of experiencing an event at a specific time, given the event history up to that point
  • The Andersen-Gill model extends the Cox PH model to recurrent events by treating each event as a separate observation and allowing for time-dependent covariates
  • The Prentice-Williams-Peterson model stratifies the baseline hazard by the event number, accommodating event-specific baseline hazards

Variance-corrected models

  • Variance-corrected models account for the dependence between events within an individual by adjusting the standard errors of the parameter estimates
  • The robust variance estimator, also known as the sandwich estimator, provides consistent estimates of the standard errors in the presence of within-individual correlation
  • Variance-corrected models maintain the interpretation of the hazard ratios while ensuring valid inference

Frailty models

  • Frailty models introduce a random effect term, called frailty, to capture unobserved heterogeneity among individuals in their susceptibility to events
  • The frailty term is assumed to follow a specific distribution, such as gamma or log-normal, and induces dependence between events within an individual
  • Frailty models allow for the estimation of individual-specific hazard functions and provide insights into the sources of variability in the event process

Applications in actuarial science

  • Survival analysis has numerous applications in actuarial science, where the modeling and prediction of time-to-event outcomes are crucial for pricing, reserving, and risk management
  • Some key areas where survival analysis is applied in actuarial science include life insurance, pension plans, health insurance, and credit risk modeling

Life insurance pricing and reserving

  • Survival analysis is used to model mortality rates and estimate life expectancies for pricing life insurance products
  • Parametric survival models, such as the Gompertz-Makeham model, are commonly used to fit mortality data and generate life tables
  • Survival models help determine the appropriate level of reserves required to meet future policy obligations

Pension plan valuation

  • Survival analysis is employed to model the longevity risk in pension plans, considering factors such as age, gender, and socioeconomic status
  • Multistate models are used to capture the transitions between active, retired, and deceased states for pension plan members
  • Accurate modeling of survival probabilities is essential for estimating pension liabilities and designing sustainable pension schemes

Health insurance claims modeling

  • Survival analysis is applied to model the incidence and duration of health insurance claims, such as hospitalizations or disability events
  • Recurrent event models are used to analyze the frequency and timing of repeated claims for an individual policyholder
  • Survival models help insurers set appropriate premiums, manage claims reserves, and design efficient health insurance products

Credit risk modeling

  • Survival analysis is used in credit risk modeling to estimate the probability of default and the timing of credit events, such as loan defaults or bond failures
  • The Cox PH model is commonly employed to assess the impact of borrower characteristics and macroeconomic factors on the hazard of default
  • Competing risks models are used to distinguish between different types of credit events, such as default, prepayment, and maturity
  • Survival models contribute to the development of credit scoring systems and the calculation of credit risk measures, such as expected loss and value-at-risk