📊Actuarial Mathematics Unit 11 Review

11.2 Survival analysis and Cox proportional hazards

📊Actuarial Mathematics
Unit 11 Review

11.2 Survival analysis and Cox proportional hazards

Written by the Fiveable Content Team • Last updated September 2025

📊Actuarial Mathematics

Unit & Topic Study Guides

11.1 Generalized linear models and regression analysis

11.2 Survival analysis and Cox proportional hazards

11.3 Time series analysis and forecasting

11.4 Bayesian inference and Markov chain Monte Carlo

11.5 Machine learning and predictive modeling

Survival analysis is a crucial tool in actuarial science, focusing on time-to-event data like death, disability, or policy lapse. It provides methods to model and predict these events, helping actuaries make informed decisions about pricing, reserving, and risk management.

Key concepts include time origin and scale, censoring and truncation, and survival and hazard functions. The Cox proportional hazards model is a popular semi-parametric approach, while parametric models like Weibull and Gompertz offer more specific functional forms for survival or hazard functions.

Survival analysis fundamentals

Survival analysis is a branch of statistics that studies time-to-event data, focusing on the analysis of the time until an event of interest occurs
It plays a crucial role in actuarial science by providing tools to model and predict the occurrence of events such as death, disability, or policy lapse
Key concepts in survival analysis include time origin and scale, censoring and truncation, and survival and hazard functions

Time origin and scale

Time origin refers to the starting point for measuring the time-to-event, which can be birth, policy inception, or diagnosis of a disease
Time scale is the unit of measurement for the time-to-event, commonly measured in years, months, or days
The choice of time origin and scale depends on the specific context and research question

Censoring and truncation

Censoring occurs when the exact time-to-event is not observed, either because the event has not occurred by the end of the study (right-censoring) or because the individual is lost to follow-up (random censoring)
Truncation happens when individuals are not included in the study because their event times do not fall within the observation window (left-truncation) or because they have already experienced the event before the study begins (right-truncation)
Proper handling of censoring and truncation is essential to avoid bias in survival analysis

Survival and hazard functions

The survival function, denoted as $S(t)$, represents the probability that an individual survives beyond time $t$
The hazard function, denoted as $h(t)$, represents the instantaneous risk of experiencing the event at time $t$, given that the individual has survived up to that point
The survival and hazard functions are related through the equation: $h(t) = -\frac{d}{dt} \log S(t)$
Estimating and interpreting these functions is fundamental to survival analysis

Nonparametric estimation

Nonparametric methods estimate the survival and hazard functions without assuming a specific parametric form for the underlying distribution
These methods are useful when the shape of the survival function is unknown or when the assumptions of parametric models are not met
Common nonparametric estimators include the Kaplan-Meier estimator and the Nelson-Aalen estimator

Kaplan-Meier estimator

The Kaplan-Meier estimator, also known as the product-limit estimator, is a nonparametric method for estimating the survival function from censored data
It is calculated as the product of the conditional probabilities of surviving each event time, given survival up to that point
The Kaplan-Meier estimator is a step function that jumps at each observed event time and provides a useful visual representation of the survival experience

Nelson-Aalen estimator

The Nelson-Aalen estimator is a nonparametric method for estimating the cumulative hazard function, denoted as $H(t)$
It is calculated as the sum of the increments in the hazard function at each observed event time, divided by the number of individuals at risk
The Nelson-Aalen estimator is particularly useful for estimating the hazard function when the proportional hazards assumption is violated

Confidence intervals and bands

Confidence intervals and bands provide a measure of uncertainty around the estimated survival or cumulative hazard functions
Pointwise confidence intervals are calculated at specific time points and indicate the range of plausible values for the true function at those points
Simultaneous confidence bands cover the entire range of the estimated function with a specified probability and are wider than pointwise intervals
These measures help assess the precision and reliability of the nonparametric estimates

Parametric survival models

Parametric survival models assume a specific functional form for the survival or hazard function, characterized by a set of parameters
These models provide a concise summary of the survival experience and allow for extrapolation beyond the observed data
Common parametric distributions used in survival analysis include the exponential, Weibull, and Gompertz distributions

Exponential distribution

The exponential distribution is the simplest parametric survival model, characterized by a constant hazard rate $\lambda$
The survival function for the exponential distribution is given by $S(t) = \exp(-\lambda t)$
The exponential distribution is memoryless, meaning that the probability of surviving an additional time $t$ does not depend on the time already survived

Weibull distribution

The Weibull distribution is a flexible parametric model that allows for increasing, decreasing, or constant hazard rates
It is characterized by two parameters: the shape parameter $\alpha$ and the scale parameter $\lambda$
The survival function for the Weibull distribution is given by $S(t) = \exp(-(\lambda t)^{\alpha})$
When $\alpha = 1$, the Weibull distribution reduces to the exponential distribution

Gompertz distribution

The Gompertz distribution is commonly used to model mortality in actuarial science, particularly for older ages
It is characterized by an exponentially increasing hazard rate, with parameters $\alpha$ and $\beta$
The survival function for the Gompertz distribution is given by $S(t) = \exp(\frac{\alpha}{\beta}(1 - \exp(\beta t)))$
The Gompertz distribution is often used in conjunction with the Makeham term to capture age-independent mortality

Accelerated failure time models

Accelerated failure time (AFT) models are a class of parametric survival models that relate the survival time to covariates through a linear equation
The logarithm of the survival time is modeled as a linear function of the covariates, with an error term following a specific distribution (e.g., Weibull, log-normal, log-logistic)
AFT models provide an alternative to the Cox proportional hazards model when the proportional hazards assumption is not met
The interpretation of AFT model coefficients is in terms of the acceleration factor, which represents the change in the time scale due to a unit change in the covariate

Cox proportional hazards model

The Cox proportional hazards (PH) model is a semi-parametric regression model for analyzing the effect of covariates on the hazard function
It assumes that the hazard ratio between two individuals with different covariate values is constant over time
The Cox PH model is widely used in survival analysis due to its flexibility and ability to handle censored data

Model specification and assumptions

The Cox PH model specifies the hazard function as the product of a baseline hazard $h_0(t)$ and an exponential term involving the covariates: $h(t|X) = h_0(t) \exp(\beta^T X)$
The baseline hazard is left unspecified, making the model semi-parametric
The key assumption of the Cox PH model is the proportional hazards assumption, which states that the hazard ratio between two individuals is constant over time

Partial likelihood estimation

The parameters of the Cox PH model are estimated using the partial likelihood method, which focuses on the order of the event times rather than their exact values
The partial likelihood is constructed by considering the probability of observing each event at its respective time, conditional on the set of individuals at risk
Maximum likelihood estimation is used to obtain the parameter estimates, which are interpreted as log-hazard ratios

Hazard ratio interpretation

The hazard ratio (HR) is the exponential of the coefficient in the Cox PH model and represents the relative change in the hazard function for a unit increase in the corresponding covariate
An HR greater than 1 indicates an increased risk, while an HR less than 1 indicates a decreased risk
The HR is assumed to be constant over time, reflecting the proportional hazards assumption

Tied event times

Tied event times occur when multiple individuals experience the event at the same recorded time
Handling ties is important for the accurate estimation of the Cox PH model parameters
Common methods for handling ties include the Breslow approximation, Efron approximation, and exact partial likelihood

Time-dependent covariates

Time-dependent covariates are variables whose values change over time, such as age or cumulative exposure
The Cox PH model can be extended to incorporate time-dependent covariates by allowing the covariate values to vary with time in the linear predictor
Including time-dependent covariates relaxes the proportional hazards assumption and enables the modeling of more complex hazard patterns

Model diagnostics and assessment

Model diagnostics and assessment are crucial steps in evaluating the adequacy and validity of survival models
These techniques help identify potential violations of model assumptions, assess the model's fit to the data, and compare different models
Common diagnostic tools include checking the proportional hazards assumption, residual analysis, goodness-of-fit tests, and model selection criteria

Proportional hazards assumption

Verifying the proportional hazards (PH) assumption is essential when using the Cox PH model
Graphical methods, such as plotting the scaled Schoenfeld residuals against time or the log-minus-log survival curves, can reveal departures from the PH assumption
Statistical tests, such as the Grambsch-Therneau test, provide formal assessments of the PH assumption for each covariate

Residual analysis

Residuals are the differences between the observed and expected outcomes based on the fitted model
In survival analysis, common types of residuals include Cox-Snell, deviance, and martingale residuals
Plotting residuals against time or covariates can reveal patterns or outliers that suggest model misspecification or influential observations

Goodness-of-fit tests

Goodness-of-fit tests assess how well the model fits the observed data
The likelihood ratio test compares nested models and evaluates the significance of additional covariates
The Wald test assesses the significance of individual coefficients in the model
The score test is used to test the inclusion of new covariates without refitting the model

Model selection criteria

Model selection criteria help choose the best model among a set of candidate models
The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) balance the model's fit with its complexity by penalizing the number of parameters
Lower values of AIC and BIC indicate better model performance
Cross-validation techniques, such as k-fold or leave-one-out, assess the model's predictive performance on unseen data

Competing risks and multistate models

Competing risks occur when an individual is at risk of experiencing multiple mutually exclusive events, such as different causes of death
Multistate models extend survival analysis to situations where individuals can transition between different states, such as healthy, diseased, and dead
These models provide a framework for analyzing more complex event histories and estimating cause-specific quantities

Cause-specific hazards

In the presence of competing risks, the cause-specific hazard function represents the instantaneous risk of experiencing a specific event type, given that no event has occurred up to that point
The cause-specific hazard functions are estimated separately for each event type using standard survival analysis methods, treating other event types as censored
The overall hazard function is the sum of the cause-specific hazard functions

Cumulative incidence functions

The cumulative incidence function (CIF) represents the probability of experiencing a specific event type by a given time, accounting for the presence of competing risks
The CIF is estimated using the Aalen-Johansen estimator, which generalizes the Kaplan-Meier estimator to the competing risks setting
The CIFs for different event types do not sum to 1, as they consider the probability of the competing events

Multistate Markov models

Multistate Markov models describe the transitions between different states over time, assuming that the future state depends only on the current state (Markov property)
The transition intensities between states are estimated using methods such as the Aalen-Johansen estimator or maximum likelihood estimation
Multistate models allow for the estimation of state occupation probabilities, transition probabilities, and sojourn times
These models are particularly useful in actuarial applications, such as disability insurance and long-term care insurance

Recurrent event analysis

Recurrent event analysis deals with situations where an individual can experience the same event multiple times, such as repeated hospitalizations or insurance claims
These models account for the dependence between events within an individual and provide insights into the event process over time
Common approaches to recurrent event analysis include intensity functions, variance-corrected models, and frailty models

Intensity functions

The intensity function represents the instantaneous risk of experiencing an event at a specific time, given the event history up to that point
The Andersen-Gill model extends the Cox PH model to recurrent events by treating each event as a separate observation and allowing for time-dependent covariates
The Prentice-Williams-Peterson model stratifies the baseline hazard by the event number, accommodating event-specific baseline hazards

Variance-corrected models

Variance-corrected models account for the dependence between events within an individual by adjusting the standard errors of the parameter estimates
The robust variance estimator, also known as the sandwich estimator, provides consistent estimates of the standard errors in the presence of within-individual correlation
Variance-corrected models maintain the interpretation of the hazard ratios while ensuring valid inference

Frailty models

Frailty models introduce a random effect term, called frailty, to capture unobserved heterogeneity among individuals in their susceptibility to events
The frailty term is assumed to follow a specific distribution, such as gamma or log-normal, and induces dependence between events within an individual
Frailty models allow for the estimation of individual-specific hazard functions and provide insights into the sources of variability in the event process

Applications in actuarial science

Survival analysis has numerous applications in actuarial science, where the modeling and prediction of time-to-event outcomes are crucial for pricing, reserving, and risk management
Some key areas where survival analysis is applied in actuarial science include life insurance, pension plans, health insurance, and credit risk modeling

Life insurance pricing and reserving

Survival analysis is used to model mortality rates and estimate life expectancies for pricing life insurance products
Parametric survival models, such as the Gompertz-Makeham model, are commonly used to fit mortality data and generate life tables
Survival models help determine the appropriate level of reserves required to meet future policy obligations

Pension plan valuation

Survival analysis is employed to model the longevity risk in pension plans, considering factors such as age, gender, and socioeconomic status
Multistate models are used to capture the transitions between active, retired, and deceased states for pension plan members
Accurate modeling of survival probabilities is essential for estimating pension liabilities and designing sustainable pension schemes

Health insurance claims modeling

Survival analysis is applied to model the incidence and duration of health insurance claims, such as hospitalizations or disability events
Recurrent event models are used to analyze the frequency and timing of repeated claims for an individual policyholder
Survival models help insurers set appropriate premiums, manage claims reserves, and design efficient health insurance products

Credit risk modeling

Survival analysis is used in credit risk modeling to estimate the probability of default and the timing of credit events, such as loan defaults or bond failures
The Cox PH model is commonly employed to assess the impact of borrower characteristics and macroeconomic factors on the hazard of default
Competing risks models are used to distinguish between different types of credit events, such as default, prepayment, and maturity
Survival models contribute to the development of credit scoring systems and the calculation of credit risk measures, such as expected loss and value-at-risk

📊Actuarial Mathematics Unit 11 Review

11.2 Survival analysis and Cox proportional hazards

📊Actuarial Mathematics Unit 11 Review

11.2 Survival analysis and Cox proportional hazards

Unit & Topic Study Guides

Survival analysis fundamentals

Time origin and scale

Censoring and truncation

Survival and hazard functions

Nonparametric estimation

Kaplan-Meier estimator

Nelson-Aalen estimator

Confidence intervals and bands

Parametric survival models

Exponential distribution

Weibull distribution

Gompertz distribution

Accelerated failure time models

Cox proportional hazards model

Model specification and assumptions

Partial likelihood estimation

Hazard ratio interpretation

Tied event times

Time-dependent covariates

Model diagnostics and assessment

Proportional hazards assumption

Residual analysis

Goodness-of-fit tests

Model selection criteria

Competing risks and multistate models

Cause-specific hazards

Cumulative incidence functions

Multistate Markov models

Recurrent event analysis

Intensity functions

Variance-corrected models

Frailty models

Applications in actuarial science

Life insurance pricing and reserving

Pension plan valuation

Health insurance claims modeling

Credit risk modeling

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Actuarial Mathematics
Unit 11 Review