Fiveable

๐Ÿ“ŠActuarial Mathematics Unit 11 Review

QR code for Actuarial Mathematics practice questions

11.1 Generalized linear models and regression analysis

๐Ÿ“ŠActuarial Mathematics
Unit 11 Review

11.1 Generalized linear models and regression analysis

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠActuarial Mathematics
Unit & Topic Study Guides

Generalized linear models expand on traditional linear regression, allowing for analysis of non-normal data common in insurance and finance. They combine a response variable, linear predictor, link function, and variance function to model complex relationships between variables.

GLMs are crucial for actuaries in risk assessment and pricing. By accommodating various data distributions, they provide flexible tools for modeling claim frequency, severity, and other key metrics in insurance and financial applications.

Fundamentals of generalized linear models

  • Generalized linear models (GLMs) extend the concept of linear regression to accommodate a wider range of response variable distributions, making them essential tools in actuarial modeling and risk assessment
  • GLMs allow for the analysis of non-normal data, such as count data, binary outcomes, and continuous positive data, which are common in insurance and financial applications

Components of GLMs

  • Response variable: The dependent variable in a GLM, which follows a distribution from the exponential family
  • Linear predictor: A linear combination of the explanatory variables, denoted as $\eta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p$
  • Link function: A function that relates the expected value of the response variable to the linear predictor, allowing for non-linear relationships between the predictors and the response
  • Variance function: Describes the relationship between the mean and variance of the response variable, which depends on the chosen distribution

Exponential family of distributions

  • The exponential family includes a wide range of distributions, such as Normal, Poisson, Binomial, Gamma, and Inverse Gaussian
  • Distributions in the exponential family have a common form for their probability density or mass function, given by $f(y; \theta, \phi) = \exp\left(\frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi)\right)$
    • $\theta$: natural parameter
    • $\phi$: dispersion parameter
    • $a(\cdot)$, $b(\cdot)$, and $c(\cdot)$: functions specific to each distribution
  • The link function $g(\cdot)$ relates the expected value of the response variable $\mu = \mathbb{E}(Y)$ to the linear predictor $\eta$, such that $g(\mu) = \eta$
  • Common link functions include:
    • Identity link: $g(\mu) = \mu$ (used in linear regression)
    • Log link: $g(\mu) = \log(\mu)$ (used in Poisson regression)
    • Logit link: $g(\mu) = \log\left(\frac{\mu}{1-\mu}\right)$ (used in logistic regression)
    • Inverse link: $g(\mu) = \frac{1}{\mu}$ (used in Gamma regression)
  • The choice of link function depends on the distribution of the response variable and the desired interpretation of the model coefficients

Maximum likelihood estimation in GLMs

  • Maximum likelihood estimation (MLE) is a method for estimating the parameters of a GLM by maximizing the likelihood function, which measures the probability of observing the data given the model parameters
  • MLE is a fundamental concept in actuarial science, as it allows for the estimation of risk parameters and the assessment of model fit

Log-likelihood function

  • The log-likelihood function is the natural logarithm of the likelihood function, given by $\ell(\boldsymbol{\beta}) = \sum_{i=1}^n \log f(y_i; \theta_i, \phi)$, where $\boldsymbol{\beta}$ is the vector of regression coefficients
  • Maximizing the log-likelihood function is equivalent to maximizing the likelihood function, but it is often more convenient to work with the log-likelihood due to its additive properties

Fisher scoring algorithm

  • The Fisher scoring algorithm is an iterative method for finding the maximum likelihood estimates of the regression coefficients in a GLM
  • The algorithm updates the estimates at each iteration using the Fisher information matrix, which is the expected value of the negative Hessian matrix of the log-likelihood function
  • The Fisher scoring algorithm is more stable and efficient than the Newton-Raphson method, particularly when the likelihood function is not well-behaved

Iteratively reweighted least squares

  • Iteratively reweighted least squares (IRLS) is an alternative formulation of the Fisher scoring algorithm that is commonly used in statistical software packages
  • IRLS recasts the GLM estimation problem as a weighted least squares problem, where the weights are updated at each iteration based on the current estimates of the regression coefficients and the variance function
  • The IRLS algorithm converges to the same maximum likelihood estimates as the Fisher scoring algorithm, but it provides additional insights into the structure of the GLM and the role of the variance function

Model selection and validation

  • Model selection and validation are crucial steps in the development of GLMs, as they help to identify the most appropriate model structure and assess its performance on new data
  • Actuaries use various model selection criteria and validation techniques to balance model complexity and predictive accuracy, ensuring that the chosen model is suitable for pricing, reserving, and risk management purposes

Deviance and likelihood ratio tests

  • Deviance is a measure of the goodness of fit of a GLM, defined as twice the difference between the log-likelihood of the saturated model (a model with a separate parameter for each observation) and the log-likelihood of the fitted model
  • Likelihood ratio tests compare the deviance of two nested models (where one model is a special case of the other) to assess whether the more complex model provides a significantly better fit to the data
  • The likelihood ratio test statistic follows a chi-squared distribution under the null hypothesis that the simpler model is adequate, with degrees of freedom equal to the difference in the number of parameters between the two models

Akaike information criterion (AIC)

  • The Akaike information criterion (AIC) is a model selection criterion that balances the goodness of fit of a model with its complexity, penalizing models with a larger number of parameters
  • AIC is defined as $\text{AIC} = -2\ell(\hat{\boldsymbol{\beta}}) + 2p$, where $\ell(\hat{\boldsymbol{\beta}})$ is the log-likelihood of the fitted model and $p$ is the number of parameters
  • Models with lower AIC values are preferred, as they provide a better trade-off between fit and complexity

Bayesian information criterion (BIC)

  • The Bayesian information criterion (BIC), also known as the Schwarz criterion, is another model selection criterion that penalizes model complexity more heavily than AIC
  • BIC is defined as $\text{BIC} = -2\ell(\hat{\boldsymbol{\beta}}) + p\log(n)$, where $n$ is the sample size
  • Like AIC, models with lower BIC values are preferred, but BIC tends to favor simpler models than AIC, particularly when the sample size is large

Residual analysis and diagnostics

  • Residual analysis involves examining the differences between the observed response values and the fitted values from the GLM to assess model assumptions and identify potential outliers or influential observations
  • Common diagnostic plots for GLMs include:
    • Residuals vs. fitted values plot: Checks for non-linearity, heteroscedasticity, and outliers
    • Normal Q-Q plot of residuals: Assesses the normality assumption for the errors
    • Scale-location plot: Examines the relationship between the absolute residuals and the fitted values to detect heteroscedasticity
    • Cook's distance plot: Identifies influential observations that may have a disproportionate impact on the model estimates
  • Residual analysis helps actuaries to refine their models, detect violations of assumptions, and improve the reliability of their predictions

Poisson regression for count data

  • Poisson regression is a type of GLM used to model count data, where the response variable represents the number of events occurring in a fixed interval of time or space
  • In actuarial applications, Poisson regression is often used to model claim frequency, the number of accidents, or the number of policy renewals

Poisson distribution and assumptions

  • The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval, assuming a constant average rate of occurrence
  • The probability mass function of the Poisson distribution is given by $P(Y = k) = \frac{e^{-\lambda}\lambda^k}{k!}$, where $\lambda$ is the average rate of occurrence
  • Poisson regression assumes that:
    • The response variable follows a Poisson distribution
    • The logarithm of the expected value of the response variable is linearly related to the predictors
    • The events occur independently of each other

Log-linear models and interpretation

  • In Poisson regression, the link function is the natural logarithm, resulting in a log-linear model: $\log(\mathbb{E}(Y)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p$
  • The coefficients in a Poisson regression model can be interpreted as the change in the log of the expected count for a one-unit increase in the corresponding predictor, holding all other predictors constant
  • To obtain the multiplicative effect of a predictor on the expected count, we can exponentiate the coefficient: $\exp(\beta_j)$ represents the ratio of the expected count for a one-unit increase in $x_j$, holding all other predictors constant

Overdispersion and quasi-Poisson models

  • Overdispersion occurs when the variance of the response variable is greater than its mean, violating the equidispersion assumption of the Poisson distribution
  • Overdispersion can lead to underestimated standard errors and incorrect inferences about the significance of predictors
  • Quasi-Poisson models address overdispersion by introducing a dispersion parameter $\phi$ that scales the variance of the response variable: $\text{Var}(Y) = \phi\mathbb{E}(Y)$
  • The quasi-Poisson model retains the same mean structure as the Poisson model but adjusts the standard errors and inference procedures to account for overdispersion

Logistic regression for binary outcomes

  • Logistic regression is a type of GLM used to model binary or categorical response variables, where the outcome of interest is the probability of an event occurring
  • In actuarial applications, logistic regression is often used to model the probability of a claim being filed, the likelihood of a policyholder renewing their coverage, or the risk of default on a loan
  • In logistic regression, the link function is the logit function, which is the natural logarithm of the odds: $\text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p$, where $p$ is the probability of the event occurring
  • The coefficients in a logistic regression model can be interpreted as the change in the log odds of the event for a one-unit increase in the corresponding predictor, holding all other predictors constant
  • The exponential of a coefficient, $\exp(\beta_j)$, represents the odds ratio for a one-unit increase in $x_j$, holding all other predictors constant
    • An odds ratio greater than 1 indicates an increased likelihood of the event
    • An odds ratio less than 1 indicates a decreased likelihood of the event

Interpretation of coefficients

  • To interpret the coefficients in a logistic regression model, it is often helpful to convert the log odds to probabilities using the inverse logit function: $p = \frac{\exp(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p)}{1 + \exp(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p)}$
  • The change in the probability of the event for a one-unit increase in a predictor depends on the values of the other predictors, due to the non-linear nature of the logit function
  • To assess the impact of a predictor on the probability scale, it is common to compute average marginal effects or predict probabilities at representative values of the predictors

Receiver operating characteristic (ROC) curves

  • Receiver operating characteristic (ROC) curves are a graphical tool for evaluating the performance of a binary classifier, such as a logistic regression model
  • An ROC curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) for different classification thresholds
  • The area under the ROC curve (AUC) is a summary measure of the model's discriminatory power, with values closer to 1 indicating better performance
  • Actuaries use ROC curves and AUC to compare different logistic regression models, assess the trade-off between sensitivity and specificity, and select optimal classification thresholds based on business objectives

Gamma regression for continuous positive data

  • Gamma regression is a type of GLM used to model continuous, positive response variables that exhibit a right-skewed distribution
  • In actuarial applications, Gamma regression is often used to model claim severity, loss amounts, or insurance premiums

Gamma distribution and assumptions

  • The Gamma distribution is a continuous probability distribution that describes the waiting time until a specified number of events occur in a Poisson process
  • The probability density function of the Gamma distribution is given by $f(y; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}y^{\alpha-1}e^{-\beta y}$, where $\alpha$ is the shape parameter and $\beta$ is the rate parameter
  • Gamma regression assumes that:
    • The response variable follows a Gamma distribution
    • The reciprocal of the expected value of the response variable is linearly related to the predictors
    • The variance of the response variable is proportional to the square of its mean
  • In Gamma regression, the link function is the inverse link, resulting in an inverse linear model: $\frac{1}{\mathbb{E}(Y)} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p$
  • The coefficients in a Gamma regression model can be interpreted as the change in the reciprocal of the expected value for a one-unit increase in the corresponding predictor, holding all other predictors constant
  • To obtain the multiplicative effect of a predictor on the expected value, we can calculate $\exp(-\beta_j)$, which represents the ratio of the expected value for a one-unit increase in $x_j$, holding all other predictors constant

Applications in insurance modeling

  • Gamma regression is widely used in insurance modeling, particularly for pricing and reserving purposes
  • Examples of applications include:
    • Modeling the severity of auto insurance claims, with predictors such as driver age, vehicle type, and accident history
    • Estimating the average cost per claim for a health insurance portfolio, based on policyholder characteristics and medical conditions
    • Predicting the loss amount for property insurance policies, considering factors such as property value, location, and construction type
  • By accurately modeling claim severity and loss amounts, actuaries can develop more precise pricing models, set adequate reserves, and manage risk exposure for insurance companies

Tweedie regression for compound Poisson-Gamma data

  • Tweedie regression is a type of GLM that combines the properties of Poisson and Gamma distributions to model continuous, non-negative data with a mass at zero
  • In actuarial applications, Tweedie regression is often used to model aggregate losses, where some observations have zero losses and others have positive loss amounts

Tweedie distribution and properties

  • The Tweedie distribution is a family of probability distributions that includes the Poisson, Gamma, and Gaussian distributions as special cases
  • The Tweedie distribution is characterized by a power variance function, where the variance is proportional to the mean raised to a power $p$: $\text{Var}(Y) = \phi\mathbb{E}(Y)^p$
  • The value of $p$ determines the specific distribution within the Tweedie family:
    • $p = 0$: Normal distribution
    • $p = 1$: Poisson distribution
    • $1 < p < 2$: Compound Poisson-Gamma distribution
    • $p = 2$: Gamma distribution
    • $p = 3$: Inverse Gaussian distribution

Power variance function and p-values

  • In Tweedie regression, the link function is the log link, and the variance function is the power variance function: $\log(\mathbb{E}(Y)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p$ and $\text{Var}(Y) = \phi\mathbb{E}(Y)^p$
  • The power parameter $p$ is estimated along with the regression coefficients and the dispersion parameter $\phi$ using maximum likelihood estimation
  • The choice of $p$ affects the interpretation of the model coefficients and the handling of zero observations in the data
  • Statistical software packages often provide tools for estimating the optimal value of $p$ based on the data and for assessing the goodness of fit of Tweedie regression models

Applications in actuarial science

  • Tweedie regression is particularly useful in actuarial science for modeling aggregate losses, which exhibit a mixture of zero and positive values
  • Examples of applications include:
    • Modeling the total claim amount for a portfolio of insurance policies, where some policyholders have no claims and others have varying claim amounts
    • Estimating the