Fiveable

🥖Linear Modeling Theory Unit 16 Review

QR code for Linear Modeling Theory practice questions

16.4 Lasso and Elastic Net Regularization

🥖Linear Modeling Theory
Unit 16 Review

16.4 Lasso and Elastic Net Regularization

Written by the Fiveable Content Team • Last updated September 2025
Written by the Fiveable Content Team • Last updated September 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Lasso and elastic net regularization are powerful tools for tackling multicollinearity in linear regression. They build on ridge regression by not only shrinking coefficients but also performing variable selection. This helps create simpler, more interpretable models.

These techniques offer a balance between model complexity and accuracy. Lasso can produce sparse models by setting some coefficients to zero, while elastic net combines lasso and ridge penalties. This flexibility makes them valuable for handling various types of data and modeling challenges.

Lasso Regularization for Variable Selection

Lasso Penalty and Coefficient Shrinkage

  • Lasso (Least Absolute Shrinkage and Selection Operator) is a regularization technique that performs both variable selection and coefficient shrinkage simultaneously in linear regression models
  • The Lasso regularization adds a penalty term to the ordinary least squares (OLS) objective function, which is the sum of the absolute values of the coefficients multiplied by a tuning parameter (λ)
    • The tuning parameter (λ) controls the strength of the regularization. As λ increases, more coefficients are shrunk towards zero, effectively performing variable selection
    • The optimal value of the tuning parameter (λ) is typically selected using cross-validation techniques, such as k-fold cross-validation or leave-one-out cross-validation
  • The Lasso estimator is not invariant under scaling of the predictors, so it is important to standardize the variables before applying Lasso regularization to ensure fair penalization across variables with different scales

Sparse Models and Variable Selection

  • Lasso has the property of producing sparse models by setting some of the coefficients exactly to zero, effectively removing the corresponding variables from the model
    • This variable selection property is particularly useful when dealing with high-dimensional datasets with many predictors (p >> n) or when seeking a parsimonious model
  • The Lasso regularization helps to prevent overfitting and improves the model's interpretability by selecting a subset of the most relevant variables
    • By removing irrelevant or redundant variables, Lasso can enhance the model's generalization ability and reduce the risk of making predictions based on noise or spurious correlations
  • The sparsity induced by Lasso can also aid in feature selection and dimensionality reduction, especially when the true underlying model is sparse (i.e., only a few variables have non-zero coefficients)

Lasso vs Ridge Regression

Regularization Penalties

  • Both Lasso and ridge regression are regularization techniques used to address multicollinearity and improve the stability and interpretability of linear regression models
  • The main difference between Lasso and ridge regression lies in the type of penalty term added to the ordinary least squares (OLS) objective function:
    • Lasso uses the L1 penalty, which is the sum of the absolute values of the coefficients multiplied by the tuning parameter (λ): $\sum_{j=1}^{p} |\beta_j|$
    • Ridge regression uses the L2 penalty, which is the sum of the squared values of the coefficients multiplied by the tuning parameter (λ): $\sum_{j=1}^{p} \beta_j^2$
  • The choice between Lasso and ridge regression depends on the specific problem and the desired properties of the model, such as sparsity, interpretability, and predictive performance

Variable Selection and Coefficient Shrinkage

  • Lasso has the property of performing variable selection by setting some coefficients exactly to zero, effectively removing the corresponding variables from the model
    • Lasso tends to produce sparse models with a subset of the most relevant variables
  • In contrast, ridge regression shrinks the coefficients towards zero but does not set them exactly to zero
    • Ridge regression keeps all the variables in the model with shrunken coefficients
  • When the number of predictors is larger than the number of observations (p > n) or when there are highly correlated predictors, Lasso may arbitrarily select one variable from a group of correlated variables, while ridge regression tends to shrink the coefficients of correlated variables towards each other

Elastic Net Regularization

Combining Lasso and Ridge Penalties

  • Elastic net regularization is a linear combination of the Lasso (L1) and ridge (L2) penalties, combining their strengths to overcome some of their individual limitations
  • The elastic net penalty is controlled by two tuning parameters:
    • α, which controls the mixing proportion between the Lasso and ridge penalties. α = 1 corresponds to the Lasso penalty, α = 0 corresponds to the ridge penalty, and 0 < α < 1 represents a combination of both penalties
    • λ, which controls the overall strength of the regularization
  • Like Lasso and ridge regression, the optimal values of the tuning parameters (α and λ) in elastic net regularization are typically selected using cross-validation techniques

Handling Correlated Predictors

  • Elastic net regularization encourages a grouping effect, where strongly correlated predictors tend to be included or excluded together in the model
    • This property is beneficial when dealing with datasets containing groups of correlated variables, as it can select or exclude the entire group rather than arbitrarily choosing one variable
  • The elastic net penalty is particularly useful when there are many correlated predictors in the dataset, as it can handle the limitations of Lasso (which may arbitrarily select one variable from a group of correlated variables) and ridge regression (which may not perform variable selection)
  • Elastic net regularization provides a flexible framework for balancing between the sparsity of Lasso and the stability of ridge regression, depending on the choice of the mixing proportion (α)

Applying Lasso and Elastic Net Techniques

Using Statistical Software

  • To apply Lasso and elastic net regularization, popular statistical software packages such as R, Python (with scikit-learn), and MATLAB can be used
  • In R, the glmnet package provides functions for fitting Lasso, ridge, and elastic net regularized linear models using efficient algorithms
    • The glmnet() function is used to fit the regularized models, specifying the family (e.g., "gaussian" for linear regression), alpha (mixing proportion), and lambda (regularization strength) parameters
    • The cv.glmnet() function performs cross-validation to select the optimal values of the tuning parameters
  • Python's scikit-learn library offers the Lasso, Ridge, and ElasticNet classes for applying these regularization techniques to linear regression models
    • The alpha parameter in scikit-learn corresponds to the regularization strength (λ), and the l1_ratio parameter in ElasticNet corresponds to the mixing proportion (α)

Interpreting the Results

  • Interpreting the results of Lasso and elastic net regularization involves examining the coefficients of the selected variables and their corresponding regularization paths
    • The regularization path shows how the coefficients of the variables change as the regularization strength (λ) varies. Variables with non-zero coefficients are considered selected by the model
    • The optimal value of λ is typically chosen based on cross-validation, considering metrics such as mean squared error (MSE) or mean absolute error (MAE)
  • The selected variables and their coefficients provide insights into the most important predictors for the response variable and their effect sizes
  • It is important to assess the model's performance on a separate test set or using cross-validation to evaluate its generalization ability and avoid overfitting
  • Regularized models should be compared with unregularized models (e.g., ordinary least squares) to assess the benefits of regularization in terms of model simplicity, interpretability, and predictive performance