🎲Data Science Statistics Unit 17 Review

17.2 Regularization Techniques (Lasso, Ridge)

🎲Data Science Statistics
Unit 17 Review

17.2 Regularization Techniques (Lasso, Ridge)

Written by the Fiveable Content Team • Last updated September 2025

🎲Data Science Statistics

Unit & Topic Study Guides

17.1 Bias-Variance Tradeoff

17.2 Regularization Techniques (Lasso, Ridge)

17.3 Cross-validation and Model Selection

Regularization techniques like Lasso and Ridge help prevent overfitting in statistical models. By adding penalties to the loss function, these methods shrink coefficients, promoting simpler models that generalize better to new data.

L1 (Lasso) and L2 (Ridge) regularization differ in their effects. Lasso can drive coefficients to zero, aiding feature selection, while Ridge shrinks coefficients without eliminating them. Both techniques balance model complexity and performance, improving predictions.

Regularization Techniques

L1 and L2 Regularization

L1 regularization (Lasso) adds absolute value of coefficients to loss function
- Promotes sparsity by driving some coefficients to exactly zero
- Useful for feature selection
- Mathematically expressed as $\text{Loss} + \lambda \sum_{i=1}^{n} |\beta_i|$
L2 regularization (Ridge) adds squared magnitudes of coefficients to loss function
- Shrinks coefficients towards zero but rarely to exactly zero
- Effective for handling multicollinearity
- Mathematically expressed as $\text{Loss} + \lambda \sum_{i=1}^{n} \beta_i^2$
Regularization parameter (λ) controls strength of regularization
- Larger λ values increase regularization effect
- Smaller λ values decrease regularization effect
- Optimal λ often determined through cross-validation

Advanced Regularization Methods

Elastic Net combines L1 and L2 regularization
- Balances feature selection and coefficient shrinkage
- Mathematically expressed as $\text{Loss} + \lambda_1 \sum_{i=1}^{n} |\beta_i| + \lambda_2 \sum_{i=1}^{n} \beta_i^2$
- Useful when dealing with correlated predictors
Shrinkage reduces magnitude of model coefficients
- Helps prevent overfitting by constraining model complexity
- L1 and L2 regularization both induce shrinkage
Sparsity refers to models with few non-zero coefficients
- L1 regularization promotes sparsity
- Leads to simpler, more interpretable models
- Useful in high-dimensional settings (gene expression data)

Model Evaluation

Understanding Model Fit

Overfitting occurs when model learns noise in training data
- Results in poor generalization to new, unseen data
- Characterized by low training error but high test error
- Can be addressed through regularization or increasing training data
Underfitting happens when model is too simple to capture underlying patterns
- Results in poor performance on both training and test data
- Characterized by high bias
- Can be addressed by increasing model complexity or adding features
Bias-variance tradeoff balances model simplicity and complexity
- Bias measures systematic error due to model assumptions
- Variance measures model sensitivity to fluctuations in training data
- Total error = Bias^2 + Variance + Irreducible Error
- Optimal model minimizes total error

Cross-validation Techniques

Cross-validation assesses model performance on unseen data
- Helps detect overfitting and estimate generalization error
- K-fold cross-validation divides data into K subsets
  - Train on K-1 subsets, validate on remaining subset
  - Repeat K times, rotating validation set
- Leave-one-out cross-validation uses single observation for validation
  - Computationally expensive but useful for small datasets
- Stratified cross-validation maintains class proportions in each fold
  - Useful for imbalanced datasets

Feature Selection and Regression

Feature Selection Techniques

Feature selection identifies most relevant predictors
- Improves model interpretability and reduces overfitting
- Can be performed using wrapper, filter, or embedded methods
Wrapper methods use model performance to select features
- Forward selection starts with no features, adds one at a time
- Backward elimination starts with all features, removes one at a time
- Recursive feature elimination iteratively removes least important features
Filter methods use statistical measures to select features
- Correlation-based selection chooses features highly correlated with target
- Mutual information quantifies dependency between feature and target
- Variance threshold removes features with low variance

Regularized Linear Regression

Regularized linear regression incorporates penalties into model fitting
- Lasso regression uses L1 regularization
- Ridge regression uses L2 regularization
- Elastic Net combines L1 and L2 regularization
Coefficient paths visualize how coefficients change with regularization strength
- X-axis represents regularization parameter (λ)
- Y-axis shows coefficient values
- Lasso paths can reach exactly zero, indicating feature elimination
- Ridge paths asymptotically approach zero but never reach it
Regularized regression models implemented in various libraries
- Scikit-learn provides Lasso, Ridge, and ElasticNet classes
- Statsmodels offers OLS with regularization options
- Regularization strength typically tuned using cross-validation

🎲Data Science Statistics Unit 17 Review

17.2 Regularization Techniques (Lasso, Ridge)

🎲Data Science Statistics
Unit 17 Review

17.2 Regularization Techniques (Lasso, Ridge)

Unit & Topic Study Guides

Regularization Techniques

L1 and L2 Regularization

Advanced Regularization Methods

Model Evaluation

Understanding Model Fit

Cross-validation Techniques

Feature Selection and Regression

Feature Selection Techniques

Regularized Linear Regression

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes