Fiveable

๐Ÿค–Statistical Prediction Unit 6 Review

QR code for Statistical Prediction practice questions

6.3 Local Regression and Smoothing Techniques

๐Ÿค–Statistical Prediction
Unit 6 Review

6.3 Local Regression and Smoothing Techniques

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿค–Statistical Prediction
Unit & Topic Study Guides

Local regression and smoothing techniques are powerful tools for modeling non-linear relationships in data. These methods fit models to subsets of data points, allowing for flexible and adaptive curve fitting without specifying a global functional form.

LOESS, LOWESS, and local polynomial regression are popular local regression methods. They use weighted least squares to fit polynomials to nearby points. Kernel smoothing and nearest neighbor methods offer non-parametric approaches to estimating smooth curves from noisy data.

Local Regression Methods

LOESS and LOWESS

  • LOESS (Locally Estimated Scatterplot Smoothing) fits a polynomial regression model to a subset of the data near each point of interest
    • Uses a weighted least squares approach, giving more weight to nearby points and less weight to distant points
    • Degree of the polynomial (linear, quadratic, etc.) can be specified (linear or quadratic)
    • Robust to outliers as it uses a weighted approach
  • LOWESS (Locally Weighted Scatterplot Smoothing) is similar to LOESS but uses a simpler weighting function
    • Weights are assigned using a tri-cube weight function based on the distance from the point of interest
    • Less computationally intensive compared to LOESS
    • Both methods are useful for exploring and visualizing non-linear relationships in data (scatterplot smoothing)

Local Polynomial Regression

  • Local polynomial regression fits a polynomial model to a subset of the data around each point of interest
    • Polynomial degree can be specified (linear, quadratic, cubic, etc.)
    • Higher degree polynomials can capture more complex local patterns but may overfit
    • Lower degree polynomials (linear or quadratic) are more stable and less prone to overfitting
  • Weighted least squares is used to estimate the coefficients of the local polynomial model
    • Observations closer to the point of interest receive higher weights
    • Weight function (kernel) determines the shape of the weights (Gaussian, Epanechnikov, tri-cube)
    • Bandwidth parameter controls the size of the local neighborhood and the smoothness of the fit (larger bandwidth = smoother fit, smaller bandwidth = more local detail)

Smoothing Techniques

Kernel Smoothing

  • Kernel smoothing is a non-parametric technique for estimating a smooth curve from noisy data
    • Estimates the value at each point by taking a weighted average of nearby observations
    • Weight function (kernel) determines the shape of the weights (Gaussian, Epanechnikov, tri-cube)
    • Bandwidth parameter controls the size of the neighborhood and the smoothness of the estimate (larger bandwidth = smoother estimate, smaller bandwidth = more local detail)
  • Kernel regression is a form of kernel smoothing used for non-parametric regression
    • Estimates the conditional expectation of the response variable given the predictor variables
    • Can capture non-linear relationships without specifying a parametric form
    • Sensitive to the choice of kernel and bandwidth

Bandwidth Selection

  • Bandwidth selection is crucial for the performance of local regression and smoothing methods
    • Bandwidth too large: oversmoothing, important features may be missed
    • Bandwidth too small: undersmoothing, overfitting to noise
  • Cross-validation is commonly used for bandwidth selection
    • Leave-one-out cross-validation (LOOCV): fit the model leaving out each observation and evaluate the prediction error
    • k-fold cross-validation: divide the data into k folds, fit the model on k-1 folds and evaluate on the held-out fold
    • Bandwidth with the lowest average prediction error is selected
  • Plug-in methods and rule-of-thumb formulas are also used for bandwidth selection (Silverman's rule)

Nearest Neighbor Methods

  • Nearest neighbor methods use the k-nearest neighbors to estimate the value at a point of interest
    • k-nearest neighbor regression (k-NN regression): estimate the response variable by averaging the values of the k-nearest neighbors
    • k-nearest neighbor classification (k-NN classification): assign the majority class label among the k-nearest neighbors
  • Choice of k determines the smoothness of the estimate
    • Smaller k: more local, less smooth, may overfit
    • Larger k: more global, smoother, may underfit
  • Distance metric used to determine the nearest neighbors (Euclidean, Manhattan, Mahalanobis)

Challenges in Local Methods

Curse of Dimensionality

  • Curse of dimensionality refers to the problem of increasing data sparsity as the number of dimensions (features) increases
    • As the number of dimensions increases, the volume of the space grows exponentially
    • Data becomes increasingly sparse in high-dimensional spaces
    • Local methods struggle in high dimensions due to the sparsity of data
  • Nearest neighbor methods are particularly affected by the curse of dimensionality
    • As dimensions increase, the distance to the nearest neighbor grows, making the estimates less reliable
    • Requires a large number of observations to maintain a sufficient density of points in high-dimensional spaces
  • Dimensionality reduction techniques (PCA, t-SNE, UMAP) can be used to mitigate the curse of dimensionality
    • Project the high-dimensional data onto a lower-dimensional space while preserving important structure
    • Local methods can be applied in the reduced-dimensional space
  • Feature selection and regularization can also help mitigate the curse of dimensionality by reducing the effective number of dimensions