📊Probability and Statistics Unit 10 Review

10.4 Least squares estimation

📊Probability and Statistics
Unit 10 Review

10.4 Least squares estimation

Written by the Fiveable Content Team • Last updated September 2025

📊Probability and Statistics

Unit & Topic Study Guides

10.1 Pearson correlation coefficient

10.2 Spearman rank correlation

10.3 Simple linear regression model

10.4 Least squares estimation

10.5 Inference for regression parameters

Least squares estimation is a powerful statistical method for fitting linear regression models. It finds the best-fitting line by minimizing the sum of squared differences between observed and predicted values. This approach provides unbiased, efficient, and consistent estimates of model parameters.

The method relies on key assumptions like linearity, independence, and normality of errors. It allows for hypothesis testing, confidence intervals, and predictions. While sensitive to outliers and multicollinearity, alternatives like robust regression and regularization techniques can address these limitations.

Definition of least squares estimation

Least squares estimation is a statistical method used to estimate the parameters of a linear regression model
Aims to find the values of the parameters that minimize the sum of the squared differences between the observed values and the predicted values
Commonly used in regression analysis to fit a line or curve to a set of data points

Principles of least squares estimation

The goal is to find the best-fitting line or curve that minimizes the discrepancies between the observed data and the model predictions
Assumes that the errors or residuals (differences between observed and predicted values) are normally distributed with a mean of zero and constant variance
Provides a closed-form solution for estimating the parameters of a linear regression model

Minimizing sum of squared residuals

The objective is to minimize the sum of the squared residuals, where residuals are the differences between the observed values and the predicted values from the model
Squaring the residuals ensures that positive and negative residuals do not cancel each other out and gives more weight to larger residuals
The least squares estimates are obtained by finding the values of the parameters that minimize this sum of squared residuals

Derivation of least squares estimators

The least squares estimators are derived by solving a set of normal equations that result from setting the partial derivatives of the sum of squared residuals with respect to each parameter equal to zero
The solution to these normal equations provides the least squares estimates of the parameters

For simple linear regression

In simple linear regression, there is only one independent variable (predictor) and the model is represented by the equation $y = \beta_0 + \beta_1x + \epsilon$
The least squares estimators for the intercept ($\beta_0$) and slope ($\beta_1$) can be derived using the formulas:
- $\hat{\beta}1 = \frac{\sum{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}$
- $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$

For multiple linear regression

In multiple linear regression, there are multiple independent variables (predictors) and the model is represented by the equation $y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p + \epsilon$
The least squares estimators for the parameters can be obtained using matrix algebra by solving the normal equations $(\mathbf{X}^T\mathbf{X})\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{y}$, where $\mathbf{X}$ is the design matrix containing the predictor variables and $\mathbf{y}$ is the vector of observed response values

Properties of least squares estimators

The least squares estimators possess several desirable statistical properties when the assumptions of the linear regression model are satisfied

Unbiasedness

The least squares estimators are unbiased, meaning that their expected values are equal to the true values of the parameters
On average, the least squares estimates will be centered around the true parameter values

Efficiency

The least squares estimators are the most efficient among all unbiased linear estimators
They have the smallest variance among all unbiased estimators, making them the best linear unbiased estimators (BLUE)

Consistency

As the sample size increases, the least squares estimators converge in probability to the true parameter values
With a large enough sample, the least squares estimates will be close to the true values of the parameters

Assumptions of least squares estimation

The validity and optimality of the least squares estimators rely on several assumptions about the linear regression model

Linearity

The relationship between the dependent variable and the independent variables is assumed to be linear
The model can be represented by a linear equation with additive error terms

Independence

The observations or errors are assumed to be independently distributed
There should be no correlation or dependence between the residuals

Homoscedasticity

The variance of the errors is assumed to be constant across all levels of the independent variables
The spread of the residuals should be consistent throughout the range of the predictors

Normality

The errors or residuals are assumed to follow a normal distribution with a mean of zero
This assumption is necessary for valid hypothesis testing and confidence interval estimation

Interpretation of least squares estimates

The least squares estimates provide information about the relationship between the dependent variable and the independent variables

Slope coefficients

The slope coefficients ($\beta_1, \beta_2, ..., \beta_p$) represent the change in the dependent variable for a one-unit increase in the corresponding independent variable, holding all other variables constant
They indicate the magnitude and direction of the effect of each predictor on the response variable

Intercept term

The intercept term ($\beta_0$) represents the expected value of the dependent variable when all independent variables are zero
It is the point where the regression line intersects the y-axis

Assessing goodness of fit

The goodness of fit measures how well the least squares model fits the observed data

Coefficient of determination (R-squared)

R-squared is the proportion of the variance in the dependent variable that is explained by the independent variables in the model
It ranges from 0 to 1, with higher values indicating a better fit
Calculated as the ratio of the explained sum of squares (ESS) to the total sum of squares (TSS): $R^2 = \frac{ESS}{TSS}$

Adjusted R-squared

Adjusted R-squared is a modified version of R-squared that takes into account the number of predictors in the model
It penalizes the addition of unnecessary predictors and provides a more conservative measure of model fit
Useful for comparing models with different numbers of predictors

Hypothesis testing with least squares estimates

Hypothesis testing allows us to assess the statistical significance of the least squares estimates and the overall model

T-tests for individual coefficients

T-tests are used to test the significance of individual regression coefficients
The null hypothesis is that the coefficient is equal to zero ($H_0: \beta_j = 0$), indicating no significant relationship between the predictor and the response
The test statistic is calculated as $t = \frac{\hat{\beta}_j - 0}{SE(\hat{\beta}_j)}$, where $SE(\hat{\beta}_j)$ is the standard error of the coefficient estimate

F-test for overall model significance

The F-test is used to assess the overall significance of the regression model
The null hypothesis is that all regression coefficients (except the intercept) are simultaneously equal to zero ($H_0: \beta_1 = \beta_2 = ... = \beta_p = 0$)
The test statistic is calculated as $F = \frac{MSR}{MSE}$, where $MSR$ is the mean square regression and $MSE$ is the mean square error

Confidence intervals for least squares estimates

Confidence intervals provide a range of plausible values for the true regression coefficients
They are constructed using the least squares estimates and their standard errors
A 95% confidence interval for a coefficient $\beta_j$ is given by $\hat{\beta}j \pm t{\alpha/2, n-p-1} \cdot SE(\hat{\beta}j)$, where $t{\alpha/2, n-p-1}$ is the critical value from the t-distribution with $n-p-1$ degrees of freedom

Prediction using least squares models

Least squares models can be used to make predictions for new observations based on the estimated regression equation

Confidence intervals for predictions

Confidence intervals for predictions provide a range of plausible values for the mean response at a given set of predictor values
They take into account the uncertainty in the estimated regression coefficients and the variability of the data

Prediction intervals

Prediction intervals provide a range of plausible values for an individual future observation at a given set of predictor values
They are wider than confidence intervals because they account for both the uncertainty in the estimated coefficients and the inherent variability of individual observations

Limitations of least squares estimation

Least squares estimation has some limitations and potential issues that should be considered

Sensitivity to outliers

Least squares estimates can be heavily influenced by outliers or extreme observations
Outliers can pull the regression line towards them, distorting the estimates and reducing the model's robustness

Multicollinearity

Multicollinearity occurs when there is a high correlation among the independent variables in the model
It can lead to unstable and unreliable estimates of the regression coefficients
Multicollinearity can make it difficult to interpret the individual effects of the predictors on the response variable

Alternatives to least squares estimation

There are alternative estimation methods that can be used when the assumptions of least squares estimation are violated or when dealing with specific challenges

Robust regression methods

Robust regression methods, such as M-estimation and least absolute deviation (LAD) regression, are less sensitive to outliers compared to least squares estimation
They minimize different loss functions that are less affected by extreme observations

Ridge regression

Ridge regression is a regularization technique used when multicollinearity is present
It adds a penalty term to the least squares objective function, shrinking the coefficient estimates towards zero
The penalty term is controlled by a tuning parameter, which balances the trade-off between fitting the data and reducing the complexity of the model

Lasso regression

Lasso (Least Absolute Shrinkage and Selection Operator) regression is another regularization technique that can handle multicollinearity and perform variable selection
It adds an L1 penalty term to the least squares objective function, which can shrink some coefficient estimates exactly to zero
Lasso regression can effectively identify and exclude irrelevant predictors from the model

📊Probability and Statistics Unit 10 Review

10.4 Least squares estimation

📊Probability and Statistics Unit 10 Review

10.4 Least squares estimation

Unit & Topic Study Guides

Definition of least squares estimation

Principles of least squares estimation

Minimizing sum of squared residuals

Derivation of least squares estimators

For simple linear regression

For multiple linear regression

Properties of least squares estimators

Unbiasedness

Efficiency

Consistency

Assumptions of least squares estimation

Linearity

Independence

Homoscedasticity

Normality

Interpretation of least squares estimates

Slope coefficients

Intercept term

Assessing goodness of fit

Coefficient of determination (R-squared)

Adjusted R-squared

Hypothesis testing with least squares estimates

T-tests for individual coefficients

F-test for overall model significance

Confidence intervals for least squares estimates

Prediction using least squares models

Confidence intervals for predictions

Prediction intervals

Limitations of least squares estimation

Sensitivity to outliers

Multicollinearity

Alternatives to least squares estimation

Robust regression methods

Ridge regression

Lasso regression

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Probability and Statistics
Unit 10 Review