Least squares approximations are a powerful tool for finding the best fit between data and models. They minimize the sum of squared differences between observed and predicted values, using inner products to define distances in vector spaces.
This method connects to inner product spaces and orthogonality through the normal equations. These equations use the orthogonality principle, stating that the error vector must be perpendicular to the subspace of approximating functions, forming the basis for solving least squares problems.
Least Squares Problems with Inner Products
Fundamentals of Least Squares Method
- Least squares method minimizes sum of squared differences between observed and predicted values
- Inner products define distance between vectors in vector space
- Essential for formulating least squares problems
- Normal equations derived using orthogonality principle
- Error vector must be orthogonal to subspace of approximating functions
- Gram matrix composed of inner products of basis vectors
- Central role in formulating least squares problems
- Coefficient vector obtained by solving linear equations system
- Involves Gram matrix and vector of inner products between target vector and basis vectors
Function Approximation and Continuous Least Squares
- Function approximation typically defines inner product as integral over domain of interest
- Leads to continuous version of least squares problem
- Continuous least squares approximates functions rather than discrete data points
- Inner product for functions often defined as $\langle f,g \rangle = \int_a^b f(x)g(x)dx$
- Basis functions chosen based on problem characteristics (polynomials, trigonometric functions)
- Gram-Schmidt process can orthogonalize basis functions for improved numerical stability
Best Approximating Vectors in Subspaces
Orthogonal Projection and Error Analysis
- Best approximating vector orthogonally projects target vector onto given subspace
- Error vector (difference between target vector and approximation) orthogonal to subspace
- Gram-Schmidt process constructs orthonormal basis for subspace
- Simplifies computation of best approximation
- Coefficients of best approximation obtained by solving normal equations
- Uniqueness of best approximation guaranteed when design matrix columns linearly independent
- Residual Sum of Squares (RSS) quantifies approximation quality
- Minimized by best approximating vector
- RSS calculated as $RSS = \sum_{i=1}^n (y_i - \hat{y}_i)^2$, where $y_i$ observed values, $\hat{y}_i$ predicted values
Applications to Overdetermined Systems
- Overdetermined systems have more equations than unknowns
- Best approximation provides compromise solution minimizing overall error
- Least squares method finds solution minimizing sum of squared residuals
- Applications include data fitting, parameter estimation in scientific models
- Example: fitting a linear model to experimental data with measurement errors
- Overdetermined systems often arise in real-world scenarios with noisy or redundant data
Solving Least Squares with Orthogonal Projections
Direct Methods and Matrix Decompositions
- Orthogonal projection matrix $P = A(A^T A)^{-1}A^T$ computes best approximation directly
- QR decomposition of design matrix A solves least squares problems efficiently
- Improves numerical stability compared to normal equations
- Normal equations $(A^T A)x = A^T b$ solved using various methods
- Cholesky decomposition for positive definite systems
- Pseudoinverse (Moore-Penrose inverse) provides general solution to least squares problems
- Useful for rank-deficient cases
- Condition number of design matrix affects numerical stability of solutions
- Considered when choosing solution method
Iterative and Regularization Techniques
- Iterative methods employed for large-scale least squares problems
- Conjugate gradient algorithm example of iterative approach
- Regularization techniques address ill-conditioned least squares problems
- Tikhonov regularization adds penalty term to objective function
- L1 regularization (Lasso) promotes sparsity in solution
- Truncated Singular Value Decomposition (TSVD) method for ill-posed problems
- Cross-validation techniques help select optimal regularization parameters
- Iterative refinement improves accuracy of computed solutions
Geometric Interpretation of Least Squares Approximations
Euclidean Geometry of Least Squares
- Least squares solution represents point in subspace closest to target vector (Euclidean distance)
- Error vector perpendicular to subspace of approximating functions
- Forms right angle with best approximation vector
- Orthogonal projection of target vector onto subspace minimizes error vector length
- Geometric interpretation extends to higher dimensions
- Subspace may be hyperplane or complex manifold
- Function approximation finds function in subspace best fitting data points (least squares sense)
- Coefficient of determination (R-squared) interpreted geometrically
- Squared cosine of angle between target vector and its projection onto subspace
Advanced Geometric Concepts in Regression Analysis
- Leverage in regression analysis has geometric interpretation
- Related to influence of data points on least squares solution
- Hat matrix in linear regression represents orthogonal projection onto column space of design matrix
- Mahalanobis distance measures distance between point and distribution center
- Used in outlier detection and multivariate analysis
- Principal Component Analysis (PCA) provides geometric interpretation of data variance
- Identifies directions of maximum variation in data
- Geometric interpretation of ridge regression involves shrinking coefficient vector towards origin
- Concept of statistical distance relates to geometric notions in hypothesis testing and confidence regions