Linear regression is a powerful statistical tool for understanding relationships between variables. It helps us predict one variable based on another, using a simple equation that captures their connection. This method is crucial for business decisions, from sales forecasting to understanding customer behavior.
The key components of linear regression include the slope, y-intercept, and error term. By interpreting these elements and assessing the model's fit through R-squared values, we can gauge how well our predictions match reality and make informed business choices.
Components and Interpretation of Simple Linear Regression
Components of linear regression
- Simple linear regression model expressed as $y = \beta_0 + \beta_1x + \epsilon$
- $y$ dependent variable (response variable) being predicted or explained
- $x$ independent variable (explanatory variable) used to predict or explain changes in $y$
- $\beta_0$ y-intercept, value of $y$ when $x$ equals zero
- $\beta_1$ slope, change in $y$ for a one-unit increase in $x$
- $\epsilon$ random error term, accounts for variability in $y$ not explained by linear relationship with $x$
Interpretation of slope vs y-intercept
- Slope ($\beta_1$) change in dependent variable ($y$) for one-unit increase in independent variable ($x$)
- Interpretation depends on context and units of variables
- Sales ($y$) and advertising expenditure ($x$), slope of 50 means $1,000 increase in advertising leads to $50 increase in sales
- Interpretation depends on context and units of variables
- Y-intercept ($\beta_0$) value of dependent variable ($y$) when independent variable ($x$) equals zero
- Interpretation depends on context and whether $x = 0$ is meaningful
- Number of employees ($x$), $\beta_0$ might not have practical interpretation, as company cannot have zero employees
- Interpretation depends on context and whether $x = 0$ is meaningful
Equation and Prediction in Simple Linear Regression
Equation of regression models
- Least squares method estimates slope ($\beta_1$) and y-intercept ($\beta_0$) from data points
- Calculate slope: $\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$
- $x_i$ and $y_i$ individual data points
- $\bar{x}$ and $\bar{y}$ means of $x$ and $y$
- $n$ number of data points
- Calculate y-intercept: $\beta_0 = \bar{y} - \beta_1\bar{x}$
- Calculate slope: $\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$
- Substitute estimated slope and y-intercept into simple linear regression model equation: $\hat{y} = \beta_0 + \beta_1x$
- $\hat{y}$ predicted value of dependent variable
Predictions from regression equations
- Use estimated simple linear regression model equation $\hat{y} = \beta_0 + \beta_1x$ to predict value of dependent variable ($\hat{y}$) for given value of independent variable ($x$)
- Substitute given value of $x$ into equation
- Calculate predicted value of $\hat{y}$
- Estimated regression equation $\hat{y} = 100 + 50x$ and $x = 2$, predicted value of $\hat{y}$ is $\hat{y} = 100 + 50(2) = 200$
Goodness of Fit in Simple Linear Regression
Goodness of fit assessment
- Assess goodness of fit using coefficient of determination (R-squared)
- R-squared proportion of variance in dependent variable ($y$) predictable from independent variable ($x$)
- Formula: $R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}$
- $SSR$ sum of squares regression (explained variation)
- $SSE$ sum of squares error (unexplained variation)
- $SST$ total sum of squares (total variation)
Meaning of R-squared values
- R-squared ranges from 0 to 1, higher values indicate better fit, lower values indicate poorer fit
- R-squared of 0 none of variance in $y$ explained by $x$
- R-squared of 1 all of variance in $y$ explained by $x$
- R-squared of 0.75 means 75% of variance in dependent variable explained by independent variable, 25% unexplained