Fiveable

๐Ÿ“ŠHonors Statistics Unit 12 Review

QR code for Honors Statistics practice questions

12.2 The Regression Equation

๐Ÿ“ŠHonors Statistics
Unit 12 Review

12.2 The Regression Equation

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠHonors Statistics
Unit & Topic Study Guides

Regression equations help us understand relationships between variables. They use data to find the best-fitting line, showing how one thing changes when another does. This powerful tool lets us predict outcomes and analyze trends.

The equation has two key parts: slope and y-intercept. The slope shows how much y changes when x increases by one. The y-intercept is where the line crosses the y-axis. Together, they paint a picture of the data's pattern.

The Regression Equation

Least-squares regression line calculation

  • Least-squares regression line best fits data points minimizes sum of squared residuals
    • Residuals represent vertical distances between data points and regression line ($y_i - \hat{y}_i$)
  • Regression equation in form $\hat{y} = b_0 + b_1x$
    • $\hat{y}$ represents predicted value of y for given x
    • $b_0$ represents y-intercept, value of y when x is 0 (height at origin)
    • $b_1$ represents slope, change in y for one-unit increase in x (rise over run)
  • Calculate slope ($b_1$) using formula: $b_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$
    • $x_i$ and $y_i$ represent individual data points (coordinates)
    • $\bar{x}$ and $\bar{y}$ represent means of x and y, respectively (averages)
  • Calculate y-intercept ($b_0$) using formula: $b_0 = \bar{y} - b_1\bar{x}$
    • Substitute slope ($b_1$) and means ($\bar{x}$ and $\bar{y}$) into equation

Interpretation of slope and y-intercept

  • Slope ($b_1$) represents change in response variable (y) for one-unit increase in predictor variable (x)
    • Positive slope indicates positive linear relationship between x and y (direct)
    • Negative slope indicates negative linear relationship between x and y (inverse)
  • Y-intercept ($b_0$) represents value of response variable (y) when predictor variable (x) is 0
    • Y-intercept may not have meaningful interpretation if x cannot realistically be 0 (extrapolation)
  • Interpret slope and y-intercept using context of data and units of variables
    • Slope units: units of y per unit of x (mph per year)
    • Y-intercept units: same as units of y (starting salary in dollars)

Strength of linear relationships

  • Correlation coefficient (r) measures strength and direction of linear relationship between two variables
    • r ranges from -1 to 1
      1. r = 1 indicates perfect positive linear relationship (straight line increasing)
      2. r = -1 indicates perfect negative linear relationship (straight line decreasing)
      3. r = 0 indicates no linear relationship (scattered points)
    • Stronger linear relationship as r approaches -1 or 1 (tighter clustering around line)
    • Calculate r using formula: $r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i - \bar{y})^2}}$
  • Coefficient of determination ($r^2$) represents proportion of variation in response variable (y) explained by predictor variable (x)
    • $r^2$ ranges from 0 to 1
      1. $r^2 = 1$ indicates all variation in y explained by x (perfect fit)
      2. $r^2 = 0$ indicates none of variation in y explained by x (no relationship)
    • Calculate $r^2$ by squaring correlation coefficient (r)
    • $r^2$ often expressed as percentage (50% of variation explained)
  • Visualize relationship between variables using a scatter plot

Assessing model fit and reliability

  • Examine residuals for patterns or trends to assess model assumptions
    • Check for homoscedasticity (constant variance of residuals across predictor values)
  • Identify potential outliers that may influence regression results
  • Standard error of estimate measures average deviation of observed values from predicted values
  • Calculate confidence intervals for slope and intercept to assess precision of estimates