Back to chapter

11.6:

Residuals and Least-Squares Property

JoVE Core
Statistics
A subscription to JoVE is required to view this content.  Sign in or start your free trial.
JoVE Core Statistics
Residuals and Least-Squares Property

Languages

Share

Consider the weekly data for the number of positive results versus COVID tests during the pandemic. A regression line drawn on the scatter plot shows a linear trend between the variables.

Whether this regression line is the best fit line is determined using residuals-  the vertical distances of the original data points from the predicted values on the regression line.

For example, for the data point with coordinates 820 and 48, the predicted value can be found by substituting x with 820 in the regression equation.

The difference between the observed and predicted values gives the residual value. Similarly, residuals for the remaining data points are also calculated.

The square of these residuals can be visualized by drawing square areas using the original point.

The sum of the area of all these squares must be a minimum for the regression line to be the best fit line. This is called the least-squares property.

For any other straight line, the sum of the areas is higher, hence cannot be considered the best fit line.  

11.6:

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line Equation1

If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.

The process of fitting the best-fit line is called linear regression. The idea behind finding the best-fit line is based on the assumption that the data are scattered about a straight line. The criteria for the best fit line is that the sum of the squared errors (SSE) is minimized, that is, made as small as possible. Any other line you might choose would have a higher SSE than the best-fit line. This best-fit line is called the least-squares regression line.

In the regression line, the square of the residuals can be visualized by drawing square areas using the original point. The sum of the area of all these squares must be a minimum for the regression line to be the best fit line. This is called the least-squares property.

This text is adapted from Opestax, Introductory Statistics, Section 12.3 The Regression Equation.