11.6: Residuals and Least-Squares Property
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit line is called linear regression. The idea behind finding the best-fit line is based on the assumption that the data are scattered about a straight line. The criteria for the best fit line is that the sum of the squared errors (SSE) is minimized, that is, made as small as possible. Any other line you might choose would have a higher SSE than the best-fit line. This best-fit line is called the least-squares regression line.
In the regression line, the square of the residuals can be visualized by drawing square areas using the original point. The sum of the area of all these squares must be a minimum for the regression line to be the best fit line. This is called the least-squares property.
This text is adapted from Opestax, Introductory Statistics, Section 12.3 The Regression Equation.