or
total sum of squares = residual sum of squares + regression sum of squares
where
Yhati = bhat0 + bhat1 Xi
The residual sum of squares is what least squares methodology minimizes.
The more the regression line tells us about the relationship exhibited by the data the smaller the contribution of the residual sum of squares and the larger the contribution of the regression sum of squares. This suggests two equivalent summaries (but with different interpretations) of goodness of the fit. The ratio of the regression sum of squares to the total sum of squares is called the coefficient of determination and is labeled R2:
By the sum of squares decomposition, it is seen that 0 <= R2 <= 1 and that the better the fit, the larger the R2. A measure of the variation of the data about (around) the regression line is given by
The square root of sy.x2 (the root mean square), measures the average deviation of the Yis from the regression line in the units of Y. The smaller sy.x, the better the fit.
Interpretation of R2 and sy.x as measures of the strength of the true relationship between Y and X requires adherence to the assumptions above (just as did the appropriateness of bhat0 and bhat1). An adequate and representative sample of data is equally important.