***This document uses HTML 3.0 tags as supported by Netscape 2.0.***

Simple Linear Regression - Summary Statistics

With least squares methodology, summary statistics can be calculated which indicate how well a given least squares regression line fits the data. These summary statistics are best understood by considering a sum of squares decomposition. The total variation in Y about its own average can be broken into a component of explained (by the regression line) and unexplained (residual) variation. That is

or

total sum of squares = residual sum of squares + regression sum of squares

where

Yhati = bhat0 + bhat1 Xi

The residual sum of squares is what least squares methodology minimizes.

The more the regression line tells us about the relationship exhibited by the data the smaller the contribution of the residual sum of squares and the larger the contribution of the regression sum of squares. This suggests two equivalent summaries (but with different interpretations) of goodness of the fit. The ratio of the regression sum of squares to the total sum of squares is called the coefficient of determination and is labeled R2:

By the sum of squares decomposition, it is seen that 0 <= R2 <= 1 and that the better the fit, the larger the R2. A measure of the variation of the data about (around) the regression line is given by

The square root of sy.x2 (the root mean square), measures the average deviation of the Yis from the regression line in the units of Y. The smaller sy.x, the better the fit.

Interpretation of R2 and sy.x as measures of the strength of the true relationship between Y and X requires adherence to the assumptions above (just as did the appropriateness of bhat0 and bhat1). An adequate and representative sample of data is equally important.


[Previous] [TOC] [Next]