***This document uses HTML 3.0 tags as supported by Netscape 2.0.***

Simple Linear Regression - Assumptions

For least squares fits to be appropriate, certain assumptions must hold:

  1. The true relationship between Y and X must be linear. That is, if b0 and b1 were known then Yi = b0 + b1 Xi + ei where the eis (unknown errors) must follow 2., 3., and 4. below.
  2. The mean of the eis must be 0.
  3. The eis must not depend upon one another.
  4. The spread of the eis at each Xi must be the same (variance [ei] = constant).
  5. The Xis can be considered fixed. That is the Xis generate the Yis.
  6. The eis are normally distributed.
Given 1., 2. is of little practical importance. 5. can also often be safely ignored. 1., 3., and 4. can be checked, approximately, by various graphs of the data and the residuals, ehatis. A graph of Y versus X will provide some idea of the appropriateness of 1. If 3. is true, a graph of ehat versus X will show no trends while if 3. and 4. hold, the points on the graph of ehat versus X will fall within a horizontal band centered at ehat = 0. 6. can be checked by a slightly more complex plot of ehat. Under 6., least square fits become uniformly best. If 6. does not hold, bhat0 and bhat1 may still be quite good. However, a more appropriate fit may be obtained by minimizing the sum of absolute distances (or something similar) rather than the sum of squared distances.

The actual residual values obtained from a least squares fit have certain properties that confound the interpretation of residual graphs. Many statistical packages make available studentized residuals that should always be used instead of plain residuals.


[Previous] [TOC] [Next]