***This document uses HTML 3.0 tags as supported by Netscape 2.0.***

Simple Linear Regression - Least Square Fits

Such an approach may be adequate in certain circumstances but does leave room for improvement. For example, using this procedure one is left without a quantitative measure of how well the estimated equation describes the data collected as well as the actual relationship. Further, two people will likely have completely different ideas what a good "hand-fit" to the data looks like. Each person would, therefore, arrive at a different equation for predicting DBH from DS.

An alternative commonly suggested for estimating b0 and b1 is to apply least squares methodology to the data collected. To understand the basis of this methodology consider figure 2.

Here our objective is to find estimates of b0 and b1 such that the resulting equation best describes the linear relationship between Y and X as suggested by the five data points. Least squares methodology says to choose estimates bhat0 and bhat1 such that the corresponding line is one where the sum of the squared vertical distances between the points and the line is minimized. As we will see, for given data, there exists only one such line. Further, given certain assumptions, least squares fits have certain desirable properties and allow one to quantitatively describe how well the equation fits the data.

A naive (and approximate) solution to finding bhat0 and bhat1 would be to draw a (very large) number of lines through a set of data, calculate the sum of squared vertical distances from each line, and choose the line corresponding to the minimum. Fortunately the problem can be solved easier mathematically. Note, from figure 2, that given the least squares line, each Y observation, Yi, can be written as

Yi = bhat0 + bhat1 Xi + ehati

where ehati is the vertical distance from the least squares line to the observation (ehati is positive for observations above the line and negative for those below). ehati is known as the residual. Residuals estimate the distance between an observed Y and the true (unknown) line (the error ei). A least squares line is obtained when the sum of squared vertical differences is a minimum, or consists of the (bhat0, bhat1) pair which minimizes

where n is the number of observations (all summations are for i=1, ..., n unless otherwise noted). Applying calculus gives the equations

where ybar and xbar are arithmetic averages. bhat0 and bhat1 provide the least squares fit to the data.


[Previous] [TOC] [Next]