Probability & Statisitcs (ST370) Chapter 11 Notes from
The Cartoon Guide to Statistics
Authored by Larry Gonick and Woollcott Smith

Regression analysis attempts to fit a line to a data set. Is uses the X axis as the independent varaible and the Y axis for the dependant variable. The form of linear regression is y=a+bx.

The things to keep in mind is that we use the bar to represent mean (ie X bar means the mean of the X values). To find the linear regression line. First we calculate the sum of the difference of points from the mean (x-xbar). Then we find the sum of the difference of the points from the y mean (y-ybar). We multiple the two sums together to get the numerator (top) of b, and use the (x-xbar)^2 value to get the denominator (bottom) of b. We then find a in the formula above as a=ybar - b * xbar. Taking these two value and substituting them back into y=a+bx give us the linear regression line.

You use the (y-yhat)^2 to find the error. This value is actually the y actual and subtract the "hat" value, which is the value predicted by the regression line, and then take the sum of the squares to get the amount of error. The proportion of error is this number divided by (y-ybar)^2 with all of the values summed. One minus this value is known as R^2. The closer to 1 this value is, the better the fit. The correlation coefficient is the square root of R^2 with the sign of b from y=a+bx. If the sign is positive then the line goes up and to the right, if it is negative then it goes down and to the right.

If you plot your residuals and there is a pattern, then you might need to resort to non-linear analysis of the problem.

Return to class notes index

Produced by Jason John Schwarz