Probability & Statisitcs (ST370) Chapter 11 Notes from
The Cartoon Guide to Statistics
Authored by Larry Gonick and Woollcott Smith
Regression analysis attempts to fit a line to a data set. Is uses
the X axis as the independent varaible and the Y axis for the dependant
variable. The form of linear regression is y=a+bx.
The things to keep in mind is that we use the bar to represent mean
(ie X bar means the mean of the X values). To find the linear regression
line. First we calculate the sum of the difference of points from
the mean (x-xbar). Then we find the sum of the difference of the
points from the y mean (y-ybar). We multiple the two sums together to
get the numerator (top) of b, and use the (x-xbar)^2 value to get the
denominator (bottom) of b. We then find a in the formula above as
a=ybar - b * xbar. Taking these two value and substituting them back
into y=a+bx give us the linear regression line.
You use the (y-yhat)^2 to find the error. This value is actually the
y actual and subtract the "hat" value, which is the value predicted by
the regression line, and then take the sum of the squares to get the
amount of error. The proportion of error is this number divided by
(y-ybar)^2 with all of the values summed. One minus this value
is known as R^2. The closer to 1 this value is, the better the fit.
The correlation coefficient is the square root of R^2 with the sign
of b from y=a+bx. If the sign is positive then the line goes up and
to the right, if it is negative then it goes down and to the right.
If you plot your residuals and there is a pattern, then you might
need to resort to non-linear analysis of the problem.