Title: Regression
1Regression
- Using Correlation To Make Predictions
2Making a prediction
To obtain the predicted value of y based on a
known value of x and a known correlation. Note
what happens for positive and negative values of
r and for high and low values of r and for
near-zero values of r.
3Graph of y 5 3 x
4y-Intercept and Slope
For a linear equation y a bx, the constant a
is the y-intercept and the constant b is the
slope. x and y are related variables
5Straight-line graphs of three linear equations
Y a bXa y-interceptb slope (rise/run)
6Graphical Interpretation of Slope
The straight-line graph of the linear equation y
a bx slopes upward if b gt 0, slopes downward
if b lt 0, and is horizontal if b 0
7Graphical interpretation of slope
8Four data points
9Scatter plot
10Two possible straight-line fits to the data
points
11Determining how well the data points in are fit
by Line A Vs.Line B
12Least-Squares Criterion
The straight line that best fits a set of data
points is the one having the smallest possible
sum of squared errors. Recall that the sum of
squared errors is error variance.
13Regression Line and Regression Equation
Regression line The straight line that best
fits a set of data points according to the
least-squares criterion. Regression equation
The equation of the regression line.
14The best-fit line minimizes the distance between
the actual data and the predicted value
15Residual, e, of a data point
16Notation Used in Regression and Correlation
We define SSx, SSP and SSy by
17Regression Equation
The regression equation for a set of n data
points is
18The relationship between b and r
- That is, the regression slope is just the
correlation coefficient scaled up to the right
size for the variables x and y.
19(No Transcript)
20Criterion for Finding a Regression Line
Before finding a regression line for a set of
data points, draw a scatter diagram. If the data
points do not appear to be scattered about a
straight line, do not determine a regression line.
21Linear regression requires linear data(a) Data
points scattered about a curve (b) Inappropriate
straight line fit to the dataHigher order
regression equations exist but are outside the
range of this course
22Uniform Variance
Math Proficiency By Grade
23Assumptions for Regression Inferences
24Table for obtaining the three sums of squares for
the used car data
25Regression line and data points for used car data
What is a fair asking price for a 2.5 year old
car?
So since the price unit is 100s, the best
prediction is 17,271
26Extrapolation in the used car example
27Sums of Squares in Regression
Total sum of squares, SST The variation in the
observed values of the response
variable Regression sum of squares, SSR The
variation in the observed values of the response
variable that is explained by the
regression Error sum of squares, SSE The
variation in the observed values of the response
variable that is not explained by the regression
28Regression Identity
The total sum of squares equals the regression
sum of squares plus the error sum of squares. In
symbols, SST SSR SSE.
29Graphical portrayal of regression for used cars
y a bx
30What sort of things could regression be used for?
- Any instance where a known correlation exists,
regression can be used to predict a new score.
Examples - If you knew that there was a past correlation
between the amount of study time and the grade
on an exam, you could make a good prediction
about the grade before it happened. - If you knew that certain features of a stock
correlate with its price, you can use regression
to predict the price before it happens.
31Regression Example Low Correlation
- Find the regression equation for predicting
height based on knowledge of weight. The
existing data is for 10 male stats students?
32X Y
33X Y XY
X2 Y2
34X Y XY
X2 Y2
?
35X Y XY
X2 Y2 2,082 721
151,325 465,844 52,147
?
- SSx ?x2 - (?x)2/n 465,844-433,472.4
32,372 - SP ?xy - ?x ?y/n 151,325-150, 112.2
- bSP/SSx, so b 1,213/32,3720.03
- a (1/n)(?y-b?x), so a 0.1(721-60.38) 66
- So, Y0.03x66
36 37Regression Example High Correlation
- Find the regression equation for predicting
probability of a teenage suicide attempt based on
weekly heroine usage.
38X Y XY X2 Y2
1 0.2 0.2 1 0.04
1 0.31 0.31 1 0.0961
1 0.18 0.18 1 0.0324
2 0.27 0.54 4 0.0729
2 0.38 0.76 4 0.1444
2 0.46 0.92 4 0.2116
3 0.9 2.7 9 0.81
3 0.58 1.74 9 0.3364
3 0.45 1.35 9 0.2025
4 0.84 3.36 16 0.7056
4 0.74 2.96 16 0.5476
4 0.68 2.72 16 0.4624
5 0.85 4.25 25 0.7225
5 0.78 3.9 25 0.6084
5 0.73 3.65 25 0.5329
6 0.88 5.28 36 0.7744
6 0.82 4.92 36 0.6724
6 0.78 4.68 36 0.6084
7 0.92 6.44 49 0.8464
7 0.85 5.95 49 0.7225
7 0.91 6.37 49 0.8281
84 13.51 63.18 420 9.9779
39X Y XY X2 Y2
1 0.2 0.2 1 0.04
1 0.31 0.31 1 0.0961
1 0.18 0.18 1 0.0324
2 0.27 0.54 4 0.0729
2 0.38 0.76 4 0.1444
2 0.46 0.92 4 0.2116
3 0.9 2.7 9 0.81
3 0.58 1.74 9 0.3364
3 0.45 1.35 9 0.2025
4 0.84 3.36 16 0.7056
4 0.74 2.96 16 0.5476
4 0.68 2.72 16 0.4624
5 0.85 4.25 25 0.7225
5 0.78 3.9 25 0.6084
5 0.73 3.65 25 0.5329
6 0.88 5.28 36 0.7744
6 0.82 4.92 36 0.6724
6 0.78 4.68 36 0.6084
7 0.92 6.44 49 0.8464
7 0.85 5.95 49 0.7225
7 0.91 6.37 49 0.8281
84 13.51 63.18 420 9.9779
40X Y XY X2 Y2
1 0.2 0.2 1 0.04
1 0.31 0.31 1 0.0961
1 0.18 0.18 1 0.0324
2 0.27 0.54 4 0.0729
2 0.38 0.76 4 0.1444
2 0.46 0.92 4 0.2116
3 0.9 2.7 9 0.81
3 0.58 1.74 9 0.3364
3 0.45 1.35 9 0.2025
4 0.84 3.36 16 0.7056
4 0.74 2.96 16 0.5476
4 0.68 2.72 16 0.4624
5 0.85 4.25 25 0.7225
5 0.78 3.9 25 0.6084
5 0.73 3.65 25 0.5329
6 0.88 5.28 36 0.7744
6 0.82 4.92 36 0.6724
6 0.78 4.68 36 0.6084
7 0.92 6.44 49 0.8464
7 0.85 5.95 49 0.7225
7 0.91 6.37 49 0.8281
84 13.51 63.18 420 9.9779
S
41X Y XY X2 Y2
84 13.51 63.18 420 9.9779
S
n 21 SSx ?x2 - (?x)2/n 420 - 336
84 SP ?xy - ?x ?y/n 63.18 54.04
9.14 bSP/SSx, so b 9.14/84
0.109 a(1/n)(?y-b?x), so a
(1/21)(13.51-9.156) 0.207 So, Y 0.109x 0.207
42Why Is It Called Regression?
- For low correlations, the predicted value is
close to the mean - For zero correlations the prediction is the mean
- Only for perfect correlations R2 1.0 do the
predicted scores show as much variation as the
actual scores - Since perfect correlations are rare, we say that
the predicted scores show regression towards the
mean