Regression - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Regression

Description:

... correlation coefficient scaled up to the right size for the variables x and y. ... X Y XY X2 Y2. Anthony Greene. 35. SSx = x2 - ( x2/n = 465,844-433,472.4 = 32,372 ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 43
Provided by: anthony91
Category:
Tags: regression | size

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Using Correlation To Make Predictions

2
Making a prediction
To obtain the predicted value of y based on a
known value of x and a known correlation. Note
what happens for positive and negative values of
r and for high and low values of r and for
near-zero values of r.
3
Graph of y 5 3 x
4
y-Intercept and Slope
For a linear equation y a bx, the constant a
is the y-intercept and the constant b is the
slope. x and y are related variables
5
Straight-line graphs of three linear equations
Y a bXa y-interceptb slope (rise/run)
6
Graphical Interpretation of Slope
The straight-line graph of the linear equation y
a bx slopes upward if b gt 0, slopes downward
if b lt 0, and is horizontal if b 0
7
Graphical interpretation of slope
8
Four data points
9
Scatter plot
10
Two possible straight-line fits to the data
points
11
Determining how well the data points in are fit
by Line A Vs.Line B
12
Least-Squares Criterion
The straight line that best fits a set of data
points is the one having the smallest possible
sum of squared errors. Recall that the sum of
squared errors is error variance.
13
Regression Line and Regression Equation
Regression line The straight line that best
fits a set of data points according to the
least-squares criterion. Regression equation
The equation of the regression line.
14
The best-fit line minimizes the distance between
the actual data and the predicted value
15
Residual, e, of a data point
16
Notation Used in Regression and Correlation
We define SSx, SSP and SSy by
17
Regression Equation
The regression equation for a set of n data
points is
18
The relationship between b and r
  • That is, the regression slope is just the
    correlation coefficient scaled up to the right
    size for the variables x and y.

19
(No Transcript)
20
Criterion for Finding a Regression Line
Before finding a regression line for a set of
data points, draw a scatter diagram. If the data
points do not appear to be scattered about a
straight line, do not determine a regression line.
21
Linear regression requires linear data(a) Data
points scattered about a curve (b) Inappropriate
straight line fit to the dataHigher order
regression equations exist but are outside the
range of this course
22
Uniform Variance
Math Proficiency By Grade
23
Assumptions for Regression Inferences
24
Table for obtaining the three sums of squares for
the used car data
25
Regression line and data points for used car data
What is a fair asking price for a 2.5 year old
car?
So since the price unit is 100s, the best
prediction is 17,271
26
Extrapolation in the used car example
27
Sums of Squares in Regression
Total sum of squares, SST The variation in the
observed values of the response
variable Regression sum of squares, SSR The
variation in the observed values of the response
variable that is explained by the
regression Error sum of squares, SSE The
variation in the observed values of the response
variable that is not explained by the regression
28
Regression Identity
The total sum of squares equals the regression
sum of squares plus the error sum of squares. In
symbols, SST SSR SSE.
29
Graphical portrayal of regression for used cars
y a bx
30
What sort of things could regression be used for?
  • Any instance where a known correlation exists,
    regression can be used to predict a new score.
    Examples
  • If you knew that there was a past correlation
    between the amount of study time and the grade
    on an exam, you could make a good prediction
    about the grade before it happened.
  • If you knew that certain features of a stock
    correlate with its price, you can use regression
    to predict the price before it happens.

31
Regression Example Low Correlation
  • Find the regression equation for predicting
    height based on knowledge of weight. The
    existing data is for 10 male stats students?

32
X Y
33
X Y XY
X2 Y2
34
X Y XY
X2 Y2
?
35
X Y XY
X2 Y2 2,082 721
151,325 465,844 52,147
?
  • SSx ?x2 - (?x)2/n 465,844-433,472.4
    32,372
  • SP ?xy - ?x ?y/n 151,325-150, 112.2
  • bSP/SSx, so b 1,213/32,3720.03
  • a (1/n)(?y-b?x), so a 0.1(721-60.38) 66
  • So, Y0.03x66


36
  • Y0.03x66

37
Regression Example High Correlation
  • Find the regression equation for predicting
    probability of a teenage suicide attempt based on
    weekly heroine usage.

38
X Y XY X2 Y2
1 0.2 0.2 1 0.04
1 0.31 0.31 1 0.0961
1 0.18 0.18 1 0.0324
2 0.27 0.54 4 0.0729
2 0.38 0.76 4 0.1444
2 0.46 0.92 4 0.2116
3 0.9 2.7 9 0.81
3 0.58 1.74 9 0.3364
3 0.45 1.35 9 0.2025
4 0.84 3.36 16 0.7056
4 0.74 2.96 16 0.5476
4 0.68 2.72 16 0.4624
5 0.85 4.25 25 0.7225
5 0.78 3.9 25 0.6084
5 0.73 3.65 25 0.5329
6 0.88 5.28 36 0.7744
6 0.82 4.92 36 0.6724
6 0.78 4.68 36 0.6084
7 0.92 6.44 49 0.8464
7 0.85 5.95 49 0.7225
7 0.91 6.37 49 0.8281
84 13.51 63.18 420 9.9779
39
X Y XY X2 Y2
1 0.2 0.2 1 0.04
1 0.31 0.31 1 0.0961
1 0.18 0.18 1 0.0324
2 0.27 0.54 4 0.0729
2 0.38 0.76 4 0.1444
2 0.46 0.92 4 0.2116
3 0.9 2.7 9 0.81
3 0.58 1.74 9 0.3364
3 0.45 1.35 9 0.2025
4 0.84 3.36 16 0.7056
4 0.74 2.96 16 0.5476
4 0.68 2.72 16 0.4624
5 0.85 4.25 25 0.7225
5 0.78 3.9 25 0.6084
5 0.73 3.65 25 0.5329
6 0.88 5.28 36 0.7744
6 0.82 4.92 36 0.6724
6 0.78 4.68 36 0.6084
7 0.92 6.44 49 0.8464
7 0.85 5.95 49 0.7225
7 0.91 6.37 49 0.8281
84 13.51 63.18 420 9.9779
40
X Y XY X2 Y2
1 0.2 0.2 1 0.04
1 0.31 0.31 1 0.0961
1 0.18 0.18 1 0.0324
2 0.27 0.54 4 0.0729
2 0.38 0.76 4 0.1444
2 0.46 0.92 4 0.2116
3 0.9 2.7 9 0.81
3 0.58 1.74 9 0.3364
3 0.45 1.35 9 0.2025
4 0.84 3.36 16 0.7056
4 0.74 2.96 16 0.5476
4 0.68 2.72 16 0.4624
5 0.85 4.25 25 0.7225
5 0.78 3.9 25 0.6084
5 0.73 3.65 25 0.5329
6 0.88 5.28 36 0.7744
6 0.82 4.92 36 0.6724
6 0.78 4.68 36 0.6084
7 0.92 6.44 49 0.8464
7 0.85 5.95 49 0.7225
7 0.91 6.37 49 0.8281
84 13.51 63.18 420 9.9779
S
41
X Y XY X2 Y2
84 13.51 63.18 420 9.9779
S
n 21 SSx ?x2 - (?x)2/n 420 - 336
84 SP ?xy - ?x ?y/n 63.18 54.04
9.14 bSP/SSx, so b 9.14/84
0.109 a(1/n)(?y-b?x), so a
(1/21)(13.51-9.156) 0.207 So, Y 0.109x 0.207

42
Why Is It Called Regression?
  • For low correlations, the predicted value is
    close to the mean
  • For zero correlations the prediction is the mean
  • Only for perfect correlations R2 1.0 do the
    predicted scores show as much variation as the
    actual scores
  • Since perfect correlations are rare, we say that
    the predicted scores show regression towards the
    mean
Write a Comment
User Comments (0)
About PowerShow.com