Chapters 10 and 11: Using Regression to Predict - PowerPoint PPT Presentation

About This Presentation
Title:

Chapters 10 and 11: Using Regression to Predict

Description:

Chapters 10 and 11: Using Regression to Predict Math 1680 A Second Regression Line The focus so far has been on the regression line from X to Y Note, however, that ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 42
Provided by: EvanB2
Learn more at: http://www.math.unt.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapters 10 and 11: Using Regression to Predict


1
Chapters 10 and 11 Using Regression to Predict
  • Math 1680

2
Overview
  • Predicting Values
  • The Regression Line
  • The RMS Error
  • The Regression Effect
  • A Second Regression Line
  • Summary

3
Predicting Values
  • We have previously seen that a pair of data sets,
    X and Y, can be characterized by their
    five-statistic summary
  • µX, the average value in X
  • SDX, the standard deviation of X
  • µY, the average value in Y
  • SDY, the standard deviation of Y
  • r, the correlation coefficient
  • Often, we want to predict a y-value given a
    particular x-value
  • Want to use only the five-statistic summary to
    make prediction

4
Predicting Values
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • If you had to guess what the weight of any man
    would be, what is your best bet?

5
Predicting Values
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • Suppose you know the man is 1 SD above average
  • Would your best guess for his weight be 1 SD
    above average?

6
  • The SD line is the dashed line running through
    the scatter plot
  • If we guessed 1 SD above average weight, where
    would we be on the plot?
  • What would a better guess be?

7
The Regression Line
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • It turns out that the correlation coefficient
    determines the best guess
  • For every SD we move in X, we should move r SDs
    in Y

8
The Regression Line
  • The regression line from X to Y
  • Runs through the point of averages
  • Has a slope of r time the slope of the SD line
  • The regression line predicts the average value
    for y within the narrowed-down range specified by
    a given x

9
The Regression Line
  • The formula for the regression line from X to Y
    is
  • Or, alternately,
  • When is the regression line the same as the SD
    line?

When r 1 or -1
10
  • The regression line is the solid line running
    through the scatter plot
  • If we looked at heights 1 SD above the average,
    the regression line runs through the point 0.47
    SDs above average in weight

11
The Regression Line
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • What is the average weight of all the men who are
    73 inches tall?
  • For a man 73 inches tall, what weight should we
    predict?

176.1 lbs
12
The Regression Line
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • What is the average weight of all the men who are
    64 inches tall?
  • For a man 64 inches tall, what weight should we
    predict?

133.8 lbs
13
The Regression Line
  • To use the regression line from X to Y
  • Standardize the given x-value to get zx
  • Use the regression equation to go from X to Y
  • zY rzX
  • Unstandardize zY to get y

14
The Regression Line
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • Predict the weight of a man who is 64

190.2 lbs
15
The Regression Line
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • Predict the weight of a man who is 56

143.2 lbs
16
The Regression Line
  • Important notes about the regression line from X
    to Y
  • It predicts the average value for y given an x
    value
  • If the scatter plot is football shaped, this
    prediction will be above about half of the sample
    and below the other half
  • This is because the variables are approximately
    normal
  • The slope of the regression line will always be

17
The RMS Error
  • Recall that an average alone did not uniquely
    describe a data set
  • A spread measure was needed
  • Since the regression method only gives us an
    average value as its prediction, we cant really
    tell by this alone how good a guess it is

18
  • The prediction given by the regression line for a
    height of 73 inches is at (73 in, 176 lbs)
  • How much does the heaviest 73 tall man weigh?
  • How much does the lightest 73 tall man weigh?

19
The RMS Error
  • If we are given a specific man to predict, we are
    likely to be a little off with the regression
    prediction
  • You can think of the prediction error as being
    the vertical distance from the point to the
    regression line
  • That is, error actual predicted
  • If we want to get a good sense of what the
    typical error for a given x-value is, we can find
    the RMS of all the errors for all the points
  • This value is called the RMS error for the
    regression line

20
The RMS Error
  • The RMS error is to the regression line what the
    SD is to the average
  • The RMS error measures the spread around a
    prediction from the regression line
  • Recall we are generally assuming the data sets
    are approximately normal
  • About 68 of the points on a scatter plot will
    fall within the strip that runs from one RMS
    error below to one RMS error above the regression
    line

21
The RMS Error
22
The RMS Error
  • The RMS error for regression from X to Y (denoted
    R) can be calculated from the five-statistic
    summary by
  • What units would R have?
  • What happens when r gets close to 0?
  • What happens when r gets close to 1 or -1?

23
The RMS Error
  • The RMS error allows us to give a range around
    our prediction
  • If the scatter plot is football-shaped, the RMS
    error is roughly constant across the entire range
    of the data set
  • The vertical spread around one part is about the
    same as the vertical spread around other parts

24
The RMS Error
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • Predict and give the RMS error for the weight of
    a man who is 62

180.8 26.5 lbs
25
The RMS Error
  • Suppose we have the following five-number summary
    stats for the height (X) and weight (Y) of men in
    the US
  • µX 70 inches, SDX 3 inches
  • µY 162 lbs, SDY 30 lbs
  • r 0.47
  • Predict and give the RMS error for the weight of
    a man who is 54

133.8 26.5 lbs
26
The Regression Effect
  • A preschool program attempts to boost students
    IQ scores
  • The children are tested when they enter the
    program (pretest)
  • The children are retested when they leave the
    program (post-test)

27
The Regression Effect
  • On both occasions, the average IQ score was 100,
    with an SD of 15
  • Also, students with below-average IQs on the
    pretest had scores that went up on the average by
    5 points
  • Students with above average scores on the pretest
    had their scores drop by an average of 5 points

28
The Regression Effect
  • Does the program equalize intelligence?

No. If the program really equalized
intelligence, then the SD for the post-test
results should be smaller than that of the
pre-test results. This is an example of the
regression effect.
29
The Regression Effect
  • The regression effect is a byproduct of the fact
    that predictions from a regression line are
    average values
  • Some of the people who did very well on the
    pre-test may simply have had a good test day
  • Their scores shouldnt necessarily be as high on
    the post-test as they were on the pretest
  • Similarly, some of the people who did poorly on
    the pre-test may simply have had a bad test day
  • Their scores shouldnt necessarily be as low on
    the post-test as they were on the pretest

30
The Regression Effect
  • Sometimes researchers mistake the regression
    effect for some important underlying cause in the
    study (regression fallacy)
  • Tall fathers tend to have tall sons who are
    slightly shorter than the father
  • There is no biological cause for this reduction
  • It is strictly statistical

31
The Regression Effect
  • As part of their training, air force pilots make
    practice landings with instructors, and are rated
    on performance
  • The instructors discuss the ratings with the
    pilots after each landing
  • Statistical analysis shows that pilots who make
    poor landings the first time tend to do better
    the second time
  • Conversely, pilots who make good landings the
    first time tend to do worse the second time

32
The Regression Effect
  • The conclusion is that criticism helps the pilots
    while praise makes them do worse
  • As a result, instructors were ordered to
    criticize all landings, good or bad
  • Was this warranted by the facts?

No. This is an example of regression fallacy.
33
The Regression Effect
  • An instructor gives a midterm
  • She asks the students who score 20 points below
    average to see her regularly during her office
    hours for special tutoring
  • They all score at class average or above on the
    final
  • Can this improvement be attributed to the
    regression effect? Why/why not?

No. If it was only the regression effect, most
of the students still would have scored below
average. The fact that everyone in the tutoring
group scored above average indicated that the
tutoring had the proper effect.
34
A Second Regression Line
  • The focus so far has been on the regression line
    from X to Y
  • Note, however, that there is also a regression
    line from Y to X
  • What would the difference between the two lines
    be?

The regression line from X to Y is given by zY
rzX, while the regression line from Y to X is
given by zX rzY
35
A Second Regression Line
  • A study of 1,000 families gives the following
  • The husbands average height was 68 inches with
    an SD of 2.7 inches
  • The wives average height was 63 inches with an
    SD of 2.5 inches
  • The correlation between them was 0.25
  • Predict and give the RMS error for the husbands
    height when his wifes height is 68 inches

69.35 inches, give or take 2.61 inches
36
A Second Regression Line
  • A study of 1,000 families gives the following
  • The husbands average height was 68 inches with
    an SD of 2.7 inches
  • The wives average height was 63 inches with an
    SD of 2.5 inches
  • The correlation between them was 0.25
  • Predict and give the RMS error for the wifes
    height when her husbands height is 69.35 inches

63.31 inches, give or take 2.42 inches
37
A Second Regression Line
Regression Line from X to Y
38
A Second Regression Line
Regression Line from X to Y
39
A Second Regression Line
Regression Line from X to Y
40
Summary
  • When trying to make predictions from a
    football-shaped plot, a good predictor is the
    average value for one variable within a
    restricted range in the other
  • The regression line runs through all of these
    averages
  • For every SD moved in the independent variable,
    the regression line predicts a move of r SDs in
    the dependent variable
  • The prediction from the regression line is likely
    to be off by the RMS error
  • The RMS error can be calculated as

41
Summary
  • The regression effect is purely statistical
  • It does not reflect a significant underlying
    trend in the data
  • There are two regression lines for a scatter plot
  • Which one to use depends on which variable you
    are predicting
Write a Comment
User Comments (0)
About PowerShow.com