# Sections 2'3 and 2'4 - PowerPoint PPT Presentation

PPT – Sections 2'3 and 2'4 PowerPoint presentation | free to view - id: 146476-OTZjZ

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Sections 2'3 and 2'4

Description:

### a b is the value of y that goes with x=1. a 2b is the value of y that goes with x =2. ... Lurking Variables. Association Does Not Imply Causation. Example. ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 21
Provided by: math53
Category:
Transcript and Presenter's Notes

Title: Sections 2'3 and 2'4

1
Sections 2.3 and 2.4
• Regression Lines

2
Review of Lines
• The equation yabx describes a line.
• a is the y-intercept or the value of y that goes
with x0.
• b is the slope of the line.
• ab is the value of y that goes with x1.
• a2b is the value of y that goes with x 2.
• When the value of x goes up by one, the value of
y goes up by b.
• Plot a line, by first plotting two points.

3
Used Honda Civics
4
Can mileage be used to predict price?
• Sketch line.
• Estimate slope
• Estimate y-intercept.
• Give equation of line.
• Estimate price of car with 100,000 miles.

5
Regression Lines
• Can we agree on the best line?
• The least squares regression line is the line
that best fits.
• Goodness of fit is determined by calculating the
sum of all squared vertical distances. The
smaller this sum, the better the fit.
• The regression line is given by yabx, where

6
Back to Civics
• Use the formulas for finding least squares
regression line. The descriptive statistics for
our data are as follows
• Describe what the slope and y-intercept mean in
the vocabulary of the problem.
• Does the y-intercept make sense in the context of
the data?

7
Extrapolation
• extrapolation using the regression line for
prediction far outside the range of values of the
explanatory variable used to obtain the line.
Such predictions are often not accurate.
• Usually in regression x0 is outside range of
data so the y-intercept will not make sense.
• Extrapolation would also occur if we tried to
predict the price of a used Civic with 300,000
miles.

8
How successful is the regression in explaining
the response?
• r2 - the proportion of the variation in the
response variable that is explained by the
regression line.

Observed Prices Leaf Unit 100 1 7 3
1 8 2 9 9 2 10 (2) 11 99
2 12 9 1 13 9 Mean 11379 S2347
Predicted Prices Leaf Unit 100 1 7
2 1 8 1 9 2 10 1 2 11
(2) 12 25 2 13 00 Mean 11379 S2295
9
Residuals
• The residual is the vertical distance from the
line (positive if above line, negative if below
line.)
• Find the residual for the car with 43903 miles.

10
How well does line fit data?
• Look at a residual plot.
• residual observed y predicted
• plot explanatory variable on x-axis residuals on
y- axis.
• If the regression captured the overall pattern,
then there should be no pattern remaining in
residual plot.
• How well does the least squares regression line
fit the relationship between mileage and price?

11
Outliers
• An outlier is a data point that falls outside the
overall pattern. They are most obvious from
looking at the residual plot.
•
• Are there any outliers in the previous example?

12
Influential Observations
• An observation is influential if removing it
markedly changes the calculated regression line.
• Example on Web
• Points that are separated from the data in the x
direction of a scatterplot tend to be
influential. (These points are may not be
outliers as defined on previous slide.)
• In the previous example, which point may be
influential?

13
Influential Observations, Continued
• The equations with and without this point
are PRICE 15589.6 - 0.0697811 MILEAGE PRICE
15990.5 - 0.0786652 MILEAGE
• How far apart are these?
• We would predict the price of a car with 30,000
miles to be 13496 vs 13631
• We would predict the price of a car with 70,000
miles to be 10705 vs 10484
• Sketch Line

14
Summary
• Get equation of regression line.
• If data is given, use calculator or Minitab.
• If summary statistics are given use formula.
• Interpret slope and y-intercept in context of
problem.
• r2 gives proportion of variation in response
variable that is explained by the explanatory
variable via the line
• Residual plots are used to check for outliers and
pattern that is not picked up by line.
• Outliers vs Influential data points.

15
Year as an Explanatory Variable
16
Residual Plot
17
18
Example. SAT Math vs SAT Verbal
• States average SAT math scores against states
average SAT verbal scores.
• verbal 34.2 0.940 math
• r 0.970 (r2 0.941).

19
Example. SAT Math vs SAT Verbal
• Describe what the slope and y-intercept mean in
the vocabulary of the problem.
• How well does the line fit the data?
• How successful is the regression in explaining
response?
• How well would this line do to predict individual
verbal scores ?

20
Words of Caution
• Extrapolation
• Using Averaged Data
• Lurking Variables
• Association Does Not Imply Causation
• Example. Among elementary aged children there is
a strong correlation between shoe size and
reading aptitude. What explains this correlation?