Loading...

PPT – Sections 2'3 and 2'4 PowerPoint presentation | free to view - id: 146476-OTZjZ

The Adobe Flash plugin is needed to view this content

Sections 2.3 and 2.4

- Regression Lines

Review of Lines

- The equation yabx describes a line.
- a is the y-intercept or the value of y that goes

with x0. - b is the slope of the line.
- ab is the value of y that goes with x1.
- a2b is the value of y that goes with x 2.
- When the value of x goes up by one, the value of

y goes up by b. - Plot a line, by first plotting two points.

Used Honda Civics

Can mileage be used to predict price?

- Sketch line.
- Estimate slope
- Estimate y-intercept.
- Give equation of line.
- Estimate price of car with 100,000 miles.

Regression Lines

- Can we agree on the best line?
- The least squares regression line is the line

that best fits. - Goodness of fit is determined by calculating the

sum of all squared vertical distances. The

smaller this sum, the better the fit. - The regression line is given by yabx, where

Back to Civics

- Use the formulas for finding least squares

regression line. The descriptive statistics for

our data are as follows - Describe what the slope and y-intercept mean in

the vocabulary of the problem. - Does the y-intercept make sense in the context of

the data?

Extrapolation

- extrapolation using the regression line for

prediction far outside the range of values of the

explanatory variable used to obtain the line.

Such predictions are often not accurate. - Usually in regression x0 is outside range of

data so the y-intercept will not make sense. - Extrapolation would also occur if we tried to

predict the price of a used Civic with 300,000

miles.

How successful is the regression in explaining

the response?

- r2 - the proportion of the variation in the

response variable that is explained by the

regression line.

Observed Prices Leaf Unit 100 1 7 3

1 8 2 9 9 2 10 (2) 11 99

2 12 9 1 13 9 Mean 11379 S2347

Predicted Prices Leaf Unit 100 1 7

2 1 8 1 9 2 10 1 2 11

(2) 12 25 2 13 00 Mean 11379 S2295

Residuals

- The residual is the vertical distance from the

line (positive if above line, negative if below

line.) - Find the residual for the car with 43903 miles.

How well does line fit data?

- Look at a residual plot.
- residual observed y predicted
- plot explanatory variable on x-axis residuals on

y- axis. - If the regression captured the overall pattern,

then there should be no pattern remaining in

residual plot. - How well does the least squares regression line

fit the relationship between mileage and price?

Outliers

- An outlier is a data point that falls outside the

overall pattern. They are most obvious from

looking at the residual plot. - Are there any outliers in the previous example?

Influential Observations

- An observation is influential if removing it

markedly changes the calculated regression line. - Example on Web
- Points that are separated from the data in the x

direction of a scatterplot tend to be

influential. (These points are may not be

outliers as defined on previous slide.) - In the previous example, which point may be

influential?

Influential Observations, Continued

- The equations with and without this point

are PRICE 15589.6 - 0.0697811 MILEAGE PRICE

15990.5 - 0.0786652 MILEAGE - How far apart are these?
- We would predict the price of a car with 30,000

miles to be 13496 vs 13631 - We would predict the price of a car with 70,000

miles to be 10705 vs 10484 - Sketch Line

Summary

- Get equation of regression line.
- If data is given, use calculator or Minitab.
- If summary statistics are given use formula.
- Interpret slope and y-intercept in context of

problem. - r2 gives proportion of variation in response

variable that is explained by the explanatory

variable via the line - Residual plots are used to check for outliers and

pattern that is not picked up by line. - Outliers vs Influential data points.

Year as an Explanatory Variable

Residual Plot

Fitting Quadratic

Example. SAT Math vs SAT Verbal

- States average SAT math scores against states

average SAT verbal scores. - verbal 34.2 0.940 math
- r 0.970 (r2 0.941).

Example. SAT Math vs SAT Verbal

- Describe what the slope and y-intercept mean in

the vocabulary of the problem. - How well does the line fit the data?
- How successful is the regression in explaining

response? - How well would this line do to predict individual

verbal scores ?

Words of Caution

- Extrapolation
- Using Averaged Data
- Lurking Variables
- Association Does Not Imply Causation
- Example. Among elementary aged children there is

a strong correlation between shoe size and

reading aptitude. What explains this correlation?