Chapter 8 Linear regression - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Chapter 8 Linear regression

Description:

Chapter 8 Linear regression Math2200 Scatterplot One double Whopper contains 53 grams of protein, 65 grams of fat and 1020 calories. So two double would contain ... – PowerPoint PPT presentation

Number of Views:202

Avg rating:3.0/5.0

Slides: 29

Provided by: nanl3

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 8 Linear regression

1
Chapter 8 Linear regression

Math2200

2
Scatterplot

One double Whopper contains 53 grams of protein,
65 grams of fat and 1020 calories. So two double
would contain enough calories for a day.
fat versus protein for 30 items on the Burger
King menu

3
The linear model

Parameters
Intercept
Slope
Model is NOT perfect!
Predicted value
Residual Observed predicted
Overestimate when residuallt0
Underestimate when residualgt0

4
The linear model (cont.)

We write our model as
This model says that our predictions from our
model follow a straight line.
The data values will scatter closely around it,
if the model is a good fit .

5
How did I get the line?

Best fit line
Minimize residuals overall
The line of best fit is the line for which the
sum of the squared residuals is smallest. The
least squares line minimizes

6
How do I get the line? (cont.)

The regression line
Slope in units of y per unit of x
Intercept in units of y

7
Interpreting the regression line

Slope
Increasing 1 unit in x ? increasing units in
y
In particular, moving one standard deviation away
from the mean in x moves us r standard deviations
away from the mean in y.
Intercept predicted value of y when x0
Predicted value at x

8
Fat Versus Protein An Example

The regression line for the Burger King data fits
the data well
The equation is
The predicted fat content for a BK Broiler
chicken sandwich is
6.8 0.97(30) 35.9 grams of fat.

9
How Big Can Predicted Values Get?

r cannot be bigger than 1 (in absolute value), so
each predicted y tends to be closer to its mean
(in standard deviations) than its corresponding x
was.
This property of the linear model is called
regression to the mean the line is called the
regression line.

10
Sir Francis Galton
11
Residuals Revisited

The residuals are the part of the data that
hasnt been modeled.
Data Model Residual
or (equivalently)
Residual Data Model
Or, in symbols,

12
Residuals Revisited (cont.)

When a regression model is appropriate, nothing
interesting should be left behind.
Residual plot should have no pattern, no bend, no
outlier.
Residual against x
Residual against predicted value
The spread of the residual plot should be the
same throughout.

13
Residuals Revisited (cont.)

If the residuals show no interesting pattern in
the residual plot, we use standard deviation of
the residuals to measure how much the points
spread around the regression line. The standard
deviation of residuals is given by
We need the Equal Variance Assumption for the
standard deviation of residuals. The associated
condition to check is the Does the Plot Thicken?
Condition.

14
Residual plot of regression stopping distance on
car speed
15
How well does the linear model fit?

The variation in the residuals is the key to
assessing how well the model fits.
Total fat (y) sd 16.4g
Residual sd 9.2g
less variation
How much of the variation
is accounted for by the model?
How much is left in the residuals?

16
Variation

The squared correlation gives the fraction
of the datas variation explained by the model.
We can view as the percentage of
variability in y that is NOT explained by the
regression line, or the variability that has been
left in the residuals
For the BK model, r2 0.832 0.69, so 31 of
the variability in total fat has been left in the
residuals.

17
R2The Variation Accounted For

0 no variance explained
1 all variance explained by the model
How big should R2 be to conclude the model fit
the data well?
R2 is always between 0 and 100. What makes a
good R2 value depends on the kind of data you
are analyzing and on what you want to do with it.

18
Check the following conditions

The two variables are both quantitative
The relationship is linear (straight enough)
Scatterplot
Residual plot
No outliers Are there very large residuals?
Scatterplot
Residual plot
Equal variance all residuals should share the
same spread
Residual plot Does the Plot Thicken?

19
Summary

Whether the linear model is appropriate?
Residual plot
How well does the model fit?
R2

20
Reality Check Is the Regression Reasonable?

Statistics dont come out of nowhere. They are
based on data.
The results of a statistical analysis should
reinforce your common sense, not fly in its face.
If the results are surprising, then either youve
learned something new about the world or your
analysis is wrong.
When you perform a regression, think about the
coefficients and ask yourself whether they make
sense.

21
What Can Go Wrong?

Dont fit a straight line to a nonlinear
relationship.
Beware extraordinary points (y-values that stand
off from the linear pattern or extreme x-values).
Dont extrapolate beyond the datathe linear
model may no longer hold outside of the range of
the data.
Dont infer that x causes y just because there is
a good linear model for their relationshipassocia
tion is not causation.
Dont choose a model based on R2 alone.

22
What have we learned?

When the relationship between two quantitative
variables is fairly straight, a linear model can
help summarize that relationship.
The regression line doesnt pass through all the
points, but it is the best compromise in the
sense that it has the smallest sum of squared
residuals.

23
What have we learned? (cont.)

The correlation tells us several things about the
regression
The slope of the line is based on the
correlation, adjusted for the units of x and y.
For each SD in x that we are away from the x
mean, we expect to be r SDs in y away from the y
mean.
Since r is always between -1 and 1, each
predicted y is fewer SDs away from its mean than
the corresponding x was (regression to the mean).
R2 gives us the fraction of the response
accounted for by the regression model.

24
What have we learned? (cont.)

The residuals also reveal how well the model
works.
If a plot of the residuals against predicted
values shows a pattern, we should re-examine the
data to see why.
The standard deviation of the residuals
quantifies the amount of scatter around the line.

25
What have we learned? (cont.)

The linear model makes no sense unless the Linear
Relationship Assumption is satisfied.
Also, we need to check the Straight Enough
Condition and Outlier Condition with a
scatterplot.
For the standard deviation of the residuals, we
must make the Equal Variance Assumption. We
check it by looking at both the original
scatterplot and the residual plot for Does the
Plot Thicken? Condition.

26
TI-83

Enter data as lists first
Press STAT
Then move the cursor to CALC
Press 4 (LinReg(axb)) or Press 8 (LinReg(abx))
Then put the list names for which you want to do
regression, e.g., L1, L2
Press ENTER
To see
Set DIAGNOSTICS ON
2ND 0 (CATALOG), move the cursor down to
DiagnosticsOn
Press ENTER (You will see DONE)
Now repeat the above operations for linear
regression, you will see the correlation
coefficient and