Chapter 4 Describing Relationships Between Variables - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Chapter 4 Describing Relationships Between Variables

Description:

Chapter 4 Describing Relationships Between Variables. 4.1 Fitting least squares ... Predicting win percentage based on rebounds/game for NBA teams. ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 44
Provided by: karl252
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4 Describing Relationships Between Variables


1
Chapter 4 Describing Relationships Between
Variables
  • 4.1 Fitting least squares lines
  • 4.2 Fitting Curves and Surfaces

2
4.1 Fitting least squares lines-Abstract
  • Least squares line
  • How to find least squares line
  • Intepretation
  • Prediction
  • Extropolation
  • Linear Fit
  • Correlation strength and direction
  • Coefficient of determination
  • Residual plot Check for random scatter
  • Normal Probability plot of Residuals Check for
    straight line
  • Regression Cautions

3
4.1 Fitting a Least Squares Line
  • Describe a relationship between two variables x
    and y
  • We will find the best linear fit of y versus x.
  • and are unknown parameters
  • Goal find estimates and for the
    parameters and .

4
Example 4.1
  • Eight batches of plastic are made and from each
    batch one test item is molded and its hardness y
    is measured at time x. The following are the 8
    measurements

5
Example 4.1
  • Scatterplot Is a linear relationship
    appropriate?
  • How do we find an equation for this line?

By looking at this scatterplot, we see that there
appears to be a strong, positive, linear
relationship between x and y
6
Least Squares Principle
  • We will fit a line given by b0 b1x, where
    b0 and b1 are estimates for the parameters
    and .
  • Note that a straight line will not pass perfectly
    through every one of our data points.
  • Thus, if we plug a data value xi into the
    equation b0 b1x, the value
    we get for will not be exactly our data
    value yi.

yi
b0 b1 xi
xi
7
Least Squares Principle
  • Need to minimize the squared distances from the
    actual data value, yi, and the value given by our
    equation, .
  • Thus, we wish to minimize

8
Least Squares Principles
  • How do we find estimates for and ?
  • Use calculus.
  • Plugging into the equation
    yields
  • How to minimize
  • Take partial derivatives with respect to and
  • Set derivatives equal to zero.

9
Normal Equations
  • Taking partial derivatives with respect to b0 and
    b1 and setting them equal to 0 yields what are
    known as the Normal Equations.

10
Least Squares Estimates
  • Solving these equations (details omitted) for b0
    and b1 yields the following

11
Example 4.2
  • Continued from example 4.1
  • Find the least squares estimates given

153.060
2.433
12
Interpretation
  • b1 means for every 1 unit increase in x
    variable, the y variable increases, on average,
    by the value of b1.
  • Only true for a linear model
  • b0 on average, the value of y when x is equal to
    0.
  • Not always meaningful
  • Example GPA vs. ACT score, b0 -5.7

13
Example 4.3
  • Continued from example 4.1
  • Eight batches of plastic are made and from each
    batch one test item is molded and its hardness y
    is measured at time x.
  • b1 means that for every 1 unit increase in
    time, the hardness increases, on average, by
    2.433.
  • b0 means that when no time has passed, the
    hardness is 153.060.

14
Prediction
  • We can predict y with the least squares line.
  • Simply insert a value of x into the least squares
    equation to obtain a predicted value of y.
  • What is the predicted hardness for time x24?

15
Extrapolation
  • Extrapolation is when a value of x beyond the
    range of our actual x observations is used to
    find a predicted .
  • Predicted values should not be used when
    extrapolating beyond the data set.
  • Why? Because we do not know the behavior beyond
    the range of our x values.
  • Example What is the predicted hardness for time
    x 110?

16
Linear Fit
  • We have a fitted line, but does it fit well?
  • To check the fit
  • Correlation
  • Coefficient of Determination
  • Residual Plots

17
Correlation
  • Correlation quantifies the linear fit between y
    and x.
  • r will always lie between (1) and 1
  • r close to 0 indicates a weak linear
    relationship.
  • r close to either 1 or 1 indicates a strong
    linear relationship.
  • The sign of r indicates if the relationship is
    positive or negative.
  • So a positive value of r tells us that y is
    increasing linearly in x and a negative value of
    r tells us that y is decreasing linearly in x.

18
Coefficient of Determination
  • Coefficient of Determination the fraction of
    raw variation in y accounted for by the fitted
    equation.
  • Can be used as Quantifies the fit of other types
    of relationships (not just linear)
  • The value of will always lie between 0 and 1
  • Values closer to 0 indicating a bad fit of our
    model
  • Values closer to 1 indicating a good fit of our
    model

19
Example 4.6
  • Continued from example 4.1
  • From r we can tell that there is a strong,
    positive, linear relationship (the linear model
    fits well).
  • From R2 we can tell that our model fits well.
  • R2 r2 only with a linear model.

20
Residuals
  • We hope that the fitted values, , will look
    like our data,
  • except for small fluctuations explainable only as
    random variation.
  • To assess this, we look at what are called
    residuals

21
Residuals
  • When we are fitting a least squares line, we are
    minimizing the sum of residuals
  • These residuals should be patternless
    (randomly scattered).
  • as indicated by a cloud of points scattered above
    and below 0 in plots of
  • the residuals against x
  • residuals against fitted

22
Residuals
  • To use residuals to check the fit, we need to
    check their pattern.
  • We now look at some different residual plots.
  • First, look at a plot that shows what we want to
    see from residual plots, namely pattenless
  • Then explore some problems/patterns that may be
    identified through residual plots.

23
Residual Plot 1
Actual Data
Residual Plot
The residuals are randomly scattered around
0. Thus, residual plot shows good fit (linear
model is appropriate).
24
Residual Plot 2
Actual Data
Residual Plot
The residual plot shows a distinct curved
pattern. Thus, a linear model is not appropriate.
The data is probably better described with a
quadratic model.
25
Residual Plot 3
Actual Data
Residual Plot
The residual plot shows a cone-shaped
pattern. There is more spread for larger fitted
values. The researcher may want to investigate
the data collection process.
26
Residual Plot 4
  • Residuals vs. the time order of the observation
  • As time increases the residuals increase.
  • This pattern suggests that some variable changing
    in time is acting on y and has not been accounted
    for in fitting the model.
  • After seeing a residual plot with this pattern,
    the researcher may want to inspect the process
    from which the data was obtained.
  • Example instrumental drift could produce a
    pattern like this.

Ordered Residual Plot
27
Normal Prob. Plot for Residuals
  • If we really have random variation, we hope
  • Residuals should centered at zero
  • Scattered evenly above and below zero
  • The most will be close to zero with less of
    residuals appearing as we move further from
    zero.
  • Histogram of residuals should look like the
    following.

28
Normal Prob. Plot for Residuals(continued)
  • Normal probability plot can be used for checking
    whether or not a set of residuals comes from a
    bell-shaped distribution.
  • An S-shape in a normal probability plot means
    that we have skewed residuals.
  • Whereas a straight line indicates a bell-shape.

29
Example 4.7
  • Continued from example 4.1

30
Example 4.7
Residual Plot
  • Residual plot shows random scatter around 0.
  • Normal probability plot follows a straight line.
  • Conclusion linear model fits well.

31
Linear Regression Cautions
  • r measures only linear relationships. There
    could be a very good nonlinear model but a small
    r.
  • Correlation does not imply causation
  • An example from Wikipedia Since the 1950s, both
    the atmospheric CO2 level and crime levels have
    increased sharply. Thus, we would expect a large
    correlation between crime and CO2 levels.
    However, we would not assume that atmospheric CO2
    causes crime.
  • Both R2 and r can be drastically affected by a
    few unusual data points.
  • Example on page 137

32
Summary of 4.1
  • Least squares line
  • How to find
  • Intepretation
  • Prediction
  • Extropolation
  • Linear Fit
  • Correlation strength and direction
  • Coefficient of determination
  • Residual plot Check for random scatter
  • Normal Probability plot of Residuals Check for
    straight line
  • Regression Cautions

33
4.2 Fitting Curves and Surface-Abstract
  • Curve fitting
  • Surface fitting
  • Interpretation given
  • More on model fitting

34
4.2 Fitting Curves and Surfaces
  • Use least squares
  • Computation and interpretation becomes more
    complicated.
  • Curve fitting
  • A natural generalization to the linear equation
    is the polynomial equation
  • Computation of estimates
    is done by computer.

35
Surface Fitting
  • In surface fitting we have more than 1 predictor
    variable (xs) with our response (y).
  • Again, computation of estimates
    is done by computer.
  • Example we want to predict brick strength (y)
    given a level of temperature ( ) and humidity
    ( )

36
Interpretation
  • Given , the
    interpretation is as follows
  • b0 represents, on average, value of y when x1 0
    and x2 0
  • b1 represents, on average, increase/decrease in y
    for every one unit increase in x1, holding
    constant x2
  • b2 represents, on average, increase/decrease in y
    for every one unit increase in x2, holding
    constant x1
  • Note these statements are general.
  • You will need to do this within the context of
    the problem.

37
Residual Plots
  • Computed the same way as before
  • Normal probability plot of residuals
  • Residual plot against x
  • Residual plot against fitted values
  • Use computer due to computational intensity

38
More on Model Fit
  • It is often wise to check multiple forms of model
    fit.
  • Each assessment may only be painting half the
    picture
  • Most common combination
  • R2
  • Residual plot

39
Example 4.8
  • Trying to predict stopping distance (ft) given
    the current speed (mph).

Distance vs. Speed
40
Example 4.8
Residual Plot
  • Although the data seemed linear, and the R2 was
    extremely high, the residual plot shows a
    distinct curved pattern.
  • Thus, the fit could be improved upon.
  • Use quadratic instead of linear.

41
Example 4.9
  • Predicting win percentage based on rebounds/game
    for NBA teams.
  • Residual plot theres random scatter around 0.
  • Linear model seems to fit well.

42
Example 4.9
  • Although the residual plot indicates a good fit,
    the R2 0.2014, which is very low.
  • From the scatterplot, we notice that the data are
    somewhat linear, but a very weak relationship
    exists (thus the low R2).

43
Summary of 4.2
  • Curve fitting
  • Surface fitting
  • Interpretation given
  • More on model fitting
Write a Comment
User Comments (0)
About PowerShow.com