Week 5 objectives - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Week 5 objectives

Description:

The standard errors for the coefficients of intercept, Aroma, Body, and Oakiness are not small. ... under Predictors' were aroma, body, flavour and oakiness, ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 37
Provided by: University520
Category:
Tags: aroma | objectives | week

less

Transcript and Presenter's Notes

Title: Week 5 objectives


1
Week 5 objectives
  • The simple linear regression model
  • Standard errors
  • The meaning of R-squared
  • Accuracy of the estimation of slope and
    intercept?
  • Predictions and accuracy of predictions
  • Diagnostic plots
  • Multiple regression models
  • Prediction accuracy in multiple regression

2
1. Review questions about fitting straight lines
  • Suppose that a scatterplot shows a reasonably
    strong, linear association between x and y
    variables. It is then natural to represent that
    linear association by a straight line.
  • The fitted line is called a regression line.

3
How to fit a straight line?
LS minimizes the sum of squares of residuals,
which are the vertical (ie y direction) distances
from line to points
4
Example of fitted line Price vs. age of houses
in Adelaide
How do we interpret slope and intercept?
5
What is the simple linear regression model?
In fitting a line by least squares, in effect an
underlying model is proposed Here, is
the true intercept and is the true
slope
6
What is the relationship between the true line
and the fitted line?
The true slope and intercept are These
parameters are estimated by How accurate are
these estimates, and how accurate is the
prediction given an
7
2. What is standard error?
  • The standard error of any estimate or predicted
    value is an estimate of its standard deviation.
  • For the present work, we use standard error
    simply as a general indication of accuracy
  • Minitab can provide standard errors

8
Lecture Example 1
selling price of houses (y) and LGA (local
government area valuation, x) Units are 000
9
3. What is R2 and what does it mean?
  • R2 is the coefficient of determination, and
    measures the proportion of variance among the
    original y observations which is explained by
    the linear regression upon x.
  • So the closer R2 is to 100, the more perfect is
    the regression model

10
Example of least squares fit
How accurate is this fit?
11
How good is the linear regression model?
Interpretations using R-squared
12
4. How accurate will be the estimation of slope
and intercept?
  • High values of R-squared indicate a generally
    good fit of the data to the regression line
  • But specific measures of the accuracy of
    estimates are provided by standard errors, which
    can be found under the StDev column of the
    regression output.

13
selling price of houses (y) and LGA (local
government area valuation, x)Units are 000.
intercept 5.59, slope 1.285How accurate
are these values?
Lecture Example 1 continued
14
Lecture example 1 continued accuracy of
estimates of intercept and slope
  • R-squared is 96, which suggests the model
    overall fit is excellent
  • The estimate of slope seems to be quite precise,
    the estimate is 1.28 with a standard error of
    0.107.
  • The estimate of intercept seems to contain a
    great deal of uncertainty, because the estimate
    is 5.59 and the standard error is 17.4.
  • But, the intercept measures the sale price of a
    house whose LGA 0! This unrealistic situation
    is well outside the range of existing LGA values.
    It is not surprising for the estimate to be
    inaccurate.

15
Lecture exercise 1
  • How good is the regression model?
  • How accurate are the estimates of the intercept
    and slope?
  • Use the Minitab output on the next slide for the
    relationship between weekly profit and traffic
    volume for a fast food outlet

16
Lecture exercise 1 continued
Regression Analysis weekly profit versus traffic
volume The regression equation is weekly profit
3.45 0.143 traffic volume Predictor
Coef SE Coef T P Constant
3.4544 0.7396 4.67 0.005 traffic
0.14254 0.02936 4.85
0.005 S 0.6897 R-Sq 82.5
17
Lecture exercise 1 solution
  • R-squared is 82.5, which suggests the model
    overall fit is very good
  • The estimate of slope seems to be very precise,
    the estimate is 0.14254 with a standard error of
    0.02936.
  • The estimate of intercept seems to be less
    accurate, because the estimate is 3.4544 and the
    standard error is 0.7396.
  • Note that the intercept measures the weekly
    profit of an outlet with no traffic which is not
    a particularly realistic situation, but it is not
    too far outside the range of existing traffic
    volume.

18
5. Predictions and its accuracy
Lecture Example 2 Regression model from
Lecture example 1 The fitted equation is
5.6 1.29x How to predict the selling price of a
house when its LGA is 140,000?
Answer 5.59 1.29140 185.5
(,000)
19
So, when are predictions from a regression line
likely to be accurate?
  • When R-squared indicates that the quality of fit
    to the linear regression model is good
  • When we use either interpolation, or
    extrapolation close to the existing range of
    x-values
  • The standard error of predictions measures the
    accuracy of predictions and gives the definitive
    answer.

20
Lecture example 2 continued Accuracy of the
prediction of 185,000
Interpolation or extrapolation for this
prediction ?
  • R-squared of 96 indicates that the quality of
    fit to the linear regression model is good

21
How to find the standard error of the prediction?
  • In performing regression you chose Stat gt
    Regression gt Regression, then entered sale price
    under Response and LGA under Predictor.
  • Also select Options, enter 140 in the box
    Prediction interval for new observations.

22
Lecture example 2 continued
23
Lecture exercise 2
  • Based on the fitted regression model, what is the
    average weekly profit for a fast food outlet with
    traffic volume of 35?
  • Discuss the accuracy of that prediction.

Regression Analysis weekly profit versus traffic
volume The regression equation is weekly profit
3.45 0.143 traffic volume Predictor
Coef SE Coef T P Constant
3.4544 0.7396 4.67 0.005 traffic
0.14254 0.02936 4.85
0.005 S 0.6897 R-Sq 82.5 Predicted
Values for New Observations Fit SE Fit
8.443 0.425
24
Lecture exercise 2 solution
  • Average weekly profit for an outlet with traffic
    volume of 35 is 8,443.
  • Accuracy of prediction is reasonably high.
    Reasons
  • The R-squared value indicates that the overall
    fit is very good
  • The prediction was obtained using interpolation
    rather that extrapolation
  • The standard error of prediction is 425, which
    is reasonably small compared to the fitted value.

25
6. Robustness questions in fitting lines
  • Outliers can have an effect, but also
  • points of high leverage (far flung values of
    explanatory variable x) can be very influential,
    especially for estimating slope
  • Some residual plots are useful see the text
    pages 97-98.
  • The next two slides show plots which identify
    three possible outliers. The data is in
    restrnt.mtp, and Sales is regressed upon newcap,
    value and seats.

26
(No Transcript)
27
(No Transcript)
28
7. Multiple linear regression add more
explanatory variables to the model
Extend the model in Example (4.7.1) to where
is traffic volume (1000/day) is seating
capacity of the outlet is weekly profit
('000) An extra explanatory variable is
. See the textbook for details
29
Example of multiple regression
where are scores given by tasters to n
different randomly assigned food portions. There
are three additives A, B, C being investigated.
and are the amount of A, B and
C, respectively, in ith portion. score
given to ith portion. What do the "slopes"
and measure?
30
Multiple Linear Regression output standard
errors are under the StDev column
31
8. Prediction accuracy in Multiple Regression
  • Follow the same procedure as for simple linear
    regression
  • In Minitab, several explanatory variable names
    have been entered in the Predictors box
  • Select Options, and in the Prediction intervals
    for new observations box, enter the designated
    explanatory variable values in the same order

32
Example
Effect of four explanatory variables on the
assessed quality of 38 wines
33
Results of multiple regression
34
Comments on standard errors
  • The standard errors for the coefficients of
    intercept, Aroma, Body, and Oakiness are not
    small.
  • The possible reason for the large standard error
    of the intercept is that, the intercept estimates
    the average quality of a wine whose explanatory
    variable values are all zero, which is well
    outside the range of recorded values from the
    explanatory Descriptive Statistics.

35
Can we predict and give standard error for the
average quality for a wine with the
characteristics aroma 5.5, body 4.6,
flavour 5.0 and oakiness 4.5?
  • The explanatory variable names entered under
    Predictors were aroma, body, flavour and
    oakiness, in that order
  • So under Options, enter the values 5.5 4.6 5.0
    4.5 in that order in the box Prediction interval
    for new observations

36
The predicted average quality is 12.9, with
standard error 0.25, a reasonably accurate
estimate
Write a Comment
User Comments (0)
About PowerShow.com