Title: Week 5 objectives
1Week 5 objectives
- The simple linear regression model
- Standard errors
- The meaning of R-squared
- Accuracy of the estimation of slope and
intercept? - Predictions and accuracy of predictions
- Diagnostic plots
- Multiple regression models
- Prediction accuracy in multiple regression
21. Review questions about fitting straight lines
- Suppose that a scatterplot shows a reasonably
strong, linear association between x and y
variables. It is then natural to represent that
linear association by a straight line. - The fitted line is called a regression line.
3How to fit a straight line?
LS minimizes the sum of squares of residuals,
which are the vertical (ie y direction) distances
from line to points
4Example of fitted line Price vs. age of houses
in Adelaide
How do we interpret slope and intercept?
5What is the simple linear regression model?
In fitting a line by least squares, in effect an
underlying model is proposed Here, is
the true intercept and is the true
slope
6What is the relationship between the true line
and the fitted line?
The true slope and intercept are These
parameters are estimated by How accurate are
these estimates, and how accurate is the
prediction given an
72. What is standard error?
- The standard error of any estimate or predicted
value is an estimate of its standard deviation. - For the present work, we use standard error
simply as a general indication of accuracy - Minitab can provide standard errors
8Lecture Example 1
selling price of houses (y) and LGA (local
government area valuation, x) Units are 000
93. What is R2 and what does it mean?
- R2 is the coefficient of determination, and
measures the proportion of variance among the
original y observations which is explained by
the linear regression upon x. - So the closer R2 is to 100, the more perfect is
the regression model
10Example of least squares fit
How accurate is this fit?
11How good is the linear regression model?
Interpretations using R-squared
124. How accurate will be the estimation of slope
and intercept?
- High values of R-squared indicate a generally
good fit of the data to the regression line - But specific measures of the accuracy of
estimates are provided by standard errors, which
can be found under the StDev column of the
regression output.
13selling price of houses (y) and LGA (local
government area valuation, x)Units are 000.
intercept 5.59, slope 1.285How accurate
are these values?
Lecture Example 1 continued
14Lecture example 1 continued accuracy of
estimates of intercept and slope
- R-squared is 96, which suggests the model
overall fit is excellent - The estimate of slope seems to be quite precise,
the estimate is 1.28 with a standard error of
0.107. - The estimate of intercept seems to contain a
great deal of uncertainty, because the estimate
is 5.59 and the standard error is 17.4. - But, the intercept measures the sale price of a
house whose LGA 0! This unrealistic situation
is well outside the range of existing LGA values.
It is not surprising for the estimate to be
inaccurate.
15Lecture exercise 1
- How good is the regression model?
- How accurate are the estimates of the intercept
and slope? - Use the Minitab output on the next slide for the
relationship between weekly profit and traffic
volume for a fast food outlet
16Lecture exercise 1 continued
Regression Analysis weekly profit versus traffic
volume The regression equation is weekly profit
3.45 0.143 traffic volume Predictor
Coef SE Coef T P Constant
3.4544 0.7396 4.67 0.005 traffic
0.14254 0.02936 4.85
0.005 S 0.6897 R-Sq 82.5
17Lecture exercise 1 solution
- R-squared is 82.5, which suggests the model
overall fit is very good - The estimate of slope seems to be very precise,
the estimate is 0.14254 with a standard error of
0.02936. - The estimate of intercept seems to be less
accurate, because the estimate is 3.4544 and the
standard error is 0.7396. - Note that the intercept measures the weekly
profit of an outlet with no traffic which is not
a particularly realistic situation, but it is not
too far outside the range of existing traffic
volume.
185. Predictions and its accuracy
Lecture Example 2 Regression model from
Lecture example 1 The fitted equation is
5.6 1.29x How to predict the selling price of a
house when its LGA is 140,000?
Answer 5.59 1.29140 185.5
(,000)
19So, when are predictions from a regression line
likely to be accurate?
- When R-squared indicates that the quality of fit
to the linear regression model is good - When we use either interpolation, or
extrapolation close to the existing range of
x-values - The standard error of predictions measures the
accuracy of predictions and gives the definitive
answer.
20Lecture example 2 continued Accuracy of the
prediction of 185,000
Interpolation or extrapolation for this
prediction ?
- R-squared of 96 indicates that the quality of
fit to the linear regression model is good
21How to find the standard error of the prediction?
- In performing regression you chose Stat gt
Regression gt Regression, then entered sale price
under Response and LGA under Predictor. - Also select Options, enter 140 in the box
Prediction interval for new observations.
22Lecture example 2 continued
23Lecture exercise 2
- Based on the fitted regression model, what is the
average weekly profit for a fast food outlet with
traffic volume of 35? - Discuss the accuracy of that prediction.
Regression Analysis weekly profit versus traffic
volume The regression equation is weekly profit
3.45 0.143 traffic volume Predictor
Coef SE Coef T P Constant
3.4544 0.7396 4.67 0.005 traffic
0.14254 0.02936 4.85
0.005 S 0.6897 R-Sq 82.5 Predicted
Values for New Observations Fit SE Fit
8.443 0.425
24Lecture exercise 2 solution
- Average weekly profit for an outlet with traffic
volume of 35 is 8,443. - Accuracy of prediction is reasonably high.
Reasons - The R-squared value indicates that the overall
fit is very good - The prediction was obtained using interpolation
rather that extrapolation - The standard error of prediction is 425, which
is reasonably small compared to the fitted value.
256. Robustness questions in fitting lines
- Outliers can have an effect, but also
- points of high leverage (far flung values of
explanatory variable x) can be very influential,
especially for estimating slope - Some residual plots are useful see the text
pages 97-98. - The next two slides show plots which identify
three possible outliers. The data is in
restrnt.mtp, and Sales is regressed upon newcap,
value and seats.
26(No Transcript)
27(No Transcript)
287. Multiple linear regression add more
explanatory variables to the model
Extend the model in Example (4.7.1) to where
is traffic volume (1000/day) is seating
capacity of the outlet is weekly profit
('000) An extra explanatory variable is
. See the textbook for details
29Example of multiple regression
where are scores given by tasters to n
different randomly assigned food portions. There
are three additives A, B, C being investigated.
and are the amount of A, B and
C, respectively, in ith portion. score
given to ith portion. What do the "slopes"
and measure?
30Multiple Linear Regression output standard
errors are under the StDev column
318. Prediction accuracy in Multiple Regression
- Follow the same procedure as for simple linear
regression - In Minitab, several explanatory variable names
have been entered in the Predictors box - Select Options, and in the Prediction intervals
for new observations box, enter the designated
explanatory variable values in the same order
32Example
Effect of four explanatory variables on the
assessed quality of 38 wines
33Results of multiple regression
34Comments on standard errors
- The standard errors for the coefficients of
intercept, Aroma, Body, and Oakiness are not
small. - The possible reason for the large standard error
of the intercept is that, the intercept estimates
the average quality of a wine whose explanatory
variable values are all zero, which is well
outside the range of recorded values from the
explanatory Descriptive Statistics.
35Can we predict and give standard error for the
average quality for a wine with the
characteristics aroma 5.5, body 4.6,
flavour 5.0 and oakiness 4.5?
- The explanatory variable names entered under
Predictors were aroma, body, flavour and
oakiness, in that order - So under Options, enter the values 5.5 4.6 5.0
4.5 in that order in the box Prediction interval
for new observations
36The predicted average quality is 12.9, with
standard error 0.25, a reasonably accurate
estimate