Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Regression

Description:

They give record times (minutes) in 1984 of 35 Scottish hill races, against ... The observations corresponding to Bens of Jura and Lairig Ghru remain moderately ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 26
Provided by: jphil
Category:
Tags: bens | regression

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Several Explanatory Variables

2
Example Scottish hill races data. These data
are made available in R as gt Library(MASS) gt
data(hills) They give record times (minutes) in
1984 of 35 Scottish hill races, against distance
(miles) and total height climbed (feet).
3
(No Transcript)
4
We regard time as the response variable, and seek
to model how its conditional distribution depends
on the explanatory variables distance and climb.
5
The R code pairs(hills) produces the plots shown.
6
These show that the response variable time has a
strong positive association with each of the
explanatory variables distance and climb -
although a stronger dependence on distance.
However, the two explanatory variables distance
and climb also have a strong positive association
with each other, and this complicates the
modelling.
7
Preliminary analysis of the data suggests that
the observation (number 18) corresponding to
Knock Hill is almost certainly in error - the
time is much too great for the given distance and
climb, and it may have been misrecorded by 1
hour. We therefore omit Knock Hill from the
analysis. (use plot and identify commands)
8
On physical grounds we attempt to find a model
with zero intercept. We consider first a linear
model (Model 1) involving both the explanatory
variables distance and time. time a x distance
b x climb e
9
(No Transcript)
10
The fitted model is time 5.47 x dist 0.0106
x climb e
11
The three stars associated with the estimates
of the coefficients, shows that distance and
climb are both important explanatory
variables. (This can be confirmed by noting the
very much poorer fits obtained if either of these
variables is omitted).
12
gt plot(hills.model.1) produces
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
The pattern of residuals leads us to suspect
that there may be some nonlinear dependence on
climb and/or distance. This would be physically
quite natural. It here seems reasonable to
introduce quadratic terms as a first attempt to
model any nonlinearity.
17
We consider now the (quite elaborate) model
(Model2) time a0 x distance b0 x
(distance)2 c0 x climb d0 x(climb)2 e
18
(No Transcript)
19
The fitted model is now time5.62xdistance0.0323
x(distance)2 0.000262xclimb0.00000180x(climb)2e

20
The analysis, most notably star values
associated with the estimate of the coefficient
of (climb)2, shows that there is indeed evidence
of nonlinearity in the dependence on climb, and
(given also physical considerations) quite
possibly in the dependence on distance.
21
(No Transcript)
22
(No Transcript)
23
The pattern of residuals is now more randomly
spread, indicating a better model than the fisrt
one.
24
Finally, the residuals of model 1 can be plotted
against those of model 2.
25
This suggests that Model 2 is a considerable
improvement, at least insofar as it reduces the
large residuals associated with the 3 labelled
observations. The observations corresponding to
Bens of Jura and Lairig Ghru remain moderately
influential.
Write a Comment
User Comments (0)
About PowerShow.com