Worked%20Example - PowerPoint PPT Presentation

About This Presentation
Title:

Worked%20Example

Description:

Especially for linear models, influence is often measured by Cook's distance. Cook's Distance Formula ... an unusually large value of Cook's distance D3 = 1.39. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 52
Provided by: jphil
Category:

less

Transcript and Presenter's Notes

Title: Worked%20Example


1
Worked Example
  • Using R

2
(No Transcript)
3
gt plot(yx)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
gtplot(epsilon1x)
This is a plot of residuals against the
exploratory variable, x
10
gtplot(epsilon1yhat)
This is a plot of residuals against the fitted
values, yhat.
11
Both graphs show the same thing the residuals
are following a random pattern. Note Since the
equation is approximately yx, both graphs are
extremely similar in this case.
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Model Diagnostics Residuals and Influence
18
Consider again the problem of fitting the
model yi f(xi) ei i 1,.n Assume
again a single continuous response variable y.
The explanatory variable x may be either a
single variable, or a vector of variables. How do
we assess the quality of a given fit f?
19
While summary statistics are helpful, they are
not sufficient. Good diagnostics are typically
based on case analysis, i.e. an examination of
each observation in turn in relation to the
fitting procedure. This leads to an examination
of residuals and influence.
20
Residuals
The residuals should be thought of as what is
left of the values of the response variable after
the fit has been subtracted. Ideally they should
show no further dependence (especially no further
location dependence) on x.
21
In general this should be investigated
graphically by plotting residuals against the
explanatory variable(s) x. For linear models, we
frequently compromise by plotting residuals
against fitted values.
22
In particular the residuals provide information
about whether the best relation has been
fitted the relative merits of different
fits mild, but non-random, departures from the
hypothesised fit the magnitude of the residual
variation
23
the identification of outliers possible
further dependence on x, other than through
location, of the conditional distribution of y
given x - in particular heterogeneity of spread
of the residuals.
24
ExampleAnscombes Artificial Data
The R data frame anscombe is made available by gt
data(anscombe) This contains 4 artificial
datasets, each of 11 observations of a continuous
response variable y and a continuous explanatory
variable x. The data are now plotted along with
the result of the least squares linear model to
the corresponding dataset.
25
All the usual summary statistics related to the
classical analyses of the fitted models are
identical across the 4 datasets. This includes
the coefficients a and b and their standard
errors and confidence intervals, together with
the residual standard errors and correlation
coefficients.


26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Consideration of the residuals shows that very
different judgements should be made about the
appropriateness of the fitted model to each of
the 4 cases. A full discussion is given by
Weisberg (1985, pp107,108).
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
Influence
Influence measures the extent to which a fit is
affected by individual observations. A possible
formal definition is the following the influence
of any observation is a measure of the difference
between the fit and the fit which would be
obtained if that observation were omitted.
41
Obviously observations with large influences
require more careful checking. Especially for
linear models, influence is often measured by
Cook's distance.
42
Cooks Distance Formula
43
As a rule of thumb, observations for which Di gt
1 make a noticeable difference to the parameter
estimates, and should be examined carefully for
the appropriateness of their use in fitting the
model. Clearly an observation with a large
residual also has a large influence. However, an
observation with an unusual value of its
explanatory variable(s) can pull a fit towards it
and have a large influence though a small
residual.
44
Example Anscombe's third data set. The last
graph produced by the plot function shows that
the observation number 3 has an unusually large
value of Cook's distance D3 1.39.
gtplot(model3) produces
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
We now refit the data omitting this
observation. gtx5x3-3 gty5y3-3 gtmodel5lm(y5
x5)
50
(No Transcript)
51
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com