The Examination of Residuals - PowerPoint PPT Presentation

1 / 132
About This Presentation
Title:

The Examination of Residuals

Description:

Each pattern may indicate that a different assumption may have to be made to ... Pattern a) indicates that the variance the random departures is not constant ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 133
Provided by: lave9
Category:

less

Transcript and Presenter's Notes

Title: The Examination of Residuals


1
The Examination of Residuals
2
  • The residuals are defined as the n differences

3
  • Many of the statistical procedures used in linear
    and nonlinear regression analysis are based
    certain assumptions about the random departures
    from the proposed model.
  • Namely the random departures are assumed
  • i) to have zero mean,
  • ii) to have a constant variance, s2,
  • iii) independent, and
  • iv) follow a normal distribution.

4
  • Thus if the fitted model is correct,
  • the residuals should exhibit tendencies that tend
    to confirm the above assumptions, or at least,
    should not exhibit a denial of the assumptions.

5
  • The principal ways of plotting the residuals ei
    are

1. Overall.
2. In time sequence, if the order is known.
3. Against the fitted values
4. Against the independent variables xij for
each value of j
In addition to these basic plots, the residuals
should also be plotted 5. In any way that is
sensible for the particular problem under
consideration,
6
Overall Plot
  • The residuals can be plotted in an overall plot
    in several ways.

7
1. The scatter plot.
2. The histogram.
3. The box-whisker plot.
4. The kernel density plot
5. a normal plot or a half normal plot on
standard probability paper.
8
  • The standard statistical test for testing
    Normality are

1. The Kolmogorov-Smirnov test.
2. The Chi-square goodness of fit test
9
  • The Kolmogorov-Smirnov test
  • The Kolmogorov-Smirnov uses the empirical
    cumulative distribution function as a tool for
    testing the goodness of fit of a distribution.
  • The empirical distribution function is defined
    below for n random observations

Fn(x) the proportion of observations in the
sample that are less than or equal to x.
10
  • Let F0(x) denote the hypothesized cumulative
    distribution function of the population (Normal
    population if we were testing normality)

If F0(x) truly represented distribution of
observations in the population than Fn(x) will be
close to F0(x) for all values of x.
11
  • The Kolmogorov-Smirinov test statistic is

the maximum distance between Fn(x) and F0(x).
  • If F0(x) does not provide a good fit to the
    distributions of the observation - Dn will be
    large.
  • Critical values for are given in many texts

12
  • The Chi-square goodness of fit test
  • The Chi-square test uses the histogram as a tool
    for testing the goodness of fit of a
    distribution.
  • Let fi denote the observed frequency in each of
    the class intervals of the histogram.
  • Let Ei denote the expected number of observation
    in each class interval assuming the hypothesized
    distribution.

13
  • The hypothesized distribution is rejected if the
    statistic
  • is large. (greater than the critical value from
    the chi-square distribution with m - 1 degrees of
    freedom.
  • m the number of class intervals used for
    constructing the histogram).

14
  • Note.

The in the above tests it is assumed that the
residuals are independent with a common variance
of s2.
This is not completely accurate for this reason
Although the theoretical random errors ei are all
assumed to be independent with the same variance
s2, the residuals are not independent and they
also do not have the same variance.
15
  • They will however be approximately independent
    with common variance if the sample size is large
    relative to the number of parameters in the model.

It is important to keep this in mind when judging
residuals when the number of observations is
close to the number of parameters in the model.
16
  • Time Sequence Plot

The residuals should exhibit a pattern of
independence.
If the data was collected in time there could be
a strong possibility that the random departures
from the model are autocorrelated.
17
  • Namely the random departures for observations
    that were taken at neighbouring points in time
    are autocorrelated.

This autocorrelation can sometimes be seen in a
time sequence plot.
The following three graphs show a sequence of
residuals that are respectively i) positively
autocorrelated , ii) independent and iii)
negatively autocorrelated.
18
i) Positively auto-correlated residuals
19
ii) Independent residuals
20
iii) Negatively auto-correlated residuals
21
  • There are several statistics and statistical
    tests that can also pick out autocorrelation
    amongst the residuals. The most common are

i) The Durbin Watson statistic
ii) The autocorrelation function
iii) The runs test
22
  • The Durbin Watson statistic

The Durbin-Watson statistic which is used
frequently to detect serial correlation is
defined by the following formula
If the residuals are serially correlated the
differences, ei - ei1, will be stochastically
small. Hence a small value of the Durbin-Watson
statistic will indicate positive autocorrelation.
Large values of the Durbin-Watson statistic on
the other hand will indicate negative
autocorrelation. Critical values for this
statistic, can be found in many statistical
textbooks.
23
  • The autocorrelation function

The autocorrelation function at lag k is defined
by
This statistic measures the correlation between
residuals the occur a distance k apart in time.
One would expect that residuals that are close in
time are more correlated than residuals that are
separated by a greater distance in time. If the
residuals are independent than rk should be close
to zero for all values of k A plot of rk versus k
can be very revealing with respect to the
independence of the residuals. Some typical
patterns of the autocorrelation function are
given below
24
  • This statistic measures the correlation between
    residuals the occur a distance k apart in time.

One would expect that residuals that are close
in time are more correlated than residuals that
are separated by a greater distance in time.
If the residuals are independent than rk should
be close to zero for all values of k A plot of rk
versus k can be very revealing with respect to
the independence of the residuals.
25
  • Some typical patterns of the autocorrelation
    function are given below

Auto correlation pattern for independent
residuals
26
  • Various Autocorrelation patterns for serially
    correlated residuals

27
(No Transcript)
28
  • The runs test

This test uses the fact that the residuals will
oscillate about zero at a normal rate if the
random departures are independent.
If the residuals oscillate slowly about zero,
this is an indication that there is a positive
autocorrelation amongst the residuals.
If the residuals oscillate at a frequent rate
about zero, this is an indication that there is a
negative autocorrelation amongst the residuals.
29
  • In the runs test, one observes the time
    sequence of the sign of the residuals

- - - - -
and counts the number of runs (i.e. the number of
periods that the residuals keep the same sign).
This should be low if the residuals are
positively correlated and high if negatively
correlated.
30
  • Plot Against fitted values and the Predictor
    Variables Xij

If we "step back" from this diagram and the
residuals behave in a manner consistent with the
assumptions of the model we obtain the impression
of a horizontal "band " of residuals which can be
represented by the diagram below.
31
  • Individual observations lying considerably
    outside of this band indicate that the
    observation may be and outlier.

An outlier is an observation that is not
following the normal pattern of the other
observations.
Such an observation can have a considerable
effect on the estimation of the parameters of a
model.
Sometimes the outlier has occurred because of a
typographical error. If this is the case and it
is detected than a correction can be made.
If the outlier occurs for other (and more
natural) reasons it may be appropriate to
construct a model that incorporates the
occurrence of outliers.
32
  • If our "step back" view of the residuals
    resembled any of those shown below we should
    conclude that assumptions about the model are
    incorrect. Each pattern may indicate that a
    different assumption may have to be made to
    explain the abnormal residual pattern.

b)
a)
33
  • Pattern a) indicates that the variance the random
    departures is not constant (homogeneous) but
    increases as the value along the horizontal axis
    increases (time, or one of the independent
    variables).

This indicates that a weighted least squares
analysis should be used.
The second pattern, b) indicates that the mean
value of the residuals is not zero.
This is usually because the model (linear or non
linear) has not been correctly specified.
Linear and quadratic terms have been omitted that
should have been included in the model.
34
Example Analysis of Residuals
  • Motor Vehicle Data
  • Dependent mpg
  • Independent Engine size, horsepower and weight

35
  • When a linear model was fit and residuals
    examined graphically the following plot resulted

36
The pattern that we are looking for is
37
  • The pattern that was found is

This indicates a nonlinear relationship
This can be handle by adding polynomial terms
(quadratic, cubic, quartic etc.) of the
independent variables or transforming the
dependent variable
38
  • Performing the log transformation on the
    dependent variable (mpg) results in the following
    residual plot

There still remains some non linearity
39
The log transformation
40
The Box-Cox transformations
l 2
l 1
l 0
l -1
l -1
41
  • The log (l 0) transformation was not totally
    successful - try moving further down the
    staircase of the family of transformations
  • (l -0.5)

42
  • try moving a bit further down the staircase of
    the family of transformations (l -1.0)

43
  • The results after deleting the outlier are given
    below

44
  • This corresponds to the model

or
and
45
  • Checking normality with a P-P plot

46
Example
  • Non-Linear Regression

47
  • In this example we are measuring the amount of a
    compound in the soil
  • 7 days after application
  • 14 days after application
  • 21 days after application
  • 28 days after application
  • 42 days after application
  • 56 days after application
  • 70 days after application
  • 84 days after application

48
  • This is carried out at two test plot locations
  • Craik
  • Tilson

6 measurements per location are made each time
49
The data
50
Graph
51
  • The Model Exponential decay with nonzero
    asymptote

a
c
52
Some starting values of the parameters found by
trial and error by Excel
53
Non Linear least squares iteration by SPSS (Craik)
54
ANOVA Table (Craik)
Parameter Estimates (Craik)
55
Testing Hypothesis similar to linear regression
Caution This statistic has only an approximate F
distribution when the sample size is large
56
  • Example Suppose we want to test
  • H0 c 0 against HA c ? 0

Complete model
Reduced model
57
ANOVA Table (Complete model)
ANOVA Table (Reduced model)
58
The Test
59
Use of Dummy Variables
  • Non Linear Regression

60
  • The Model

or
where
61
The data file
62
Non Linear least squares iteration by SPSS
63
ANOVA Table
Parameter Estimates
64
Testing Hypothesis
Suppose we want to test H0 Da a1 a2 0 and
Dk k1 k2 0
65
  • The Reduced Model

or
66
ANOVA Table
Parameter Estimates
67
The F Test
Thus we accept the null Hypothesis that the
reduced model is correct
68
(No Transcript)
69
Factorial Experiments
  • Analysis of Variance
  • Experimental Design

70
  • Dependent variable Y
  • k Categorical independent variables A, B, C,
    (the Factors)
  • Let
  • a the number of categories of A
  • b the number of categories of B
  • c the number of categories of C
  • etc.

71
The Completely Randomized Design
  • We form the set of all treatment combinations
    the set of all combinations of the k factors
  • Total number of treatment combinations
  • t abc.
  • In the completely randomized design n
    experimental units (test animals , test plots,
    etc. are randomly assigned to each treatment
    combination.
  • Total number of experimental units N ntnabc..

72
The treatment combinations can thought to be
arranged in a k-dimensional rectangular block
B
1
2
b
1
2
A
a
73
C
B
A
74
  • The Completely Randomized Design is called
    balanced
  • If the number of observations per treatment
    combination is unequal the design is called
    unbalanced. (resulting mathematically more
    complex analysis and computations)
  • If for some of the treatment combinations there
    are no observations the design is called
    incomplete. (some of the parameters - main
    effects and interactions - cannot be estimated.)

75
Example
  • In this example we are examining the effect of
  • tThe level of protein A (High or Low) and
  • tThe source of protein B (Beef, Cereal, or Pork)
    on weight gains (grams) in rats.

We have n 10 test animals randomly assigned to
k 6 diets
76
The k 6 diets are the 6 32 Level-Source
combinations
  • High - Beef
  • High - Cereal
  • High - Pork
  • Low - Beef
  • Low - Cereal
  • Low - Pork

77
Table Gains in weight (grams) for rats under six
diets differing in level of protein (High or
Low) and s ource of protein (Beef, Cereal, or
Pork)
Level
of Protein High Protein Low protein
Source of Protein Beef Cereal Pork Beef Cereal P
ork
Diet 1 2 3 4 5 6
73 98 94 90 107 49 102 74 79 76 95 82 118 56
96 90 97 73 104 111 98 64 80 86 81 95 102 86
98 81 107 88 102 51 74 97 100 82 108 72 74 106
87 77 91 90 67 70 117 86 120 95 89 61 111 9
2 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std.
Dev. 15.14 15.02 10.92 13.89 15.71 16.55
78
Example Four factor experiment
  • Four factors are studied for their effect on Y
    (luster of paint film). The four factors are

1) Film Thickness - (1 or 2 mils)
2) Drying conditions (Regular or Special)
3) Length of wash (10,30,40 or 60 Minutes), and
4) Temperature of wash (92 C or 100 C)
Two observations of film luster (Y) are taken for
each treatment combination
79
  • The data is tabulated below
  • Regular Dry Special Dry
  • Minutes 92 ?C 100 ?C 92?C 100 ?C
  • 1-mil Thickness
  • 20 3.4 3.4 19.6 14.5 2.1 3.8 17.2 13.4
  • 30 4.1 4.1 17.5 17.0 4.0 4.6 13.5 14.3
  • 40 4.9 4.2 17.6 15.2 5.1 3.3 16.0 17.8
  • 60 5.0 4.9 20.9 17.1 8.3 4.3 17.5 13.9
  • 2-mil Thickness
  • 20 5.5 3.7 26.6 29.5 4.5 4.5 25.6 22.5
  • 30 5.7 6.1 31.6 30.2 5.9 5.9 29.2 29.8
  • 40 5.5 5.6 30.5 30.2 5.5 5.8 32.6 27.4
  • 60 7.2 6.0 31.4 29.6 8.0 9.9 33.5 29.5

80
Notation
  • Let the single observations be denoted by a
    single letter and a number of subscripts
  • yijk..l
  • The number of subscripts is equal to
  • (the number of factors) 1
  • 1st subscript level of first factor
  • 2nd subscript level of 2nd factor
  • Last subsrcript denotes different observations on
    the same treatment combination

81
Notation for Means
  • When averaging over one or several subscripts we
    put a bar above the letter and replace the
    subscripts by ?
  • Example
  • y241? ?

82
Profile of a Factor
  • Plot of observations means vs. levels of the
    factor.
  • The levels of the other factors may be held
    constant or we may average over the other levels

83
Summary Table
Source of Protein
Level of Protein Beef Cereal Pork Overall
High 100.00 85.90 99.50 95.13
Low 79.20 83.90 78.70 80.60
Overall 89.60 84.90 89.10 87.87
84
Profiles of Weight Gain for Source and Level of
Protein
85
Profiles of Weight Gain for Source and Level of
Protein
86
Effects in a factorial Experiment
87
  • Mean
  • 87.867
  •  

88
  • Main Effects for Factor A (Source of Protein)
  • Beef Cereal Pork
  • 1.733 -2.967 1.233

89
  • Main Effects for Factor B (Level of Protein)
  • High Low
  • 7.267 -7.267
  •  

90
  • AB Interaction Effects
  • Source of Protein
  • Beef Cereal Pork
  • Level High 3.133 -6.267 3.133
  • of Protein Low -3.133 6.267 -3.133

91
(No Transcript)
92
Example 2
  • Paint Luster Experiment

93
(No Transcript)
94
Table Means and Cell Frequencies
95
Means and Frequencies for the AB Interaction
(Temp - Drying)
96
Profiles showing Temp-Dry Interaction
97
Means and Frequencies for the AD Interaction
(Temp- Thickness)
98
Profiles showing Temp-Thickness Interaction
99
The Main Effect of C (Length)
100
(No Transcript)
101
Additive Factors
B
A
102
Interacting Factors
B
A
103
Models for factorial Experiments
  • Single Factor
  • yij m ai eij i 1,2, ... ,a j 1,2,
    ... ,n
  • Two Factor
  • yijk m ai bj (ab)ij eijk
  •  
  • i 1,2, ... ,a j 1,2, ... ,b k 1,2, ...
    ,n

104
  • Three Factor
  • yijkl m ai bj (ab)ij gk (ag)ik
    (bg)jk (abg)ijk eijkl
  • m ai bj gk (ab)ij (ag)ik (bg)jk
    (abg)ijk eijkl
  •  
  • i 1,2, ... ,a j 1,2, ... ,b k 1,2, ...
    ,c l 1,2, ... ,n

105
  • Four Factor
  • yijklm m ai bj (ab)ij gk (ag)ik
    (bg)jk (abg)ijk dl (ad)il (bd)jl (abd)ijl
    (gd)kl (agd)ikl (bgd)jkl (abgd)ijkl
    eijklm
  • m ai bj gk dl (ab)ij (ag)ik (bg)jk
    (ad)il (bd)jl (gd)kl (abg)ijk (abd)ijl
    (agd)ikl (bgd)jkl (abgd)ijkl eijklm
  •  
  • i 1,2, ... ,a j 1,2, ... ,b k 1,2, ...
    ,c l 1,2, ... ,d m 1,2, ... ,n
  • where 0 S ai S bj S (ab)ij S gk S (ag)ik
    S(bg)jk S (abg)ijk S dl S (ad)il S (bd)jl
    S (abd)ijl S (gd)kl S (agd)ikl S (bgd)jkl
    S (abgd)ijkl
  • and S denotes the summation over any of the
    subscripts.

106
Estimation of Main Effects and Interactions
  • Estimator of Main effect of a Factor

Mean at level i of the factor - Overall Mean
  • Estimator of k-factor interaction effect at a
    combination of levels of the k factors

Mean at the combination of levels of the k
factors - sum of all means at k-1 combinations
of levels of the k factors sum of all means at
k-2 combinations of levels of the k factors - etc.
107
Example
  • The main effect of factor B at level j in a four
    factor (A,B,C and D) experiment is estimated by
  • The two-factor interaction effect between factors
    B and C when B is at level j and C is at level k
    is estimated by

108
  • The three-factor interaction effect between
    factors B, C and D when B is at level j, C is at
    level k and D is at level l is estimated by
  • Finally the four-factor interaction effect
    between factors A,B, C and when A is at level i,
    B is at level j, C is at level k and D is at
    level l is estimated by

109
  • Definition
  • A factor is said to not affect the response if
    the profile of the factor is horizontal for all
    combinations of levels of the other factors
  • No change in the response when you change the
    levels of the factor (true for all combinations
    of levels of the other factors)
  • Otherwise the factor is said to affect the
    response

110
  • Definition
  • Two (or more) factors are said to interact if
    changes in the response when you change the level
    of one factor depend on the level(s) of the other
    factor(s).
  • Profiles of the factor for different levels of
    the other factor(s) are not parallel
  • Otherwise the factors are said to be additive .
  • Profiles of the factor for different levels of
    the other factor(s) are parallel.

111
  • If two (or more) factors interact each factor
    effects the response.
  • If two (or more) factors are additive it still
    remains to be determined if the factors affect
    the response
  • In factorial experiments we are interested in
    determining
  • which factors effect the response and
  • which groups of factors interact .

112
  • The testing in factorial experiments
  • Test first the higher order interactions.
  • If an interaction is present there is no need to
    test lower order interactions or main effects
    involving those factors. All factors in the
    interaction affect the response and they interact
  • The testing continues with for lower order
    interactions and main effects for factors which
    have not yet been determined to affect the
    response.

113
Models for factorial Experiments
  • Single Factor
  • yij m ai eij i 1,2, ... ,a j 1,2,
    ... ,n
  • Two Factor
  • yijk m ai bj (ab)ij eijk
  •  
  • i 1,2, ... ,a j 1,2, ... ,b k 1,2, ...
    ,n

114
  • Three Factor
  • yijkl m ai bj (ab)ij gk (ag)ik
    (bg)jk (abg)ijk eijkl
  • m ai bj gk (ab)ij (ag)ik (bg)jk
    (abg)ijk eijkl
  •  
  • i 1,2, ... ,a j 1,2, ... ,b k 1,2, ...
    ,c l 1,2, ... ,n

115
  • Four Factor
  • yijklm m ai bj (ab)ij gk (ag)ik
    (bg)jk (abg)ijk dl (ad)il (bd)jl (abd)ijl
    (gd)kl (agd)ikl (bgd)jkl (abgd)ijkl
    eijklm
  • m ai bj gk dl (ab)ij (ag)ik (bg)jk
    (ad)il (bd)jl (gd)kl (abg)ijk (abd)ijl
    (agd)ikl (bgd)jkl (abgd)ijkl eijklm
  •  
  • i 1,2, ... ,a j 1,2, ... ,b k 1,2, ...
    ,c l 1,2, ... ,d m 1,2, ... ,n
  • where 0 S ai S bj S (ab)ij S gk S (ag)ik
    S(bg)jk S (abg)ijk S dl S (ad)il S (bd)jl
    S (abd)ijl S (gd)kl S (agd)ikl S (bgd)jkl
    S (abgd)ijkl
  • and S denotes the summation over any of the
    subscripts.

116
Estimation of Main Effects and Interactions
  • Estimator of Main effect of a Factor

Mean at level i of the factor - Overall Mean
  • Estimator of k-factor interaction effect at a
    combination of levels of the k factors

Mean at the combination of levels of the k
factors - sum of all means at k-1 combinations
of levels of the k factors sum of all means at
k-2 combinations of levels of the k factors - etc.
117
Example
  • The main effect of factor B at level j in a four
    factor (A,B,C and D) experiment is estimated by
  • The two-factor interaction effect between factors
    B and C when B is at level j and C is at level k
    is estimated by

118
  • The three-factor interaction effect between
    factors B, C and D when B is at level j, C is at
    level k and D is at level l is estimated by
  • Finally the four-factor interaction effect
    between factors A,B, C and when A is at level i,
    B is at level j, C is at level k and D is at
    level l is estimated by

119
Anova Table entries
  • Sum of squares interaction (or main) effects
    being tested ? (product of sample size and levels
    of factors not included in the interaction)
  • Degrees of freedom df product of (number of
    levels - 1) of factors included in the
    interaction.

120
  • Mean
  • 87.867
  •  

121
  • Main Effects for Factor A (Source of Protein)
  • Beef Cereal Pork
  • 1.733 -2.967 1.233

122
  • Main Effects for Factor B (Level of Protein)
  • High Low
  • 7.267 -7.267
  •  

123
  • AB Interaction Effects
  • Source of Protein
  • Beef Cereal Pork
  • Level High 3.133 -6.267 3.133
  • of Protein Low -3.133 6.267 -3.133

124
(No Transcript)
125
(No Transcript)
126
Table Means and Cell Frequencies
127
Means and Frequencies for the AB Interaction
(Temp - Drying)
128
Profiles showing Temp-Dry Interaction
129
Means and Frequencies for the AD Interaction
(Temp- Thickness)
130
Profiles showing Temp-Thickness Interaction
131
The Main Effect of C (Length)
132
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com