Topic 9: Remedies - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Topic 9: Remedies

Description:

... test that requires a table look-up. Tests for normality ... 5, Y' = Y1/2, square root = -.5, Y' = Y-1/2, one over square root = -1, Y' = Y-1 = 1/Y, inverse ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 39
Provided by: georgep56
Category:
Tags: remedies | root | square | table | topic

less

Transcript and Presenter's Notes

Title: Topic 9: Remedies


1
Topic 9 Remedies
2
Outline
  • Review diagnostics for residuals
  • Discuss remedies
  • Nonlinear relationship
  • Nonconstant variance
  • Nonnormal distribution
  • Outliers

3
Diagnostics for residuals
  • Look at residuals to find serious violations of
    the model assumptions
  • nonlinear relationship
  • nonconstant variance
  • nonnormal errors
  • presence of outliers
  • a strongly skewed distribution

4
Recommendations for checking assumptions
  • Plot Y vs X (is it a linear relationship?)
  • Look at distribution of residuals
  • Plot residuals vs X, time, or any other potential
    explanatory variable
  • Use the ism in symbol statement to get
    smoothed curves

5
Plots of Residuals
  • Plot residuals vs
  • Time (order)
  • Explanatory variables
  • Look for
  • nonrandom patterns
  • outliers (unusual observations)

6
Residuals vs Order
  • Pattern in plot suggests dependent errors / lack
    of indep
  • Pattern usually a linear or quadratic trend
    and/or cyclical
  • If you are interested read NKNW pp 104-105

7
Tests for normality
  • H0 data are an i.i.d. sample from a normal
    population
  • Ha data are not an i.i.d. sample from a normal
    population
  • NKNW (p 111) suggest a correlation test that
    requires a table look-up

8
Tests for normality
  • We have several choices for a significance
    testing procedure
  • Proc univariate with the normal option provides
    four
  • proc univariate normal
  • Shapiro-Wilk is a common choice

9
Test Shapiro-Wilk W
Kolmogorov-Smirnov D Cramer-von Mises
W-Sq Anderson-Darling A-Sq statistic
-----p Value------ 0.978 Pr lt W
0.8626 0.095 Pr gt D gt0.1500 0.033
Pr gt W-Sq gt0.2500 0.207 Pr gt A-Sq gt0.2500
10
Other tests for model assumptions
  • Durbin-Watson test for serially correlated errors
    (NKNW p 110)
  • Modified Levene test for homogeneity of variance
    (NKNW p 112-114)
  • Breusch-Pagan test for homogeneity of variance
    (NKNW p 115)
  • For SAS commands see nknw110.sas

11
Plots vs significance test
  • Plots are more likely to suggest a remedy
  • Significance tests results are very dependent on
    the sample size with sufficiently large samples
    we can reject most null hypotheses

12
Lack of fit
  • When we have repeat observations at different
    values of X, we can do a significance test for
    nonlinearity
  • Browse through NKNW 3.7
  • We will do details when we get to NKNW 17.9, p
    742
  • Basic idea is to compare two models
  • Gplot with a smooth is a better (i.e., simpler)
    approach

13
Nonlinear relationships
  • We can model many nonlinear relationships with
    linear models, some have several explanatory
    variables (i.e., multiple linear regression)
  • Y ß0 ß1X ß2X2 ? (quadratic)
  • Y ß0 ß1log(X) ?

14
Nonlinear Relationships
  • Sometimes can transform a nonlinear equation into
    a linear equation
  • Consider Y ß0exp(ß1X) ?
  • Can form linear model using log
  • log(Y) log(ß0) ß1X ?
  • Note that we have changed our assumption about
    the error

15
Nonlinear Relationship
  • We can perform a nonlinear regression analysis
  • NKNW Chapter 13
  • SAS PROC NLIN

16
Nonconstant variance
  • Sometimes we model the way in which the error
    variance changes
  • may be linearly related to X
  • We can then use a weighted analysis
  • NKNW 10.1
  • Use a weight statement in PROC REG

17
Nonnormal errors
  • Transformations often help
  • Use a procedure that allows different
    distributions for the error term
  • SAS PROC GENMOD

18
GENMOD
  • Possible distributions of Y
  • Binomial (Y/N or percentage data)
  • Poisson (Count data)
  • Gamma (exponential)
  • Inverse gaussian
  • Negative binomial
  • Multinomial
  • Specify a link function for E(Y)

19
Ladder of Reexpression(transformations)
1.5
p
Transformation is xp
1.0
0.5
0.0
-0.5
-1.0
20
Circle of Transformations
X up, Y up
X down, Y up
Y
X
X up, Y down
X down, Y down
21
Box-Cox Transformations
  • Also called power transformations
  • These transformations adjust for nonnormality and
    nonconstant variance
  • Y Y? or Y (Y? - 1)/?
  • In the second form, the limit as ? approaches
    zero is the (natural) log

22
Important Special Cases
  • ? 1, Y Y1, no transformation
  • ? .5, Y Y1/2, square root
  • ? -.5, Y Y-1/2, one over square root
  • ? -1, Y Y-1 1/Y, inverse
  • ? 0, Y (natural) log of Y

23
Box-Cox Details
  • We can estimate ? by including it as a parameter
    in a non linear model
  • Y? ß0 ß1X ?
  • and using the method of maximum likelihood
  • Details are in NKMW p 132-133
  • SAS code is in nknw132.sas

24
Box-Cox Solution
  • Standardized transformed Y is
  • K1(Y? - 1) if ? ? 0
  • K2log(Y) if ? 0
  • where K2 (? Yi)1/n (the geometric mean)
  • and K1 1/ (? K2 ?-1)
  • Run regressions with X as explanatory variable
  • estimated ? minimizes SSE

25
data a1 input age plasma _at__at_ cards 0 13.44 0
12.84 0 11.91 0 20.09 0 15.60 1 10.11 1 11.38 1
10.28 1 8.96 1 8.59 2 9.83 2 9.00 2 8.65 2
7.85 2 8.88 3 7.94 3 6.01 3 5.14 3 6.90 3
6.77 4 4.86 4 5.10 4 5.67 4 5.75 4 6.23
26
(No Transcript)
27
The first part of the program gets the geometric
mean data a2 set a1 lplasmalog(plasma)
proc univariate dataa2 noprint var lplasma
output outa3 meanmeanl
28
data a4 set a2 if _n_ eq 1 then set a3
keep age yl l k2exp(meanl) do l -1.0
to 1.0 by .1 k11/(lk2(l-1))
ylk1(plasmal -1) if abs(l) lt 1E-8 then
ylk2log(plasma) output end
29
proc sort dataa4 outa4 by l proc reg
dataa4 noprint outesta5 model ylage
by l data a5 set a5 n25 p2
sse(n-p)(_rmse_)2 proc print dataa5
var l sse
30
Obs l sse 1 -1.0 33.9089 2
-0.9 32.7044 3 -0.8 31.7645 4
-0.7 31.0907 5 -0.6 30.6868 6
-0.5 30.5596 7 -0.4 30.7186 8
-0.3 31.1763 9 -0.2 31.9487 10
-0.1 33.0552
31
symbol1 vnone ijoin proc gplot dataa5
plot ssel run
32
(No Transcript)
33
data a1 set a1 tplasma plasma(-.5) tage
(age.5)(-.5) symbol1 vcircle ism50
proc gplot plot tplasmaage proc sort by
tage proc gplot plot tplasmatage run
34
(No Transcript)
35
(No Transcript)
36
Box Cox Procedure
There is a fairly new procedure that will find
the box-cox transformation proc transreg
dataa1 model boxcox(plasma)identity(age) run
37
Transformation
Information for BoxCox(plasma)
Lambda R-Square Log Like
-2.50 0.76
-17.0444 -2.00
0.80 -12.3665
-1.50 0.83 -8.1127
-1.00 0.86
-4.8523 -0.50
0.87 -3.5523 lt
0.00 0.85 -5.0754
0.50 0.82
-9.2925 1.00
0.75 -15.2625
1.50 0.67 -22.1378
2.00 0.59
-29.4720 2.50
0.50 -37.0844
lt - Best
Lambda -
Confidence Interval
- Convenient Lambda
38
Background Reading
  • Sections 3.4 - 3.7 describe significance tests
    for assumptions (read it if you are interested).
  • Box-Cox transformation is in nknw132.sas
  • Read sections 4.1, 4.2, 4.4, 4.5, and 4.6
Write a Comment
User Comments (0)
About PowerShow.com