Topic 18: Remedies

About This Presentation

Title:

Topic 18: Remedies

Description:

Check normality of the residuals with a normal quantile plot ... If there appears to be a curvilinear pattern, generate the graphics version with ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 52

Provided by: georgep56

Learn more at: https://www.stat.purdue.edu

Category:

more less

Transcript and Presenter's Notes

Title: Topic 18: Remedies

1
Topic 18 Remedies
2
Outline

Regression diagnostics
Remedial Measures
Weighted regression
Ridge Regression
Robust Regression
Bootstrapping
Validation
Qualitative predictor
Piecewise Linear Model

3
Regression DiagnosticsSummary

Check normality of the residuals with a normal
quantile plot
Plot the residuals versus predicted values,
versus each of the Xs and (when appropriate)
versus time
Examine the partial regression plots
If there appears to be a curvilinear pattern,
generate the graphics version with a smooth

4
Regression DiagnosticsSummary

Examine
the studentized deleted residuals (RSTUDENT in
the output)
The hat matrix diagonals
Dffits, Cooks D, and the DFBETAS
Check observations that are extreme on these
measures relative to the other observations

5
Regression DiagnosticsSummary

Examine the tolerance for each X
If there are variables with low tolerance, you
need to do some model building
Recode variables
Variable selection

6
Remedial measures

Weighted least squares
Ridge regression
Robust regression
Nonparametric regression
Bootstrapping

7
Maximum Likelihood
8
Weighted regression

Maximization of L with respect to ßs
Equivalent to minimization of
Weight of each observation wi1/si2

9
Weighted least squares

Least squares problem is to minimize the sum of
wi times the squared residual for case i (similar
to MLE)
Computations are easy, use the weight statement
in proc reg
bw (X?WX)-1(X?WY)
where W is a diagonal matrix of the weights
The problem in many cases is to determine the
weights

10
Determination of weights

Find a relationship between the absolute residual
and another variable and use this as a model for
the standard deviation
Similarly for the squared residual and the
variance
Use grouped data or approximately grouped data to
estimate the variance

11
Determination of weights

With a model for the standard deviation or the
variance, we can approximate the optimal weights
Optimal weights are proportional to the inverse
of the variance

12
NKNW Example

NKNW p 406
Y is diastolic blood pressure
X is age
n 54 healthy adult women aged 20 to 60 years
old

13
Get the data and check it
data a1 infile ../data/ch10ta01.dat' input
age diast proc print dataa1 run
14
Plot the relationship
symbol1 vcircle ism70 proc gplot dataa1
plot diastage run
15
Diastolic bp vs age
16
Run the regression
proc reg dataa1 model diastage output
outa2 rresid run
17
Regression output
Source DF Value Pr gt F Model 1 35.79
lt.0001 Error 52 Total 53
18
Regression output
Root MSE 8.14 R-Square 0.40 Adj R-Sq 0.39 Dep
Mean 79.1 Coef Var 10.2
19
Regression output
Par St Var DF Est Err t P Int
1 56.1 3.9 14.06 lt.0001 age 1 0.58 .09 5.98
lt.0001
20
Use the output data set to get the absolute and
squared residuals
data a2 set a2 absrabs(resid)
sqrrresidresid
21
Do the plots with a smooth
proc gplot dataa2 plot (resid absr
sqrr)age run
22
Residuals vs age
23
Absolute value of the residuals vs age
24
Squared residuals vs age
25
Predict the standard deviation (absolute value of
the residual)
proc reg dataa2 model absrage output
outa3 pshat
Note that a3 has the predicted standard
deviations (shat)
26
Compute the weights
data a3 set a3 wt1/(shatshat)
27
Regression with weights
proc reg dataa3 model diastage / clb
weight wt run
28
Output
Source DF F P Model 1 56.64
lt.0001 Error 52 Total 53
29
Output
Root MSE 1.21302 R-Square
0.5214 Adj R-Sq 0.5122 Dependent Mean
73.55134 Coeff Var 1.64921
30
Output
Par St Var Est Err t P Int
55.5 2.5 22.04 lt.0001 age 0.59 0.07 7.53
lt.0001
31
Ridge regression

Similar to a very old idea in numerical analysis
If (X?X) is difficult to invert (near singular)
then approximate by inverting (X?XkI).
Estimators of coefficients are biased but more
stable.
For some value of k ridge regression estimator
has a smaller mean square error than ordinary
least square estimator.
Interesting but has not turned out to be a useful
method in practice .
Ridge k is an option for model statement .

32
Robust regression

Basic idea is to have a procedure that is not
sensitive to outliers
Alternatives to least squares, minimize
sum of absolute values of residuals
median of the squares of residuals
Do weighted regression with weights based on
residuals, and iterate

33
Nonparametric regression

Several versions
We have used ism70
Interesting theory
All versions have some smoothing parameter
similar to the 70 in ism70
Confidence intervals and significance tests not
fully developed

34
Bootstrap

Very important theoretical development that has
had a major impact on applied statistics
Based on simulation
Sample with replacement from the data or
residuals and get the distribution of the
quantity of interest
CI usually based on quantiles of the sampling
distribution

35
Model validation

Three approaches to checking the validity of the
model
Collect new data, does it fit the model
Compare with theory, other data, simulation
Use some of the data for the basic analysis and
some for validity check

36
One qualitative explanatory variable

Indicator (or dummy) variables have the value 0
when the quality is absent and 1 when the quality
is present
Examples include
Gender as an explanatory variable
Placebo versus control

37
Binary predictor

X1 has values 0 and 1 corresponding to two
different groups
X2 is a continuous variable
Y ß0 ß1X1 ß2X2 ß3X1X2 ?
For X1 0 Y ß0 ß2X2 ?
For X1 1 Y (ß0 ß1) (ß2 ß3)X2 ?

38
Binary predictor

H0 ß1 ß3 0 tests the hypothesis that the
lines are the same
H0 ß1 0 tests equal intercepts
H0 ß3 0 tests equal slopes

39
More models

If a categorical (qualitative) variable has
several k possible values we need k-1 indicator
variables
These can be defined in many different ways we
will do this in Chapter 16
We also can have several categorical explanatory
variables, interactions, etc

40
Piecewise Linear model

At some (known) point or points, the slope of the
relationship changes
Consider NKNW p 476 (n 8)
Y is unit cost
X1 is lot size
The slope is allowed to change at X1 500

41
Plot the data
42
Model

Our model has
An intercept
A coefficient for lot size (the slope)
An additional explanatory variable that will add
a constant to the slope whenever lot size is
greater than 500

43
Data step
Data a1 set a1 if lotsize le 500 then
cslope0 if lotsize gt 500 then
cslopelotsize-500
44
Check the data
Obs cost lotsize cslope 1 2.57
650 150 2 4.40 340 0
3 4.52 400 0 4 1.39 800
300 5 4.75 300 0 6
3.55 570 70 7 2.49 720
220 8 3.77 480 0
45
Run the regression
proc reg dataa1 model costlotsize cslope
output outa2 pcosthat run
46
Output
Source DF F Value Pr gt F Model 2 79.06
0.0002 Error 5 Total 7
47
Output
Root MSE 0.24494 R-Square
0.9693 Dependent Mean 3.43000 Adj R-Sq
0.9571 Coeff Var 7.14106
48
Output
Variable Est t P Int 5.89545
9.76 0.0002 lotsize -0.00395 -2.65 0.0454 cslope
-0.00389 -1.69 0.1528
49
Plot data with fit
symbol1 vcircle inone cblack symbol2 vnone
ijoin cblack proc sort dataa2 by
lotsize proc gplot dataa2 plot (cost
costhat)lotsize /overlay run
50
The plot
51
Background Reading

We used programs NKNW406.sas, NKNW459.sas, and
NKNW476.sas to generate the output for today.

Write a Comment

User Comments (0)

About PowerShow.com

Topic 18: Remedies - PowerPoint PPT Presentation

Topic 18: Remedies

Check normality of the residuals with a normal quantile plot ... If there appears to be a curvilinear pattern, generate the graphics version with ... – PowerPoint PPT presentation