Title: Diagnostics
1Diagnostics
- Checking Assumptions and Bad Data
2Questions
- What is the linearity assumption? How can you
tell if it seems met? - What is homoscedasticity (heteroscedasticity)?
How can you tell if its a problem? - What is an outlier?
- What is leverage?
- What is a residual?
- How can you use residuals in assuring that the
regression model is a good representation of the
data? - Why consider a standardized residual?
- What is a studentized residual?
3Linear Model
- Linear relations b/t X and Y
- Normal distribution of error of prediction
- Homoscedasticity (homogeneity of error in Y
across levels of X)
4Good-Looking Graph
No apparent departures from line.
5Same Data, Different Graph
No systematic relations between X and residuals.
6Problem with Linearity
7Problem with Heteroscedasticity
Common problem when Y
8Outliers
Outlier pathological point
9Review
- What is the linearity assumption? How can you
tell if it seems met? - What is homoscedasticity (heteroscedasticity)?
How can you tell if its a problem? - What is an outlier?
10Residuals
- Zresid
- Look for large values (some say zgt2)
- Studentized residual (Student Residual)
The studentized residual considers the distance
of the point from the mean. The farther X is
from the mean, the smaller the standard error and
the larger the residual. Look for large values.
Also, studentized deleted residual (RStudent).
11Influence Analysis
- Leverage
- Leverage is an index of the importance of an
observation to a regression analysis. - Function of X only
- Large deviations from mean are influential
- Maximum is 1 min is 1/N
- Average value is (k1)/N, where k is the number
of IVs
12Influence Analysis (2)
- DFBETA and standardized DFBETA
- Change in slope or intercept resulting when you
delete the ith person. - Allow for influence of both X and Y
13Example
r .82 r2 .67 p lt .05.
X Y
2 2
3 3
3 1
4 1
4 3
5 2
8 8
4.14 2.86
SX 1.95, SY 2.41
b1.01, a-1.34
M
14Example (2)
Y Pred Resid Student Residual Rstudent DFBETA a DFBETAb
2 .6875 1.3125 1.072 1.0923 .7577 -.6044
3 1.7 1.3 .962 .9526 .3943 -.2546
1 1.7 -.7 -.518 -.476 -.1970 .1272
1 2.7125 -1.7125 -1.224 -1.3086 -.2524 .0423
3 2.7125 .2875 .206 .1846 .0356 -.006
2 3.725 -1.725 -1.256 -1.3584 .0198 -.2681
8 6.7625 1.2375 1.803 2.7249 -3.5303 4.4807
15Remedies
- Fit Curves if needed.
- Note heteroscedasticity for applied problems.
- Investigate all outliers. May delete them or
not, depending. Report your actions.
16Review
- What is leverage?
- What is a residual?
- How can you use residuals in assuring that the
regression model is a good representation of the
data? - Why consider a standardized residual?
- What is a studentized residual?