Statistical, Practical, and Design Issues in Analysis with Missing Data

About This Presentation

Title:

Statistical, Practical, and Design Issues in Analysis with Missing Data

Description:

x 4 manifest variables. x 5 waves = 200 variables. 20,300 parameters ... same money. 3-Form Design. Student Received Item Set? X A B C. Form 1 yes yes yes NO ... – PowerPoint PPT presentation

Number of Views:374

Avg rating:3.0/5.0

Slides: 48

Provided by: christina59

Category:

more less

Transcript and Presenter's Notes

Title: Statistical, Practical, and Design Issues in Analysis with Missing Data

1
Statistical, Practical, and Design Issues in
Analysis with Missing Data

John Graham
The Methodology Center
Dept. of Biobehavioral Health
Penn State University
SRCD, Atlanta, April 9, 2005

2
Recent Missing Data Work

Collins, L. M., Schafer, J. L., Kam, C. M.
(2001). A comparison of inclusive and
restrictive strategies in modern missing data
procedures. Psychological Methods, 6, 330_351.
Little Rubin (2002). 2nd Edition
Schafer, J. L., Graham, J. W. (2002). Missing
data our view of the state of the art.
Psychological Methods, 7, 147-177.
Graham, J. W., Cumsille, P. E., Elek-Fisk, E.
(2003). Methods for handling missing data. In
J. A. Schinka W. F. Velicer (Eds.). Research
Methods in Psychology (pp. 87_114). Volume 2 of
Handbook of Psychology (I. B. Weiner,
Editor-in-Chief). New York John Wiley Sons.

3
Recent Planned Missingness Papers

Graham, J. W., Taylor, B. J., Cumsille, P. E.
(2001). Planned missing data designs in analysis
of change. In L. Collins A. Sayer (Eds.), New
methods for the analysis of change, (pp.
335-353). Washington, DC American Psychological
Association.
Graham, J. W., Taylor, B. J., Olchowski, A. E.,
Cumsille, P. E. (2005). Planned missing data
designs in psychological research. Submitted for
Publication.

4
Other Recent Papers

Graham, J. W. (2003). Adding missing-data
relevant variables to FIML-based structural
equation models. Structural Equation Modeling,
10, 80-100.
Graham, J. W., Schafer, J. L. (1999). On the
performance of multiple imputation for
multivariate data with small sample size. In R.
Hoyle (Ed.) Statistical Strategies for Small
Sample Research, (pp. 1-29). Thousand Oaks, CA
Sage.

5
Problem with Missing Data

Analysis procedures were designed for complete
data. . .

6
Solution 1

Design new procedures
Missing Data Parameter Estimation in One Step
Full Information Maximum Likelihood (FIML)SEM
and Other Latent Variable Programs(Amos, Mx,
LISREL, Mplus, LTA)

7
Solution 2

Missing data Multiple Imputation (MI)
Two Steps
Step 1 Replace Missing Values with Plausible
Values
Step 2 Analyze Data as if there were No Missing
Data

8
FAQ

Aren't you somehowhelping yourself with
imputation?. . .

9
NO. Missing data imputation . . .

does NOT give you something for nothing
DOES let you make use of all data you have
. . .

10
FAQ

Is the imputed value what the person would have
given?

11
NO. When we impute a value . . .

We do not impute for the sake of the value itself
We impute to preserve important characteristics
of the whole data set
. . .

12
We want . . .

unbiased parameter estimation
e.g., b-weights
Good estimate of variability
e.g., standard errors
best statistical power

13
MCAR(Missing Completely At Random)

MCAR 1 Cause of missingness completely random
process (like coin flip)
MCAR 2 Cause uncorrelated with variables of
interest
Example parents move
No bias if cause omitted

14
MAR (Missing At Random)

Missingness may be related to measured variables
But no residual relationship with unmeasured
variables
Example reading speed
No bias if you control for measured variables
(conditionally missing at random)

15
MNAR (Missing Not At Random)

Even after controlling for measured variables ...
Residual relationship with unmeasured variables
Example drug use reason for absence

16
MNAR Causes

The recommended methods assume data are MAR
Should these methods be used even when
assumptions not met?
. . .

17
YES! These Methods Work!

Suggested methods work better than old methods
Multiple causes of missingness
Only small part of missingness may be MNAR
Suggested methods usually work very well

18
Practical Issues

How much difference does it make?
How easy is the "sell"?
Which is better FIML or MI?
"Auxiliary" Variables (Collins, Schafer, Kam,
2001 Graham, 2003)
Small sample size (Graham Schafer, 1999)
Too many variables
Automation

19
Some Practical Issues
20
Practical IssuesBiggest problems in multiple
imputation

How do I write my data out of SPSS?
How can I use MI with ANOVA?
How do I use MI with SPSS, STATA, SUDAAN, EQS,
Mplus?
Is there a less tedious way?

21
Practical IssuesHow Easy is the "Sell"?

Sell is getting easier all the time
Pendulum starting to swing other way

22
Practical IssuesWhich is better -- MI or FIML?

MI is more generally applicable than FIML
In theory (in long run), they are equivalent
As practiced, they are sometimes a bit different

23
Recent Research (Collins, Schafer, Kam, 2001)
Shows ...

Include auxiliary variables in model
highly correlated with variables of interest
minimize loss of power
reduce bias (MNAR)

24
FIML versus MI (revisited)

In long run, FIML and MI equivalent
AS PRACTICED, MI is better
MI makes it easier to include auxiliary variables
(Collins, Schafer, Kam, 2001)
but models are available to allow auxiliary
variables with FIML
see Graham (2003, Structural Equation Modeling)

25
FIML versus MI (revisited)

In long run, FIML and MI equivalent
AS PRACTICED, FIML may be better
FIML estimates sometimes MAY have more power
but if you set m (n imputations) high enough (40
imputations ?)
power is equivalent

26
Practical IssuesSmall Sample Size

Graham Schafer (1999)
large regression model
N 50
50 missingness
Multiple Imputation (NORM) worked fine

27
Practical Issues Too Many Variables

Longitudinal research
10 variables x 5 waves 50 variables
1,325 parameters
10 latent variables x 4 manifest variables x
5 waves 200 variables
20,300 parameters

28
One Solution (borrow Steve Peck's slide)Impute
Scales Rather Than Items

Create scales based on partial data
Require 100 of items if Alpha .60
Require 80 of items if Alpha .70
Require 67 of items if Alpha .80
Require 50 of items if Alpha .90
Good solution for regression analyses
Not good solution for SEM

29
One Solution (borrow Steve Peck's slide)Impute
Scales Rather Than Items

Create scales based on partial data
Items
Alpha Required
_________ ________
lt.60 100
.70 80
.80 67
.90 50
Good solution for regression analyses
Not good solution for SEM

30
Too Many VariablesA Partial Solution

Normally cannot impute separately
assumes r 0 between sets of variables
What if r 0 is true?
(1) Separate variables so they are maximally
uncorrelated (principal components)
(2) Use factor scores to represent unused
variables

31
Practical IssuesBiggest Problems in Multiple
Imputation

NOT the theory
NOT imputing the data
NOT combining the results from m 20 analyses

32
Practical IssuesBiggest problems in multiple
imputation

How do I write my data out of SPSS?
How can I use MI with ANOVA?
How do I use MI with SPSS, STATA, SUDAAN, EQS,
Mplus?
Is there a less tedious way?

33
Now That We HaveGood Solutions for Missing Data
Analysis ...

Consider PLANNED missing data designs

34
Planned Missingness for Growth Modeling

see Graham, J. W., Taylor, B. J., Cumsille, P.
E. (2001). Planned missing data designs in
analysis of change. In L. Collins A. Sayer
(Eds.), New methods for the analysis of change,
(pp. 335-353). Washington, DC American
Psychological Association.

35
Design 1 all combinations of1 time missing (17
missing)

1 1 1 1 1 57 0 0 0 0 01 1 1 1 0 57 1 1 1 1
11 1 1 0 1 57 1 1 1 1 11 1 0 1 1 57 1 1 1 1
11 0 1 1 1 57 1 1 1 1 10 1 1 1 1 57 1 1 1 1
1 ___ N 342

36
Design 3 all combinations of 2 times missing
(36 missing)

1 1 1 1 1 31 0 0 0 0 0
1 1 1 0 0 31 0 0 0 0 0
1 1 0 1 0 31 0 0 0 0 0
1 0 1 1 0 31 0 0 0 0 0
0 1 1 1 0 31 1 1 1 1 1
1 1 0 0 1 31 1 1 1 1 1
1 0 1 0 1 31 1 1 1 1 1
0 1 1 0 1 31 1 1 1 1 1
1 0 0 1 1 31 1 1 1 1 1
0 1 0 1 1 31 1 1 1 1 1
0 0 1 1 1 31 1 1 1 1 1

37
SE for b-wt Predicting Slope
same money
Missing Data Designs
same SE
same SE
Complete Cases Designs
same money
Data Points
38
3-Form Design

Student Received Item Set?
----------------------------
X A B C
Form 1 yes yes yes NO
Form 2 yes yes NO yes
Form 3 yes NO yes yes
Form 4 yes yes yes yes

39
3-Form Design

Item Sets X A B C total 34 33 33 3
3 133
form X A B C1 34 33 33 0 1002 34 33
0 33 1003 34 0 33 33 100

40
3-Form Design Item Order

Form 1 X A BForm 2 X C AForm 3 X B C

41
3-Form Design Item Order

Form 1 X A B CForm 2 X C A BForm
3 X B C A

42
3-Form Design Item Order

Form 1 X A B CForm 2 X C A BForm
3 X B C A
Could pay some subjects to complete extra
questions

43
3-Form Design Item Order

Form 1 X A B CForm 2 X C A BForm
3 X B C A
Give questions as shown, measure reasons for
non-completion
poor reading
low motivation
"Managed" missingness

44
Expensive Measures II
Larger N, Less Expensive
r .30
Smaller N, More Expensive
45
Example Study

r -.30 (smoking and health)
Self-report Smoking
two items
Biochemical Smoking Measures
Expired Air CO
Saliva Cotinine

46
Example Study

15,050 for Measuring Smoking
Self-Reports 7.30 per subject
CO / Cotinine 16.78 per subject
self-reports bio-chem625 x 7.30
625 x 16.78 15,050
1200 x 7.30 375 x 16.78 15,050

47
D
C
A
B
E
48

Cheap Measures are Biased
A loadings .50, .70 (Cheap, Expensive)
B loadings .70, .70 (Cheap, Expensive)
C loadings .70, .50 (Cheap, Expensive)
D loadings .50, .50 (Cheap, Expensive)
Cheap Measures Not Biased
E loadings .50, .70 (Cheap, Expensive)

49
Research Examples

Smoking Research
less expensive Self-Reports
more expensive CO and Saliva Cotinine
Alcohol Research
less expensive Brief Self-reports
more expensive Time Line Follow Back

50
Research Examples

Blood Vessel Health (relevant for cardiovascular
health)
Ultrasound Flow-mediated dilation
150 per subject
BP approximation
pulse wave contour analysis
15 per subject

51
Research Examples

Nutrition Research
less expensive Brief Nutrition Survey
more expensive Extensive 24-hr Recall
Survey Research I
less expensive Brief Mail Survey
more expensive Extensive Face-to-Face
Interview

52
Research Examples

Exercise Research Physical Conditioning
less expensive Self-report survey
more expensive VO2-max
Survey Research II
less expensive Retrospective reports
more expensive Prospective reports

the end

54
Recent Papers

Schafer, J. L., Graham, J. W. (2002). Missing
data our view of the state of the art.
Psychological Methods, 7, 147-177.
Graham, J. W., Cumsille, P. E., Elek-Fisk, E.
(2003). Methods for handling missing data. In
J. A. Schinka W. F. Velicer (Eds.). Research
Methods in Psychology (pp. 87_114). Volume 2 of
Handbook of Psychology (I. B. Weiner,
Editor-in-Chief). New York John Wiley Sons.
Graham, J. W., Taylor, B. J., Olchowski, A. E.,
Cumsille, P. E. (2005). Planned missing data
designs in psychological research. Manuscript
submitted for publication.
http//mcgee.hhdev.psu.edu/publication_resources
email jgraham_at_psu.edu

Write a Comment

User Comments (0)

About PowerShow.com

Statistical, Practical, and Design Issues in Analysis with Missing Data - PowerPoint PPT Presentation

Statistical, Practical, and Design Issues in Analysis with Missing Data

x 4 manifest variables. x 5 waves = 200 variables. 20,300 parameters ... same money. 3-Form Design. Student Received Item Set? X A B C. Form 1 yes yes yes NO ... – PowerPoint PPT presentation