Statistical, Practical, and Design Issues in Analysis with Missing Data - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Statistical, Practical, and Design Issues in Analysis with Missing Data

Description:

x 4 manifest variables. x 5 waves = 200 variables. 20,300 parameters ... same money. 3-Form Design. Student Received Item Set? X A B C. Form 1 yes yes yes NO ... – PowerPoint PPT presentation

Number of Views:374
Avg rating:3.0/5.0
Slides: 48
Provided by: christina59
Category:

less

Transcript and Presenter's Notes

Title: Statistical, Practical, and Design Issues in Analysis with Missing Data


1
Statistical, Practical, and Design Issues in
Analysis with Missing Data
  • John Graham
  • The Methodology Center
  • Dept. of Biobehavioral Health
  • Penn State University
  • SRCD, Atlanta, April 9, 2005

2
Recent Missing Data Work
  • Collins, L. M., Schafer, J. L., Kam, C. M.
    (2001). A comparison of inclusive and
    restrictive strategies in modern missing data
    procedures. Psychological Methods, 6, 330_351.
  • Little Rubin (2002). 2nd Edition
  • Schafer, J. L., Graham, J. W. (2002). Missing
    data our view of the state of the art.
    Psychological Methods, 7, 147-177.
  • Graham, J. W., Cumsille, P. E., Elek-Fisk, E.
    (2003). Methods for handling missing data. In
    J. A. Schinka W. F. Velicer (Eds.). Research
    Methods in Psychology (pp. 87_114). Volume 2 of
    Handbook of Psychology (I. B. Weiner,
    Editor-in-Chief). New York John Wiley Sons.

3
Recent Planned Missingness Papers
  • Graham, J. W., Taylor, B. J., Cumsille, P. E.
    (2001). Planned missing data designs in analysis
    of change. In L. Collins A. Sayer (Eds.), New
    methods for the analysis of change, (pp.
    335-353). Washington, DC American Psychological
    Association.
  • Graham, J. W., Taylor, B. J., Olchowski, A. E.,
    Cumsille, P. E. (2005). Planned missing data
    designs in psychological research. Submitted for
    Publication.

4
Other Recent Papers
  • Graham, J. W. (2003). Adding missing-data
    relevant variables to FIML-based structural
    equation models. Structural Equation Modeling,
    10, 80-100.
  • Graham, J. W., Schafer, J. L. (1999). On the
    performance of multiple imputation for
    multivariate data with small sample size. In R.
    Hoyle (Ed.) Statistical Strategies for Small
    Sample Research, (pp. 1-29). Thousand Oaks, CA
    Sage.

5
Problem with Missing Data
  • Analysis procedures were designed for complete
    data. . .

6
Solution 1
  • Design new procedures
  • Missing Data Parameter Estimation in One Step
  • Full Information Maximum Likelihood (FIML)SEM
    and Other Latent Variable Programs(Amos, Mx,
    LISREL, Mplus, LTA)

7
Solution 2
  • Missing data Multiple Imputation (MI)
  • Two Steps
  • Step 1 Replace Missing Values with Plausible
    Values
  • Step 2 Analyze Data as if there were No Missing
    Data

8
FAQ
  • Aren't you somehowhelping yourself with
    imputation?. . .

9
NO. Missing data imputation . . .
  • does NOT give you something for nothing
  • DOES let you make use of all data you have
  • . . .

10
FAQ
  • Is the imputed value what the person would have
    given?

11
NO. When we impute a value . . .
  • We do not impute for the sake of the value itself
  • We impute to preserve important characteristics
    of the whole data set
  • . . .

12
We want . . .
  • unbiased parameter estimation
  • e.g., b-weights
  • Good estimate of variability
  • e.g., standard errors
  • best statistical power

13
MCAR(Missing Completely At Random)
  • MCAR 1 Cause of missingness completely random
    process (like coin flip)
  • MCAR 2 Cause uncorrelated with variables of
    interest
  • Example parents move
  • No bias if cause omitted

14
MAR (Missing At Random)
  • Missingness may be related to measured variables
  • But no residual relationship with unmeasured
    variables
  • Example reading speed
  • No bias if you control for measured variables
    (conditionally missing at random)

15
MNAR (Missing Not At Random)
  • Even after controlling for measured variables ...
  • Residual relationship with unmeasured variables
  • Example drug use reason for absence

16
MNAR Causes
  • The recommended methods assume data are MAR
  • Should these methods be used even when
    assumptions not met?
  • . . .

17
YES! These Methods Work!
  • Suggested methods work better than old methods
  • Multiple causes of missingness
  • Only small part of missingness may be MNAR
  • Suggested methods usually work very well

18
Practical Issues
  • How much difference does it make?
  • How easy is the "sell"?
  • Which is better FIML or MI?
  • "Auxiliary" Variables (Collins, Schafer, Kam,
    2001 Graham, 2003)
  • Small sample size (Graham Schafer, 1999)
  • Too many variables
  • Automation

19
Some Practical Issues
20
Practical IssuesBiggest problems in multiple
imputation
  • How do I write my data out of SPSS?
  • How can I use MI with ANOVA?
  • How do I use MI with SPSS, STATA, SUDAAN, EQS,
    Mplus?
  • Is there a less tedious way?

21
Practical IssuesHow Easy is the "Sell"?
  • Sell is getting easier all the time
  • Pendulum starting to swing other way

22
Practical IssuesWhich is better -- MI or FIML?
  • MI is more generally applicable than FIML
  • In theory (in long run), they are equivalent
  • As practiced, they are sometimes a bit different

23
Recent Research (Collins, Schafer, Kam, 2001)
Shows ...
  • Include auxiliary variables in model
  • highly correlated with variables of interest
  • minimize loss of power
  • reduce bias (MNAR)

24
FIML versus MI (revisited)
  • In long run, FIML and MI equivalent
  • AS PRACTICED, MI is better
  • MI makes it easier to include auxiliary variables
    (Collins, Schafer, Kam, 2001)
  • but models are available to allow auxiliary
    variables with FIML
  • see Graham (2003, Structural Equation Modeling)

25
FIML versus MI (revisited)
  • In long run, FIML and MI equivalent
  • AS PRACTICED, FIML may be better
  • FIML estimates sometimes MAY have more power
  • but if you set m (n imputations) high enough (40
    imputations ?)
  • power is equivalent

26
Practical IssuesSmall Sample Size
  • Graham Schafer (1999)
  • large regression model
  • N 50
  • 50 missingness
  • Multiple Imputation (NORM) worked fine

27
Practical Issues Too Many Variables
  • Longitudinal research
  • 10 variables x 5 waves 50 variables
  • 1,325 parameters
  • 10 latent variables x 4 manifest variables x
    5 waves 200 variables
  • 20,300 parameters

28
One Solution (borrow Steve Peck's slide)Impute
Scales Rather Than Items
  • Create scales based on partial data
  • Require 100 of items if Alpha .60
  • Require 80 of items if Alpha .70
  • Require 67 of items if Alpha .80
  • Require 50 of items if Alpha .90
  • Good solution for regression analyses
  • Not good solution for SEM

29
One Solution (borrow Steve Peck's slide)Impute
Scales Rather Than Items
  • Create scales based on partial data
  • Items
  • Alpha Required
  • _________ ________
  • lt.60 100
  • .70 80
  • .80 67
  • .90 50
  • Good solution for regression analyses
  • Not good solution for SEM

30
Too Many VariablesA Partial Solution
  • Normally cannot impute separately
  • assumes r 0 between sets of variables
  • What if r 0 is true?
  • (1) Separate variables so they are maximally
    uncorrelated (principal components)
  • (2) Use factor scores to represent unused
    variables

31
Practical IssuesBiggest Problems in Multiple
Imputation
  • NOT the theory
  • NOT imputing the data
  • NOT combining the results from m 20 analyses

32
Practical IssuesBiggest problems in multiple
imputation
  • How do I write my data out of SPSS?
  • How can I use MI with ANOVA?
  • How do I use MI with SPSS, STATA, SUDAAN, EQS,
    Mplus?
  • Is there a less tedious way?

33
Now That We HaveGood Solutions for Missing Data
Analysis ...
  • Consider PLANNED missing data designs

34
Planned Missingness for Growth Modeling
  • see Graham, J. W., Taylor, B. J., Cumsille, P.
    E. (2001). Planned missing data designs in
    analysis of change. In L. Collins A. Sayer
    (Eds.), New methods for the analysis of change,
    (pp. 335-353). Washington, DC American
    Psychological Association.

35
Design 1 all combinations of1 time missing (17
missing)
  • 1 1 1 1 1 57 0 0 0 0 01 1 1 1 0 57 1 1 1 1
    11 1 1 0 1 57 1 1 1 1 11 1 0 1 1 57 1 1 1 1
    11 0 1 1 1 57 1 1 1 1 10 1 1 1 1 57 1 1 1 1
    1 ___ N 342

36
Design 3 all combinations of 2 times missing
(36 missing)
  • 1 1 1 1 1 31 0 0 0 0 0
  • 1 1 1 0 0 31 0 0 0 0 0
  • 1 1 0 1 0 31 0 0 0 0 0
  • 1 0 1 1 0 31 0 0 0 0 0
  • 0 1 1 1 0 31 1 1 1 1 1
  • 1 1 0 0 1 31 1 1 1 1 1
  • 1 0 1 0 1 31 1 1 1 1 1
  • 0 1 1 0 1 31 1 1 1 1 1
  • 1 0 0 1 1 31 1 1 1 1 1
  • 0 1 0 1 1 31 1 1 1 1 1
  • 0 0 1 1 1 31 1 1 1 1 1

37
SE for b-wt Predicting Slope
same money
Missing Data Designs
same SE
same SE
Complete Cases Designs
same money
Data Points
38
3-Form Design
  • Student Received Item Set?
  • ----------------------------
  • X A B C
  • Form 1 yes yes yes NO
  • Form 2 yes yes NO yes
  • Form 3 yes NO yes yes
  • Form 4 yes yes yes yes

39
3-Form Design
  • Item Sets X A B C total 34 33 33 3
    3 133
  • form X A B C1 34 33 33 0 1002 34 33
    0 33 1003 34 0 33 33 100

40
3-Form Design Item Order
  • Form 1 X A BForm 2 X C AForm 3 X B C

41
3-Form Design Item Order
  • Form 1 X A B CForm 2 X C A BForm
    3 X B C A

42
3-Form Design Item Order
  • Form 1 X A B CForm 2 X C A BForm
    3 X B C A
  • Could pay some subjects to complete extra
    questions

43
3-Form Design Item Order
  • Form 1 X A B CForm 2 X C A BForm
    3 X B C A
  • Give questions as shown, measure reasons for
    non-completion
  • poor reading
  • low motivation
  • "Managed" missingness

44
Expensive Measures II
Larger N, Less Expensive
r .30
Smaller N, More Expensive
45
Example Study
  • r -.30 (smoking and health)
  • Self-report Smoking
  • two items
  • Biochemical Smoking Measures
  • Expired Air CO
  • Saliva Cotinine

46
Example Study
  • 15,050 for Measuring Smoking
  • Self-Reports 7.30 per subject
  • CO / Cotinine 16.78 per subject
  • self-reports bio-chem625 x 7.30
    625 x 16.78 15,050
  • 1200 x 7.30 375 x 16.78 15,050

47
D
C
A
B
E
48
  • Cheap Measures are Biased
  • A loadings .50, .70 (Cheap, Expensive)
  • B loadings .70, .70 (Cheap, Expensive)
  • C loadings .70, .50 (Cheap, Expensive)
  • D loadings .50, .50 (Cheap, Expensive)
  • Cheap Measures Not Biased
  • E loadings .50, .70 (Cheap, Expensive)

49
Research Examples
  • Smoking Research
  • less expensive Self-Reports
  • more expensive CO and Saliva Cotinine
  • Alcohol Research
  • less expensive Brief Self-reports
  • more expensive Time Line Follow Back

50
Research Examples
  • Blood Vessel Health (relevant for cardiovascular
    health)
  • Ultrasound Flow-mediated dilation
  • 150 per subject
  • BP approximation
  • pulse wave contour analysis
  • 15 per subject

51
Research Examples
  • Nutrition Research
  • less expensive Brief Nutrition Survey
  • more expensive Extensive 24-hr Recall
  • Survey Research I
  • less expensive Brief Mail Survey
  • more expensive Extensive Face-to-Face
    Interview

52
Research Examples
  • Exercise Research Physical Conditioning
  • less expensive Self-report survey
  • more expensive VO2-max
  • Survey Research II
  • less expensive Retrospective reports
  • more expensive Prospective reports

53
  • the end

54
Recent Papers
  • Schafer, J. L., Graham, J. W. (2002). Missing
    data our view of the state of the art.
    Psychological Methods, 7, 147-177.
  • Graham, J. W., Cumsille, P. E., Elek-Fisk, E.
    (2003). Methods for handling missing data. In
    J. A. Schinka W. F. Velicer (Eds.). Research
    Methods in Psychology (pp. 87_114). Volume 2 of
    Handbook of Psychology (I. B. Weiner,
    Editor-in-Chief). New York John Wiley Sons.
  • Graham, J. W., Taylor, B. J., Olchowski, A. E.,
    Cumsille, P. E. (2005). Planned missing data
    designs in psychological research. Manuscript
    submitted for publication.
  • http//mcgee.hhdev.psu.edu/publication_resources
  • email jgraham_at_psu.edu
Write a Comment
User Comments (0)
About PowerShow.com