Filling Holes in Your Data: Multiple Imputation in Education Research - PowerPoint PPT Presentation

Loading...

PPT – Filling Holes in Your Data: Multiple Imputation in Education Research PowerPoint presentation | free to view - id: 3e500-MjkzZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Filling Holes in Your Data: Multiple Imputation in Education Research

Description:

1. Discrete variables Horton et al 2003; Allison 2005 2. Skew von Hippel 2008 ... (Allison 2002, von Hippel 2009) 2 strata: public schools and private schools. ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 39
Provided by: paultvo
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Filling Holes in Your Data: Multiple Imputation in Education Research


1
Filling Holes in Your Data Multiple Imputation
in Education Research
  • Paul T. von Hippel
  • Harvard Graduate School of Education
  • Larsen G-06
  • Wednesday, April 22, 1-230 pm

2
I. Background II. New Results
3
I. Background
4
Education Data…
5
…Are Full of Holes
6
Listwise Deletion
  • aka Case deletion, Complete-case analysis

7
Impute the Mean
8
Regression Imputation
  • Impute the conditional mean

9
Random Regression Imputation
  • Conditional mean random variation

10
Add Extra Regressors
  • Graduation rate. Sector (public vs private).

11
Multiple Imputation Rubin 1987
  • Steps
  • Replication
  • Imputation
  • Analysis
  • Recombination

12
12. Replication Imputation
13
12. Replication Imputation
  • The imputed variable is not the original
    variable.
  • They just have similar statistical properties.

14
34. Analysis Recombination
  • How many imputations?
  • Large g ?Large M
  • Often enough (Rubin 1987 von Hippel 2005)
  • M 3 to 10
  • Surely enough (Bodner 2008)
  • M 100 g 31

15
Models used for imputation
  • SASs MI procedure
  • Multivariate normal
  • Normal
  • Linear
  • Statas ice command (Royston 2006)
  • Alternating regression
  • Logistic, Poisson, normal
  • Other models (not widely implemented)
  • Rs Mixed and Pan (Schafer 1997)
  • Resampling
  • Non-normal models (He Raghunathan 2008)

16
II. New results A. Non-normality 1. Discrete
variables Horton et al 2003 Allison 2005 2.
Skew von Hippel 2008 B. Missing Y von Hippel
2007 C. Nonlinearity 1. Interactions von
Hippel 2009 Allison 2002 2. Curves von Hippel
2009 Theme Your data can look wrong, so
long as your estimates are right.
17
IIA. Non-normality
18
IIA1. Rounding discrete variables ?
  • Common advice
  • Impute dummy as normal
  • Round normal imputations to 0 and 1 ?

Horton, Lipsitz Parzen 2003 Allison 2005
Bollen Barb 1981
19
IIA2. Rounding skewed variables ?
  • Skewed variable.
  • Impute as though normal.
  • Truncate implausible values. ?

(von Hippel 2008)
20
IIA2. Transforming skewed variables ?
(von Hippel 2008)
21
IIA. Non-normality Summary
  • Best
  • impute using a model that fits
  • Often OK
  • impute as though normal
  • Bad
  • Try to make data normal
  • Editing imputations
  • Principle
  • Imputed data ? original data
  • Imputed estimates original estimates

22
IIB. Missing Y
23
IIB. Missing Y
  • Missing Ys are useless for regression
  • But cases with missing Ys have information about
    X
  • Little 1992

von Hippel 2007
24
IIB. Missing Y Multiple Imputation, then
Deletion (MID)
von Hippel 2007
  • Steps
  • Replication
  • Imputation
  • 2½ . Deletion of cases with imputed Y
  • Analysis
  • Recombination

25
IIC. Non-linearity
26
IIC1. Nonlinearity Interactions (Allison 2002,
von Hippel 2009)
  • Complete data. Y regressed on X, D and DX

27
Impute, then Interact? ?
  • Impute (X,Y,D) as though linear (no interaction).
  • then regress Y on X, D, and DX

28
Stratify, then Impute ? (Allison 2002, von Hippel
2009)
  • 2 strata public schools and private schools.
  • Impute (X,Y) as linear within each stratum.

29
Interact, then Impute! ?
  • Impute the interaction, like any other variable.
  • Then regress on the imputed interaction
  • (Allison 2002, von Hippel 2009).

30
IIC2. Nonlinearity Curves (von Hippel 2009)
  • Complete data. Y regressed on X and X2

31
Impute, then Square? ?
  • Impute (X,Y) as though linear (with other
    variables).
  • then regress Y on X and X2?

32
Square, Then Impute ? (von Hippel 2009)
  • Impute the square like any other variable.
  • Then use the imputed square in regression

33
IIC. Nonlinearity Summary
  • Transform, then impute. (von Hippel 2009)
  • Calculate transformation (square, interaction,
    etc).
  • Impute like any other variable.
  • Use imputed transformation in analysis.
  • Principle
  • Imputed data ? real data
  • Imputed statistics real statistics

34
Conclusions
  • Plausible estimates more important than plausible
    data
  • Normal imputations are versatile, but messy
  • Future research and software
  • Resampling (approximate Bayesian bootstrap)
  • Alternatives to imputation
  • full-information maximum likelihood estimation
  • Data quality

35
References
  • Allison, P (2001). Missing Data. Thousand Oaks
    CA Sage.
  • Allison, P (2005). Imputation of Categorical
    Variables with PROC MI, SAS Users Group
    International conference, Philadelphia, PA, April
    10-13.
  • Barnard, J. and Rubin, D B (1999), Small-Sample
    Degrees of Freedom With Multiple Imputation,
    Journal of the American Statistical Association
    86(4)948-55.
  • He, Yulei, and Raghunathan, Trivellore E. (2006),
    Tukeys gh Distribution for Multiple
    Imputation. The American Statistician 60 (3)
    251-256.
  • Horton, NJ, Lipsitz, SP, Parzen, M. (2003). A
    potential for bias when rounding in multiple
    imputation. The American Statistician 57(4),
    229-232.
  • Horton, NJ Kleinman, KP. (2007). Much Ado
    About Nothing A Comparison of Missing Data
    Methods and Software to Fit Incomplete Data
    Regression Models.
  • Little, RJA (1992), Regression with Missing Xs
    A Review, Journal of the American Statistical
    Association 87(420), 1227-1237.
  • Little, RJA Rubin, DB (2002), Statistical
    Analysis with Missing Data.
  • Kim, JK (2004), Finite Sample Properties of
    Multiple Imputation Estimators, The Annals of
    Statistics 32(2), 766-783.
  • Meng, X. L. (1995), Multiple Imputation
    Inferences With Uncongenial Sources of Input,
    Statistical Science 10538-73.
  • Rubin, DB (1987), Multiple Imputation for Survey
    Nonresponse. New York Wiley.
  • Schafer, JL (1996). Analysis of Incomplete
    Multivariate Data. New York Chapman Hall.
  • von Hippel, PT (2004). " Biases in SPSS 12.0
    Missing Value Analysis." The American
    Statistician 58(2), 160-164.
  • von Hippel, PT (2005). "How Many Imputations Are
    Needed? A Comment on Hershberger and Fisher
    (2003)." Structural Equation Modeling, 12(2),
    334-335.
  • von Hippel, PT (2007), Regression with Missing
    Ys. Sociological Methodology.
  • von Hippel, PT (2008), Imputing skewed
    variables. Under review.
  • von Hippel, PT (2009), How to impute squares and
    interactions. Sociological Methodology, in
    press.

36
Assumption Ignorable missingness
  • missing at random (MAR), noninformative
  • The missing values are like the observed values
    in similar cases.

37
Full information maximum likelihood (ML)
  • Suppose Y has a missing value
  • Estimate the distribution of possible Y values
  • In MI
  • impute 3-10 values from this distribution
  • In ML,
  • integrate across the full distribution of
    possible Y values
  • Like MI with an infinite number of imputations

38
ML in AMOS
About PowerShow.com