Missing data - PowerPoint PPT Presentation

About This Presentation
Title:

Missing data

Description:

Missing data issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid procedure for ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 9
Provided by: Gold102
Category:
Tags: data | missing | research

less

Transcript and Presenter's Notes

Title: Missing data


1
Missing data issues and extensions
  • For multilevel data we need to impute missing
    data for variables defined at higher levels
  • We need to have a valid procedure for discrete
    variables
  • Useful to include sampling weights
  • Can we deal with partially missing data?

2
Consider the imputation stage with a set of
multivariate responses
  • We illustrate first with a simple model where the
    response joint distribution is MVN and there are
    responses at 2 levels
  • To illustrate how such a model is specified
    consider repeated measures of childrens heights
    level 2 is the childs adult height.

3
Child heights adult height

Child height as a cubic polynomial with intercept
slope random at level 2 and both correlated
with adult height random effect to give 3-variate
normal.
This allows us jointly to model level1 and level
2 variables with missing data. (see Goldstein and
Kounali, JRSSA, 2009)
4
  • Results

Thus, if data are missing at either level 1 or
level 2 they will get imputed via the MCMC
algorithm.
5
Mixed response types
  • For ordered, or unordered categorical data we can
    specify corresponding latent normal
    distributions.
  • For ordered response we can consider a probit
    threshold model s.t.
  • the cumulative probability of being in one of the
    categories 1,,s is
  • and the associated latent normal model is
  • For a p category unordered response we can
    define a latent p-1 variate normal

We can define MCMC steps to sample form observed
categorical responses an underlying normal or
MVN. Note that these are further conditioned on
the remaining set of (correlated) normal
variables. For details see Multilevel models with
multivariate mixed response types (2009)
Goldstein, H, Carpenter, J., Kenward, M., Levin,
K. Statistical Modelling (to appear)
6
Imputation
  • So now with any mixture of categorical and normal
    variables at any level, we sample, for each MCMC
    iteration, a MVN set of variables including
    imputed values.
  • Thus imputation is standard and the reverse
    transformation is used to obtain imputed
    variables on the categorical scales.
  • For non-normal continuous data we can use e.g. a
    Box-Cox normalising transformation to sample a
    latent normal. Further extensions for Poisson and
    other discrete distributions are also available.
  • Release 2.10 of MLwiN has a link to REALCOM that
    allows these extensions.

7
Partially observed (coarsened) data
  • Where we have a prior (estimated) probability
    distribution (PD) for a missing discrete (or
    continuous) variable value we simply insert an
    extra MCMC step that accepts the standard MI
    value with a probability that is just the
    probability given by the PD. A corresponding step
    is used for normal data.
  • This thus uses all of the data efficiently. No
    data are discarded so long as it is possible to
    assign a PD.
  • Applications in record matching, rating scales
    with uncertain responses etc.
  • Several completed data sets are produced and
    combined as in standard MI

8
Sampling weights- briefly
  • Consider a 2-level model
  • Write level 2 weights as
  • Level 1 weights for j-th level 2 unit as
  • Final level 1 weights
  • We use as the level 1 random part
    explanatory variable instead of the constant 1
  • This will be used for imputation and for MOI

Ongoing work to incorporate this into
MLwiN-REALCOM
Write a Comment
User Comments (0)
About PowerShow.com