Analysing partially observed datasets by embedding in multilevel models - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Analysing partially observed datasets by embedding in multilevel models

Description:

This is essentially what multiple imputation does ... Multiple imputation methods agree with multilevel model approach in line with theory ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 24
Provided by: soci96
Category:

less

Transcript and Presenter's Notes

Title: Analysing partially observed datasets by embedding in multilevel models


1
Analysing partially observed datasets by
embedding in multilevel models
  • James Carpenter1, Michael Spratt2, Jonathan
    Sterne2 Mike Kenward1
  • Medical Statistics Unit, London School of Hygiene
    Tropical Medicine
  • Department of Social Medicine, University of
    Bristol

2
Overview
  • Motivating example from ALSPAC study
  • Brief review of issues raised by missing data
  • Handling missing data by embedding the analysis
    in a multilevel model
  • Simulated example
  • Analysis of ALSPAC blood pressure data
  • Discussion
  • References.

3
Blood Pressure in ALSPAC
  • ALSPAC Avon Longitudinal Study of Parents and
    Children
  • General interest in the extent to which adverse
    circumstances in early life have a long term
    effect
  • Here, take an illustrative example after
    adjusting for systolic blood pressure at 7 years,
    sex, teenage pregnancy and mothers educational
    level, is systolic blood pressure at 9 years
    still affected by defective housing at birth?

4
Blood Pressure in ALSPAC
  • Social deprivation variables fully observed
  • Some missing blood pressure readings at both 7
    and 9 years
  • 10.5 missing at age 7
  • 15.5 missing at age 9

5
Recap on missing data
  • When we have missing observations, we need to
    consider the reason they are unseen the
    missingness mechanism
  • We say they are Missing Completely at Random
    (MCAR) when each individuals missingness
    mechanism is orthogonal to anything we wish to
    draw inference about
  • We say they are Missing At Random (MAR) when,
    given an individuals observed data, their
    missingness mechanism does not depend on their
    unseen values.
  • We say they are Missing Not At Random (MNAR) if
    the missingness mechanism is neither of the
    above.
  • Note that, strictly, different individuals can
    have different mechanisms within the MAR class

6
Options for analysis
  • Use complete cases
  • If data are not MCAR, results will be biased if
    they are MCAR, we still lose power
  • Assume unseen data are MAR and
  • Use multiple imputation, or weighting, OR
  • Embed the analysis in a multilevel model which
    takes care of the imputation step

7
Is blood pressure MCAR?
  • Logistic regression of the chance of observing 9
    year blood pressure on the other predictors
  • Data not MCAR

8
MAR analysis by embedding model of interest in a
multilevel model
  • When data are MAR, we get valid inference by
    regressing the partially observed data on the
    fully observed data
  • This is essentially what multiple imputation does
  • Therefore we can avoid MI if we can set up a
    model where
  • Partially observed data are responses and
  • the model parameterisation includes the
    coefficients we are interested in

9
When can we do this?
  • We can do this when we wish to regress a
    partially observed response at some follow-up
    time on say birth data.
  • we bring in intermediate data as additional
    responses
  • A property of the normal distribution means we
    can also do this if we wish to adjust for a
    partially observed variable provided we set up
    the model for the mean and variance correctly

10
Illustration
  • Response y and covariate x both partially
    observed. Additional fully observed categorical
    variable (two levels)
  • Model
  • Exact numerical equivalence if no missing data
    consistent otherwise.
  • Assumption of multivariate normality required
    for many MI packages too.
  • Theoretical arguments in paper.

11
Simulated example
  • y is outcome variable, normally distributed
  • z is normal variable correlated with y and also
    predictive of missingness of y
  • z can be though of as a baseline, or earlier
    measurement

12
Other covariates
  • Generate binary x1 and x2 predictive of y
  • Make some values of y and z MAR
  • Estimate the association of y with x1, adjusted
    for x2 and z

13
Missing data mechanism
  • Missingness in y artificially created, with
    missingness mechanism dependent on x1, x2 and z
  • Missingness in z created, with missingness
    mechanism dependent on y, x1 and x2

14
Simulated Examples
  • 5 analyses carried out
  • Analysis using original data
  • Then after observations made missing
  • Analysis of remaining data (complete cases)
  • MI analysis using Stata inorm 2
  • Analysis using Stata ice 4
  • Analysis by embedding in a multilevel models (SAS
    proc MIXED MLwiN)

15
Results
16
Analysis of ALSPAC Blood Pressure Example
  • Blood pressure example analysed in similar way to
    the simulations
  • Y sbp at 9 years
  • X1 defective housing (main predictor of
    interest)
  • X2 gender
  • X3 teenage pregnancy
  • X4 basic educational attainment
  • Z sbp at 7 years

17
Missing data assumption
  • For each individual, their missing blood pressure
    (either at 7 or 9 years) is MAR given their
    observed covariates
  • blood pressure
  • defective housing
  • gender,
  • teenage pregnancy and
  • mothers educational level

18
Results
  • Effect of defective housing at birth, adjusting
    for blood pressure at 7, gender, and teenage
    pregnancy

19
Discussion blood pressure analysis
  • Weve seen unseen data are not MCAR our results
    assume unseen data MAR
  • Results consistent with an ongoing effect of
    defective housing at birth on blood pressure
  • Complete case analysis overestimates the effect,
    and also looses information (by ignoring
    information from partially observed cases)
  • Multiple imputation methods agree with multilevel
    model approach in line with theory

20
Discussion implications
  • Illustrated how missing data in a single-level
    analysis can be handled by embedding in a
    multilevel model
  • Key assumption is multivariate normality
  • MI is a stochastic estimation process so cannot
    be more efficient that direct estimation, which
    also has the advantage of easy repeatability
  • Modelling can be done in your favourite package
    MLwiN, stata, SAS proc MIXED
  • However, within Stata xtmixed, we currently
    cannot define the residual correlation matrix as
    unstructured

21
References
  • 1 James R Carpenter and Michael G Kenward.
    Missing data in randomised controlled trials a
    practical guide. Birmingham NCCRM (2007)
  • 2 R.J.A. Little and D. B. Rubin. Statistical
    analysis with missing data (2nd edition). Wiley,
    (2002).
  • 3 J. L.Schafer. Analysis of incomplete
    multivariate data. CRC Press (1997).
  • 4 S Van Buuren, H. C. Boshuitzen and D.L.
    Knook. Multiple imputation of missing blood
    pressure covariates in survival analysis. SIM 18.
    681-694 (1999).

22
(No Transcript)
23
Appendix SAS code for first example
  • proc mixed data from_stata_x1

  • title 'Reg condit on baseline introduction of
    mytreat to enforce similar means for treat at
    bl'
  • class id time mytreat

  • model newy mytreat/ s htype 2 noint ddfm
    kr
  • repeated time/subject id type un

  • lsmeans mytreat / diff

  • run



  • Data in long format newy is z when time1 and
    is y when time2
  • In order to adjust for z (i.e, baseline) a
    3-level categorical variable mytreat is defined
  • 0 if time is 1
  • 1 if time is 2 and x is 0
  • 2 if time is 2 and x is 1
Write a Comment
User Comments (0)
About PowerShow.com