Missing Data and random effect modelling - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Missing Data and random effect modelling

Description:

Title: Lecture 2 Author: maths Last modified by: maths Created Date: 7/1/2005 10:53:16 AM Document presentation format: On-screen Show Company: UoN Other titles – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 41
Provided by: Mat5158
Category:

less

Transcript and Presenter's Notes

Title: Missing Data and random effect modelling


1
Lecture 20
  • Missing Data and random effect modelling

2
Lecture Contents
  • What is missing data?
  • Simple ad-hoc methods.
  • Types of missing data. (MCAR, MAR, MNAR)
  • Principled methods.
  • Multiple imputation.
  • Methods that respect the random effect structure.
  • Thanks to James Carpenter (LSHTM) for many
    slides!!

3
Dealing with missing data
  • Why is this necessary?
  • Missing data are common.
  • However, they are usually inadequately handled in
    both epidemiological and experimental research.
  • For example, Wood et al. (2004) reviewed 71
    recently published BMJ, JAMA, Lancet and NEJM
    papers.
  • 89 had partly missing outcome data.
  • In 37 trials with repeated outcome measures, 46
    performed complete case analysis.
  • Only 21 reported sensitivity analysis.

4
What do we mean by missing data?
  • Missing data are observations that we intended to
    be made but did not make. For example, an
    individual may only respond to certain questions
    in a survey, or may not respond at all to a
    particular wave of a longitudinal survey. In the
    presence of missing data, our goal remains making
    inferences that apply to the population targeted
    by the complete sample - i.e. the goal remains
    what it was if we had seen the complete data.
  • However, both making inferences and performing
    the analysis are now more complex. We will see we
    need to make assumptions in order to draw
    inferences, and then use an appropriate
    computational approach for the analysis.
  • We will avoid adopting computationally simple
    solutions (such as just analysing complete data
    or carrying forward the last observation in a
    longitudinal study) which generally lead to
    misleading inferences.

5
What are missing data?
  • In practice the data consist of (a) the
    observations actually made (where '?' denotes a
    missing observation)
  • and (b) the pattern of missing values

Variable Variable Variable Variable Variable Variable Variable
Unit 1 2 3 4 5 6 7
1 1 2 3.4 4.5 ? 10 1.2
2 1 3 ? ? B 12 ?
3 2 ? 2.6 ? C 15 0
Variable Variable Variable Variable Variable Variable Variable
Unit 1 2 3 4 5 6 7
1 1 1 1 1 0 1 1
2 1 1 0 0 1 1 0
3 1 0 1 0 1 1 1
6
Inferential Framework
  • When it comes to analysis, whether we adopt a
    frequentist or a Bayesian approach the likelihood
    is central.
  • In these slides, for convenience, we discuss
    issues from a frequentist perspective, although
    often we use appropriate Bayesian computational
    strategies to approximate frequentist analyses.

7
Classical Approach
  • The actual sampling process involves the
    'selection' of the missing values, as well as the
    units. So to complete the process of inference in
    a justifiable way we need to take this into
    account.

8
Bayesian Framework
  • Posterior Belief
  • Prior Belief Likelihood.
  • Here
  • The likelihood is a measure of comparative
    support for different models given the data. It
    requires a model for the observed data, and as
    with classical inference this must involve
    aspects of the way in which the missing data have
    been selected (i.e. the missingness mechanism).

9
What do we mean by valid inference when we have
missing data?
  • We have already noted that missing data are
    observations we intended to make but did not.
    Thus, the sampling process now involves both the
    selection of the units, AND ALSO the process by
    which observations become missing - the
    missingness mechanism.
  • It follows that for valid inference, we need to
    take account of the missingness mechanism.
  • By valid inference in a frequentist framework we
    mean that the quantities we calculate from the
    data have the usual properties. In other words,
    estimators are consistent, confidence intervals
    attain nominal coverage, p-values are correct
    under the null hypothesis, and so on.

10
Assumptions
  • We distinguish between item and unit nonresponse
    (missingness). For item missingness, values can
    be missing on response (i.e. outcome) variables
    and/or on explanatory (i.e. design/covariate/expos
    ure/confounder) variables.
  • Missing data can effect properties of estimators
    (for example, means, percentages, percentiles,
    variances, ratios, regression parameters and so
    on). Missing data can also affect inferences,
    i.e. the properties of tests and confidence
    intervals, and Bayesian posterior distributions.
  • A critical determinant of these effects is the
    way in which the probability of an observation
    being missing (the missingness mechanism) depends
    on other variables (measured or not) and on its
    own value.
  • In contrast with the sampling process, which is
    usually known, the missingness mechanism is
    usually unknown.

11
Assumptions
  • The data alone cannot usually definitively tell
    us the sampling process.
  • Likewise, the missingness pattern, and its
    relationship to the observations, cannot
    definitively identify the missingness mechanism.
  • The additional assumptions needed to allow the
    observed data to be the basis of inferences that
    would have been available from the complete data
    can usually be expressed in terms of either
  • 1. the relationship between selection of missing
    observations and the values they would have
    taken, or
  • 2. the statistical behaviour of the unseen data.
  • These additional assumptions are not subject to
    assessment from the data under analysis their
    plausibility cannot be definitively determined
    from the data at hand.

12
Assumptions
  • The issues surrounding the analysis of data sets
    with missing values therefore centre on
    assumptions. We have to
  • 1. decide which assumptions are reasonable and
    sensible in any given setting -
    contextual/subject matter information will be
    central to this
  • 2. ensure that the assumptions are transparent
  • 3. explore the sensitivity of inferences/conclusio
    ns to the assumptions, and
  • 4. understand which assumptions are associated
    with particular analyses.

13
Getting computation out of the way
  • The above implies it is sensible to use
    approaches that make weak assumptions, and to
    seek computational strategies to implement them.
    However, often computationally simple strategies
    are adopted, which make strong assumptions, which
    are subsequently hard to justify.
  • Classic examples are completers analysis (i.e.
    only including units with fully observed data in
    the analysis) and last observation carried
    forward. The latter is sometimes advocated in
    longitudinal studies, and replaces a unit's
    unseen observations at a particular wave with
    their last observed values, irrespective of the
    time that has elapsed between the two waves.

14
Conclusions (1)
  • Missing data introduce an element of ambiguity
    into statistical analysis, which is different
    from the traditional sampling imprecision. While
    sampling imprecision can be reduced by increasing
    the sample size, this will usually only increase
    the number of missing observations! As discussed
    in the preceding sections, the issues surrounding
    the analysis of incomplete datasets turn out to
    centre on assumptions and computation.
  • The assumptions concern the relationship between
    the reason for the missing data (i.e. the
    process, or mechanism, by which the data become
    missing) and the observations themselves (both
    observed and unobserved).
  • Unlike say in regression, where we can use the
    residuals to check on the assumption of
    normality, these assumptions cannot be verified
    from the data at hand.
  • Sensitivity analysis, where we explore how our
    conclusions change as we change the assumptions,
    therefore has a central role in the analysis of
    missing data.

15
Simple, ad-hoc methods and their shortcomings
  • In contrast to principled methods, these usually
    create a single 'complete' dataset, which is
    analysed as if it were the fully observed data.
  • Unless certain, fairly strong, assumptions are
    true, the answers are invalid.
  • We briefly review the following methods
  • Analysis of completers only.
  • Imputation of simple mean.
  • Imputation of regression mean.
  • Creating an extra category.

16
Completers analysis
  • The data on the right has one missing observation
    on variable 2, unit 10.
  • Completers analysis deletes all units with
    incomplete data from the analysis (here unit 10).

Variable Variable
Unit 1 2
1 3.4 5.67
2 3.9 4.81
3 2.6 4.93
4 1.9 6.21
5 2.2 6.83
6 3.3 5.61
7 1.7 5.45
8 2.4 4.94
9 2.8 5.73
10 3.6 ?
17
Whats wrong with completers analysis?
  • It is inefficient.
  • It is problematic in regression when covariate
    values are missing and models with several sets
    of explanatory variables need to be compared.
    Either we keep changing the size of the data set,
    as we add/remove explanatory variables with
    missing observations, or we use the (potentially
    very small, and unrepresentative) subset of the
    data with no missing values.
  • When the missing observations are not a
    completely random selection of the data, a
    completers analysis will give biased estimates
    and invalid inferences.

18
Simple mean imputation
  • We replace missing data with the arithmetic
    average of the observed data for that variable.
    In the table of 10 cases this will be 5.58.
  • Why not?
  • This approach is clearly inappropriate for
    categorical variables.
  • It does not lead to proper estimates of measures
    of association or regression coefficients.
    Rather, associations tend to be diluted.
  • In addition, variances will be wrongly estimated
    (typically under estimated) if the imputed values
    are treated as real. Thus inferences will be
    wrong too.

19
Regression mean imputation
  • Here, we use the completers to calculate the
    regression of the incomplete variable on the
    other complete variables. Then, we substitute the
    predicted mean for each unit with a missing
    value. In this way we use information from the
    joint distribution of the variables to make the
    imputation.
  • To perform regression imputation, we first
    regress variable 2 on variable 1 (note, it
    doesn't matter which of these is the 'response'
    in the model of interest). In our example, we use
    simple linear regression
  • V2 a ß V1 e.
  • Using units 1-9, we find that a 6.56 and ß -
    0.366, so the regression relationship is
  • Expected value of V2 6.56 - 0.366V1.
  • For unit 10, this gives
  • 6.56 - 0.366 x 3.6 5.24.

20
Regression mean imputation Why/Why Not?
  • Regression mean imputation can generate unbiased
    estimates of means, associations ad regression
    coefficients in a much wider range of settings
    than simple mean imputation.
  • However, one important problem remains. The
    variability of the imputations is too small, so
    the estimated precision of regression
    coefficients will be wrong and inferences will be
    misleading.

21
Creating an extra category
  • When a categorical variable has missing values it
    is common practice to add an extra 'missing
    value' category. In the example below, the
    missing values, denoted '?' have been given the
    category 3.

Variable Variable
Unit 1 2
1 3.4 1
2 3.9 1
3 2.6 1
4 1.9 1
5 2.2 ? ? 3
6 3.3 2
7 1.7 2
8 2.4 2
9 2.8 ? ? 3
10 3.6 ? ? 3
22
Creating an extra category
  • This is bad practice because
  • the impact of this strategy depends on how
    missing values are divided among the real
    categories, and how the probability of a value
    being missing depends on other variables
  • very dissimilar classes can be lumped into one
    group
  • severe bias can arise, in any direction, and
  • when used to stratify for adjustment (or correct
    for confounding) the completed categorical
    variable will not do its job properly.

23
Some notation
  • The data We denote the data we intended to
    collect, by Y, and we partition this into
  • Y Yo,Ym.
  • where Yo is observed and Ym is missing. Note that
    some variables in Y may be outcomes/responses,
    some may be explanatory variables/covariates.
    Depending on the context these may all refer to
    one unit, or to an entire dataset.
  • Missing value indicator Corresponding to every
    observation Y, there is a missing value indicator
    R, defined as
  • R 1 if Y observed 0 otherwise.

24
Missing value mechanism
  • The key question for analyses with missing data
    is, under what circumstances, if any, do the
    analyses we would perform if the data set were
    fully observed lead to valid answers? As before,
    'valid' means that effects and their SE's are
    consistently estimated, tests have the correct
    size, and so on, so inferences are correct.
  • The answer depends on the missing value
    mechanism.
  • This is the probability that a set of values are
    missing given the values taken by the observed
    and missing observations, which we denote by
  • Pr(R Yo, Ym).

25
Examples of missing value mechanisms
  • 1. The chance of non-response to questions about
    income usually depend on the person's income.
  • 2. Someone may not be at home for an interview
    because they are at work.
  • 3. The chance of a subject leaving a clinical
    trial may depend on their response to treatment.
  • 4. A subject may be removed from a trial if their
    condition is insufficiently controlled.

26
Missing Completely at Random (MCAR)
  • Suppose the probability of an observation being
    missing does not depend on observed or unobserved
    measurements. In mathematical terms, we write
    this as
  • Pr(R Yo, Ym) Pr(R)
  • Then we say that the observation is Missing
    Completely At Random, which is often abbreviated
    to MCAR. Note that in a sample survey setting
    MCAR is sometimes called uniform non-response.
  • If data are MCAR, then consistent results with
    missing data can be obtained by performing the
    analyses we would have used had there been no
    missing data, although there will generally be
    some loss of information. In practice this means
    that, under MCAR, the analysis of only those
    units with complete data gives valid inferences.

27
Missing At Random (MAR)
  • After considering MCAR, a second question
    naturally arises. That is, what are the most
    general conditions under which a valid analysis
    can be done using only the observed data, and no
    information about the missing value mechanism,
    Pr(R Yo, Ym)? The answer to this is when, given
    the observed data, the missingness mechanism does
    not depend on the unobserved data.
    Mathematically,
  • Pr(R Yo, Ym) Pr(R Yo).
  • This is termed Missing At Random, abbreviated
    MAR.

28
Missing Not At Random (MNAR)
  • When neither MCAR nor MAR hold, we say the data
    are Missing Not At Random, abbreviated MNAR. In
    the likelihood setting (see end of previous
    section) the missingness mechanism is termed
    non-ignorable.
  • What this means is
  • Even accounting for all the available observed
    information, the reason for observations being
    missing still depends on the unseen observations
    themselves.
  • To obtain valid inference, a joint model of both
    Y and R is required (that is a joint model of the
    data and the missingness mechanism).

29
MNAR (continued)
  • Unfortunately
  • We cannot tell from the data at hand whether the
    missing observations are MCAR, MNAR or MAR
    (although we can distinguish between MCAR and
    MAR).
  • In the MNAR setting it is very rare to know the
    appropriate model for the missingness mechanism.
  • Hence the central role of sensitivity analysis
    we must explore how our inferences vary under
    assumptions of MAR, MNAR, and under various
    models. Unfortunately, this is often easier said
    than done, especially under the time and
    budgetary constraints of many applied projects.

30
Principled methods
  • These all have the following in common
  • No attempt is made to replace a missing value
    directly. i.e. we do not pretend to 'know' the
    missing values.
  • Rather available information (from the observed
    data and other contextual considerations) is
    combined with assumptions not dependent on the
    observed data.
  • This is used to
  • either generate statistical information about
    each missing value, e.g. distributional
    information given what we have observed, the
    missing observation has a normal distribution
    with mean a and variance b , where the
    parameters can be estimated from the data.
  • and/or generate information about the missing
    value mechanism.

31
Principled methods
  • The great range of ways in which these can be
    done leads to the plethora of approaches to
    missing values. Here are some broad classes of
    approach
  • Wholly model based methods.
  • Simple stochastic imputation.
  • Multiple stochastic imputation.
  • Weighted methods. (not covered here)

32
Wholly model based methods
  • A full statistical model is written down for the
    complete data.
  • Analysis (whether frequentist or Bayesian) is
    based on the likelihood.
  • Assumptions must be made about the missing data
    mechanism
  • If it is assumed MCAR or MAR, no explicit model
    is needed for it.
  • Otherwise this model must be included in the
    overall formulation.
  • Such likelihood analyses requires some form of
    integration (averaging) over the missing data.
    Depending on the setting this can be done
    implicitly or explicitly, directly or indirectly,
    analytically or numerically. The statistical
    information on the missing data is contained in
    the model. Examples of this would be the use of
    linear mixed models under MAR in SAS PROC MIXED
    or MLwiN.
  • We will examine this in the practical.

33
Simple stochastic imputation
  • Instead of replacing a value with a mean, a
    random draw is made from some suitable
    distribution.
  • Provided the distribution is chosen
    appropriately, consistent estimators can be
    obtained from methods that would work with the
    whole data set.
  • Very important in the large survey setting where
    draws are made from units with complete data that
    are 'similar' to the one with missing values
    (donors).
  • There are many variations on this hot-deck
    approach.
  • Implicitly they use non-parametric estimates of
    the distribution of the missing data typically
    need very large samples.

34
Simple stochastic imputation
  • Although the resulting estimators can behave
    well, for precision (and inference) account must
    be taken of the source of the imputations (i.e.
    there is no 'extra' data). This implies that the
    usual complete data estimators of precision can't
    be used. Thus, for each particular class of
    estimator (e.g. mean, ratio, percentile) each
    type of imputation has an associated variance
    estimator that may be design based (i.e. using
    the sampling structure of the survey) or model
    based, or model assisted (i.e. using some
    additional modelling assumptions). These variance
    estimators can be very complicated and are not
    convenient for generalization.

35
Multiple (stochastic) imputation
  • This is very similar to the single stochastic
    imputation method, except there are many ways in
    which draws can be made (e.g. hot-deck
    non-parametric, model based). The crucial
    difference is that, instead of completing the
    data once, the imputation process is repeated a
    small number of times (typically 5-10). Provided
    the draws are done properly, variance estimation
    (and hence constructing valid inferences) is much
    more straightforward.
  • The observed variability among the estimates from
    each imputed data set is used in modifying the
    complete data estimates of precision. In this
    way, valid inferences are obtained under missing
    at random.

36
Why do multiple imputation?
  • One of the main problems with the single
    stochastic imputation methods is the need for
    developing appropriate variance formulae for each
    different setting. Multiple imputation attempts
    to provide a procedure that can get the
    appropriate measures of precision relatively
    simply in (almost) any setting.
  • It was developed by Rubin is a survey setting
    (where it feels very natural) but has more
    recently been used more widely.

37
Missing Data and Random effects models
  • In the practical we will consider two approaches
  • Model based MCMC estimation of a multivariate
    response model.
  • Generating multiple imputations from this model
    (using MCMC) that can then be used to fit further
    models using any estimation method.

38
Information on practical
  • Practical introduces MVN models in MLwiN using
    MCMC.
  • Two education datasets.
  • Firstly two responses that are components within
    GCSE science exams in which we consider model
    based approaches.
  • Secondly a six responses dataset from Hungary in
    which we consider multiple imputation.

39
Other approaches to missing data
  • IGLS estimation of MVN models is available in
    MLwiN. Here the algorithm treats the MVN model as
    a special case of a univariate Normal model and
    so there are no overheads for missing data
    (assuming MAR).
  • WinBUGS has great flexibility with missing data.
    The MLwiN-gtWinBUGS interface will allow you to do
    the same model based approach as in the
    practical.
  • It can however also be used to incorporate
    imputation models as part of the model.

40
Plug for www.missingdata.org.uk
  • James Carpenter has developed MLwiN macros that
    perform multiple imputation using MCMC.
  • These build around the MCMC features in the
    practical but run an imputation model independent
    of the actual model of interest.
  • See www.missingdata.org.uk for further details
    including variants of these slides and WinBUGS
    practicals.
Write a Comment
User Comments (0)
About PowerShow.com