The use of centred parameterisations and Markov chain Monte Carlo MCMC estimation to fit multilevel - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

The use of centred parameterisations and Markov chain Monte Carlo MCMC estimation to fit multilevel

Description:

... of mammary gland of dairy cows, usually caused by a bacterial ... years x 52 dairy farms 103 farm years 8,710 cow lactation periods following ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 38
Provided by: Bro78
Category:

less

Transcript and Presenter's Notes

Title: The use of centred parameterisations and Markov chain Monte Carlo MCMC estimation to fit multilevel


1
The use of centred parameterisations and Markov
chain Monte Carlo (MCMC)estimation to fit
multilevel discrete-time survival models
  • William Browne, Mousa Golalizadeh, Martin Green
    and Fiona Steele
  • Universities of Bristol and Nottingham
  • Thanks to ESRC for supporting this work

2
Summary
  • Multilevel discrete-time survival analysis.
  • Hierarchical centering.
  • Application 1 Mastitis incidence in dairy
    cattle.
  • Orthogonal polynomials.
  • Application 2 Contraceptive discontinuation in
    Indonesia.
  • Orthogonal predictors and parameter expansion.

3
Multilevel discrete-time survival analysis
  • Used to model the durations until the occurrence
    of events e.g. length of time to death.
  • Different from standard regression due to
    right-censoring and time varying covariates.
  • In many applications events may be repeatable and
    the outcome is duration of continuous exposure to
    the risk of an event.
  • In our applications cows may suffer mastitis more
    than once and women can initiate and discontinue
    use of contraceptives several times.
  • We begin with two levels of data episodes nested
    within individuals.
  • With a discrete time approach we expand the data
    to create a lower level time interval, which
    represents a fixed period of time and is nested
    within episode.

4
Data Expansion
  • Suppose we have time intervals (e.g. days, weeks)
    indexed by t 1,...,K where K is the maximum
    duration of an episode.
  • Let tij denote the number of intervals for which
    individual i is observed in episode j.
  • Before modelling we now need to expand the data
    for each episode ij to obtain tij records.
  • We have ytij0 for t1,,tij-1 and the response
    for t tij is 1 if an episode ends in an event
    and 0 for censored episodes.
  • Example Individual observed for 7 months and has
    event after 4 months
  • Response vector (0,0,0,1,0,0,0),
  • Time intervals (1,2,3,4,1,2,3)

5
Modelling
  • Having restructured the data we can analyse the
    event occurrence indicator ytij using standard
    methods for clustered binary response data.
  • Here we model the probability of an event
    occurrence ptij as a function of duration (zt)
    and covariates xtij with uj being an individual
    specific random effect representing unobserved
    time-invariant individual characteristics.

6
Hierarchical centering
  • A reparameterisation technique described
    initially in Gelfand, Sahu and Carlin (1995).
  • The aim of reparameterisation techniques is to
    replace original parameters in a model with new
    parameters that are less correlated with each
    other in the joint posterior distribution.
  • The model is then fitted using the new
    parameterisation and the MCMC chains produced can
    then be transformed back to the initial
    parameters.
  • We will illustrate on an example of a simple
    random effects logistic regression model using a
    contraceptive use dataset from Bangladesh. The
    dataset has two levels women nested within
    districts with as a 0/1 response the use of
    contraceptives.

7
MCMC algorithm for random effect logistic
regression models
Consider the following simple model
A standard MCMC algorithm (as used in MLwiN
(Rasbash et al. (2000), Browne (2003)) is Step 1
Update using random walk Metropolis
sampling. Step 2 Update using
random walk Metropolis sampling. Step 3 Update
from its inverse Gamma full conditional
using Gibbs Sampling
8
Hierarchical centered formulation
We can reparameterise by replacing the residuals
uj with random effects uj ß0uj. Here the uj
are (hierarchically) centred around ß0.
This formulation allows Gibbs sampling to be used
for the fixed effect ß0 in addition to the random
effects variance. It is interesting to consider
how an MCMC algorithm for this centered
formulation can be expressed in terms of the
original parameterisation.
9
MCMC Algorithm
  • At iteration t1
  • Step 1 Update ß0 from its normal conditional
    distribution
  • Adjust
  • to keep uj fixed.
  • The other steps are unchanged by
    reparameterisation as the random walk Metropolis
    sampling for uj is equivalent to the same step
    for uj and the final step is unchanged.

10
Results for Bangladeshi model
  • Below we see results for a run of 50,000
    iterations using each formulation. The effective
    sample size (ESS) numbers show the improvement of
    the hierarchical centred formulation.

An additional benefit is in terms of speed as the
hierarchical centred formulation takes 66s in
MLwiN as opposed to 101s for the standard
formulation.
11
Further model for the Bangladeshi dataset
  • We can add additional predictors to the model and
    center any predictors that are defined at the
    district level.
  • In this case we have two woman level predictors
    age and urban and two district level predictors
    percentage illiterate and percentage who pray
    every day.

12
Results
  • Here we see results for runs of 50,000 iterations
    for each method.

Here the HC formulation takes 123s as opposed to
220s for the standard method. Note the increase
in ESS only for the fixed effects involved in the
centering.
13
Why should hierarchical centering be useful for
discrete time survival models ?
  • As we have seen discrete time survival models are
    a special case of random effects logistic
    regression models.
  • Expansion results in many higher level predictors
    that can be (hierarchically) centered around.
  • Hierarchical centering should improve mixing but
    also speed up estimation.
  • Will be particularly useful when we have many
    level 1 units per level 2 unit and large level 2
    variance, where the speed up will be greatest
    see application 1.
  • Not always useful but other potential solutions
    see application 2.

14
Application 1 Mastitis incidence in dairy cattle
  • Mastitis inflammation of mammary gland of dairy
    cows, usually caused by a bacterial infection.
    Infections arising in dry (non-lactating) period
    often result in clinical mastitis in early
    lactation.
  • Green et al. (2007) use multilevel survival
    models to investigate how cow, farm and
    management factors in the dry period correlate
    with mastitis incidence.
  • Data 2 years x 52 dairy farms 103 farm years
    8,710 cow lactation periods following dry
    periods expands to 256,382 records in total!

15
Mastitis model
  • The Model has many predictors.
  • The duration terms zt consists of an intercept
    plus polynomials in log time to order 3.
  • The 4 levels are farm, farm year, cow dry period
    and weekly obs. although no random effects occur
    at the dry period level.
  • The model can easily be converted to a
    hierarchical centered formulation by centering
    around the 10 farm-year level fixed effects

16
Mastitis predictors
  • Predictors at dry period level (6) are
  • Parity of cow (5 categories 4 dummies)
  • Dummies for Somatic cell count high before drying
    off, and whether two SCC readings were available.
  • At the farm level (9)
  • One dummy for cows remaining standing for 30
    minutes after dry cow treatment,
  • Two dummy variables to indicate whether cubicle
    bedding is disinfected in the early dry period
    and whether this is not applicable due to the
    system used.
  • Two dummy variables to indicate if transition cow
    cubicles are bedded at least once daily and
    whether there are in fact transition cow
    cubicles.
  • A dummy variable as to whether the cubicle
    bedding is disinfected in the transition dry
    period
  • Three dummy variables to indicate whether the
    transition cubicle feed and loaf area is scraped
    daily, more often than daily or doesn't exist.

17
Mastitis results
  • The model was run for 50,000 iterations after an
    adapting period and a further burnin of 500
    iterations in MLwiN.
  • Due to the number of parameters in model we will
    in the table that follows simply give effective
    sample size (ESS) values although it should also
    be noted that parameter values for the
    non-centred formulation were rather variable due
    to the small ESS
  • The estimation took 19 hours for the non-centred
    formulation and 11½ hours for the centred
    formulation.

18
Effective sample sizes
19
Orthogonal polynomials
  • It is noticeable that the parameters a1 -a3 have
    small ESS. These correspond to the predictor
    polynomials in log duration and the three
    correlations between the predictors are -0.61,
    0.79 and -0.90 respectively.
  • We can without altering the rest of the analysis
    replace this group of predictors via a
    reparameterisation that makes the predictors less
    correlated and in fact we choose to make them
    orthogonal i.e.
  • To do this we can keep the first predictor and
    then add the rest in one at a time solving the
    above at each stage.
  • This results in predictors z
  • logt, logt20.94logt, logt3 1.50logt2-2.05logt

20
Results of orthogonal polynomials
  • Fitting this model will give coefficient a that
    can be easily converted back to the coefficients
    for the original predictors
  • a1 a10.94 a 2-2.05 a 3, a2 a21.50 a 3, ,
    a3 a3
  • Effective sample size comparison

21
Hierarchical centering in WinBUGS
  • WinBUGS uses different MCMC algorithms for random
    effect logistic regression models the
    multivariate MH approach of Gamerman (1997) is
    used.
  • This is much slower than simple random walk
    Metropolis but produces less correlated chains.
  • We ran both formulations in WinBUGS for 15,000
    iterations (as opposed to 50,000 in MLwiN) after
    a 500 burnin.
  • The ESS were similar to the results from MLwiN
    for both formulations however WinBUGS took 65
    hours and 41.5 hours respectively - 3-4 times
    longer than MLwiN!

22
Application 2 Contraceptive discontinuation in
Indonesia
  • Steele et al. (2004) use multilevel multistate
    models to study transitions in and out of
    contraceptive use in Indonesia. Here we consider
    a simplification of their model which considers
    only the transition from use to non-use, commonly
    referred to as contraceptive discontinuation.
  • The data come from the 1997 Indonesia Demographic
    and Health Survey. Contraceptive use histories
    were collected retrospectively for the six-year
    period prior to the survey, and include
    information on the month and year of starting and
    ceasing use, the method used, and the reason for
    discontinuation.
  • The analysis is based on 17,833 episodes of
    contraceptive use for 12,594 women, where an
    episode is defined as a continuous period of
    using the same method of contraception.
  • Restructuring the data to discrete-time format
    with monthly time intervals leads to 365,205
    records. To reduce the size, monthly intervals
    are grouped into six-month intervals and a
    binomial response is defined with denominator
    equal to the number of months of use within a
    six-month interval. Aggregation of intervals
    leads to a dataset with 68,515 records.

23
Model
  • We here have intervals nested within episodes
    nested within women.

Here we include discrete categories for the
duration terms zt a piecewise constant hazard
with categories representing 6-11 months, 12-23
months, 24-35 months and gt35 months with a base
category of 0-5 months.
24
Predictors
  • At the episode level we have
  • Age categorized as lt25,25-34 and 34-49.
  • Contraceptive method categorized as
    pill/injectable, Norplant/IUD, other modern and
    traditional.
  • At the woman level we have
  • Education (3 categories).
  • Type of region of residence (urban/rural).
  • Socioeconomic status (low, medium or high).

25
Results
The table below gives the effective sample sizes
based on runs of 250,000 iterations.
Here the hierarchical centered formulation does
really badly. This is because the cluster
variance s2u is very small estimates of 0.041
and 0.022 for the two methods
26
A closer look at the residuals
  • It is well known that hierarchical centering
    works best if the cluster level variance is
    substantial.
  • Here we see that both the variance is small and
    the distribution of the residuals is not very
    normal.
  • This is due to a few women who discontinue usage
    very quickly and often, whilst many women never
    discontinue!

Std residuals
Normal scores
27
Simple logistic regression
  • We will consider first removing the random
    effects from the model (due to their small
    variance) which will result in a simple logistic
    regression model.
  • It will then be impossible to perform
    hierarchical centering however we will consider
    an extension to the orthogonalisation performed
    in the first application.
  • Note that Hills and Smith (1992) talk about using
    orthogonal parameterisations and Roberts and
    Gilks give it one sentence in MCMC in Practice.
    Here we consider it in combination with the
    simple (single site) random walk Metropolis
    sampler where reduction of correlation in the
    posterior is perhaps most important.

28
Orthogonal parameterisation
  • For simplicity assume we have all predictors in
    one matrix P and that we can write ztaxtijß as
    ptij? where ?(a,ß).
  • Step 1 Number the predictors in some ordering
    1,,N.
  • Step 2 Take each predictor in turn and replace
    it with a predictor that is orthogonal to all the
    already considered predictors.
  • For predictor pk.
  • Note this requires solving k-1 equations in k-1
    unknown w parameters.
  • A different orthogonal set of predictors results
    from each ordering.

29
Orthogonal parameterisation
  • The second step of the algorithm produces both a
    set of orthogonal predictors that span the same
    space as the original predictors and a group of w
    coefficients that can be combined to form a lower
    diagonal matrix W.
  • We can fit this model and recover the
    coefficients for the original predictors by
    pre-multiplication by WT.
  • It is worth noting here that we use improper
    Uniform priors for the coefficients and if we
    used proper priors we would need to also
    calculate the Jacobian for the reparameterisation
    to ensure the same priors are used.
  • We ordered the predictors in what follows so that
    the level 2 predictors were last before
    performing reparameterisation.

30
Results
  • The following is based on 50,000 iterations

Here we see almost universal benefit of the
orthogonal parameterisation with virtually zero
time costs and very little programming!
31
Parameter expansion
  • We havent here solved the problem of the small
    random effect variance.
  • In this case one possibility is to consider the
    technique of Parameter Expansion (Lui, Rubin and
    Wu 1998, Lui and Wu 1999).
  • This has also been considered for random effect
    models by Van Dyk and Meng (2001), Browne(2004)
    and Gelman et al. (2007).
  • Basically we embed the desired model in a larger
    un-identified model in which mixing is good and
    from which can be recovered the desired
    identified model.
  • In random effects models this involves
    multiplying all the random effects by a parameter
    to be estimated.

32
Fitted model
  • Combining orthogonalisation and parameter
    expansion we have

We ran this model using WinBUGS and only 25,000
iterations following a burnin of 500 iterations
which took 34 hours compared to 23½ for 250k in
MLwiN without parameter expansion. The results
are given overleaf.
33
Results for full model
  • Here we compare simply using the orthogonal
    approach in MLwiN for 250k with both orthogonal
    predictors and parameter expansion in WinBUGS for
    25k. Note this takes 1.5 times as long

34
Trace plots of variance chain
Before Parameter Expansion
Here we see the far greater mixing of the
variance chain after parameter expansion. It is
worth noting that parameter expansion uses a
different prior for s2u and results in an
estimate of 0.059 (0.048) as opposed to 0.008
(0.006) without and earlier estimates of
0.041(0.026) and 0.022 (0.018) before
orthogonalisation. Note however that all
estimates bar parameter expansion are based on
very low ESS!
After Parameter Expansion
35
Discussion
  • We began by showing how hierarchical centering
    can be used in multilevel discrete time survival
    models.
  • We contrasted two examples were it did very well
    and were it didnt.
  • The main factors determining its success are
    size of the higher level variance and relative
    numbers of level 1/level 2 units.
  • We demonstrated other methods orthogonalisation
    of predictors and parameter expansion which are
    also useful.
  • An interesting area of further research is
    choosing the order for orthogonalisation i.e.
    which set of orthogonal predictors to use.
  • Other research building on hierarchical centering
    includes Papaspiliopoulus et al. (2003,2007)
    showing how to construct partially non-centred
    parameterisations.

36
References
  • Browne, W.J. (2003). MCMC Estimation in MLwiN.
    London Institute of Education, University of
    London
  • Browne, W.J. (2004). An illustration of the use
    of reparameterisation methods for improving MCMC
    efficiency in crossed random effect models
    Multilevel Modelling Newsletter 16 (1) 13-25
  • Gamerman D. (1997) Sampling from the posterior
    distribution in generalized linear mixed models.
    Statistics and Computing. 7, 57--68.
  • Gelfand, A.E., Sahu, S.K. and Carlin, B.P. (1995)
    Efficient parameterisations for normal linear
    mixed models. Biometrika 83, 479--488
  • Gelman, A., Huang, Z., van Dyk, D., and
    Boscardin, W.J. (2007). Using redundant
    parameterizations to fit hierarchical models.
    Journal of Computational and Graphical Statistics
    (to appear).
  • Green, M.J., Bradley, A.J., Medley, G.F. and
    Browne, W.J. (2007) Cow, Farm and Management
    Factors during the Dry Period that determine the
    rate of Clinical Mastitis after calving. Journal
    of Dairy Science 90 3764--3776.
  • Hills, S.E. and Smith, A.F.M. (1992)
    Parameterization Issues in Bayesian Inference. In
    Bayesian Statistics 4, (J M Bernardo, J O Berger,
    A P Dawid, and A F M Smith, eds), Oxford
    University Press, UK, pp. 227--246.

37
References cont.
  • Liu, C., Rubin, D.B., and Wu, Y.N. (1998)
    Parameter expansion to accelerate EM The PX-EM
    algorithm. Biometrika 85 (4) 755-770.
  • Liu, J.S., Wu, Y.N. (1999) Parameter Expansion
    for Data Augmentation. Journal Of The American
    Statistical Association 94 1264-1274
  • Papaspiliopoulos, O, Roberts, G.O. and Skold, M.
    (2003) Non-centred Parameterisations for
    Hierarchical Models and Data Augmentation. In
    Bayesian Statistics 7, (J M Bernardo, M J
    Bayarri, J O Berger, A P Dawid, D Heckerman, A F
    M Smith and M West, eds), Oxford University
    Press, UK, pp. 307--32
  • Papaspiliopoulos, O, Roberts, G.O. and Skold, M.
    (2007) A General Framework for the
    Parametrization of Hierarchical Models.
    Statistical Science 22, 59--73.
  • Rasbash, J., Browne, W.J., Healy, M, Cameron, B
    and Charlton, C. (2000). The MLwiN software
    package version 1.10. London Institute of
    Education, University of London.
  • Steele, F., Goldstein, H. and Browne, W.J.
    (2004). A general multilevel multistate competing
    risks model for event history data, with an
    application to a study of contraceptive use
    dynamics. Statistical Modelling 4 145--159
  • Van Dyk, D.A., and Meng, X-L. (2001) The Art of
    Data Augmentation. Journal of Computational and
    Graphical Statistics. 10, 1--50.
Write a Comment
User Comments (0)
About PowerShow.com