Title: The use of centred parameterisations and Markov chain Monte Carlo MCMC estimation to fit multilevel
1The use of centred parameterisations and Markov
chain Monte Carlo (MCMC)estimation to fit
multilevel discrete-time survival models
- William Browne, Mousa Golalizadeh, Martin Green
and Fiona Steele - Universities of Bristol and Nottingham
- Thanks to ESRC for supporting this work
2Summary
- Multilevel discrete-time survival analysis.
- Hierarchical centering.
- Application 1 Mastitis incidence in dairy
cattle. - Orthogonal polynomials.
- Application 2 Contraceptive discontinuation in
Indonesia. - Orthogonal predictors and parameter expansion.
3Multilevel discrete-time survival analysis
- Used to model the durations until the occurrence
of events e.g. length of time to death. - Different from standard regression due to
right-censoring and time varying covariates. - In many applications events may be repeatable and
the outcome is duration of continuous exposure to
the risk of an event. - In our applications cows may suffer mastitis more
than once and women can initiate and discontinue
use of contraceptives several times. - We begin with two levels of data episodes nested
within individuals. - With a discrete time approach we expand the data
to create a lower level time interval, which
represents a fixed period of time and is nested
within episode.
4Data Expansion
- Suppose we have time intervals (e.g. days, weeks)
indexed by t 1,...,K where K is the maximum
duration of an episode. - Let tij denote the number of intervals for which
individual i is observed in episode j. - Before modelling we now need to expand the data
for each episode ij to obtain tij records. - We have ytij0 for t1,,tij-1 and the response
for t tij is 1 if an episode ends in an event
and 0 for censored episodes. - Example Individual observed for 7 months and has
event after 4 months - Response vector (0,0,0,1,0,0,0),
- Time intervals (1,2,3,4,1,2,3)
5Modelling
- Having restructured the data we can analyse the
event occurrence indicator ytij using standard
methods for clustered binary response data. - Here we model the probability of an event
occurrence ptij as a function of duration (zt)
and covariates xtij with uj being an individual
specific random effect representing unobserved
time-invariant individual characteristics.
6Hierarchical centering
- A reparameterisation technique described
initially in Gelfand, Sahu and Carlin (1995). - The aim of reparameterisation techniques is to
replace original parameters in a model with new
parameters that are less correlated with each
other in the joint posterior distribution. - The model is then fitted using the new
parameterisation and the MCMC chains produced can
then be transformed back to the initial
parameters. - We will illustrate on an example of a simple
random effects logistic regression model using a
contraceptive use dataset from Bangladesh. The
dataset has two levels women nested within
districts with as a 0/1 response the use of
contraceptives.
7MCMC algorithm for random effect logistic
regression models
Consider the following simple model
A standard MCMC algorithm (as used in MLwiN
(Rasbash et al. (2000), Browne (2003)) is Step 1
Update using random walk Metropolis
sampling. Step 2 Update using
random walk Metropolis sampling. Step 3 Update
from its inverse Gamma full conditional
using Gibbs Sampling
8Hierarchical centered formulation
We can reparameterise by replacing the residuals
uj with random effects uj ß0uj. Here the uj
are (hierarchically) centred around ß0.
This formulation allows Gibbs sampling to be used
for the fixed effect ß0 in addition to the random
effects variance. It is interesting to consider
how an MCMC algorithm for this centered
formulation can be expressed in terms of the
original parameterisation.
9MCMC Algorithm
- At iteration t1
- Step 1 Update ß0 from its normal conditional
distribution - Adjust
- to keep uj fixed.
- The other steps are unchanged by
reparameterisation as the random walk Metropolis
sampling for uj is equivalent to the same step
for uj and the final step is unchanged.
10Results for Bangladeshi model
- Below we see results for a run of 50,000
iterations using each formulation. The effective
sample size (ESS) numbers show the improvement of
the hierarchical centred formulation.
An additional benefit is in terms of speed as the
hierarchical centred formulation takes 66s in
MLwiN as opposed to 101s for the standard
formulation.
11Further model for the Bangladeshi dataset
- We can add additional predictors to the model and
center any predictors that are defined at the
district level. - In this case we have two woman level predictors
age and urban and two district level predictors
percentage illiterate and percentage who pray
every day.
12Results
- Here we see results for runs of 50,000 iterations
for each method.
Here the HC formulation takes 123s as opposed to
220s for the standard method. Note the increase
in ESS only for the fixed effects involved in the
centering.
13Why should hierarchical centering be useful for
discrete time survival models ?
- As we have seen discrete time survival models are
a special case of random effects logistic
regression models. - Expansion results in many higher level predictors
that can be (hierarchically) centered around. - Hierarchical centering should improve mixing but
also speed up estimation. - Will be particularly useful when we have many
level 1 units per level 2 unit and large level 2
variance, where the speed up will be greatest
see application 1. - Not always useful but other potential solutions
see application 2.
14Application 1 Mastitis incidence in dairy cattle
- Mastitis inflammation of mammary gland of dairy
cows, usually caused by a bacterial infection.
Infections arising in dry (non-lactating) period
often result in clinical mastitis in early
lactation. - Green et al. (2007) use multilevel survival
models to investigate how cow, farm and
management factors in the dry period correlate
with mastitis incidence. - Data 2 years x 52 dairy farms 103 farm years
8,710 cow lactation periods following dry
periods expands to 256,382 records in total!
15Mastitis model
- The Model has many predictors.
- The duration terms zt consists of an intercept
plus polynomials in log time to order 3. - The 4 levels are farm, farm year, cow dry period
and weekly obs. although no random effects occur
at the dry period level. - The model can easily be converted to a
hierarchical centered formulation by centering
around the 10 farm-year level fixed effects
16Mastitis predictors
- Predictors at dry period level (6) are
- Parity of cow (5 categories 4 dummies)
- Dummies for Somatic cell count high before drying
off, and whether two SCC readings were available. - At the farm level (9)
- One dummy for cows remaining standing for 30
minutes after dry cow treatment, - Two dummy variables to indicate whether cubicle
bedding is disinfected in the early dry period
and whether this is not applicable due to the
system used. - Two dummy variables to indicate if transition cow
cubicles are bedded at least once daily and
whether there are in fact transition cow
cubicles. - A dummy variable as to whether the cubicle
bedding is disinfected in the transition dry
period - Three dummy variables to indicate whether the
transition cubicle feed and loaf area is scraped
daily, more often than daily or doesn't exist.
17Mastitis results
- The model was run for 50,000 iterations after an
adapting period and a further burnin of 500
iterations in MLwiN. - Due to the number of parameters in model we will
in the table that follows simply give effective
sample size (ESS) values although it should also
be noted that parameter values for the
non-centred formulation were rather variable due
to the small ESS - The estimation took 19 hours for the non-centred
formulation and 11½ hours for the centred
formulation.
18Effective sample sizes
19Orthogonal polynomials
- It is noticeable that the parameters a1 -a3 have
small ESS. These correspond to the predictor
polynomials in log duration and the three
correlations between the predictors are -0.61,
0.79 and -0.90 respectively. - We can without altering the rest of the analysis
replace this group of predictors via a
reparameterisation that makes the predictors less
correlated and in fact we choose to make them
orthogonal i.e. - To do this we can keep the first predictor and
then add the rest in one at a time solving the
above at each stage. - This results in predictors z
- logt, logt20.94logt, logt3 1.50logt2-2.05logt
20Results of orthogonal polynomials
- Fitting this model will give coefficient a that
can be easily converted back to the coefficients
for the original predictors - a1 a10.94 a 2-2.05 a 3, a2 a21.50 a 3, ,
a3 a3 - Effective sample size comparison
21Hierarchical centering in WinBUGS
- WinBUGS uses different MCMC algorithms for random
effect logistic regression models the
multivariate MH approach of Gamerman (1997) is
used. - This is much slower than simple random walk
Metropolis but produces less correlated chains. - We ran both formulations in WinBUGS for 15,000
iterations (as opposed to 50,000 in MLwiN) after
a 500 burnin. - The ESS were similar to the results from MLwiN
for both formulations however WinBUGS took 65
hours and 41.5 hours respectively - 3-4 times
longer than MLwiN!
22Application 2 Contraceptive discontinuation in
Indonesia
- Steele et al. (2004) use multilevel multistate
models to study transitions in and out of
contraceptive use in Indonesia. Here we consider
a simplification of their model which considers
only the transition from use to non-use, commonly
referred to as contraceptive discontinuation. - The data come from the 1997 Indonesia Demographic
and Health Survey. Contraceptive use histories
were collected retrospectively for the six-year
period prior to the survey, and include
information on the month and year of starting and
ceasing use, the method used, and the reason for
discontinuation. - The analysis is based on 17,833 episodes of
contraceptive use for 12,594 women, where an
episode is defined as a continuous period of
using the same method of contraception. - Restructuring the data to discrete-time format
with monthly time intervals leads to 365,205
records. To reduce the size, monthly intervals
are grouped into six-month intervals and a
binomial response is defined with denominator
equal to the number of months of use within a
six-month interval. Aggregation of intervals
leads to a dataset with 68,515 records.
23Model
- We here have intervals nested within episodes
nested within women.
Here we include discrete categories for the
duration terms zt a piecewise constant hazard
with categories representing 6-11 months, 12-23
months, 24-35 months and gt35 months with a base
category of 0-5 months.
24Predictors
- At the episode level we have
- Age categorized as lt25,25-34 and 34-49.
- Contraceptive method categorized as
pill/injectable, Norplant/IUD, other modern and
traditional. - At the woman level we have
- Education (3 categories).
- Type of region of residence (urban/rural).
- Socioeconomic status (low, medium or high).
25Results
The table below gives the effective sample sizes
based on runs of 250,000 iterations.
Here the hierarchical centered formulation does
really badly. This is because the cluster
variance s2u is very small estimates of 0.041
and 0.022 for the two methods
26A closer look at the residuals
- It is well known that hierarchical centering
works best if the cluster level variance is
substantial. - Here we see that both the variance is small and
the distribution of the residuals is not very
normal. - This is due to a few women who discontinue usage
very quickly and often, whilst many women never
discontinue!
Std residuals
Normal scores
27Simple logistic regression
- We will consider first removing the random
effects from the model (due to their small
variance) which will result in a simple logistic
regression model. - It will then be impossible to perform
hierarchical centering however we will consider
an extension to the orthogonalisation performed
in the first application. - Note that Hills and Smith (1992) talk about using
orthogonal parameterisations and Roberts and
Gilks give it one sentence in MCMC in Practice.
Here we consider it in combination with the
simple (single site) random walk Metropolis
sampler where reduction of correlation in the
posterior is perhaps most important.
28Orthogonal parameterisation
- For simplicity assume we have all predictors in
one matrix P and that we can write ztaxtijß as
ptij? where ?(a,ß). - Step 1 Number the predictors in some ordering
1,,N. - Step 2 Take each predictor in turn and replace
it with a predictor that is orthogonal to all the
already considered predictors. - For predictor pk.
- Note this requires solving k-1 equations in k-1
unknown w parameters. - A different orthogonal set of predictors results
from each ordering.
29Orthogonal parameterisation
- The second step of the algorithm produces both a
set of orthogonal predictors that span the same
space as the original predictors and a group of w
coefficients that can be combined to form a lower
diagonal matrix W. - We can fit this model and recover the
coefficients for the original predictors by
pre-multiplication by WT. - It is worth noting here that we use improper
Uniform priors for the coefficients and if we
used proper priors we would need to also
calculate the Jacobian for the reparameterisation
to ensure the same priors are used. - We ordered the predictors in what follows so that
the level 2 predictors were last before
performing reparameterisation.
30Results
- The following is based on 50,000 iterations
Here we see almost universal benefit of the
orthogonal parameterisation with virtually zero
time costs and very little programming!
31Parameter expansion
- We havent here solved the problem of the small
random effect variance. - In this case one possibility is to consider the
technique of Parameter Expansion (Lui, Rubin and
Wu 1998, Lui and Wu 1999). - This has also been considered for random effect
models by Van Dyk and Meng (2001), Browne(2004)
and Gelman et al. (2007). - Basically we embed the desired model in a larger
un-identified model in which mixing is good and
from which can be recovered the desired
identified model. - In random effects models this involves
multiplying all the random effects by a parameter
to be estimated.
32Fitted model
- Combining orthogonalisation and parameter
expansion we have
We ran this model using WinBUGS and only 25,000
iterations following a burnin of 500 iterations
which took 34 hours compared to 23½ for 250k in
MLwiN without parameter expansion. The results
are given overleaf.
33Results for full model
- Here we compare simply using the orthogonal
approach in MLwiN for 250k with both orthogonal
predictors and parameter expansion in WinBUGS for
25k. Note this takes 1.5 times as long
34Trace plots of variance chain
Before Parameter Expansion
Here we see the far greater mixing of the
variance chain after parameter expansion. It is
worth noting that parameter expansion uses a
different prior for s2u and results in an
estimate of 0.059 (0.048) as opposed to 0.008
(0.006) without and earlier estimates of
0.041(0.026) and 0.022 (0.018) before
orthogonalisation. Note however that all
estimates bar parameter expansion are based on
very low ESS!
After Parameter Expansion
35Discussion
- We began by showing how hierarchical centering
can be used in multilevel discrete time survival
models. - We contrasted two examples were it did very well
and were it didnt. - The main factors determining its success are
size of the higher level variance and relative
numbers of level 1/level 2 units. - We demonstrated other methods orthogonalisation
of predictors and parameter expansion which are
also useful. - An interesting area of further research is
choosing the order for orthogonalisation i.e.
which set of orthogonal predictors to use. - Other research building on hierarchical centering
includes Papaspiliopoulus et al. (2003,2007)
showing how to construct partially non-centred
parameterisations.
36References
- Browne, W.J. (2003). MCMC Estimation in MLwiN.
London Institute of Education, University of
London - Browne, W.J. (2004). An illustration of the use
of reparameterisation methods for improving MCMC
efficiency in crossed random effect models
Multilevel Modelling Newsletter 16 (1) 13-25 - Gamerman D. (1997) Sampling from the posterior
distribution in generalized linear mixed models.
Statistics and Computing. 7, 57--68. - Gelfand, A.E., Sahu, S.K. and Carlin, B.P. (1995)
Efficient parameterisations for normal linear
mixed models. Biometrika 83, 479--488 - Gelman, A., Huang, Z., van Dyk, D., and
Boscardin, W.J. (2007). Using redundant
parameterizations to fit hierarchical models.
Journal of Computational and Graphical Statistics
(to appear). - Green, M.J., Bradley, A.J., Medley, G.F. and
Browne, W.J. (2007) Cow, Farm and Management
Factors during the Dry Period that determine the
rate of Clinical Mastitis after calving. Journal
of Dairy Science 90 3764--3776. - Hills, S.E. and Smith, A.F.M. (1992)
Parameterization Issues in Bayesian Inference. In
Bayesian Statistics 4, (J M Bernardo, J O Berger,
A P Dawid, and A F M Smith, eds), Oxford
University Press, UK, pp. 227--246.
37References cont.
- Liu, C., Rubin, D.B., and Wu, Y.N. (1998)
Parameter expansion to accelerate EM The PX-EM
algorithm. Biometrika 85 (4) 755-770. - Liu, J.S., Wu, Y.N. (1999) Parameter Expansion
for Data Augmentation. Journal Of The American
Statistical Association 94 1264-1274 - Papaspiliopoulos, O, Roberts, G.O. and Skold, M.
(2003) Non-centred Parameterisations for
Hierarchical Models and Data Augmentation. In
Bayesian Statistics 7, (J M Bernardo, M J
Bayarri, J O Berger, A P Dawid, D Heckerman, A F
M Smith and M West, eds), Oxford University
Press, UK, pp. 307--32 - Papaspiliopoulos, O, Roberts, G.O. and Skold, M.
(2007) A General Framework for the
Parametrization of Hierarchical Models.
Statistical Science 22, 59--73. - Rasbash, J., Browne, W.J., Healy, M, Cameron, B
and Charlton, C. (2000). The MLwiN software
package version 1.10. London Institute of
Education, University of London. - Steele, F., Goldstein, H. and Browne, W.J.
(2004). A general multilevel multistate competing
risks model for event history data, with an
application to a study of contraceptive use
dynamics. Statistical Modelling 4 145--159 - Van Dyk, D.A., and Meng, X-L. (2001) The Art of
Data Augmentation. Journal of Computational and
Graphical Statistics. 10, 1--50.