MCMC for multilevel logistic regressions in MLwiN - PowerPoint PPT Presentation

About This Presentation
Title:

MCMC for multilevel logistic regressions in MLwiN

Description:

... insemination was natural or artificial) and heifer (the age of the cow) ... check whether we need random effects and whether to include heifer in the model. ... – PowerPoint PPT presentation

Number of Views:315
Avg rating:3.0/5.0
Slides: 41
Provided by: mat135
Category:

less

Transcript and Presenter's Notes

Title: MCMC for multilevel logistic regressions in MLwiN


1
Lecture 14
  • MCMC for multilevel logistic regressions in MLwiN

2
Lecture Contents
  • Recap of MCMC from day 3
  • Recap of logistic regression from days 1 4
  • Metropolis Hastings sampling
  • Reunion Island dataset (2 and 3 level)
  • VPC for binomial models
  • Method Comparison on Rodriguez/Goldman examples

3
MCMC Methods (recap)
  • Goal To sample from joint posterior
    distribution.
  • Problem For complex models this involves
    multidimensional integration.
  • Solution It may be possible to sample from
    conditional posterior distributions,
  • It can be shown that after convergence such a
    sampling approach generates dependent samples
    from the joint posterior distribution.

4
Gibbs Sampling (recap)
  • When we can sample directly from the conditional
    posterior distributions then such an algorithm is
    known as Gibbs Sampling.
  • This proceeds as follows for the variance
    components example
  • Firstly give all unknown parameters starting
    values,
  • Next loop through the following steps

5
Gibbs Sampling for VC model
  • Sample from

These steps are then repeated with the
generated values from this loop replacing the
starting values. The chain of values produced by
this procedure are known as a Markov chain. Note
that ß is generated as a block while each uj is
updated individually.
6
Algorithm Summary
  • Repeat the following four steps
  • 1. Generate ß from its (Multivariate) Normal
    conditional distribution.
  • 2. Generate each uj from its Normal conditional
    distribution.
  • 3. Generate 1/su2 from its Gamma conditional
    distribution.
  • 3. Generate 1/se2 from its Gamma conditional
    distribution.

7
Logistic regression model
  • A standard Bayesian logistic regression model
    (e.g. for the rat tumour example) can be written
    as follows
  • Both MLwiN and WinBUGS can fit this model but can
    we write out the conditional posterior
    distributions and use Gibbs Sampling?

8
Conditional distribution for ß0
This distribution is not a standard distribution
and so we cannot simply simulate from a
standard random number generator. However both
WinBUGS and MLwiN can fit this model using
MCMC. We will in this lecture describe how MLwiN
does this before considering WinBUGS in the next
lecture.
9
Metropolis Hastings (MH) sampling
  • An alternative, and more general, way to
    construct am MCMC sampler.
  • A form of generalised rejection sampling (see
    later) where values are drawn from approximate
    distributions and corrected so that,
    asymptotically they behave as random observations
    from the target distribution.
  • MH sampling algorithms sequentially draw
    candidate observations from a proposal
    distribution, conditional on the current
    observations thus inducing a Markov chain.

10
General MH algorithm step
  • Let us focus on a single parameter ? and its
    posterior distribution p(?Y).
  • Now at iteration t, ? takes value ?t and we
    generate a new value ? from a proposal
    distribution q.
  • Then we accept this new value and let
  • ?t1 ? with acceptance probability a(?, ?t)
    otherwise we set ?t1 ?t.
  • The acceptance probability

11
Choosing a proposal distribution
  • Remarkably the proposal distribution can have
    almost any form.
  • There are some (silly) exceptions e.g. a proposal
    that has point mass at one value but assuming
    that the proposal allows the chain to explore the
    whole posterior and doesnt produce a recurrent
    chain we are OK.
  • Three special cases of Metropolis Hasting
    algorithm are
  • Random walk metropolis sampling.
  • Independence sampling.
  • Gibbs sampling!

12
Pure Metropolis Sampling
  • The general MH sampling algorithm is due to
    Hastings (1970) however this is a generalisation
    of pure Metropolis Sampling (Metropolis et al.
    (1953)).
  • This special case is when
  • i.e. the proposal distribution is symmetric.
  • This then reduces the acceptance probability to

13
Random Walk Metropolis
  • This is an example of pure Metropolis sampling.
  • Here q(?1?2) q(?1 ?2).
  • Typical examples of random walk proposals are
    Normal distributions centered around the current
    value of the parameter i.e. q(?1?2) N(?2,s2)
    where s2 is the (fixed) proposal variance that
    can be tuned to give particular acceptance rates.
  • This is the method used within MLwiN.

14
Independence Sampler
  • The independence sampler is so called as each
    proposal is independent of the current parameter
    value i.e.
  • q(?1?2) q(?1).
  • This leads to acceptance probability
  • Example proposal distributions could be a Normal
    based around the ML estimate with inflated
    variance. The independence sampler can sometimes
    work very well but equally can work very badly!

15
Gibbs Sampling
  • The Gibbs sampler that we have already studied is
    a special case of the MH algorithm. The proposal
    distribution is the full conditional distribution
    which leads to acceptance rate 1 as shown below

16
MH Sampling in MLwiN
  • MLwiN actually uses a hybrid method.
  • Gibbs sampling steps are used for variance
    parameters.
  • MH steps are used for fixed effects and
    residuals.
  • Univariate Normal proposal distributions are
    used.
  • For the proposal standard deviation a scaled IGLS
    standard error is initially used (multiplied by
    5.8 on the variance scale).
  • However an adaptive method is used to tune these
    proposal distributions prior to the burn-in
    period.

17
MH Sampling for Normal model
  • For the education dataset we can illustrate MH
    Sampling on the VC model by modifying steps 1 and
    2.
  • Repeat the following four steps
  • 1. Generate ßi by Univariate Normal MH Sampling.
  • 2. Generate each uj by Univariate Normal MH
    Sampling.
  • 3. Generate 1/su2 from its Gamma conditional
    distribution.
  • 3. Generate 1/se2 from its Gamma conditional
    distribution.

18
MH Sampling in MLwiN
  • Here we see how to change method. Note for
    Binomial responses this will change automatically
    and Gibbs Sampling will not be available.

19
Trajectories plot
  • Here we see MH sampling for ß.

20
Adaptive Method (ad hoc)
  • One way of finding a good proposal distribution
    is to choose a distribution that gives a
    particular acceptance rate. It has been shown
    that a good acceptance rate is often around
    50.
  • MLwiN has incorporated an adaptive method that
    uses this fact to construct univariate Normal
    proposals with an acceptance rate of
    approximately 50.
  • Method
  • Before the burn-in we have an adaptation period
    where the sampler improves the proposal
    distribution. The adaptive method requires a
    desired acceptance rate e.g. 50 and tolerance
    e.g. 10 resulting in an acceptable range of
    (40,60).

21
Adaptive method algorithm
  • Run the sampler for consecutive batches of 100
    iterations. Compare the number accepted, N with
    the desired acceptance rate, R.
  • Repeat this procedure until 3 consecutive values
    of N lie within the acceptable range and then
    mark this parameter. When all the parameters are
    marked the adaptation period is over.
  • N.B. Proposal SDs are still modified after being
    marked until adaptation period is over.

22
Example of Adaptive period
  • Adaptive method used on parameter, ß0

N SD Accepted N in Row
0 1.0 - -
100 0.505 1 0
200 0.263 4 0
300 0.138 5 0
400 0.074 7 0
500 0.046 19 0
600 0.032 29 0
700 0.031 48 1
800 0.026 40 2
900 0.024 46 3
1000 0.021 51 3
1500 0.022 48 3
23
Comparison of Gibbs vs MH
  • MLwiN also has a MVN MH algorithm. A comparison
    of ESS for a run of 5,000 iterations of the VC
    model follows

Gibbs MH MH MV
ß0 216 33 59
ß1 4413 973 303
s2u 2821 2140 2919
s2e 4712 4895 4728
24
2 level Reunion Island dataset
  • We have already studied this dataset with a
    continuous response.
  • Here we consider a subset with only the 1st
    lactation for each cow resulting in 2 levels
    cows nested within herds.
  • The (Binary) response of interest is fscr
    whether the first service results in a
    conception.
  • There are two predictors ai (whether
    insemination was natural or artificial) and
    heifer (the age of the cow).

25
MCMC algorithm
  • Our MLwiN algorithm has 3 steps
  • 1. Generate ßi by Univariate Normal MH Sampling.
  • 2. Generate each uj by Univariate Normal MH
    Sampling.
  • 3. Generate 1/su2 from its Gamma conditional
    distribution.

26
MLwin Demo
  • The 2-level model is set up and run in IGLS
    giving the following starting values

27
Trajectories for 5k iterations
  • Here we see some poor mixing particularly for the
    variance

28
DIC for binomial models
  • We can use DIC to check whether we need random
    effects and whether to include heifer in the
    model. Note we only ran each model for 5,000
    iterations.

Model pD DIC
VC AI 18.37 2087.15 (2086.48 50k)
LR AI 2.16 2095.70
VC AI Heifer 22.16 2087.43
29
VC Model after 50k iterations
  • Here is a trace for the herd level variance after
    50k iterations that suggests we need to run even
    longer!

30
VPC for Binomial models
  • VPC is harder to calculate for Binomial models as
    the level 1 variance is part of the Binomial
    distribution and hence related to the mean and on
    a different scale to higher level variances.
  • Goldstein et al. (2002) propose 4 methods
  • Use a Taylor series approximation.
  • Use a simulation based approach.
  • Switch to a Normal response model.
  • Use the latent variable approach in Snijders and
    Bosker.

31
VPC for Binomial models
  • Snijders and Bosker (1999) suggest the following
  • The variance of a standard logistic distribution
    is p2/3 and so the level 1 variance should be
    replaced by this value.
  • In the Reunion Island example this means
  • Or in other words 2.63 of the variation is at
    the herd level. The fact that there isnt a huge
    level 2 variance may in part explain the poor
    mixing of the MCMC algorithm.

32
3 level reunion island dataset
  • We will now fit the same models to the 3-level
    dataset. After 5,000 we have the following rather
    worrying results

33
Running for longer
  • We ran the chains for 100k after a burn-in of 5k
    and thinned the chains by a factor of 10. The
    results are still a little worrying

34
Potential solutions
  • In the last 2 slides we have seen some bad mixing
    behaviour for some parameters.
  • The solution of running for longer seems to work
    but we need to run for a very long time!
  • In the next lecture we will look at WinBUGS
    methods for this model and also a
    reparameterisation of the model known as
    hierarchical centering.
  • For this model it looks like MCMC is extremely
    computationally intensive to get reliable
    estimates. In what follows we look at an example
    where MCMC is clearly useful.

35
The Guatemalan Child Health dataset.
  • This consists of a subsample of 2,449 respondents
    from the 1987 National Survey of Maternal and
    Child Helath, with a 3-level structure of births
    within mothers within communities.
  • The subsample consists of all women from the
    chosen communities who had some form of prenatal
    care during pregnancy. The response variable is
    whether this prenatal care was modern (physician
    or trained nurse) or not.
  • Rodriguez and Goldman (1995) use the structure of
    this dataset to consider how well
    quasi-likelihood methods compare with considering
    the dataset without the multilevel structure and
    fitting a standard logistic regression.
  • They perform this by constructing simulated
    datasets based on the original structure but with
    known true values for the fixed effects and
    variance parameters.
  • They consider the MQL method and show that the
    estimates of the fixed effects produced by MQL
    are worse than the estimates produced by standard
    logistic regression disregarding the multilevel
    structure!

36
The Guatemalan Child Health dataset.
  • Goldstein and Rasbash (1996) consider the same
    problem but use the PQL method. They show that
    the results produced by PQL 2nd order estimation
    are far better than for MQL but still biased.
  • The model in this situation is
  • In this formulation i,j and k index the level 1,
    2 and 3 units respectively.
  • The variables x1,x2 and x3 are composite scales
    at each level because the original model
    contained many covariates at each level.
  • Browne (1998) considered the hybrid
    Metropolis-Gibbs method in MLwiN and two possible
    variance priors (Gamma-1(e,e) and Uniform

37
Simulation Results
  • The following gives point estimates (MCSE) for 4
    methods and 500 simulated datasets.

Parameter (True) MQL1 PQL2 Gamma Uniform
ß0 (0.65) 0.474 (0.01) 0.612 (0.01) 0.638 (0.01) 0.655 (0.01)
ß1 (1.00) 0.741 (0.01) 0.945 (0.01) 0.991 (0.01) 1.015 (0.01)
ß2 (1.00) 0.753 (0.01) 0.958 (0.01) 1.006 (0.01) 1.031 (0.01)
ß3 (1.00) 0.727 (0.01) 0.942 (0.01) 0.982 (0.01) 1.007 (0.01)
s2v (1.00) 0.550 (0.01) 0.888 (0.01) 1.023 (0.01) 1.108 (0.01)
s2u (1.00) 0.026 (0.01) 0.568 (0.01) 0.964 (0.02) 1.130 (0.02)
38
Simulation Results
  • The following gives interval coverage
    probabilities (90/95) for 4 methods and 500
    simulated datasets.

Parameter (True) MQL1 PQL2 Gamma Uniform
ß0 (0.65) 67.6/76.8 86.2/92.0 86.8/93.2 88.6/93.6
ß1 (1.00) 56.2/68.6 90.4/96.2 92.8/96.4 92.2/96.4
ß2 (1.00) 13.2/17.6 84.6/90.8 88.4/92.6 88.6/92.8
ß3 (1.00) 59.0/69.6 85.2/89.8 86.2/92.2 88.6/93.6
s2v (1.00) 0.6/2.4 70.2/77.6 89.4/94.4 87.8/92.2
s2u (1.00) 0.0/0.0 21.2/26.8 84.2/88.6 88.0/93.0
39
Summary of simulations
  • The Bayesian approach yields excellent bias and
    coverage results.
  • For the fixed effects, MQL performs badly but the
    other 3 methods all do well.
  • For the random effects, MQL and PQL both perform
    badly but MCMC with both priors is much better.
  • Note that this is an extreme scenario with small
    levels 1 in level 2 yet high level 2 variance and
    in other examples MQL/PQL will not be so bad.

40
Introduction to Practical
  • In the practical you will be let loose on MLwiN
    with two datasets
  • A dataset on contraceptive use in Bangladesh.
  • A veterinary epidemiology dataset on pneumonia in
    pigs.
Write a Comment
User Comments (0)
About PowerShow.com