Using complex random effect models in epidemiology and ecology - PowerPoint PPT Presentation

About This Presentation
Title:

Using complex random effect models in epidemiology and ecology

Description:

These m values are then averaged to give point estimates of the parameter of interest. ... For more information see my book: MCMC Estimation in MLwiN Browne (2003) ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 63
Provided by: smssderm
Category:

less

Transcript and Presenter's Notes

Title: Using complex random effect models in epidemiology and ecology


1
Using complex random effect models in
epidemiology and ecology
  • Dr William J. Browne
  • School of Mathematical Sciences
  • University of Nottingham

2
Outline
  • Background to my research, random effect and
    multilevel models and MCMC estimation.
  • Random effect models for complex data structures
    including artificial insemination and Danish
    chicken examples.
  • Multivariate random effect models and great tit
    nesting behaviour.
  • Efficient MCMC algorithms.
  • Conclusions and future work.

3
Background
  • 1995-1998 PhD in Statistics, University of
    Bath.
  • Applying MCMC methods to multilevel models.
  • 1998-2003 Postdoctoral research positions at
    the Centre for Multilevel Modelling at the
    Institute of Education, London.
  • 2003-2006 Lecturer in Statistics at University
    of Nottingham.
  • 2006-2007 Associate professor of Statistics at
    University of Nottingham.
  • 2007- Professor in Biostatistics, University of
    Bristol.
  • Research interests
  • Multilevel modelling, complex random effect
    modelling, applied statistics, Bayesian
    statistics and MCMC estimation.

4
Random effect models
  • Models that account for the underlying structure
    in the dataset.
  • Originally developed for nested structures
    (multilevel models), for example in education,
    pupils nested within schools.
  • An extension of linear modelling with the
    inclusion of random effects.
  • A typical 2-level model is
  • Here i might index pupils and j index schools.
  • Alternatively in another example i might index
    cows and j index herds.
  • The important thing is that the model and
    statistical methods used are the same!

5
Estimation Methods for Multilevel Models
  • Due to additional random effects no simple matrix
    formulae exist for finding estimates in
    multilevel models.
  • Two alternative approaches exist
  • Iterative algorithms e.g. IGLS, RIGLS, that
    alternate between estimating fixed and random
    effects until convergence. Can produce ML and
    REML estimates.
  • Simulation-based Bayesian methods e.g. MCMC that
    attempt to draw samples from the posterior
    distribution of the model.
  • One possible computer program to use for
    multilevel models which incorporates both
    approaches is MLwiN.

6
MLwiN
  • Software package designed specifically for
    fitting multilevel models.
  • Developed by a team led by Harvey Goldstein and
    Jon Rasbash at the Institute of Education in
    London over past 15 years or so. Earlier
    incarnations ML2, ML3, MLN.
  • Originally contained classical estimation
    methods (IGLS) for fitting models.
  • MLwiN launched in 1998 also included MCMC
    estimation.
  • My role in the team was as developer of the MCMC
    functionality in MLwiN in my time at Bath and
    during 4.5 years at the IOE.
  • Note MLwiN core team relocated to Bristol in
    2005.

7
MCMC Algorithm
  • Consider the 2-level normal response model
  • MCMC algorithms usually work in a Bayesian
    framework and so we need to add prior
    distributions for the unknown parameters.
  • Here there are 4 sets of unknown parameters
  • We will add prior distributions

8
MCMC Algorithm (2)
  • One possible MCMC algorithm for this model then
    involves simulating in turn from the 4 sets of
    conditional distributions. Such an algorithm is
    known as Gibbs Sampling. MLwiN uses Gibbs
    sampling for all normal response models.
  • Firstly we set starting values for each group of
    unknown parameters,
  • Then sample from the following conditional
    distributions, firstly
  • To get .

9
MCMC Algorithm (3)
  • We next sample from
  • to get , then
  • to get , then finally
  • To get . We have then updated all of the
    unknowns in the model. The process is then simply
    repeated many times, each time using the
    previously generated parameter values to generate
    the next set

10
Burn-in and estimates
  • Burn-in It is general practice to throw away the
    first n values to allow the Markov chain to
    approach its equilibrium distribution namely the
    joint posterior distribution of interest. These
    iterations are known as the burn-in.
  • Finding Estimates We continue generating values
    at the end of the burn-in for another m
    iterations. These m values are then averaged to
    give point estimates of the parameter of
    interest. Posterior standard deviations and other
    summary measures can also be obtained from the
    chains.

11
So why use MCMC?
  • Often gives better (in terms of bias) estimates
    for non-normal responses (see Browne and Draper,
    2006).
  • Gives full posterior distribution so interval
    estimates for derived quantities are easy to
    produce.
  • Can easily be extended to more complex problems
    as we will see next.
  • Potential downside 1 Prior distributions
    required for all unknown parameters.
  • Potential downside 2 MCMC estimation is much
    slower than the IGLS algorithm.
  • For more information see my book MCMC Estimation
    in MLwiN Browne (2003).

12
Extension 1 Cross-classified models
For example, schools by neighbourhoods. Schools
will draw pupils from many different
neighbourhoods and the pupils of a neighbourhood
will go to several schools. No pure hierarchy can
be found and pupils are said to be contained
within a cross-classification of schools by
neighbourhoods

nbhd 1 nbhd 2 Nbhd 3
School 1 xx x
School 2 x x
School 3 xx x
School 4 x xxx
 
13
Notation
With hierarchical models we use a subscript
notation that has one subscript per level and
nesting is implied reading from the left. For
example, subscript pattern ijk denotes the ith
level 1 unit within the jth level 2 unit within
the kth level 3 unit. If models become
cross-classified we use the term classification
instead of level. With notation that has one
subscript per classification, that also captures
the relationship between classifications,
notation can become very cumbersome. We propose
an alternative notation introduced in Browne et
al. (2001) that only has a single subscript no
matter how many classifications are in the model.
14
Single subscript notation
We write the model as
where classification 2 is neighbourhood and
classification 3 is school. Classification 1
always corresponds to the classification at which
the response measurements are made, in this case
pupils. For pupils 1 and 11 equation (1) becomes
15
Classification diagrams
In the single subscript notation we lose
information about the relationship (crossed or
nested) between classifications. A useful way of
conveying this information is with the
classification diagram. Which has one node per
classification and nodes linked by arrows have a
nested relationship and unlinked nodes have a
crossed relationship.
School
Neighbourhood
Pupil
Cross-classified structure where pupils from a
school come from many neighbourhoods and pupils
from a neighbourhood attend several schools.
Nested structure where schools are contained
within neighbourhoods
16
Example Artificial insemination by donor
1901 women 279 donors 1328 donations 12100
ovulatory cycles response is whether conception
occurs in a given cycle
In terms of a unit diagram
Or a classification diagram
17
Model for artificial insemination data
We can write the model as
Results
Note cross-classified models can be fitted in
IGLS but are far easier to fit using MCMC
estimation.
18
Extension 2 Multiple membership models
  • When level 1 units are members of more than one
    higher level unit we describe a model for such
    data as a multiple membership model.
  • For example,
  •  Pupils change schools/classes and each
    school/class has an effect on pupil outcomes.
  • Patients are seen by more than one nurse during
    the course of their treatment.

 
19
Notation
Note that nurse(i) now indexes the set of nurses
that treat patient i and w(2)i,j is a weighting
factor relating patient i to nurse j. For
example, with four patients and three nurses, we
may have the following weights
20
Classification diagrams for multiple membership
relationships
Double arrows indicate a multiple membership
relationship between classifications.
We can mix multiple membership, crossed and
hierarchical structures in a single model.
21
Example involving nesting, crossing and multiple
membership Danish chickens
Production hierarchy 10,127 child flocks
725
houses 304 farms
Breeding hierarchy 10,127 child flocks 200 parent
flocks
As a unit diagram
As a classification diagram
22
Model and results
Response is cases of salmonella Note multiple
membership models can be fitted in IGLS and this
model/dataset represents roughly the most complex
model that the method can handle. Such models are
far easier to fit using MCMC estimation.
23
Random effect modelling of great tit nesting
behaviour
  • An extension of cross-classified models to
    multivariate responses.
  • Collaborative research with Richard Pettifor
    (Institute of Zoology, London), and Robin
    McCleery and Ben Sheldon (University of Oxford).

24
Wytham woods great tit dataset
  • A longitudinal study of great tits nesting in
    Wytham Woods, Oxfordshire.
  • 6 responses 3 continuous 3 binary.
  • Clutch size, lay date and mean nestling mass.
  • Nest success, male and female survival.
  • Data 4165 nesting attempts over a period of 34
    years.
  • There are 4 higher-level classifications of the
    data female parent, male parent, nestbox and
    year.

25
Data background
The data structure can be summarised as follows
Note there is very little information on each
individual male and female bird but we can get
some estimates of variability via a random
effects model.
26
Diagrammatic representation of the dataset.
27
Univariate cross-classified random effect
modelling
  • For each of the 6 responses we will firstly fit a
    univariate model, normal responses for the
    continuous variables and probit regression for
    the binary variables. For example using notation
    of Browne et al. (2001) and letting response yi
    be clutch size

28
Estimation
  • We use MCMC estimation in MLwiN and choose
    diffuse priors for all parameters.
  • We run 3 MCMC chains from different starting
    points for 250k iterations each (500k for binary
    responses) and use the Gelman-Rubin diagnostic to
    decide burn-in length.
  • We compared results with the equivalent classical
    model using the Genstat software package and got
    broadly similar results.
  • We fit all four higher classifications and do not
    consider model comparison.

29
Clutch Size
Here we see that the average clutch size is just
below 9 eggs with large variability between
female birds and some variability between years.
Male birds and nest boxes have less impact.
30
Lay Date (days after April 1st)
Here we see that the mean lay date is around the
end of April/beginning of May. The biggest driver
of lay date is the year which is probably
indicating weather differences. There is some
variability due to female birds but little impact
of nest box and male bird.
31
Nestling Mass
Here the response is the average mass of the
chicks in a brood at 10 days old. Note here lots
of the variability is unexplained and both
parents are equally important.
32
Human example
Helena Jayne Browne Born 22nd May 2006 Birth
Weight 8lb 0oz
Sarah Victoria Browne Born 20th July 2004 Birth
Weight 6lb 6oz
Fathers birth weight 9lb 13oz, Mothers birth
weight 6lb 8oz
33
Nest Success
Here we define nest success as one of the ringed
nestlings captured in later years. The value 0.01
for ß corresponds to around a 50 success rate.
Most of the variability is explained by the
Binomial assumption with the bulk of the
over-dispersion mainly due to yearly differences.
34
Male Survival
Here male survival is defined as being observed
breeding in later years. The average probability
is 0.334 and there is very little over-dispersion
with differences between years being the main
factor. Note the actual response is being
observed breeding in later years and so the real
probability is higher!
35
Female survival
Here female survival is defined as being observed
breeding in later years. The average probability
is 0.381 and again there isnt much
over-dispersion with differences between
nestboxes and years being the main factors.
36
Multivariate modelling of the great tit dataset
  • We now wish to combine the six univariate models
    into one big model that will also account for the
    correlations between the responses.
  • We choose a MV Normal model and use latent
    variables (Chib and Greenburg, 1998) for the 3
    binary responses that take positive values if the
    response is 1 and negative values if the response
    is 0.
  • We are then left with a 6-vector for each
    observation consisting of the 3 continuous
    responses and 3 latent variables. The latent
    variables are estimated as an additional step in
    the MCMC algorithm and for identifiability the
    elements of the level 1 variance matrix that
    correspond to their variances are constrained to
    equal 1.

37
Multivariate Model
Here the vector valued response is decomposed
into a mean vector plus random effects for each
classification.
Inverse Wishart priors are used for each of the
classification variance matrices. The values are
based on considering overall variability in each
response and assuming an equal split for the 5
classifications.
38
Use of the multivariate model
  • The multivariate model was fitted using an MCMC
    algorithm programmed into the MLwiN package which
    consists of Gibbs sampling steps for all
    parameters apart from the level 1 variance matrix
    which requires Metropolis sampling (see Browne
    2006).
  • The multivariate model will give variance
    estimates in line with the 6 univariate models.
  • In addition the covariances/correlations at each
    level can be assessed to look at how correlations
    are partitioned.

39
Partitioning of covariances
40
Correlations from a 1-level model
  • If we ignore the structure of the data and
    consider it as 4165 independent observations we
    get the following correlations

CS LD NM NS MS
LD -0.30 X X X X
NM -0.09 -0.06 X X X
NS 0.20 -0.22 0.16 X X
MS 0.02 -0.02 0.04 0.07 X
FS -0.02 -0.02 0.06 0.11 0.21
Note correlations in bold are statistically
significant i.e. 95 credible interval doesnt
contain 0.
41
Correlations in full model
CS LD NM NS MS
LD N, F, O -0.30 X X X X
NM F, O -0.09 F, O -0.06 X X X
NS Y, F 0.20 N, F, O -0.22 O 0.16 X X
MS - 0.02 - -0.02 - 0.04 Y 0.07 X
FS F, O -0.02 F, O -0.02 - 0.06 Y, F 0.11 Y, O 0.21
Key Blue ve, Red ve Y year, N nestbox, F
female, O - observation
42
Pairs of antagonistic covariances at different
classifications
  • There are 3 pairs of antagonistic correlations
    i.e. correlations with different signs at
    different classifications
  • LD NM Female 0.20 Observation -0.19
  • Interpretation Females who generally lay late,
    lay heavier chicks but the later a particular
    bird lays the lighter its chicks.
  • CS FS Female 0.48 Observation -0.20
  • Interpretation Birds that lay larger clutches
    are more likely to survive but a particular bird
    has less chance of surviving if it lays more
    eggs.
  • LD FS Female -0.67 Observation 0.11
  • Interpretation Birds that lay early are more
    likely to survive but for a particular bird the
    later they lay the better!

43
Prior Sensitivity
  • Our choice of variance prior assumes a priori
  • No correlation between the 6 traits.
  • Variance for each trait is split equally between
    the 5 classifications.
  • We compared this approach with a more Bayesian
    approach by splitting the data into 2 halves
  • In the first 17 years (1964-1980) there were
    1,116 observations whilst in the second 17 years
    (1981-1997) there were 3,049 observations
  • We therefore used estimates from the first 17
    years of the data to give a prior for the second
    17 years and compared this prior with our earlier
    prior.

44
Correlations for 2 priors
CS LD NM NS MS
LD 1. N, F, O 2. N, F, O (N, F, O) X X X X
NM 1. F, O 2. F, O (F, O) 1. O 2. O (F, O) X X X
NS 1. Y, F 2. Y, F (Y, F) 1. Y, F, O 2. N, F, O (N, F, O) 1. O 2. O (O) X X
MS - - - 1. M 2. M, O - - - - 1. Y 2. Y (Y) X
FS 1. F, O 2. F, O (F, O) 1. F, O 2. F, O (F, O) - - - 1. Y, F 2. Y, F (Y, F) 1. Y, O 2. Y, O (Y, O)
Key Blue ve, Red ve 1,2 prior numbers with
full data results in brackets Y year, N
nestbox, M male, F female, O - observation
45
MCMC efficiency for clutch size response
  • The MCMC algorithm used in the univariate
    analysis of clutch size was a simple 10-step
    Gibbs sampling algorithm.
  • The same Gibbs sampling algorithm can be used in
    both the MLwiN and WinBUGS software packages and
    we ran both for 50,000 iterations.
  • To compare methods for each parameter we can look
    at the effective sample sizes (ESS) which give an
    estimate of how many independent samples we
    have for each parameter as opposed to 50,000
    dependent samples.
  • ESS of iterations/?,

46
Effective Sample sizes
The effective sample sizes are similar for both
packages. Note that MLwiN is 5 times quicker!!
We will now consider methods that will improve
the ESS values for particular parameters. We will
firstly consider the fixed effect parameter.
47
Trace and autocorrelation plots for fixed effect
using standard Gibbs sampling algorithm
48
Hierarchical Centering
This method was devised by Gelfand et al. (1995)
for use in nested models. Basically (where
feasible) parameters are moved up the hierarchy
in a model reformulation. For example
is equivalent to
The motivation here is we remove the strong
negative correlation between the fixed and random
effects by reformulation.
49
Hierarchical Centering
In our cross-classified model we have 4 possible
hierarchies up which we can move parameters. We
have chosen to move the fixed effect up the year
hierarchy as its variance had biggest ESS
although this choice is rather arbitrary.
The ESS for the fixed effect increases 50-fold
from 602 to 35,063 while for the year level
variance we have a smaller improvement from
29,604 to 34,626. Note this formulation also runs
faster 1864s vs 2601s (in WinBUGS).
50
Trace and autocorrelation plots for fixed effect
using hierarchical centering formulation
51
Parameter Expansion
  • We next consider the variances and in particular
    the between-male bird variance.
  • When the posterior distribution of a variance
    parameter has some mass near zero this can hamper
    the mixing of the chains for both the variance
    parameter and the associated random effects.
  • The pictures over the page illustrate such poor
    mixing.
  • One solution is parameter expansion (Liu et al.
    1998).
  • In this method we add an extra parameter to the
    model to improve mixing.

52
Trace plots for between males variance and a
sample male effect using standard Gibbs sampling
algorithm
53
Autocorrelation plot for male variance and a
sample male effect using standard Gibbs sampling
algorithm
54
Parameter Expansion
In our example we use parameter expansion for all
4 hierarchies. Note the ? parameters have an
impact on both the random effects and their
variance.
The original parameters can be found by
Note the models are not identical as we now have
different prior distributions for the variances.
55
Parameter Expansion
  • For the between males variance we have a 20-fold
    increase in ESS from 33 to 600.
  • The parameter expanded model has different prior
    distributions for the variances although these
    priors are still diffuse.
  • It should be noted that the point and interval
    estimate of the level 2 variance has changed from
  • 0.034 (0.002,0.126) to 0.064 (0.000,0.172).
  • Parameter expansion is computationally slower
    3662s vs 2601s for our example.

56
Trace plots for between males variance and a
sample male effect using parameter expansion.
57
Autocorrelation plot for male variance and a
sample male effect using parameter expansion.
58
Combining the two methods
Hierarchical centering and parameter expansion
can easily be combined in the same model. Here we
perform centering on the year classification and
parameter expansion on the other 3 hierarchies.
59
Effective Sample sizes
As we can see below the effective sample sizes
for all parameters are improved for this
formulation while running time remains
approximately the same.
60
Conclusions
  • In this talk we have considered using complex
    random effects models in three application areas.
  • For the bird ecology example we have seen how
    these models can be used to partition both
    variability and correlation between various
    classifications to identify interesting
    relationships.
  • We then investigated hierarchical centering and
    parameter expansion for a model for one of our
    responses. These are both useful methods for
    improving mixing when using MCMC.
  • Both methods are simple to implement in the
    WinBUGS package and can be easily combined to
    produce an efficient MCMC algorithm.

61
Further Work
  • Incorporating hierarchical centering and
    parameter expansion in MLwiN.
  • Investigating their use in conjunction with the
    Metropolis-Hastings algorithm.
  • Investigate block-updating methods e.g. the
    structured MCMC algorithm.
  • Extending the methods to our original
    multivariate response problem.

62
References
  • Browne, W.J. (2002). MCMC Estimation in MLwiN.
    London Institute of Education, University of
    London
  • Browne, W.J. (2004). An illustration of the use
    of reparameterisation methods for improving MCMC
    efficiency in crossed random effect models.
    Multilevel Modelling Newsletter 16 (1) 13-25
  • Browne, W.J. (2006). MCMC Estimation of
    constrained variance matrices with applications
    in multilevel modelling. Computational Statistics
    and Data Analysis. 50 1655-1677.
  • Browne, W.J. and Draper D. (2006). A Comparison
    of Bayesian and likelihood methods for fitting
    multilevel models (with discussion). Bayesian
    Analysis. 1 473-550.
  • Browne, W.J., Goldstein, H. and Rasbash, J.
    (2001). Multiple membership multiple
    classification (MMMC) models. Statistical
    Modelling 1 103-124.
  • Browne, W.J., McCleery, R.H., Sheldon, B.C., and
    Pettifor, R.A. (2006). Using cross-classified
    multivariate mixed response models with
    application to the reproductive success of great
    tits (Parus Major). Statistical Modelling (to
    appear)
  • Chib, S. and Greenburg, E. (1998). Analysis of
    multivariate probit models. Biometrika 85,
    347-361.
  • Gelfand A.E., Sahu S.K., and Carlin B.P. (1995).
    Efficient Parametrizations For Normal Linear
    Mixed Models. Biometrika 82 (3) 479-488.
  • Kass, R.E., Carlin, B.P., Gelman, A. and Neal, R.
    (1998). Markov chain Monte Carlo in practice a
    roundtable discussion. American Statistician, 52,
    93-100.
  • Liu, C., Rubin, D.B., and Wu, Y.N. (1998)
    Parameter expansion to accelerate EM The PX-EM
    algorithm. Biometrika 85 (4) 755-770.
  • Rasbash, J., Browne, W.J., Goldstein, H., Yang,
    M., Plewis, I., Healy, M., Woodhouse, G.,Draper,
    D., Langford, I., Lewis, T. (2000). A Users
    Guide to MLwiN, Version 2.1, London Institute of
    Education, University of London.
Write a Comment
User Comments (0)
About PowerShow.com