Model Comparison using MCMC and further models - PowerPoint PPT Presentation


PPT – Model Comparison using MCMC and further models PowerPoint presentation | free to download - id: 1a6ec4-Y2JiN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Model Comparison using MCMC and further models


In MLwiN we used to suggest running IGLS for model selection then MCMC ... is one possibility but is it a saviour for Bayesian model choice or a white elephant? ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 32
Provided by: mat135
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Model Comparison using MCMC and further models

Lecture 9
  • Model Comparison using MCMC and further models

Lecture Contents
  • Model comparison
  • DIC diagnostic
  • Random slopes regression model
  • Priors for variance matrices
  • MLwiN RSR demonstration
  • Other predictor variables
  • DIC in WinBUGS

Bayesian Model Comparison
  • In Bayesian statistics model comparison is a
    thorny issue!!
  • In MLwiN we used to suggest running IGLS for
    model selection then MCMC on your chosen model.
  • Why is it a thorny issue?
  • The posterior f(?Y) does not allow criticism of
    the model in light of the observed data nor
    comparison amonst models.
  • It is f(Y) that can be used to assess model
  • Regardless of the model, f(Y) is a density over
    the space of observables which can be compared
    with what was actually observed.

Bayes Factors
  • If we observe YOBS and have 2 models M1 and M2
    then the Bayes Factor is
  • This provides the relative weight of evidence for
    model M1 compared to model M2.
  • Rough calibration of the Bates factor has been

Problems with Bayes Factor
  • 1. When prior is vague -gt f(?) is improper
  • This implies that even though f(? Y) may be
    proper, f(Y) is improper so Bayes Factors cannot
    be used!
  • 2. Computation of the Bayes factor itself
    requires high-dimensional integration.
  • 3. Lindleys paradox data points to rejection
    but prior is diffuse so denominator of Bayes
    factor much smaller than numerator and too much
    weight given to parsimonious models.

Other related ideas
  • Prior predictive distributions f(Y).
  • Cross-validation predictive distributions
  • F(yrY(r)).
  • Posterior predictive distributions f(YYobs).
  • Model uncertainty where the model is itself a
    parameter to be estimated.
  • Bayesian model averaging.
  • Reversible jump MCMC.

Model Comparison for random effect models
  • As we will typically use diffuse priors, Bayes
    factors are not an option here.
  • The methods listed previously are possibilities
    but not built into software packages.
  • The Deviance Information Criterion (DIC) is one
    possibility but is it a saviour for Bayesian
    model choice or a white elephant?

DIC Spiegelhalter et al. (2002)
  • Plus points
  • Discussion paper proposing it written by leading
    figures in Bayesian modelling.
  • Available in both MLwiN and WinBUGS for standard
  • Minus points
  • The paper was given a very mixed reception at the
    RSS when it was discussed!

  • A natural way to compare models is to use a
    criterion based on a trade-off between the fit of
    the data to the model and the corresponding
    complexity of the model.
  • DIC does this in a Bayesian way.
  • DIC goodness of fit complexity.
  • Fit is measured by deviance
  • Complexity is measured by an estimate of the
    effective number of parameters defined as
  • i.e. Posterior mean deviance minus the deviance
    evaluated at the posterior mean of the parameters.

DIC (continued)
  • The DIC is then defined analagously to AIC as
  • Models with smaller DIC are better supported by
    the data.
  • DIC can be monitored in WinBUGS from
    Inference/DIC menu.
  • DIC is available in MLwiN under the Model/MCMC

Education dataset
  • We can fit a simple (Bayesian) linear regression
    in MLwiN
  • The DIC output is as follows

? Note PD 3 the actual number of parameters
Variance components model
  • Here we consider the random intercepts model from
    earlier practicals

This is the parallel lines model
Change in DIC
Here we see the clear improvement in fitting
random effects for school. Note that the
effective number of parameters is 60 compared
with 68 actual parameters in the dataset due to
random rather than fixed school effects.
Random slopes model (crossing lines)
Fitting an RSR in a Bayesian Framework
  • The basic random slopes regression model is as
  • To this model we need to add priors for

Wishart priors
  • For a (kxk) variance matrix parameter in a Normal
    likelihood the conjugate prior is the inverse
    Wishart distribution with parameters ? and S
  • This distribution looks complex but is simply a
    multivariate generalisation of the inverse Gamma

Wishart prior for Ou-1
  • In MLwiN we use an inverse Wishart prior for
  • the precision matrix
  • Note this is a (weakly informative) prior as the
    first parameter represents the prior sample size
    and is set to the smallest feasible value. Browne
    and Draper have looked at alternative Wishart
    priors as well as a Uniform prior and performed

Gibbs Sampling algorithm for RSR model
  • Repeat the following four steps
  • 1. Generate ß from its (Multivariate) Normal
    conditional distribution.
  • 2. Generate each uj from its (Multivariate)
    Normal conditional distribution.
  • 3. Generate Ou-1 from its Wishart conditional
  • 3. Generate 1/se2 from its Gamma conditional

Bayesian RSR Model for education dataset
Note IGLS estimates used in prior. Variance
(posterior mean) estimates bigger than IGLS
DIC for RSR model
As with the frequentist approach the random
slopes model is an improvement over the random
intercepts model. The additional 65 random
parameters only add 32 effective parameters
Trajectories for the RSR model
MCMC Diagnostics for Ou00
Predictions for the RSR model with highlighted
  • Here the top and bottom school are highlighted

Residuals for the RSR
  • Individually
  • and pairwise

Uniform Priors
  • Here the level 2 variance estimates increase as
    in Browne and Draper (2000)
  • Browne and Draper found that the Wishart priors
    were preferable although the use of the IGLS
    estimate is not strictly Bayesian as we are using
    the data twice!

Other predictors in the education dataset
  • This dataset has other predictors such as gender
    and school gender that can be considered in the
  • In the next slide we see the equations window for
    a model with these added which has DIC 9189.26 a
    reduction of over 25 on the earlier RSR model

RSR gender effects
WinBUGS RSR gender
  • model
  • Level 1 definition
  • for(i in 1N)
  • normexami dnorm(mui,tau)
  • muilt- beta1 consi
  • beta2 standlrti
  • beta3 girli
  • beta4 boyschi
  • beta5 girlschi
  • u2schooli,1 consi
  • u2schooli,2 standlrti
  • Higher level definitions
  • for (j in 1n2)
  • u2j,12 dmnorm(zero212,tau.u212,12)
  • Priors for fixed effects
  • for (k in 15) betak dflat()

? Here we see the WiNBUGS code for our last
model. Notice how MVN and Wishart distributions
are specified in WinBUGS
  • In WinBUGS DIC is available from the Inference
  • The DIC is set after the burnin and then the DIC
    button is pressed after running giving
  • Dbar post.mean of -2logL Dhat -2LogL at
    post.mean of stochastic nodes
  • Dbar Dhat pD DIC
  • Normexam 9098.590 9007.960 90.631 9189.220
  • total 9098.590 9007.960 90.631 9189.220

Parameter estimates in WinBUGS
  • Note that here we see that WinBUGS gives similar
    estmates as MLwiN for the model. Note that for
    the fixed effects ß that WinBUGS indexes from 1
    while MLwiN indexes from 0.
  • node mean sd (2.5, 97.5)
  • beta1 -0.19 0.05236 (-0.2996, -0.08937)
  • beta2 0.5539 0.01971 (0.5145, 0.5911)
  • beta3 0.1687 0.03399 (0.1019, 0.2349)
  • beta4 0.1775 0.1008 (-0.02581, 0.3781)
  • beta5 0.175 0.08212 (0.004677, 0.3318)
  • sigma2 0.5511 0.0125 (0.5272,0.576)
  • sigma2.u21,1 0.08777 0.01885 (0.05745 , 0.1305)
  • sigma2.u21,2 0.02141 0.00719 (0.009262,0.0372)
  • sigma2.u22,2 0.01545 0.00460 (0.008271,0.02603)

Next Practical
  • The next practical is free ranging
  • You can follow the MLwiN chapter on RSR models
    that is given.
  • You can try out RSR models in WinBUGS.
  • You can try out fitting random effect models to
    the orthodont dataset using MCMC.
  • You can try out DIC on other models.