# Model Comparison using MCMC and further models - PowerPoint PPT Presentation

PPT – Model Comparison using MCMC and further models PowerPoint presentation | free to download - id: 1a6ec4-Y2JiN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Model Comparison using MCMC and further models

Description:

### In MLwiN we used to suggest running IGLS for model selection then MCMC ... is one possibility but is it a saviour for Bayesian model choice or a white elephant? ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 32
Provided by: mat135
Category:
Tags:
Transcript and Presenter's Notes

Title: Model Comparison using MCMC and further models

1
Lecture 9
• Model Comparison using MCMC and further models

2
Lecture Contents
• Model comparison
• DIC diagnostic
• Random slopes regression model
• Priors for variance matrices
• MLwiN RSR demonstration
• Other predictor variables
• DIC in WinBUGS

3
Bayesian Model Comparison
• In Bayesian statistics model comparison is a
thorny issue!!
• In MLwiN we used to suggest running IGLS for
model selection then MCMC on your chosen model.
• Why is it a thorny issue?
• The posterior f(?Y) does not allow criticism of
the model in light of the observed data nor
comparison amonst models.
• It is f(Y) that can be used to assess model
performance.
• Regardless of the model, f(Y) is a density over
the space of observables which can be compared
with what was actually observed.

4
Bayes Factors
• If we observe YOBS and have 2 models M1 and M2
then the Bayes Factor is
• This provides the relative weight of evidence for
model M1 compared to model M2.
• Rough calibration of the Bates factor has been
proposed

5
Problems with Bayes Factor
• 1. When prior is vague -gt f(?) is improper
• This implies that even though f(? Y) may be
proper, f(Y) is improper so Bayes Factors cannot
be used!
• 2. Computation of the Bayes factor itself
requires high-dimensional integration.
• 3. Lindleys paradox data points to rejection
but prior is diffuse so denominator of Bayes
factor much smaller than numerator and too much
weight given to parsimonious models.

6
Other related ideas
• Prior predictive distributions f(Y).
• Cross-validation predictive distributions
• F(yrY(r)).
• Posterior predictive distributions f(YYobs).
• Model uncertainty where the model is itself a
parameter to be estimated.
• Bayesian model averaging.
• Reversible jump MCMC.

7
Model Comparison for random effect models
• As we will typically use diffuse priors, Bayes
factors are not an option here.
• The methods listed previously are possibilities
but not built into software packages.
• The Deviance Information Criterion (DIC) is one
possibility but is it a saviour for Bayesian
model choice or a white elephant?

8
DIC Spiegelhalter et al. (2002)
• Plus points
• Discussion paper proposing it written by leading
figures in Bayesian modelling.
• Available in both MLwiN and WinBUGS for standard
models
• Minus points
• The paper was given a very mixed reception at the

9
DIC
• A natural way to compare models is to use a
criterion based on a trade-off between the fit of
the data to the model and the corresponding
complexity of the model.
• DIC does this in a Bayesian way.
• DIC goodness of fit complexity.
• Fit is measured by deviance
• Complexity is measured by an estimate of the
effective number of parameters defined as
• i.e. Posterior mean deviance minus the deviance
evaluated at the posterior mean of the parameters.

10
DIC (continued)
• The DIC is then defined analagously to AIC as
• Models with smaller DIC are better supported by
the data.
• DIC can be monitored in WinBUGS from
• DIC is available in MLwiN under the Model/MCMC

11
Education dataset
• We can fit a simple (Bayesian) linear regression
in MLwiN
• The DIC output is as follows

? Note PD 3 the actual number of parameters
12
Variance components model
• Here we consider the random intercepts model from
earlier practicals

This is the parallel lines model
13
Change in DIC
Here we see the clear improvement in fitting
random effects for school. Note that the
effective number of parameters is 60 compared
with 68 actual parameters in the dataset due to
random rather than fixed school effects.
14
Random slopes model (crossing lines)
u0,1
15
Fitting an RSR in a Bayesian Framework
• The basic random slopes regression model is as
follows
• To this model we need to add priors for

16
Wishart priors
• For a (kxk) variance matrix parameter in a Normal
likelihood the conjugate prior is the inverse
Wishart distribution with parameters ? and S
• This distribution looks complex but is simply a
multivariate generalisation of the inverse Gamma
distribution.

17
Wishart prior for Ou-1
• In MLwiN we use an inverse Wishart prior for
• the precision matrix
• Note this is a (weakly informative) prior as the
first parameter represents the prior sample size
and is set to the smallest feasible value. Browne
and Draper have looked at alternative Wishart
priors as well as a Uniform prior and performed
simulations.

18
Gibbs Sampling algorithm for RSR model
• Repeat the following four steps
• 1. Generate ß from its (Multivariate) Normal
conditional distribution.
• 2. Generate each uj from its (Multivariate)
Normal conditional distribution.
• 3. Generate Ou-1 from its Wishart conditional
distribution.
• 3. Generate 1/se2 from its Gamma conditional
distribution

19
Bayesian RSR Model for education dataset
Note IGLS estimates used in prior. Variance
(posterior mean) estimates bigger than IGLS
estimates.
20
DIC for RSR model
As with the frequentist approach the random
slopes model is an improvement over the random
intercepts model. The additional 65 random
parameters only add 32 effective parameters
21
Trajectories for the RSR model
22
MCMC Diagnostics for Ou00
23
Predictions for the RSR model with highlighted
data
• Here the top and bottom school are highlighted

24
Residuals for the RSR
• Individually
• and pairwise

25
Uniform Priors
• Here the level 2 variance estimates increase as
in Browne and Draper (2000)
• Browne and Draper found that the Wishart priors
were preferable although the use of the IGLS
estimate is not strictly Bayesian as we are using
the data twice!

26
Other predictors in the education dataset
• This dataset has other predictors such as gender
and school gender that can be considered in the
practical.
• In the next slide we see the equations window for
a model with these added which has DIC 9189.26 a
reduction of over 25 on the earlier RSR model

27
RSR gender effects
28
WinBUGS RSR gender
• model
• Level 1 definition
• for(i in 1N)
• normexami dnorm(mui,tau)
• muilt- beta1 consi
• beta2 standlrti
• beta3 girli
• beta4 boyschi
• beta5 girlschi
• u2schooli,1 consi
• u2schooli,2 standlrti
• Higher level definitions
• for (j in 1n2)
• u2j,12 dmnorm(zero212,tau.u212,12)
• Priors for fixed effects
• for (k in 15) betak dflat()

? Here we see the WiNBUGS code for our last
model. Notice how MVN and Wishart distributions
are specified in WinBUGS
29
DIC in WinBUGS
• In WinBUGS DIC is available from the Inference
• The DIC is set after the burnin and then the DIC
button is pressed after running giving
• Dbar post.mean of -2logL Dhat -2LogL at
post.mean of stochastic nodes
• Dbar Dhat pD DIC
• Normexam 9098.590 9007.960 90.631 9189.220
• total 9098.590 9007.960 90.631 9189.220

30
Parameter estimates in WinBUGS
• Note that here we see that WinBUGS gives similar
estmates as MLwiN for the model. Note that for
the fixed effects ß that WinBUGS indexes from 1
while MLwiN indexes from 0.
• node mean sd (2.5, 97.5)
• beta1 -0.19 0.05236 (-0.2996, -0.08937)
• beta2 0.5539 0.01971 (0.5145, 0.5911)
• beta3 0.1687 0.03399 (0.1019, 0.2349)
• beta4 0.1775 0.1008 (-0.02581, 0.3781)
• beta5 0.175 0.08212 (0.004677, 0.3318)
• sigma2 0.5511 0.0125 (0.5272,0.576)
• sigma2.u21,1 0.08777 0.01885 (0.05745 , 0.1305)
• sigma2.u21,2 0.02141 0.00719 (0.009262,0.0372)
• sigma2.u22,2 0.01545 0.00460 (0.008271,0.02603)

31
Next Practical
• The next practical is free ranging
• You can follow the MLwiN chapter on RSR models
that is given.
• You can try out RSR models in WinBUGS.
• You can try out fitting random effect models to
the orthodont dataset using MCMC.
• You can try out DIC on other models.