Model Comparison using MCMC and further models - PowerPoint PPT Presentation

Loading...

PPT – Model Comparison using MCMC and further models PowerPoint presentation | free to download - id: 1a6ec4-Y2JiN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Model Comparison using MCMC and further models

Description:

In MLwiN we used to suggest running IGLS for model selection then MCMC ... is one possibility but is it a saviour for Bayesian model choice or a white elephant? ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 32
Provided by: mat135
Learn more at: http://seis.bris.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Model Comparison using MCMC and further models


1
Lecture 9
  • Model Comparison using MCMC and further models

2
Lecture Contents
  • Model comparison
  • DIC diagnostic
  • Random slopes regression model
  • Priors for variance matrices
  • MLwiN RSR demonstration
  • Other predictor variables
  • DIC in WinBUGS

3
Bayesian Model Comparison
  • In Bayesian statistics model comparison is a
    thorny issue!!
  • In MLwiN we used to suggest running IGLS for
    model selection then MCMC on your chosen model.
  • Why is it a thorny issue?
  • The posterior f(?Y) does not allow criticism of
    the model in light of the observed data nor
    comparison amonst models.
  • It is f(Y) that can be used to assess model
    performance.
  • Regardless of the model, f(Y) is a density over
    the space of observables which can be compared
    with what was actually observed.

4
Bayes Factors
  • If we observe YOBS and have 2 models M1 and M2
    then the Bayes Factor is
  • This provides the relative weight of evidence for
    model M1 compared to model M2.
  • Rough calibration of the Bates factor has been
    proposed

5
Problems with Bayes Factor
  • 1. When prior is vague -gt f(?) is improper
  • This implies that even though f(? Y) may be
    proper, f(Y) is improper so Bayes Factors cannot
    be used!
  • 2. Computation of the Bayes factor itself
    requires high-dimensional integration.
  • 3. Lindleys paradox data points to rejection
    but prior is diffuse so denominator of Bayes
    factor much smaller than numerator and too much
    weight given to parsimonious models.

6
Other related ideas
  • Prior predictive distributions f(Y).
  • Cross-validation predictive distributions
  • F(yrY(r)).
  • Posterior predictive distributions f(YYobs).
  • Model uncertainty where the model is itself a
    parameter to be estimated.
  • Bayesian model averaging.
  • Reversible jump MCMC.

7
Model Comparison for random effect models
  • As we will typically use diffuse priors, Bayes
    factors are not an option here.
  • The methods listed previously are possibilities
    but not built into software packages.
  • The Deviance Information Criterion (DIC) is one
    possibility but is it a saviour for Bayesian
    model choice or a white elephant?

8
DIC Spiegelhalter et al. (2002)
  • Plus points
  • Discussion paper proposing it written by leading
    figures in Bayesian modelling.
  • Available in both MLwiN and WinBUGS for standard
    models
  • Minus points
  • The paper was given a very mixed reception at the
    RSS when it was discussed!

9
DIC
  • A natural way to compare models is to use a
    criterion based on a trade-off between the fit of
    the data to the model and the corresponding
    complexity of the model.
  • DIC does this in a Bayesian way.
  • DIC goodness of fit complexity.
  • Fit is measured by deviance
  • Complexity is measured by an estimate of the
    effective number of parameters defined as
  • i.e. Posterior mean deviance minus the deviance
    evaluated at the posterior mean of the parameters.

10
DIC (continued)
  • The DIC is then defined analagously to AIC as
  • Models with smaller DIC are better supported by
    the data.
  • DIC can be monitored in WinBUGS from
    Inference/DIC menu.
  • DIC is available in MLwiN under the Model/MCMC
    menu.

11
Education dataset
  • We can fit a simple (Bayesian) linear regression
    in MLwiN
  • The DIC output is as follows

? Note PD 3 the actual number of parameters
12
Variance components model
  • Here we consider the random intercepts model from
    earlier practicals

This is the parallel lines model
13
Change in DIC
Here we see the clear improvement in fitting
random effects for school. Note that the
effective number of parameters is 60 compared
with 68 actual parameters in the dataset due to
random rather than fixed school effects.
14
Random slopes model (crossing lines)
u0,1
15
Fitting an RSR in a Bayesian Framework
  • The basic random slopes regression model is as
    follows
  • To this model we need to add priors for

16
Wishart priors
  • For a (kxk) variance matrix parameter in a Normal
    likelihood the conjugate prior is the inverse
    Wishart distribution with parameters ? and S
  • This distribution looks complex but is simply a
    multivariate generalisation of the inverse Gamma
    distribution.

17
Wishart prior for Ou-1
  • In MLwiN we use an inverse Wishart prior for
  • the precision matrix
  • Note this is a (weakly informative) prior as the
    first parameter represents the prior sample size
    and is set to the smallest feasible value. Browne
    and Draper have looked at alternative Wishart
    priors as well as a Uniform prior and performed
    simulations.

18
Gibbs Sampling algorithm for RSR model
  • Repeat the following four steps
  • 1. Generate ß from its (Multivariate) Normal
    conditional distribution.
  • 2. Generate each uj from its (Multivariate)
    Normal conditional distribution.
  • 3. Generate Ou-1 from its Wishart conditional
    distribution.
  • 3. Generate 1/se2 from its Gamma conditional
    distribution

19
Bayesian RSR Model for education dataset
Note IGLS estimates used in prior. Variance
(posterior mean) estimates bigger than IGLS
estimates.
20
DIC for RSR model
As with the frequentist approach the random
slopes model is an improvement over the random
intercepts model. The additional 65 random
parameters only add 32 effective parameters
21
Trajectories for the RSR model
22
MCMC Diagnostics for Ou00
23
Predictions for the RSR model with highlighted
data
  • Here the top and bottom school are highlighted

24
Residuals for the RSR
  • Individually
  • and pairwise

25
Uniform Priors
  • Here the level 2 variance estimates increase as
    in Browne and Draper (2000)
  • Browne and Draper found that the Wishart priors
    were preferable although the use of the IGLS
    estimate is not strictly Bayesian as we are using
    the data twice!

26
Other predictors in the education dataset
  • This dataset has other predictors such as gender
    and school gender that can be considered in the
    practical.
  • In the next slide we see the equations window for
    a model with these added which has DIC 9189.26 a
    reduction of over 25 on the earlier RSR model

27
RSR gender effects
28
WinBUGS RSR gender
  • model
  • Level 1 definition
  • for(i in 1N)
  • normexami dnorm(mui,tau)
  • muilt- beta1 consi
  • beta2 standlrti
  • beta3 girli
  • beta4 boyschi
  • beta5 girlschi
  • u2schooli,1 consi
  • u2schooli,2 standlrti
  • Higher level definitions
  • for (j in 1n2)
  • u2j,12 dmnorm(zero212,tau.u212,12)
  • Priors for fixed effects
  • for (k in 15) betak dflat()

? Here we see the WiNBUGS code for our last
model. Notice how MVN and Wishart distributions
are specified in WinBUGS
29
DIC in WinBUGS
  • In WinBUGS DIC is available from the Inference
    menu
  • The DIC is set after the burnin and then the DIC
    button is pressed after running giving
  • Dbar post.mean of -2logL Dhat -2LogL at
    post.mean of stochastic nodes
  • Dbar Dhat pD DIC
  • Normexam 9098.590 9007.960 90.631 9189.220
  • total 9098.590 9007.960 90.631 9189.220

30
Parameter estimates in WinBUGS
  • Note that here we see that WinBUGS gives similar
    estmates as MLwiN for the model. Note that for
    the fixed effects ß that WinBUGS indexes from 1
    while MLwiN indexes from 0.
  • node mean sd (2.5, 97.5)
  • beta1 -0.19 0.05236 (-0.2996, -0.08937)
  • beta2 0.5539 0.01971 (0.5145, 0.5911)
  • beta3 0.1687 0.03399 (0.1019, 0.2349)
  • beta4 0.1775 0.1008 (-0.02581, 0.3781)
  • beta5 0.175 0.08212 (0.004677, 0.3318)
  • sigma2 0.5511 0.0125 (0.5272,0.576)
  • sigma2.u21,1 0.08777 0.01885 (0.05745 , 0.1305)
  • sigma2.u21,2 0.02141 0.00719 (0.009262,0.0372)
  • sigma2.u22,2 0.01545 0.00460 (0.008271,0.02603)

31
Next Practical
  • The next practical is free ranging
  • You can follow the MLwiN chapter on RSR models
    that is given.
  • You can try out RSR models in WinBUGS.
  • You can try out fitting random effect models to
    the orthodont dataset using MCMC.
  • You can try out DIC on other models.
About PowerShow.com