Introduction to Monte Carlo Markov chain MCMC methods - PowerPoint PPT Presentation


PPT – Introduction to Monte Carlo Markov chain MCMC methods PowerPoint presentation | free to download - id: 159c59-MzFlN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Introduction to Monte Carlo Markov chain MCMC methods


Lecture 3. Introduction to Monte Carlo Markov chain (MCMC) methods. Lecture Contents ... can be assessed by the Monte Carlo standard error (MCSE) for each parameter. ... – PowerPoint PPT presentation

Number of Views:1007
Avg rating:3.0/5.0
Slides: 28
Provided by: mat135
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Introduction to Monte Carlo Markov chain MCMC methods

Lecture 3
  • Introduction to Monte Carlo Markov chain (MCMC)

Lecture Contents
  • How does WinBUGS fit a regression?
  • Gibbs Sampling
  • Convergence and burnin
  • How many iterations?
  • Logistic regression example

Linear regression
  • Let us revisit our simple linear regression model
  • To this model we added the following priors in
  • Ideally we would sample from the joint posterior

Linear Regression ctd.
  • In this case we can sample from the joint
    posterior as described in the last lecture
  • However this is not the case for all models and
    so we will now describe other simulation-based
    methods that can be used.
  • These methods come from a family of methods
    called Markov chain Monte Carlo (MCMC) methods
    and here we will focus on a method called Gibbs

MCMC Methods
  • Goal To sample from joint posterior
  • Problem For complex models this involves
    multidimensional integration
  • Solution It may be possible to sample from
    conditional posterior distributions,
  • It can be shown that after convergence such a
    sampling approach generates dependent samples
    from the joint posterior distribution.

Gibbs Sampling
  • When we can sample directly from the conditional
    posterior distributions then such an algorithm is
    known as Gibbs Sampling.
  • This proceeds as follows for the linear
    regression example
  • Firstly give all unknown parameters starting
  • Next loop through the following steps

Gibbs Sampling ctd.
  • Sample from

These steps are then repeated with the
generated values from this loop replacing the
starting values. The chain of values produced by
this procedure are known as a Markov chain, and
it is hoped that this chain converges to its
equilibrium distribution which is the joint
posterior distribution.
Calculating the conditional distributions
  • In order for the algorithm to work we need to
    sample from the conditional posterior
  • If these distributions have standard forms then
    it is easy to draw random samples from them.
  • Mathematically we write down the full posterior
    and assume all parameters are constants apart
    from the parameter of interest.
  • We then try to match the resulting formulae to a
    standard distribution.

Matching distributional forms
  • If a parameter ? follows a Normal(µ,s2)
    distribution then we can write
  • Similarly if ? follows a Gamma(a,ß) distribution
    then we can write

Step 1 ß0
Step 2 ß1
Step 3 1/s2
Algorithm Summary
  • Repeat the following three steps
  • 1. Generate ß0 from its Normal conditional
  • 2. Generate ß1 from its Normal conditional
  • 3. Generate 1/s2 from its Gamma conditional

Convergence and burn-in
  • Two questions that immediately spring to mind
  • We start from arbitrary starting values so when
    can we safely say that our samples are from the
    correct distribution?
  • After this point how long should we run the chain
    for and store values?

Checking Convergence
  • This is the researchers responsibility!
  • Convergence is to a target distribution (the
    required posterior), not to a single value as in
    ML methods.
  • Once convergence has been reached, samples should
    look like a random scatter about a stable mean

  • Convergence occurs here at around 100 iterations.

Checking convergence 2
  • One approach (in WinBUGS) is to run many long
    chains with widely differing starting values.
  • WinBUGS also has the Brooks-Gelman-Rubin
    diagnostic which is based on the ratio of
    between-within chain variances (ANOVA). This
    diagnostic should converge to 1.0 on convergence.
  • MLwiN has other diagnostics that we will cover on

Demo of multiple chains in WinBUGS
  • Here we transfer to the computer for a
    demonstration with the regression example of
    multiple chains (also mention node info)

Demo of multiple chains in WinBUGS
  • Average 80 interval within-chains (blue) and
    pooled 80 interval between chains (green)
    converge to stable values
  • Ratio pooledaverage interval width (red)
    converge to 1.

Convergence in more complex models
  • Convergence in linear regression is (almost)
  • Here is an example of slower convergence

How many iterations after convergence?
  • After convergence, further iterations are needed
    to obtain samples for posterior inference.
  • More iterations more accurate posterior
  • MCMC chains are dependent samples and so the
    dependence or autocorrelation in the chain will
    influence how many iterations we need.
  • Accuracy of the posterior estimates can be
    assessed by the Monte Carlo standard error (MCSE)
    for each parameter.
  • Methods for calculating MCSE are given in later

Inference using posterior samples from MCMC runs
  • A powerful feature of MCMC and the Bayesian
    approach is that all inference is based on the
    joint posterior distribution.
  • We can therefore address a wide range of
    substantive questions by appropriate summaries of
    the posterior.
  • Typically report either the mean or median of the
    posterior samples for each parameter of interest
    as a point estimate
  • 2.5 and 97.5 percentiles of the posterior
    sample for each parameter give a 95 posterior
    credible interval (interval within which the
    parameter lies with probability 0.95)

Derived Quantities
  • Once we have a sample from the posterior we can
    answer lots of questions simply by investigating
    this sample.
  • Examples
  • What is the probability that ?gt0?
  • What is the probability that ?1gt ?2?
  • What is a 95 interval for ?1/(?1 ?2)?
  • See later for examples of these sorts of derived

Logistic regression example
  • In the practical that follows we will look at the
    following dataset of rat tumours and fit a
    logistic regression model to it

Logistic regression model
  • A standard Bayesian logistic regression model for
    this data can be written as follows
  • WinBUGS can fit this model but can we write out
    the conditional posterior distributions and use
    Gibbs Sampling?

Conditional distribution for ß0
This distribution is not a standard distribution
and so we cannot simply simulate from a
standard random number generator. However both
WinBUGS and MLwiN can fit this model using
MCMC. We will however not see how until day 5.
Hints for the next practical
  • In the next practical you will be creating
    WinBUGS code for a logistic regression model.
  • In this practical you get less help and so I
    would suggest that looking at the Seeds example
    in the WinBUGS examples may help. The seeds
    example is more complicated than what you require
    but will be helpful for showing the necessary
    WinBUGS statements.