Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review - PowerPoint PPT Presentation

View by Category
About This Presentation



Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006 – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 22
Provided by: Yut85
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review

Markov Chain Monte Carlo Convergence Diagnostics
A Comparative Review
  • By Mary Kathryn Cowles and Bradley P. Carlin
  • Presented by Yuting Qi
  • 12/01/2006

  • MCMC Convergence Diagnostics
  • Introduce 4 Methods in details
  • Focus on
  • Prescriptive summary
  • Underlying theoretical basis
  • Advantages and disadvantages
  • Comparative results

1. Gelman and Rubin (1992) 1/4
  • What ?
  • Based on normal theory approximation to exact
    Bayesian posterior inference
  • Focus on applied inference for Bayesian posterior
    distributions in real problem, which often tend
    toward normality after transformations and
  • Two major steps
  • Create an overdispersed estimate of the target
    distribution and use it to start several
    independent sequences.
  • Analyze the multiple sequences to form a
    distributional estimate of what is known about
    the target r.v. given the simulations so far.
    The distributional estimate is a Students t
    distribution of each scalar quantity of interest.
  • Convergence
  • Convergence is monitored by estimating the factor
    by which the scale parameter might shrink for
    infinite sampling.

1. Gelman and Rubin (1992) 2/4
  • How ?
  • Step 1 Creating a starting distribution
  • Locate the high-density regions of the target
    distribution of x and find the K modes.
  • Approximate the high-density regions by a GMM
  • Form an overdispersed distribution by first
    drawing from the GMM and then dividing each
    sample by a positive number, which results in a
    mixture t distributions
  • Sharpen the overdispersed approximation by
    downweighting regions that have relatively low
    density through importance resampling for example.

1. Gelman and Rubin (1992) 3/4
  • Step 2 Re-estimating the target distributions
  • Independently simulate m sequences of length 2n
    from the overdispersed distribution and discard
    the first n iterations.
  • For each scalar parameter of interest, estimate
    the following quantity from the last n iterations
    of m sequences
  • B the variance between the means from m
  • W the average of the m within-sequence
  • estimate of target mean mean of mn samples
  • estimate of target variance (unbiased)
  • Estimate the posterior of target distribution as
    a t distribution (considering variability of the
    estimates and ) with center and
    scale .
  • Monitor the convergence by shrink factor
    , as it near 1 for all scalars,
    collect burn-out samples.

1. Gelman and Rubin (1992) 4/4
  • Comments
  • approaches to 1 within-sequences variance
    dominant between-sequences variance, all
    sequences escaped the influence of starting
    points and traverse all target distributions.
  • Quantitative.
  • Criticisms
  • Rely on the users ability to find a start
  • Rely on normal approximation for diagnosing
    convergence to the true posterior.
  • Inefficient, multiple sequences and discard a
    large number of early iterations.

2. Geweke (1992) 1/3
  • What ?
  • Use methods from spectral analysis to assess
    convergence and the intent is to estimate the
    mean Eg(?) of some function g(?) of interest.
  • Collect g(? (j)) after each iteration
  • Treat g(? (j))j1,p as time series and compute
    spectral density SG(?).
  • Use numerical standard error (NSE) and relative
    numerical efficiency (RNE) to monitor
  • Assumption
  • The MCMC process and the importance function g(?)
    , jointly imply the existence of a spectrum, and
    the existence of a spectral density with no
    discontinuities at the frequency 0.

2. Geweke (1992) 2/3
  • How ?
  • Estimate Eg(?) from p iterations
  • Asymptotically estimator
  • Asymptotic variance
  • Determine preliminary iterations
  • Given the sequence G(j)j1,p, if G(j) is
    stationary, as p-gtinf
  • Determine sufficient iterations
  • Numerical standard error (NSE)
  • Relative numerical efficiency (RNE)

Indicating the number of draws wound be required
to produce the same numerical accuracy if the
draws had been made from an iid sample drawn
directly from the posterior distribution.
2. Geweke (1992) 3/3
  • Comments
  • Address the issues of both bias and variance.
  • Is univariate.
  • Require a single sampler chain.
  • Disadvantages
  • Is sensitive to the spectral window.
  • Not specify a procedure for applying the
    diagnostic but leave to the subjective choice of
    the users.

3. Ritter and Tanner (1992) 1/3
  • The Gibbs Stopper
  • Convert the output of the Gibbs sampler to a
    sample from the exact distribution.
  • Assign a weight w to the d-dimensional vector X
    drawn from the current iteration
  • q is a function proportional to the joint
  • gi is the current Gibbs sampler approximation.
  • Assess the convergence
  • If the current approximation to the joint
    distribution is close to the true one, then the
    distribution of the weights will be degenerate
    about a constant.

3. Ritter and Tanner (1992) 2/3
  • Compute gi
  • Let
  • The joint distribution of the samples obtained at
    iteration i1 is
  • gi1(X)
  • The integration can be approximated by Monte
    Carlo method
  • gi1(X) ?
  • X1, , Xm are samples drawn at
    iteration i.

Probability of moving from X (at iteration i)
to X at iteration i1.
3. Ritter and Tanner (1992) 3/3
  • Comments
  • Assess distributional convergence
  • Disadvantages
  • Applicable only with the Gibbs sampler
  • Coding is problem-specific
  • Computation of weights can be time-intensive
  • If full conditionals are not standard
    distributions, we must estimate the normalizing

4. Zellner and Min (1995) 1/3
  • Gibbs Sampler Convergence Criteria (GSC2)
  • Aim to determine whether the Gibbs sampler not
    only has converged, but also has converged to a
    correct result.
  • Divide the model parameters into two parts ?, ?
  • Derive analytical forms for
  • Three convergence criterions
  • Assume (?1, ?1) and (?2, ?2) are two points
    in the parameter space

4. Zellner and Min (1995) 2/3
  • 1. The anchored ratio convergence criterion
  • Calculate
  • If the Gibbs sampler output is satisfactory,
  • and will be close to .
  • 2. The difference convergence criterion (DC2)
  • Since
  • If -gt0, then satisfactory
  • 3. The ratio convergence criterion (RC2)
  • If -gt1, then satisfactory

4. Zellner and Min (1995) 3/3
  • Comments
  • Quantitative
  • Require a single sampler chain
  • Coding is problem-specific and analytical work is
  • Disadvantage
  • Application is limited when the factorization
    cannot be achieved.

Comparative results 1/3
  • Trivariate Normal with high correlations
  • Run the samplers for relatively few iterations to
    test these methods detect convergence failure or

Comparative results 2/3
  • 1. Gelman Rubin shrink factors (-gt1)
  • 2. Geweke NSE (-gt0)

Comparative results 3/4
  • Ritter Tanner Gibbs stopper (weights w -gt

Comparative results 4/4
  • Zellner Min Difference convergence Criterion
    ( -gt 0)

Comparative results 5/5
  • Remarks
  • Gewekes diagnostic appears to be premature
  • Gelman Rubins method may be consistent with
    the fact however choosing the starting points is
  • The results of other methods are difficult to

Summary, Discussion, and Recommendation
  • Be cautious when using these diagnostics
  • Use a variety of diagnostic tools rather than any
    single one
  • Learn as much as possible about the target
    density before applying MCMC algorithm