# Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review - PowerPoint PPT Presentation

View by Category
Title:

## Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review

Description:

### Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006 – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 22
Provided by: Yut85
Category:
Tags:
Transcript and Presenter's Notes

Title: Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review

1
Markov Chain Monte Carlo Convergence Diagnostics
A Comparative Review
• By Mary Kathryn Cowles and Bradley P. Carlin
• Presented by Yuting Qi
• 12/01/2006

2
OUTLINE
• MCMC Convergence Diagnostics
• Introduce 4 Methods in details
• Focus on
• Prescriptive summary
• Underlying theoretical basis
• Comparative results

3
1. Gelman and Rubin (1992) 1/4
• What ?
• Based on normal theory approximation to exact
Bayesian posterior inference
• Focus on applied inference for Bayesian posterior
distributions in real problem, which often tend
toward normality after transformations and
marginalization.
• Two major steps
• Create an overdispersed estimate of the target
distribution and use it to start several
independent sequences.
• Analyze the multiple sequences to form a
distributional estimate of what is known about
the target r.v. given the simulations so far.
The distributional estimate is a Students t
distribution of each scalar quantity of interest.
• Convergence
• Convergence is monitored by estimating the factor
by which the scale parameter might shrink for
infinite sampling.

4
1. Gelman and Rubin (1992) 2/4
• How ?
• Step 1 Creating a starting distribution
• Locate the high-density regions of the target
distribution of x and find the K modes.
• Approximate the high-density regions by a GMM
• Form an overdispersed distribution by first
drawing from the GMM and then dividing each
sample by a positive number, which results in a
mixture t distributions
• Sharpen the overdispersed approximation by
downweighting regions that have relatively low
density through importance resampling for example.

5
1. Gelman and Rubin (1992) 3/4
• Step 2 Re-estimating the target distributions
• Independently simulate m sequences of length 2n
from the overdispersed distribution and discard
the first n iterations.
• For each scalar parameter of interest, estimate
the following quantity from the last n iterations
of m sequences
• B the variance between the means from m
sequences
• W the average of the m within-sequence
variances
• estimate of target mean mean of mn samples
• estimate of target variance (unbiased)
• Estimate the posterior of target distribution as
a t distribution (considering variability of the
estimates and ) with center and
scale .
• Monitor the convergence by shrink factor
, as it near 1 for all scalars,
collect burn-out samples.

6
1. Gelman and Rubin (1992) 4/4
• approaches to 1 within-sequences variance
dominant between-sequences variance, all
sequences escaped the influence of starting
points and traverse all target distributions.
• Quantitative.
• Criticisms
• Rely on the users ability to find a start
distribution.
• Rely on normal approximation for diagnosing
convergence to the true posterior.
• Inefficient, multiple sequences and discard a
large number of early iterations.

7
2. Geweke (1992) 1/3
• What ?
• Use methods from spectral analysis to assess
convergence and the intent is to estimate the
mean Eg(?) of some function g(?) of interest.
• Collect g(? (j)) after each iteration
• Treat g(? (j))j1,p as time series and compute
spectral density SG(?).
• Use numerical standard error (NSE) and relative
numerical efficiency (RNE) to monitor
convergence.
• Assumption
• The MCMC process and the importance function g(?)
, jointly imply the existence of a spectrum, and
the existence of a spectral density with no
discontinuities at the frequency 0.

8
2. Geweke (1992) 2/3
• How ?
• Estimate Eg(?) from p iterations
• Asymptotically estimator
• Asymptotic variance
• Determine preliminary iterations
• Given the sequence G(j)j1,p, if G(j) is
stationary, as p-gtinf
• Determine sufficient iterations
• Numerical standard error (NSE)
• Relative numerical efficiency (RNE)

0
Indicating the number of draws wound be required
to produce the same numerical accuracy if the
directly from the posterior distribution.
9
2. Geweke (1992) 3/3
• Address the issues of both bias and variance.
• Is univariate.
• Require a single sampler chain.
• Is sensitive to the spectral window.
• Not specify a procedure for applying the
diagnostic but leave to the subjective choice of
the users.

10
3. Ritter and Tanner (1992) 1/3
• The Gibbs Stopper
• Convert the output of the Gibbs sampler to a
sample from the exact distribution.
• Assign a weight w to the d-dimensional vector X
drawn from the current iteration
• q is a function proportional to the joint
distribution
• gi is the current Gibbs sampler approximation.
• Assess the convergence
• If the current approximation to the joint
distribution is close to the true one, then the
distribution of the weights will be degenerate

11
3. Ritter and Tanner (1992) 2/3
• Compute gi
• Let
• The joint distribution of the samples obtained at
iteration i1 is
• gi1(X)
• The integration can be approximated by Monte
Carlo method
• gi1(X) ?
• X1, , Xm are samples drawn at
iteration i.

Probability of moving from X (at iteration i)
to X at iteration i1.
12
3. Ritter and Tanner (1992) 3/3
• Assess distributional convergence
• Applicable only with the Gibbs sampler
• Coding is problem-specific
• Computation of weights can be time-intensive
• If full conditionals are not standard
distributions, we must estimate the normalizing
constants.

13
4. Zellner and Min (1995) 1/3
• Gibbs Sampler Convergence Criteria (GSC2)
• Aim to determine whether the Gibbs sampler not
only has converged, but also has converged to a
correct result.
• Divide the model parameters into two parts ?, ?
• Derive analytical forms for
• Three convergence criterions
• Assume (?1, ?1) and (?2, ?2) are two points
in the parameter space

prior
likelihood
14
4. Zellner and Min (1995) 2/3
• 1. The anchored ratio convergence criterion
(ARC2)
• Calculate
• If the Gibbs sampler output is satisfactory,
then
• and will be close to .
• 2. The difference convergence criterion (DC2)
• Since
• If -gt0, then satisfactory
• 3. The ratio convergence criterion (RC2)
• If -gt1, then satisfactory

15
4. Zellner and Min (1995) 3/3
• Quantitative
• Require a single sampler chain
• Coding is problem-specific and analytical work is
needed
• Application is limited when the factorization
cannot be achieved.

16
Comparative results 1/3
• Trivariate Normal with high correlations
• Run the samplers for relatively few iterations to
test these methods detect convergence failure or
ambiguity.

17
Comparative results 2/3
• 1. Gelman Rubin shrink factors (-gt1)
• 2. Geweke NSE (-gt0)

18
Comparative results 3/4
• Ritter Tanner Gibbs stopper (weights w -gt
constant)

19
Comparative results 4/4
• Zellner Min Difference convergence Criterion
( -gt 0)

20
Comparative results 5/5
• Remarks
• Gewekes diagnostic appears to be premature
• Gelman Rubins method may be consistent with
the fact however choosing the starting points is
critical
• The results of other methods are difficult to
interpret.

21
Summary, Discussion, and Recommendation
• Be cautious when using these diagnostics
• Use a variety of diagnostic tools rather than any
single one
• Learn as much as possible about the target
density before applying MCMC algorithm