Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review

Description:

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006 – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 22
Provided by: Yut85
Learn more at: http://ece.duke.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Markov%20Chain%20Monte%20Carlo%20Convergence%20Diagnostics:%20A%20Comparative%20Review


1
Markov Chain Monte Carlo Convergence Diagnostics
A Comparative Review
  • By Mary Kathryn Cowles and Bradley P. Carlin
  • Presented by Yuting Qi
  • 12/01/2006

2
OUTLINE
  • MCMC Convergence Diagnostics
  • Introduce 4 Methods in details
  • Focus on
  • Prescriptive summary
  • Underlying theoretical basis
  • Advantages and disadvantages
  • Comparative results

3
1. Gelman and Rubin (1992) 1/4
  • What ?
  • Based on normal theory approximation to exact
    Bayesian posterior inference
  • Focus on applied inference for Bayesian posterior
    distributions in real problem, which often tend
    toward normality after transformations and
    marginalization.
  • Two major steps
  • Create an overdispersed estimate of the target
    distribution and use it to start several
    independent sequences.
  • Analyze the multiple sequences to form a
    distributional estimate of what is known about
    the target r.v. given the simulations so far.
    The distributional estimate is a Students t
    distribution of each scalar quantity of interest.
  • Convergence
  • Convergence is monitored by estimating the factor
    by which the scale parameter might shrink for
    infinite sampling.

4
1. Gelman and Rubin (1992) 2/4
  • How ?
  • Step 1 Creating a starting distribution
  • Locate the high-density regions of the target
    distribution of x and find the K modes.
  • Approximate the high-density regions by a GMM
  • Form an overdispersed distribution by first
    drawing from the GMM and then dividing each
    sample by a positive number, which results in a
    mixture t distributions
  • Sharpen the overdispersed approximation by
    downweighting regions that have relatively low
    density through importance resampling for example.

5
1. Gelman and Rubin (1992) 3/4
  • Step 2 Re-estimating the target distributions
  • Independently simulate m sequences of length 2n
    from the overdispersed distribution and discard
    the first n iterations.
  • For each scalar parameter of interest, estimate
    the following quantity from the last n iterations
    of m sequences
  • B the variance between the means from m
    sequences
  • W the average of the m within-sequence
    variances
  • estimate of target mean mean of mn samples
  • estimate of target variance (unbiased)
  • Estimate the posterior of target distribution as
    a t distribution (considering variability of the
    estimates and ) with center and
    scale .
  • Monitor the convergence by shrink factor
    , as it near 1 for all scalars,
    collect burn-out samples.

6
1. Gelman and Rubin (1992) 4/4
  • Comments
  • approaches to 1 within-sequences variance
    dominant between-sequences variance, all
    sequences escaped the influence of starting
    points and traverse all target distributions.
  • Quantitative.
  • Criticisms
  • Rely on the users ability to find a start
    distribution.
  • Rely on normal approximation for diagnosing
    convergence to the true posterior.
  • Inefficient, multiple sequences and discard a
    large number of early iterations.

7
2. Geweke (1992) 1/3
  • What ?
  • Use methods from spectral analysis to assess
    convergence and the intent is to estimate the
    mean Eg(?) of some function g(?) of interest.
  • Collect g(? (j)) after each iteration
  • Treat g(? (j))j1,p as time series and compute
    spectral density SG(?).
  • Use numerical standard error (NSE) and relative
    numerical efficiency (RNE) to monitor
    convergence.
  • Assumption
  • The MCMC process and the importance function g(?)
    , jointly imply the existence of a spectrum, and
    the existence of a spectral density with no
    discontinuities at the frequency 0.

8
2. Geweke (1992) 2/3
  • How ?
  • Estimate Eg(?) from p iterations
  • Asymptotically estimator
  • Asymptotic variance
  • Determine preliminary iterations
  • Given the sequence G(j)j1,p, if G(j) is
    stationary, as p-gtinf
  • Determine sufficient iterations
  • Numerical standard error (NSE)
  • Relative numerical efficiency (RNE)

0
Indicating the number of draws wound be required
to produce the same numerical accuracy if the
draws had been made from an iid sample drawn
directly from the posterior distribution.
9
2. Geweke (1992) 3/3
  • Comments
  • Address the issues of both bias and variance.
  • Is univariate.
  • Require a single sampler chain.
  • Disadvantages
  • Is sensitive to the spectral window.
  • Not specify a procedure for applying the
    diagnostic but leave to the subjective choice of
    the users.

10
3. Ritter and Tanner (1992) 1/3
  • The Gibbs Stopper
  • Convert the output of the Gibbs sampler to a
    sample from the exact distribution.
  • Assign a weight w to the d-dimensional vector X
    drawn from the current iteration
  • q is a function proportional to the joint
    distribution
  • gi is the current Gibbs sampler approximation.
  • Assess the convergence
  • If the current approximation to the joint
    distribution is close to the true one, then the
    distribution of the weights will be degenerate
    about a constant.

11
3. Ritter and Tanner (1992) 2/3
  • Compute gi
  • Let
  • The joint distribution of the samples obtained at
    iteration i1 is
  • gi1(X)
  • The integration can be approximated by Monte
    Carlo method
  • gi1(X) ?
  • X1, , Xm are samples drawn at
    iteration i.

Probability of moving from X (at iteration i)
to X at iteration i1.
12
3. Ritter and Tanner (1992) 3/3
  • Comments
  • Assess distributional convergence
  • Disadvantages
  • Applicable only with the Gibbs sampler
  • Coding is problem-specific
  • Computation of weights can be time-intensive
  • If full conditionals are not standard
    distributions, we must estimate the normalizing
    constants.

13
4. Zellner and Min (1995) 1/3
  • Gibbs Sampler Convergence Criteria (GSC2)
  • Aim to determine whether the Gibbs sampler not
    only has converged, but also has converged to a
    correct result.
  • Divide the model parameters into two parts ?, ?
  • Derive analytical forms for
  • Three convergence criterions
  • Assume (?1, ?1) and (?2, ?2) are two points
    in the parameter space

prior
likelihood
14
4. Zellner and Min (1995) 2/3
  • 1. The anchored ratio convergence criterion
    (ARC2)
  • Calculate
  • If the Gibbs sampler output is satisfactory,
    then
  • and will be close to .
  • 2. The difference convergence criterion (DC2)
  • Since
  • If -gt0, then satisfactory
  • 3. The ratio convergence criterion (RC2)
  • If -gt1, then satisfactory

15
4. Zellner and Min (1995) 3/3
  • Comments
  • Quantitative
  • Require a single sampler chain
  • Coding is problem-specific and analytical work is
    needed
  • Disadvantage
  • Application is limited when the factorization
    cannot be achieved.

16
Comparative results 1/3
  • Trivariate Normal with high correlations
  • Run the samplers for relatively few iterations to
    test these methods detect convergence failure or
    ambiguity.

17
Comparative results 2/3
  • 1. Gelman Rubin shrink factors (-gt1)
  • 2. Geweke NSE (-gt0)

18
Comparative results 3/4
  • Ritter Tanner Gibbs stopper (weights w -gt
    constant)

19
Comparative results 4/4
  • Zellner Min Difference convergence Criterion
    ( -gt 0)

20
Comparative results 5/5
  • Remarks
  • Gewekes diagnostic appears to be premature
  • Gelman Rubins method may be consistent with
    the fact however choosing the starting points is
    critical
  • The results of other methods are difficult to
    interpret.

21
Summary, Discussion, and Recommendation
  • Be cautious when using these diagnostics
  • Use a variety of diagnostic tools rather than any
    single one
  • Learn as much as possible about the target
    density before applying MCMC algorithm
About PowerShow.com