# Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models - PowerPoint PPT Presentation

PPT – Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models PowerPoint presentation | free to download - id: 65de12-NmI0Y

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models

Description:

### Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models Omiros Papaspiliopoulos and Gareth O. Roberts Presented by Yuting Qi – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 18
Provided by: eeDukeEd5
Category:
Tags:
Transcript and Presenter's Notes

Title: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models

1
Retrospective Markov chain Monte Carlo methods
for Dirichlet process hierarchical models
• Omiros Papaspiliopoulos and Gareth O. Roberts

Presented by Yuting Qi ECE Dept., Duke
Univ. 10/06/06
2
Overview
• DP hierarchical models
• Two Gibbs samplers
• Polya urn (Escobar, west, 94, 95)
• Blocked Gibbs sampler (Ishwaran 00)
• Retrospective sampling
• MCMC for DP with Retrospective sampling
• Performance
• Conclusions

3
DP Mixture Models (1)
• DP Mixture models (DMMs)
• Assume Yi is drawn from a parametric distribution
with parameters Xi and ? .
• All Xi have one common prior P, some Xi may take
same value.
• Prior distribution P DP(? , H?).
• Property of Pólya urn scheme
• Marginalize P,

4
DP Mixture Models (2)
• Explicit form of DP (Sethuraman 94)
• Relationship
• ? is large gt Vj Beta(1, ?) is small gt
small pj, many sticks of short length gt P
consists of infinite number of Zj with small pj
gt P-gtH? .
• ? -gt0 gt Vj Beta(1, ?) is large gt few
large sticks gt P has a large mass on a small
subset of Zj gt most Xi share the same value.

Vj Beta(1, ?) Zj H?
5
Wests Gibbs Sampler (1)
• Estimation of joint posterior
• Sample from full conditional distributions
• H?,0 , posterior by updating prior H? via the
likelihood function,

6
Wests Gibbs Sampler (2)
• Gibbs sampling scheme
• Sample Xi , is equivalent to sample indicator Ki
(Ki k, Xi takes on value Xk) for Xi , given old
Kii1,n , Xkk1,K, generate new Ki from
posteriors.
• For Ki0, draw new Xi from
• For Kigt0, generate a new set of Xk according to
posteriors
• Drawbacks
• Converges slowly
• Difficult to implement when H? and likelihood ?
are not conjugate.

7
Blocked Gibbs Sampler (1)
• Stick-breaking representation
• Estimation of joint posterior
• Update P in each Gibbs iteration
• No Xi involved
• Must truncate at a finite level K

8
Blocked Gibbs Sampler (2)
• Sampling scheme
• Sample Zj
• For those j occupied by Xi, sample Zj from
conditional posterior
• For those j not occupied by any Xi, sample Zj
from base prior H?
• Sample K from its conditional posteriors
• Sample p from its conditional posteriors

pk,j is posterior of pk updated by likelihood
9
Retrospective Sampling (1)
• Retrospective sampling
• In Blocked Gibbs sampler, given pj, sample Ki,
and set Xi Zki , infinite sampled pairs (pj, Zj)
is not feasible.
• To sample Ki, first generate Ui from uniform 0,
1, then set Kij iif
• Retrospective sampling exchanges order of
sampling Ui and sampling the pairs (pj, Zj) .
• If for a given Ui , more pj needed than we
currently have, simulate pairs (pj, Zj)
retrospectively, until
is satisfied.

10
Retrospective Sampling (2)
• Algorithm

11
MCMC for DDM (1)
• MCMC with retrospective sampling
• Notation

12
MCMC for DDM (2)
• Sampling Scheme
• Sample Zj
• Sample p from its conditional posteriors
• Sample K from retrospective sampling.

13
MCMC for DDM (3)
• Sampling K
• Using Metropolis-Hasting steps.
• When update Ki, the sampler proposes to move from
k to k(i,j),
• The distribution for generating the proposed j is

Mi is a constant controls the probability of
proposing j greater than maxk.
14
MCMC for DDM (4)
• Algorithm

15
Performance
• lepto data set (unimodal) 0.67N(0,1)0.33N(0.3,
0.252)
• bimod data set (bimodal) 0.5N(-1,
0.52)0.5N(1, 0.52)

Autocorrelation time a standard way to measure
the speed of convergence
How well the algorithm explores the
high-dimensional model space
16
Performance
17