Approximate Inference: Markov Chain Monte Carlo

About This Presentation

Title:

Approximate Inference: Markov Chain Monte Carlo

Description:

Metropolis-Hastings framework -general -simulated annealing, Gibbs sampling ... Metropolis-Hastings. Random walk through state space (jumps) ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 38

Provided by: garyho5

Category:

more less

Transcript and Presenter's Notes

Title: Approximate Inference: Markov Chain Monte Carlo

1
Approximate Inference Markov Chain Monte Carlo

by
Gary Holness

2
Outline

Summarize 50 years of work!!
Touch on ideas of the theory
Motivate language with classical stats
Bayesian improvement
Sampling techniques
Mention Markov Chain
Metropolis-Hastings framework
-general
-simulated annealing, Gibbs sampling

3
Distributions govern data

Distribution is a lawf(x q )
x (x1,,xn)q (q1,, qn)q (m , s)
Indexed by parameters
Knowing params, can make statements about yet
unseen events

4
Point estimation
L(q)
q
search parameter space

Estimator q is statistic T t(X1,,Xn) can
substitute for q
Consider pdf as some f(q) over X L(q
X1,,Xn ) with our random sample
q argmaxq L(q X1,,Xn ) MLE

5
Bayesian improvement

Point estimation gives single value
Treat unknown quantity as random variable
Have generator f(xq) of evidence
Since q is r.v. write pdf as a conditional f(xq)
Choose prior dist for r.v. p(q)
Using Bayes rule compute posterior

6
Bayes contd

Given posterior for q we can talk about -
expectations of arbitrary functions g(q)
- mode (MAP est.) qMAP argmaxq p(qx)
Unsure about q - p(q) high variance -
improper prior
Posterior not well known - conjugate choose
prior so posterior is same family

7
What else is hard

The proportionality const - only know p(qx)
within constant - large discrete spaces,
marginalizing bad O(an)
In Bayes net
Suppose have posterior for x p(xe)
Computing E xg(x) means integrating/summing
over large space
OK, use sampling can be hard to generate
samples from posterior

8
Monte Carlo (MC)

Need cheaper numerical integration
MC techniques originally for particle physics
Use sampling to approximate - let Y f(X)
- let X1,,Xn be r.s. Xip(xe) r.s. ?
I.I.D -
Stillgenerating r.s. from support of p(xe)
hard
- sample where p(xe) has mass - sample
from scratch every time

9
What if its hard to generate samples?

Propose another dist easier to generate samples
Compute estimator for g(x) using samples
influence the target dist b.
Amount the target and proposal agree on mass of
support importance weight

10
Proposal Dist Example Importance Sampling

We want to estimate function g(x)
Let f(x) be posterior dist
Let q(x) be proposal dist
Sample X1Xn where Xiq(x)

11
Sampling themes

Numerical Integration
Representative sample from support
Approximation for inference
Difficult distributions
Proposal distribution
Account for discrepancies where p and q put mass
Talk in terms of improvements

12
MC direct sampling

Sample from priors and cond in topological order
Sample from posterior
Compute prob of event by counting samples (in the
limit)
Hard to sample from posterior

13
MC rejection sampling

Better estimation
Sample from priors/conditionals, P, given in
Bayes net
Reject samples which disagree with evidence by
direct filtering
Or sample from simpler proposal distribution q

14
MC rejection sampling

P(x) is
Compare mass assigned to sample by P,Q massx ?
Q gt massx?P
Compute posterior from accepted samples
Problem choosing the right Q Q very
different from P ? reject many samples

15
MC Weighted Sampling

Generate samples consistent w/evidence
P(Rain Sprinkler true, WetGrass true)
Fix evidence vars

16
Weighted sampling contd

sample non-evidence vars (using cond tables)
Count sampled events matching query
Samples credibility modulated by
Only incorporates evidence vars which influence
the particular sample

17
MC Weighted sampling

Each sampled event gets weight
Compute posterior by weighted sum of sampled
events
Credibility ? influence on probability mass

18
MCMC

WS-MC performance degrades with increasing number
of evidence variables
Large spaces individual samples carry small
weight
Weighted still generates samples from scratch
Why not generate sample by small modification to
known event?

19
Markov Chain

Given an n-dimensional state space
Random vector x (x1,,xn)
x(t) x at time-step t
x(t) transitions to x(t1) with probP(x(t)
x(t-1),,x(1)) T(x(t) x(t-1))
Homogenous chain determined by state x, fixed
transition kernel T (rows sum to 1)

20
Markov chain

Irreducible transition graph connected
Aperiodic not trapped in cycles
Detailed bal prob(x(i)?x(i-1))
prob(x(I-1)?x(i))
Detailed bal ? stationary dist exists

21
Metropolis-Hastings

Treat target dist as stationary distribution
Build transition matrix so that we satisfy
detailed balance
While sampling from easier proposal dist

22
MCMC Metropolis-Hastings

Have invariant dist p(x)
Proposal dist q(x x)
Sample x given x from q(xx)
Chain transitions to state x with probability
prob(x?x) prob(x?x)

23
Metropolis Hastings

Our transition Kernel becomes

24
Metropolis-Hastings

Initialize x(0)
For i 0 to N-1 - sample uUnif(0,1) -
sample x q(xx) - if u lt
x(i1) x //transition
else x(i1) x(i) //stay in curr state

25
General MCMC example (Jordan)

q(xx) N(x(i),100)
p(x) 0.3 exp(-0.2x2) 0.7 exp(-0.2(x-10)2)

26
Metropolis-Hastings

Random walk through state space (jumps)
Can simulate multiple chains in parallel
Much hinges on proposal dist, q
Want to visit state space where p(x) puts mass
Modes of p(x) visited while A(x,x) high
Chain mixes well

27
Simulated Annaeling Global Optimization

Given p(x) want to find global max
Compute MAP estimate if p posterior
simulate by markov chain to get p(x)

28
Simulated Annealing

Search inefficient, few samples from p(x)s mode
Simulate non-homogeneous chain (transition kernel
changes)
Invariant dist p changes with temperature
concentrates on global maxima

29
MCMC Simulated Annealing

Initialize x(0)
For i 0 to N-1 - sample uUnif(0,1) -
sample x q(xx) - if u lt
x(i1) x else
x(i1) x(i)
-set Ti1 on temperature schedule

30
MCMC-SA

Must choose appropriate schedule
Appropriate proposal dist
Shown to converge for Ti C (ln(i T0)-1)Geman
1984, Van Laarhoven 1987
C and T0 problem dependent
What if state space really huge?

31
Blocking

Large state space state vector comprised of many
components (high dimension)
Some components can be correlated
Sample components one at a time
Each block has own transition kernel governing
how it is changed
Blocks can be correlated components
Can use blocks to get better mixing

32
Markov chain w/cycle of Kernels

Sample from joint dist XX1,,Xn
With prob P(x1,,xn)
Build chain from kernel Tk for k1nTk(x(i1)
x(i)) P(x(i1)k x-k)
x-k xi i !k
Using kernel Tk change component k by sampling
Apply Tk for k1n in sequence (or rand)

33
Cycle of kernels

T(x(i1) x(i)) Tj(x(j1) x(j))
T T1T2Tn
Sample from proposal distribution for each
component q(xx(i))
Acceptance probability still
Play with blocking/cycling to improve mixing

34
Gibbs Sampling

Initialize x
For i 0 to N-1 - sample x1(i1) - sample
x2(I1) - sample xn(i1) - sample
uUnif(0,1) - if u lt A(x(i),x) by choosing
q p, A1 x(i1) x else
x(i1) x(i)

35
Gibbs sampling in Bayes net

Dont sample from full component-at-a-time
conditional
Use Structure of Bayes net
Condition on Markov Blanket
A,B,C,I,L ? G Mb(G)

36
Summary

Large problem approximate by sampling
Difficult distribution Use proposal
Markov Chain sampling by re-use/walk
Detailed balance target dist exists
Sample and test against acceptance
Want good mixing
Many, many, many, MCMC variations out there

37
Thank you for listeningHow can MCMC make all
your dreams come true?

Write a Comment

User Comments (0)

About PowerShow.com

Approximate Inference: Markov Chain Monte Carlo - PowerPoint PPT Presentation

Approximate Inference: Markov Chain Monte Carlo

Metropolis-Hastings framework -general -simulated annealing, Gibbs sampling ... Metropolis-Hastings. Random walk through state space (jumps) ... – PowerPoint PPT presentation