Approximate Inference: Markov Chain Monte Carlo - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Approximate Inference: Markov Chain Monte Carlo

Description:

Metropolis-Hastings framework -general -simulated annealing, Gibbs sampling ... Metropolis-Hastings. Random walk through state space (jumps) ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 38
Provided by: garyho5
Category:

less

Transcript and Presenter's Notes

Title: Approximate Inference: Markov Chain Monte Carlo


1
Approximate Inference Markov Chain Monte Carlo
  • by
  • Gary Holness

2
Outline
  • Summarize 50 years of work!!
  • Touch on ideas of the theory
  • Motivate language with classical stats
  • Bayesian improvement
  • Sampling techniques
  • Mention Markov Chain
  • Metropolis-Hastings framework
  • -general
  • -simulated annealing, Gibbs sampling

3
Distributions govern data
  • Distribution is a lawf(x q )
  • x (x1,,xn)q (q1,, qn)q (m , s)
  • Indexed by parameters
  • Knowing params, can make statements about yet
    unseen events

4
Point estimation
L(q)
q
search parameter space
  • Estimator q is statistic T t(X1,,Xn) can
    substitute for q
  • Consider pdf as some f(q) over X L(q
    X1,,Xn ) with our random sample
  • q argmaxq L(q X1,,Xn ) MLE

5
Bayesian improvement
  • Point estimation gives single value
  • Treat unknown quantity as random variable
  • Have generator f(xq) of evidence
  • Since q is r.v. write pdf as a conditional f(xq)
  • Choose prior dist for r.v. p(q)
  • Using Bayes rule compute posterior

6
Bayes contd
  • Given posterior for q we can talk about -
    expectations of arbitrary functions g(q)
    - mode (MAP est.) qMAP argmaxq p(qx)
  • Unsure about q - p(q) high variance -
    improper prior
  • Posterior not well known - conjugate choose
    prior so posterior is same family

7
What else is hard
  • The proportionality const - only know p(qx)
    within constant - large discrete spaces,
    marginalizing bad O(an)
  • In Bayes net
  • Suppose have posterior for x p(xe)
  • Computing E xg(x) means integrating/summing
    over large space
  • OK, use sampling can be hard to generate
    samples from posterior

8
Monte Carlo (MC)
  • Need cheaper numerical integration
  • MC techniques originally for particle physics
  • Use sampling to approximate - let Y f(X)
    - let X1,,Xn be r.s. Xip(xe) r.s. ?
    I.I.D -
  • Stillgenerating r.s. from support of p(xe)
    hard
  • - sample where p(xe) has mass - sample
    from scratch every time

9
What if its hard to generate samples?
  • Propose another dist easier to generate samples
  • Compute estimator for g(x) using samples
    influence the target dist b.
  • Amount the target and proposal agree on mass of
    support importance weight

10
Proposal Dist Example Importance Sampling
  • We want to estimate function g(x)
  • Let f(x) be posterior dist
  • Let q(x) be proposal dist
  • Sample X1Xn where Xiq(x)

11
Sampling themes
  • Numerical Integration
  • Representative sample from support
  • Approximation for inference
  • Difficult distributions
  • Proposal distribution
  • Account for discrepancies where p and q put mass
  • Talk in terms of improvements

12
MC direct sampling
  • Sample from priors and cond in topological order
  • Sample from posterior
  • Compute prob of event by counting samples (in the
    limit)
  • Hard to sample from posterior

13
MC rejection sampling
  • Better estimation
  • Sample from priors/conditionals, P, given in
    Bayes net
  • Reject samples which disagree with evidence by
    direct filtering
  • Or sample from simpler proposal distribution q

14
MC rejection sampling
  • P(x) is
  • Compare mass assigned to sample by P,Q massx ?
    Q gt massx?P
  • Compute posterior from accepted samples
  • Problem choosing the right Q Q very
    different from P ? reject many samples

15
MC Weighted Sampling
  • Generate samples consistent w/evidence
  • P(Rain Sprinkler true, WetGrass true)
  • Fix evidence vars

16
Weighted sampling contd
  • sample non-evidence vars (using cond tables)
  • Count sampled events matching query
  • Samples credibility modulated by
  • Only incorporates evidence vars which influence
    the particular sample

17
MC Weighted sampling
  • Each sampled event gets weight
  • Compute posterior by weighted sum of sampled
    events
  • Credibility ? influence on probability mass

18
MCMC
  • WS-MC performance degrades with increasing number
    of evidence variables
  • Large spaces individual samples carry small
    weight
  • Weighted still generates samples from scratch
  • Why not generate sample by small modification to
    known event?

19
Markov Chain
  • Given an n-dimensional state space
  • Random vector x (x1,,xn)
  • x(t) x at time-step t
  • x(t) transitions to x(t1) with probP(x(t)
    x(t-1),,x(1)) T(x(t) x(t-1))
  • Homogenous chain determined by state x, fixed
    transition kernel T (rows sum to 1)

20
Markov chain
  • Irreducible transition graph connected
  • Aperiodic not trapped in cycles
  • Detailed bal prob(x(i)?x(i-1))
    prob(x(I-1)?x(i))
  • Detailed bal ? stationary dist exists

21
Metropolis-Hastings
  • Treat target dist as stationary distribution
  • Build transition matrix so that we satisfy
    detailed balance
  • While sampling from easier proposal dist

22
MCMC Metropolis-Hastings
  • Have invariant dist p(x)
  • Proposal dist q(x x)
  • Sample x given x from q(xx)
  • Chain transitions to state x with probability
  • prob(x?x) prob(x?x)

23
Metropolis Hastings
  • Our transition Kernel becomes

24
Metropolis-Hastings
  • Initialize x(0)
  • For i 0 to N-1 - sample uUnif(0,1) -
    sample x q(xx) - if u lt
  • x(i1) x //transition
    else x(i1) x(i) //stay in curr state

25
General MCMC example (Jordan)
  • q(xx) N(x(i),100)
  • p(x) 0.3 exp(-0.2x2) 0.7 exp(-0.2(x-10)2)

26
Metropolis-Hastings
  • Random walk through state space (jumps)
  • Can simulate multiple chains in parallel
  • Much hinges on proposal dist, q
  • Want to visit state space where p(x) puts mass
  • Modes of p(x) visited while A(x,x) high
  • Chain mixes well

27
Simulated Annaeling Global Optimization
  • Given p(x) want to find global max
  • Compute MAP estimate if p posterior
  • simulate by markov chain to get p(x)

28
Simulated Annealing
  • Search inefficient, few samples from p(x)s mode
  • Simulate non-homogeneous chain (transition kernel
    changes)
  • Invariant dist p changes with temperature
  • concentrates on global maxima

29
MCMC Simulated Annealing
  • Initialize x(0)
  • For i 0 to N-1 - sample uUnif(0,1) -
    sample x q(xx) - if u lt
  • x(i1) x else
    x(i1) x(i)
  • -set Ti1 on temperature schedule

30
MCMC-SA
  • Must choose appropriate schedule
  • Appropriate proposal dist
  • Shown to converge for Ti C (ln(i T0)-1)Geman
    1984, Van Laarhoven 1987
  • C and T0 problem dependent
  • What if state space really huge?

31
Blocking
  • Large state space state vector comprised of many
    components (high dimension)
  • Some components can be correlated
  • Sample components one at a time
  • Each block has own transition kernel governing
    how it is changed
  • Blocks can be correlated components
  • Can use blocks to get better mixing

32
Markov chain w/cycle of Kernels
  • Sample from joint dist XX1,,Xn
  • With prob P(x1,,xn)
  • Build chain from kernel Tk for k1nTk(x(i1)
    x(i)) P(x(i1)k x-k)
  • x-k xi i !k
  • Using kernel Tk change component k by sampling
  • Apply Tk for k1n in sequence (or rand)

33
Cycle of kernels
  • T(x(i1) x(i)) Tj(x(j1) x(j))
  • T T1T2Tn
  • Sample from proposal distribution for each
    component q(xx(i))
  • Acceptance probability still
  • Play with blocking/cycling to improve mixing

34
Gibbs Sampling
  • Initialize x
  • For i 0 to N-1 - sample x1(i1) - sample
    x2(I1) - sample xn(i1) - sample
    uUnif(0,1) - if u lt A(x(i),x) by choosing
    q p, A1 x(i1) x else
    x(i1) x(i)

35
Gibbs sampling in Bayes net
  • Dont sample from full component-at-a-time
    conditional
  • Use Structure of Bayes net
  • Condition on Markov Blanket
  • A,B,C,I,L ? G Mb(G)

36
Summary
  • Large problem approximate by sampling
  • Difficult distribution Use proposal
  • Markov Chain sampling by re-use/walk
  • Detailed balance target dist exists
  • Sample and test against acceptance
  • Want good mixing
  • Many, many, many, MCMC variations out there

37
Thank you for listeningHow can MCMC make all
your dreams come true?
Write a Comment
User Comments (0)
About PowerShow.com