Inference V: MCMC Methods - PowerPoint PPT Presentation

About This Presentation

Title:

Inference V: MCMC Methods

Description:

... methods that are based on Markov Chain Markov Chain Monte Carlo ... methods that are based on Markov Chain Markov Chain Monte Carlo (MCMC) methods ... – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 22

Provided by: NirFri5

Category:

more less

Transcript and Presenter's Notes

Title: Inference V: MCMC Methods

1
Inference VMCMC Methods
2
Stochastic Sampling

In previous class, we examined methods that use
independent samples to estimate P(X x e )
Problem It is difficult to sample from P(X1, .
Xn e )
We had to use likelihood weighting to reweigh our
samples
This introduced bias in estimation
In some case, such as when the evidence is on
leaves, these methods are inefficient

3
MCMC Methods

We are going to discuss sampling methods that are
based on Markov Chain
Markov Chain Monte Carlo (MCMC) methods
Key ideas
Sampling process as a Markov Chain
Next sample depends on the previous one
These will approximate any posterior distribution
We start by reviewing key ideas from the theory
of Markov chains

4
Markov Chains

Suppose X1, X2, take some set of values
wlog. These values are 1, 2, ...
A Markov chain is a process that corresponds to
the network
To quantify the chain, we need to specify
Initial probability P(X1)
Transition probability P(Xt1Xt)
A Markov chain has stationary transition
probability
P(Xt1Xt) is the same for all times t

5
Irreducible Chains

A state j is accessible from state i if there is
an n such that P(Xn j X1 i) gt 0
There is a positive probability of reaching j
from i after some number steps
A chain is irreducible if every state is
accessible from every state

6
Ergodic Chains

A state is positively recurrent if there is a
finite expected time to get back to state i after
being in state i
If X has finite number of states, then this is
suffices that i is accessible from itself
A chain is ergodic if it is irreducible and every
state is positively recurrent

7
(A)periodic Chains

A state i is periodic if there is an integer d
such thatP(Xn i X1 i ) 0 when n is not
divisible by d
A chain is aperiodic if it contains no periodic
state

8
Stationary Probabilities

Thm
If a chain is ergodic and aperiodic, then the
limitexists, and does not depend on i
Moreover, letthen, P(X) is the unique
probability satisfying

9
Stationary Probabilities

The probability P(X) is the stationary
probability of the process
Regardless of the starting point, the process
will converge to this probability
The rate of convergence depends on properties of
the transition probability

10
Sampling from the stationary probability

This theory suggests how to sample from the
stationary probability
Set X1 i, for some random/arbitrary i
For t 1, 2, , n
Sample a value xt1 for Xt1 from P(Xt1Xtxt)
return xn
If n is large enough, then this is a sample from
P(X)

11
Designing Markov Chains

How do we construct the right chain to sample
from?
Ensuring aperiodicity and irreducibility is
usually easy
Problem is ensuring the desired stationary
probability

12
Designing Markov Chains

Key tool
If the transition probability satisfiesthen,
P(X) Q(X)
This gives a local criteria for checking that the
chain will have the right stationary distribution

13
MCMC Methods

We can use these results to sample from
P(X1,,Xne)
Idea
Construct an ergodic aperiodic Markov Chain
such that P(X1,,Xn) P(X1,,Xne)
Simulate the chain n steps to get a sample

14
MCMC Methods

Notes
The Markov chain variable Y takes as value
assignments to all variables that are consistent
evidence
For simplicity, we will denote such a state using
the vector of variables

15
Gibbs Sampler

One of the simplest MCMC method
At each transition change the state of just on Xi
We can describe the transition probability as a
stochastic procedure
Input a state x1,,xn
Choose i at random (using uniform probability)
Sample xi from P(Xix1, , xi-1, xi1 ,, xn,
e)
let xj xj for all j ? i
return x1,,xn

16
Correctness of Gibbs Sampler

By chain rule
P(x1, , xi-1, xi, xi1 ,, xne) P(x1, ,
xi-1, xi1 ,, xne)P(xix1, , xi-1, xi1 ,,
xn, e)
Thus, we get
Since we choose i from the same distribution at
each stage, this procedure satisfies the ratio
criteria

17
Gibbs Sampling for Bayesian Network

Why is the Gibbs sampler easy in BNs?
Recall that the Markov blanket of a variable
separates it from the other variables in the
network
P(Xi X1,,Xi-1,Xi1,,Xn) P(Xi Mbi )
This property allows us to use local computations
to perform sampling in each transition

18
Gibbs Sampling in Bayesian Networks

How do we evaluate P(Xi x1,,xi-1,xi1,,xn) ?
Let Y1, , Yk be the children of Xi
By definition of Mbi, the parents of Yj are in
Mbi?Xi
It is easy to show that

19
Sampling Strategy

How do we collect the samples?
Strategy I
Run the chain M times, each run for N steps
each run starts from a different state points
Return the last state in each run

M chains
20
Sampling Strategy

Strategy II
Run one chain for a long time
After some burn in period, sample points every
some fixed number of steps

burn in
M samples from one chain
21
Comparing Strategies

Strategy I
Better chance of covering the space of
pointsespecially if the chain is slow to reach
stationarity
Have to perform burn in steps for each chain
Strategy II
Perform burn in only once
Samples might be correlated (although only
weakly)
Hybrid strategy
run several chains, and sample few samples from
each
Combines benefits of both strategies