An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo) - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

Description:

An introduction to MCMC for Machine Learning. Andrieu et al. ... 1) Irreducibility. That is every state must be (eventually) reachable from every other state. ... – PowerPoint PPT presentation

Number of Views:547

Avg rating:3.0/5.0

Slides: 24

Provided by: cvSn

Category:

more less

Transcript and Presenter's Notes

Title: An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

1
An Introduction to MCMC for Machine Learning
(Markov Chain Monte Carlo)
Young Ki Baik
Computer Vision Lab. SNU
2
References

An introduction to MCMC for Machine Learning
Andrieu et al. (Machine Learning 2003)
Introduction to Monte Carlo methods
David MacKay.
Markov Chain Monte Carlo for Computer Vision
Zhu, Delleart and Tu. (a tutorial at ICCV05)
http//civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm
Various PPTs for MCMC in the web

3
Contents

MCMC
Metropolis-Hasting algorithm
Mixture and cycles of MCMC kernels
Auxiliary variable sampler
Adaptive MCMC
Other application of MCMC
Convergence problem and Trick of MCMC
Remained Problems
Conclusion

4
MCMC

Problem of MC(Monte Carlo)
Assembling the entire distribution for MC is
usually hard
Complicated energy landscapes
High-dimensional system.
Extraordinarily difficult normalization
Solution MCMC
Build up distribution from Markov chain
Choose local transition probabilities which
generate distribution of interest (ensure
detailed balance)
Each random variable is chosen based on the
previous variable in the chain
Walk along the Markov chain until convergence
reached
Result Normalization not required, calculation
are local

5
MCMC

What is Markov Chain?
A Markov chain is a mathematical model for
stochastic system that generates random variable
X1, X2, , Xt, where the distribution
The distribution of the next random variable
depends only on the current random variable.
The entire chain represents a stationary
probability distribution.

6
MCMC

What is Markov Chain Monte Carlo?
MCMC is general purpose technique for generating
fair samples from a probability in
high-dimensional space, using random numbers
(dice) drawn from uniform probability in certain
range.

Markov chain states
Independent trials of dice
7
MCMC

MCMC as a general purpose computing technique
Task 1 simulation draw fair (typical) samples
from a probability which governs a system.
Task 2 Integration/computing in very high
dimensions, i.e. to compute
Task 3 Optimization with an annealing scheme
Task 4 Learning
un supervised learning with hidden variables
(simulated from posterior) or MLE learning of
parameters p(xT) needs simulations as well.

8
MCMC

Some notation
The stochastic process is called a Markov
chain if
The chain is homogeneous if
remains invariant for all i, with
for any i.
Chain depends solely on the current state of the
chain and a fixed transition matrix.

9
MCMC

Example
Transition graph for Markov chain
with three state (s3)
Transition matrix
For the initial state
This stability result plays a fundamental role in
MCMC.

0.1
0.9
1
0.4
0.6
10
MCMC

Convergence properties
For any starting point, the chain will
convergence to the invariant distribution p(x),
as long as T is a stochastic transition matrix
that obeys the following properties
1) Irreducibility
That is every state must be (eventually)
reachable from every other state.
2) Aperiodicity
This stops the chain from oscillating
between different states
3) Reversibility (detailed balance) condition
This holds the system remain its stationary
distribution.
.

discrete
continuous
Kernal, proposal distribution
11
MCMC

Eigen-analysis
From the spectral theory, p(x) is the left
eigenvector of the matrix T with corresponding
eigenvalue 1.
The second largest eigenvalue determines the rate
of convergence of the chain, and should be as
small as possible.

Stationary distribution
Eigenvalue v1 always 1
12
Metropolis-Hastings algorithm

The MH algorithm
The most popular MCMC method
Invariant distgribution p(x)
Proposal distribution q(xx)
Candidate value x
Acceptance probability A(x,x)
Kernel K
.

13
Metropolis-Hastings algorithm

Results of running the MH algorithm
Target distribution

Proposal distribution
14
Metropolis-Hastings algorithm

Different choices of the proposal standard
deviation
MH requires careful design of the proposal
distribution.
If is narrow, only 1 mode of p(x) might be
visited.
If is too wide,
the rejection rate can be high.
If all the modes are visited while
the acceptance probability is high,
the chain is said to mix well.

15
Mixture and cycles of MCMC kernels

Mixture and cycle
It is possible to combine several samplers into
mixture and cycles of individual samplers.
If transition kernels K1, K2 have invariant
distribution, then cycle hybrid kernel and
mixture hybrid kernel are also transition kernels.

16
Mixture and cycles of MCMC kernels

Mixtures of kernels
Incorporate global proposals to explore vast
region of the state space and local proposals to
discover finer details of the target
distribution.
-gt target distribution with many narrow peaks
( reversible jump MCMC algorithm)

17
Mixture and cycles of MCMC kernels

Cycles of kernels
Split a multivariate state vector into components
(block) -gt It can be updated separately.
-gt Blocking highly correlated variables
( Gibbs sampling algorithm)

18
Auxiliary variable samplers

Auxiliary variable
It is often easier to sample from an augmented
distribution p(x,u), where u is an auxiliary
variable.
It is possible to obtain marginal samples x by
sampling (x, u), and ignoring the sample u.
Hybrid Monte Carlo (HMC)
Use gradient approximation
Slice sampling

19
Adaptive MCMC

Adaptive selection of proposal distribution
The variance of proposal distribution is
important.
To automate the process of choosing the proposal
distribution as much as possible.
Problem
Adaptive MCMC can disturb the stationary
distribution.
Gelfand and Sahu(1994)
Station distribution is disturbed despite the
fact that each participating kernel has the same
stationary distribution.
Avoidance
Carry out adaptation only initial fixed number of
step.
Parallel chains
And so on
-gt inefficient, much more research is
required.

20
Other application of MCMC

Simulated annealing method for global
optimization
To find global maximum of p(x)
Monte Carlo EM
To find fast approximation for E-step
Sequential Monte Carlo method and particle
filters
To carry out on-line approximation of probability
distributions using samples.
-gtusing parallel sampling

21
Convergence problem and Trick of MCMC

Convergence problem
Determining the length of the Markov chain is a
difficult task.
Trick
Initial set problem (for starting biases)
Discards an initial set of samples (Burn-in)
Set initial sample value manually.
Markov chain test
Apply several graphical and statistical tests to
assess if the chain has stabilized.
-gt It doesnt provide entirely satisfactory
diagnostics.
Study about convergence problem

22
Remained problems

Large dimension model
The combination of sampling algorithm with either
gradient optimization or exact one.
Massive data set
A few solution based on importance sampling have
been proposed.
Many and varied applications
-gt But there is still great room for innovation
in this area.

23
Conclusion

MCMC
The Markov Chain Monte Carlo methods cover a
variety of different fields and applications.
There are great opportunities for combining
existing sub-optimal algorithms with MCMC in many
machine learning problems.
Some areas are already benefiting from sampling
methods include

Tracking, restoration, segmentation Probabilistic
graphical models Classification Data association
for localization Classical mixture models.

Write a Comment

User Comments (0)