An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo) - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

Description:

An introduction to MCMC for Machine Learning. Andrieu et al. ... 1) Irreducibility. That is every state must be (eventually) reachable from every other state. ... – PowerPoint PPT presentation

Number of Views:547
Avg rating:3.0/5.0
Slides: 24
Provided by: cvSn
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)


1
An Introduction to MCMC for Machine Learning
(Markov Chain Monte Carlo)
Young Ki Baik
Computer Vision Lab. SNU
2
References
  • An introduction to MCMC for Machine Learning
  • Andrieu et al. (Machine Learning 2003)
  • Introduction to Monte Carlo methods
  • David MacKay.
  • Markov Chain Monte Carlo for Computer Vision
  • Zhu, Delleart and Tu. (a tutorial at ICCV05)
  • http//civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm
  • Various PPTs for MCMC in the web

3
Contents
  • MCMC
  • Metropolis-Hasting algorithm
  • Mixture and cycles of MCMC kernels
  • Auxiliary variable sampler
  • Adaptive MCMC
  • Other application of MCMC
  • Convergence problem and Trick of MCMC
  • Remained Problems
  • Conclusion

4
MCMC
  • Problem of MC(Monte Carlo)
  • Assembling the entire distribution for MC is
    usually hard
  • Complicated energy landscapes
  • High-dimensional system.
  • Extraordinarily difficult normalization
  • Solution MCMC
  • Build up distribution from Markov chain
  • Choose local transition probabilities which
    generate distribution of interest (ensure
    detailed balance)
  • Each random variable is chosen based on the
    previous variable in the chain
  • Walk along the Markov chain until convergence
    reached
  • Result Normalization not required, calculation
    are local

5
MCMC
  • What is Markov Chain?
  • A Markov chain is a mathematical model for
    stochastic system that generates random variable
    X1, X2, , Xt, where the distribution
  • The distribution of the next random variable
    depends only on the current random variable.
  • The entire chain represents a stationary
    probability distribution.

6
MCMC
  • What is Markov Chain Monte Carlo?
  • MCMC is general purpose technique for generating
    fair samples from a probability in
    high-dimensional space, using random numbers
    (dice) drawn from uniform probability in certain
    range.

Markov chain states
Independent trials of dice
7
MCMC
  • MCMC as a general purpose computing technique
  • Task 1 simulation draw fair (typical) samples
    from a probability which governs a system.
  • Task 2 Integration/computing in very high
    dimensions, i.e. to compute
  • Task 3 Optimization with an annealing scheme
  • Task 4 Learning
  • un supervised learning with hidden variables
    (simulated from posterior) or MLE learning of
    parameters p(xT) needs simulations as well.

8
MCMC
  • Some notation
  • The stochastic process is called a Markov
    chain if
  • The chain is homogeneous if
    remains invariant for all i, with
    for any i.
  • Chain depends solely on the current state of the
    chain and a fixed transition matrix.

9
MCMC
  • Example
  • Transition graph for Markov chain
  • with three state (s3)
  • Transition matrix
  • For the initial state
  • This stability result plays a fundamental role in
    MCMC.

0.1
0.9
1
0.4
0.6
10
MCMC
  • Convergence properties
  • For any starting point, the chain will
    convergence to the invariant distribution p(x),
    as long as T is a stochastic transition matrix
    that obeys the following properties
  • 1) Irreducibility
  • That is every state must be (eventually)
    reachable from every other state.
  • 2) Aperiodicity
  • This stops the chain from oscillating
    between different states
  • 3) Reversibility (detailed balance) condition
  • This holds the system remain its stationary
    distribution.
  • .

discrete
continuous
Kernal, proposal distribution
11
MCMC
  • Eigen-analysis
  • From the spectral theory, p(x) is the left
    eigenvector of the matrix T with corresponding
    eigenvalue 1.
  • The second largest eigenvalue determines the rate
    of convergence of the chain, and should be as
    small as possible.

Stationary distribution
Eigenvalue v1 always 1
12
Metropolis-Hastings algorithm
  • The MH algorithm
  • The most popular MCMC method
  • Invariant distgribution p(x)
  • Proposal distribution q(xx)
  • Candidate value x
  • Acceptance probability A(x,x)
  • Kernel K
  • .

13
Metropolis-Hastings algorithm
  • Results of running the MH algorithm
  • Target distribution

Proposal distribution
14
Metropolis-Hastings algorithm
  • Different choices of the proposal standard
    deviation
  • MH requires careful design of the proposal
    distribution.
  • If is narrow, only 1 mode of p(x) might be
    visited.
  • If is too wide,
  • the rejection rate can be high.
  • If all the modes are visited while
  • the acceptance probability is high,
  • the chain is said to mix well.

15
Mixture and cycles of MCMC kernels
  • Mixture and cycle
  • It is possible to combine several samplers into
    mixture and cycles of individual samplers.
  • If transition kernels K1, K2 have invariant
    distribution, then cycle hybrid kernel and
    mixture hybrid kernel are also transition kernels.

16
Mixture and cycles of MCMC kernels
  • Mixtures of kernels
  • Incorporate global proposals to explore vast
    region of the state space and local proposals to
    discover finer details of the target
    distribution.
  • -gt target distribution with many narrow peaks
  • ( reversible jump MCMC algorithm)

17
Mixture and cycles of MCMC kernels
  • Cycles of kernels
  • Split a multivariate state vector into components
    (block) -gt It can be updated separately.
  • -gt Blocking highly correlated variables
  • ( Gibbs sampling algorithm)

18
Auxiliary variable samplers
  • Auxiliary variable
  • It is often easier to sample from an augmented
    distribution p(x,u), where u is an auxiliary
    variable.
  • It is possible to obtain marginal samples x by
    sampling (x, u), and ignoring the sample u.
  • Hybrid Monte Carlo (HMC)
  • Use gradient approximation
  • Slice sampling

19
Adaptive MCMC
  • Adaptive selection of proposal distribution
  • The variance of proposal distribution is
    important.
  • To automate the process of choosing the proposal
    distribution as much as possible.
  • Problem
  • Adaptive MCMC can disturb the stationary
    distribution.
  • Gelfand and Sahu(1994)
  • Station distribution is disturbed despite the
    fact that each participating kernel has the same
    stationary distribution.
  • Avoidance
  • Carry out adaptation only initial fixed number of
    step.
  • Parallel chains
  • And so on
  • -gt inefficient, much more research is
    required.

20
Other application of MCMC
  • Simulated annealing method for global
    optimization
  • To find global maximum of p(x)
  • Monte Carlo EM
  • To find fast approximation for E-step
  • Sequential Monte Carlo method and particle
    filters
  • To carry out on-line approximation of probability
    distributions using samples.
  • -gtusing parallel sampling

21
Convergence problem and Trick of MCMC
  • Convergence problem
  • Determining the length of the Markov chain is a
    difficult task.
  • Trick
  • Initial set problem (for starting biases)
  • Discards an initial set of samples (Burn-in)
  • Set initial sample value manually.
  • Markov chain test
  • Apply several graphical and statistical tests to
    assess if the chain has stabilized.
  • -gt It doesnt provide entirely satisfactory
    diagnostics.
  • Study about convergence problem

22
Remained problems
  • Large dimension model
  • The combination of sampling algorithm with either
    gradient optimization or exact one.
  • Massive data set
  • A few solution based on importance sampling have
    been proposed.
  • Many and varied applications
  • -gt But there is still great room for innovation
    in this area.

23
Conclusion
  • MCMC
  • The Markov Chain Monte Carlo methods cover a
    variety of different fields and applications.
  • There are great opportunities for combining
    existing sub-optimal algorithms with MCMC in many
    machine learning problems.
  • Some areas are already benefiting from sampling
    methods include

Tracking, restoration, segmentation Probabilistic
graphical models Classification Data association
for localization Classical mixture models.
Write a Comment
User Comments (0)
About PowerShow.com