Loading...

PPT – Markov Chain Monte Carlo and Gibbs Sampling PowerPoint presentation | free to download - id: 56ca5b-ZTJhO

The Adobe Flash plugin is needed to view this content

Markov Chain Monte Carlo and Gibbs Sampling

- Vasileios Hatzivassiloglou
- University of Texas at Dallas

Markov chains

- A subtype of random walks (not necessarily

uniform) where the entire memory of the system is

contained in the current state - Described by a transition matrix P, where pij is

the probability of going from state i to state j - Very useful for describing stochastic discrete

systems

Markov chain example

pCC

pAC

pTC

pCT

pCG

pCA

pAT

pTA

pGA

pGT

pGC

pAA

pTT

pTG

pAG

pGG

Example application of Markov chains

- Markov chains model dependencies across time or

position - The model assigns a probability to each sequence

of observed data and can be used to measure how

likely an observed sequence is to follow it - The GeneMark algorithm uses 5th-order Markov

chains (why?) to find genes (distinguishing them

from other regions in the DNA)

Markov Chains Stationary Distribution

- Under general assumptions (irreducibility and

aperiodicity), Markov chains have a stationary

distribution p, the limit of Pk as k goes to

infinity - An irreducible MC has no identical states
- An aperiodic MC can reach any state from any

other state - Such a Markov chain is called ergodic

Markov Chain Monte Carlo

- Used as a means to guide the selection of samples
- We want the stationary distribution of the Markov

chain to be the distribution we are sampling from - In many cases, this is relatively easy to do

Gibbs sampling

- A special case of MCMC where the conditional

probabilities on specific variables can be

calculated easily, but the joint probability must

be sampled from

Gibbs sampling in our problem

- Start with a single candidate SS1,...,Sk where

each Si chosen randomly and uniformly from input

sequence i - Calculate A and D(AB) for S
- Choose one member of S randomly to remove
- Choose an alternative (from the corresponding

sequence) with probability proportional to the

corresponding D(AB) - Repeat until D(AB) converges

Exploring alternative strings

- When we replace a string from sequence i
- We examine in turn each of the m-n1 strings that

that sequence could offer - For each such string, we add it temporarily to S

and calculate the new A and D(AB) - Then we assign to each string Sij (j varies

across these strings) probability - May pick a worse string, or the same string we

just removed

Gibbs sampler convergence

- Return best S seen across all iterations (may not

be the last one) - Stop after a fixed number of iterations, or when

D(AB) does not change very much - Solution is sensitive to the starting S, so we

typically run the algorithm several (thousand)

times from different starting points

Complexity of Gibbs sampler

- Construct initial S and calculate A and D(AB)

in O(kn) time - Each iterative step takes O(n) time to remove a

string and recalculate D(AB), O(mn) time to

calculate the probabilities of the m-n1

alternatives - Total time is O(mnd) where d is the number of

iteration steps (dmgtgtk), multiplied by the number

of random restarts

Why Gibbs sampling works

- Retains elements of the greedy approach
- weighing by relative entropy makes likely to move

towards locally better solutions - Allows for locally bad moves with a small

probability, to escape local maxima

Variations in Gibbs sampling

- Discard substrings non-uniformly (weighed by

relative entropy, analogous to subsequent

selection of new string) - Use simulated annealing to reduce chance of

making a bad move (and gradually ensure

convergence)

Annealing

- Annealing is a process in metallurgy for

improving metals by increasing crystal size and

reducing defects - The process works by heating the metal and

controlled cooling which lets the atoms go

through a series of states with gradually lower

internal energy - The goal is to have the metal settle in a

configuration with lower than the original

internal energy

Simulated Annealing

- Simulated annealing (SA) adopts an energy

function equal to the function we want to

minimize - Transitions between neighboring states are

adopted with probability specified by a function

f of the energy gain ?EEnew-Eold and the

temperature T - f(?E,T)gt0 even if ?Egt0, but as T?0, f(?E,T)?1 if

?Elt0 (or to c?E for some constant c) and

f(?E,T)?0 for ?Egt0

Simulated Annealing

- Original f (from the Metropolis-Hasting

algorithm) - T controls acceptance of locally bad solutions
- The annealing schedule is a process for gradually

reducing T so that eventually only good moves are

accepted

Special cases

- If T is always zero,
- simulated annealing reduces to greedy local

optimization - If T is constant but non-zero,
- simulated annealing reduces to the process we

described for Gibbs sampling (choose solutions

randomly with probability proportional to their

improvement)