Lecture 8: Hidden Markov Models (HMMs) - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Lecture 8: Hidden Markov Models (HMMs)

Description:

Originally presented at Yaakov Stein's DSPCSP Seminar, spring 2002. Modified by Benny Chor, using also some ... States Rainy:1, Cloudy:2, Sunny:3. Matrix A ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 35
Provided by: shlo1
Category:

less

Transcript and Presenter's Notes

Title: Lecture 8: Hidden Markov Models (HMMs)


1
Lecture 8 Hidden Markov Models
(HMMs)
Prepared by
  • Michael Gutkin
  • Shlomi Haba

Originally presented at Yaakov Steins DSPCSP
Seminar, spring 2002
Modified by Benny Chor, using also some slides of
Nir Friedman (Hebrew Univ.), for the
Computational Genomics Course, Tel-Aviv Univ.,
Dec. 2002
2
Outline
  • Discrete Markov Models
  • Hidden Markov Models
  • Three major questions
  • Q1. Computing the probability of a given
    observation.
  • A1. Forward Backward (Baum Welch) DP
    algorithm.
  • Q2. Computing the most probable sequence,
    given an observation.
  • A2. Viterbi DP Algorithm
  • Q3. Given an observation, learn best model.
  • A3. Expectation Maximization (EM) A
    Heuristic.

3
Markov Models
  • A discrete (finite) system
  • N distinct states.
  • Begins (at time t1) in some initial state.
  • At each time step (t1,2,) the system moves
  • from current to next state (possibly the same
    as
  • the current state) according to transition
  • probabilities associated with current state.
  • This kind of system is called aDiscrete Markov
    Model

4
Discrete Markov Model
  • Example Discrete Markov Model with 5 states
  • Each of the aij represents the probability of
    moving from state i to state j
  • The aij are given in a matrix A aij
  • The probability to start in a given state i is
    pi , The vector p represents these
  • start probabilities.

5
Types of Models
  • Ergodic model
  • Strongly connected - directed
  • path w/ positive probabilities
  • from each state i to state j
  • (but not necessarily complete directed graph)

6
Types of Models (cont.)
  • Left-to-Right (LR) model
  • Index of state non-decreasing with time

7
Discrete Markov Model - Example
  • States Rainy1, Cloudy2, Sunny3
  • Matrix A
  • Problem given that the weather on day 1 (t1)
    is sunny(3), what is the probability for the
    observation O

8
Discrete Markov Model Example (cont.)
  • The answer is -

9
Hidden Markov Models (probabilistic
finite state automata)
  • Often we face scenarios where states cannot be
    directly observed.
  • We need an extension Hidden Markov Models

aij are state transition probabilities. bik are
observation (output) probabilities.
Observed phenomenon
b11 b12 b13 b14 1, b21 b22 b23 b24
1, etc.
10
Example Dishonest Casino
Actually, what is hidden in this model?
11
Biological Example CpG islands
  • In human genome, CpG dinucleotides are relatively
    rare
  • CpG pairs undergo a process called methylation
    that modifies the C nucleotide
  • A methylated C can (with relatively high
    probability) mutate to a T
  • Promoter regions are CpG rich
  • These regions are not methylated, and thus mutate
    less often
  • These are called CpG islands

12
CpG Islands
  • We construct two Markov chains One for CpG
    rich, one for CpG poor regions.
  • Using observations from 60K nucleotide, we get
    two models, and - .

13
HMMs Question I
  • Given an observation sequence O (O1 O2 O3
    OT), and a model M A, B, p , how do we
    efficiently compute P(OM), the probability that
    the given model M produces the observation O in a
    run of length T ?
  • This probability can be viewed as a measure of
    the
  • quality of the model M. Viewed this way, it
    enables discrimination/selection among
    alternative models.

14
HMM Question II (Harder)
  • Given an observation sequence, O (O1 O2 O3
    OT), and a model, M A, B, p , how do we
    efficiently compute the most probable sequence(s)
    of states, Q?
  • That is, the sequence of states Q (Q1 Q2 Q3
    QT) , which maximizes P(OQ,M), the probability
    that the given model M produces the given
    observation O when it goes through the specific
    sequence of states Q .
  • Recall that given a model M, a sequence of
    observations O, and a sequence of states Q, we
    can efficiently compute P(OQ,M) (should watch
    out for numeric underflows)

15
HMM Question III (Hardest)
  • Given an observation sequence O (O1 O2 O3
    OT), and a
  • class of models, each of the form M A,
    B, p , which
  • specific model best explains the
    observations?
  • A solution to question I enables the efficient
    computation
  • of P(OM) (the probability that a specific
    model M produces
  • the observation O).
  • Question III can be viewed as a learning problem
    We
  • want to use the sequence of observations
    in order to train an HMM and learn the optimal
    underlying model
  • parameters (transition and output
    probabilities).

16
HMM Recognition (question I)
  • For a given model M A, B, p and a given
    state sequence
  • Q1 Q2 Q3 QT ,, the probability of an
    observation sequence
  • O1 O2 O3 OT is P(OQ,M) bQ1O1
    bQ2O2 bQ3O3 bQTOT
  • For a given hidden Markov model M A, B, p
  • the probability of the state sequence Q1 Q2 Q3
    QT
  • is (the initial probability of Q1 is taken to be
    pQ1)
  • P(QM) pQ1 aQ1Q2 aQ2Q3 aQ3Q4
    aQT-1QT
  • So, for a given hidden Markov model, M
  • the probability of an observation sequence O1 O2
    O3 OT
  • is obtained by summing over all possible state
    sequences

17
HMM Recognition (cont.)
  • P(O M) S P(OQ) P(QM)
  • SQ pQ1 bQ1O1 aQ1Q2 bQ2O2 aQ2Q3 bQ2O2
  • Requires summing over exponentially many paths
  • But can be made more efficient

18
HMM Recognition (cont.)
T
  • Why isnt it efficient? O(2TQ )
  • For a given state sequence of length T we have
    about 2T calculations
  • P(QM) pQ1 aQ1Q2 aQ2Q3 aQ3Q4 aQT-1QT
  • P(OQ) bQ1O1 bQ2O2 bQ3O3 bQTOT
  • There are Q possible state sequence
  • So, if Q5, and T100, then the algorithm
    requires 2 100 5 1.6 10 computations
  • We can use the forward-backward (F-B) algorithm

T
100
72
x
x
x
19
The F-B Algorithm
  • Some definitions
  • 1. Legal final state a state at which a path
    through the model may end.
  • 2. a - a forward-going
  • 3. b a backward-going
  • 4. a(ji) aij b(Oi) biO
  • 5. O the observation O1O2Ot in times 1,2,,t
    (O1 on t1, O2 on t2, etc.)

t
1
20
The F-B Algorithm (cont.)
  • a can be recursively calculated
  • Stopping condition
  • Moving from state i to state j
  • But we can enter state j from all others states

21
The F-B Algorithm (cont.)
  • Now we can work sequentially
  • And on time tT we get what we wanted -

22
The F-B Algorithm (cont.)
  • The full algorithm

Run Demo
23
The F-B Algorithm (cont.)
  • The likelihood is measured using any sequence of
    states of length T
  • This is known as the Any Path Method
  • We can choose an HMM by the probability generated
    using the best possible sequence of states
  • Well refer to this method as the Best Path
    Method

24
Most Probable States Sequence (ques. II)
  • Idea
  • If we know the value of Qi , then the most
    probable sequence on i1,,n does not depend on
    observations before time i
  • Let Vl(i) be the probability of the best sequence
    Q1,,Qi such that Qi l

25
Viterbi Algorithm
  • A DP problem
  • Grid
  • X frame index, t (time)
  • Q State index, i
  • Constraints
  • Every path must advance in time by one, and only
    one, time step for each path segment
  • Final grid points on any path must be of the form
    (T, if ), where if is a legal final state in a
    model

26
Viterbi Algorithm (cont.)
  • Cost
  • Node (t,i) the probability to emit the
    observation y(t) on state i biy
  • Transition from (t-1,i) to (t,j) the
    probability to change state from i to j aij
  • The total cost associated with the path is given
    by the product of the costs (type B)
  • Initial Transition cost a0i pi
  • Goal
  • The best path will be the one of maximum cost

27
Viterbi Algorithm (cont.)
  • We can use the trick of taking negative
    logarithms
  • Multiplications of probabilities are expansive
    and numerically problematic
  • Sums of numerically stable numbers are simpler
  • The problem is turned into a minimal-cost path
    search

28
Viterbi Algorithm (cont.)

29
HMM EM Training
  • Using the Baum-Welch algorithm
  • Is an EM algorithm
  • Estimate approximate the result
  • Maximize and if needed, re-estimate
  • The estimation algorithm is based on DP
    algorithms (F-B Viterbi)

30
HMM EM Training (cont.)
  • Initializing
  • Begin with an arbitrary model M
  • Estimate
  • Evaluate the likelihood P(OM)
  • Along the way, keep track of some tallies
  • Recalculate the matrixes A and B
  • e.g, aij
  • Maximize
  • If P(OM) P(OM) e, re-estimate with MM
  • Use several initial models to find a favorable
    local maximum of P(OM)

number of transitions from i to j
number of transitions exiting state i
31
HMM Training (cont.)
  • Why a local maximum?

32
Auxiliary
Physiology
Model
33
Auxiliary cont.
Articulation
34
Auxiliary cont.
Spectrogram
Write a Comment
User Comments (0)
About PowerShow.com