Hidden Markov Models (HMM) Rabiner - PowerPoint PPT Presentation

About This Presentation
Title:

Hidden Markov Models (HMM) Rabiner

Description:

The major difference from the forward algorithm: Maximization instead of sum ... xt(i,j) is the probability of being in state Si at time t, and Sj at time t 1 ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 30
Provided by: fatih7
Category:

less

Transcript and Presenter's Notes

Title: Hidden Markov Models (HMM) Rabiner


1
Hidden Markov Models (HMM)Rabiners Paper
  • Markoviana Reading Group
  • Computer Eng. Science Dept.
  • Arizona State University

2
Stationary and Non-stationary
  • Stationary ProcessIts statistical properties do
    not vary with time
  • Non-stationary ProcessThe signal properties
    vary over time

3
HMM Example - Casino Coin
0.9
Two CDF tables
0.2
0.1
Fair
Unfair
State transition Pbbties.
States
0.8
Symbol emission Pbbties.
0.5
0.3
0.5
0.7
Observation Symbols
H
H
T
T
Observation Sequence
HTHHTTHHHTHTHTHHTHHHHHHTHTHH
State Sequence
FFFFFFUUUFFFFFFUUUUUUUFFFFFF
Motivation Given a sequence of H Ts, can you
tell at what times the casino cheated?
4
Properties of an HMM
  • First-order Markov process
  • qt only depends on qt-1
  • Time is discrete

5
Elements of an HMM
  • N, the number of States
  • M, the number of Symbols
  • States S1, S2, SN
  • Observation Symbols O1, O2, OM
  • l, the Probability Distributions a, b, p

6
HMM Basic Problems
  • Given an observation sequence OO1O2O3OT and l,
    find P(Ol)
  • Forward Algorithm / Backward Algorithm
  • Given OO1O2O3OT and l, find most likely state
    sequence Qq1q2qT
  • Viterbi Algorithm
  • Given OO1O2O3OT and l, re-estimate l so that
    P(Ol) is higher than it is now
  • Baum-Welch Re-estimation

7
Forward Algorithm Illustration
at(i) is the probability of observing a partial
sequence O1O2O3Ot such that the state Si.
8
Forward Algorithm Illustration (contd)
at(i) is the probability of observing a partial
sequence O1O2O3Ot such that the state Si.
Total of this column gives solution
State Sj SN pNbN(O1) S (a1(i) aiN) bN(O2)
State Sj
State Sj S6 p6b6(O1) S (a1(i) ai6) b6(O2)
State Sj S5 p5b5(O1) S (a1(i) ai5) b5(O2)
State Sj S4 p4b4(O1) S (a1(i) ai4) b4(O2)
State Sj S3 p3b3(O1) S (a1(i) ai3) b3(O2)
State Sj S2 p2b2(O1) S (a1(i) ai2) b2(O2)
State Sj S1 p1b1(O1) S (a1(i) ai1) b1(O2)
State Sj at(j) O1 O2 O3 O4 OT
Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot
9
Forward Algorithm
Definition Initialization Induction Problem
1 Answer
at(i) is the probability of observing a partial
sequence O1O2O3Ot such that the state Si.
Complexity O(N2T)
10
Backward Algorithm Illustration
?t(i) is the probability of observing a partial
sequence Ot1Ot2Ot3OT such that the state Si.
11
Backward Algorithm
Definition Initialization Induction
?t(i) is the probability of observing a partial
sequence Ot1Ot2Ot3OT such that the state Si.
12
Q2 Optimality Criterion 1
  • Maximize the expected number of correct
    individual states
  • Definition
  • Initialization
  • Problem 2 Answer
  • ?t(i) is the probability of being in state Si at
    time t given the observation sequence O and the
    model ?.
  • Problem If some aij0, the optimal state
    sequence may not even be a valid state sequence.

13
Q2 Optimality Criterion 2
  • Find the single best state sequence (path),
    i.e. maximize P(QO,?).
  • Definition

dt(i) is the highest probability of a state path
for the partial observation sequence O1O2O3Ot
such that the state Si.
14
Viterbi Algorithm
The major difference from the forward
algorithm Maximization instead of sum
15
Viterbi Algorithm Illustration
dt(i) is the highest probability of a state path
for the partial observation sequence O1O2O3Ot
such that the state Si.
Max of this col indicates traceback start
State Sj SN pN bN(O1) max d1(i) aiN bN(O2)
State Sj
State Sj S6 p6 b6(O1) max d1(i) ai6 b6(O2)
State Sj S5 p5 b5(O1) max d1(i) ai5 b5(O2)
State Sj S4 p4 b4(O1) max d1(i) ai4 b4(O2)
State Sj S3 p3 b3(O1) max d1(i) ai3 b3(O2)
State Sj S2 p2 b2(O1) max d1(i) ai2 b2(O2)
State Sj S1 p1 b1(O1) max d1(i) ai1 b1(O2)
State Sj dt(j) O1 O2 O3 O4 OT
Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot
16
Relations with DBN
  • Forward Function
  • Backward Function
  • Viterbi Algorithm

bj(Ot1)
aij
?t(i)
?t1(j)
bj(Ot1)
aij
?t1(j)
?t(i)
?T(i)1
?t1(j)
bj(Ot1)
aij
?t(i)
17
Some more definitions
gt(i) is the probability of being in state Si at
time t
xt(i,j) is the probability of being in state Si
at time t, and Sj at time t1
18
Baum-Welch Re-estimation
  • Expectation-Maximization Algorithm
  • Expectation

19
Baum-Welch Re-estimation (contd)
  • Maximization

20
Notes on the Re-estimation
  • If the model does not change, it means that it
    has reached a local maxima.
  • Depending on the model, many local maxima can
    exist
  • Re-estimated probabilities will sum to 1

21
Implementation issues
  • Scaling
  • Multiple observation sequences
  • Initial parameter estimation
  • Missing data
  • Choice of model size and type

22
Scaling
  • calculation
  • Recursion to calculate

23
Scaling (contd)
  • calculation
  • Desired condition
  • Note that is not true!

24
Scaling (contd)
25
Maximum log-likelihood
  • Initialization
  • Recursion
  • Termination

26
Multiple observations sequences
  • Problem with re-estimation

27
Initial estimates of parameters
  • For ? and A,
  • Random or uniform is sufficient
  • For B (discrete symbol prb.),
  • Good initial estimate is needed

28
Insufficient training data
  • Solutions
  • Increase the size of training data
  • Reduce the size of the model
  • Interpolate parameters using another model

29
References
  • L Rabiner. A Tutorial on Hidden Markov Models
    and Selected Applications in Speech Recognition.
    Proceedings of the IEEE 1989.
  • S Russell, P Norvig. Probabilistic Reasoning
    Over Time. AI A Modern Approach, Ch.15, 2002
    (draft).
  • V Borkar, K Deshmukh, S Sarawagi. Automatic
    segmentation of text into structured records.
    ACM SIGMOD 2001.
  • T Scheffer, C Decomain, S Wrobel. Active Hidden
    Markov Models for Information Extraction.
    Proceedings of the International Symposium on
    Intelligent Data Analysis 2001.
  • S Ray, M Craven. Representing Sentence Structure
    in Hidden Markov Models for Information
    Extraction.  Proceedings of the 17th
    International Joint Conference on Artificial
    Intelligence 2001.
Write a Comment
User Comments (0)
About PowerShow.com