CSE 552 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

CSE 552

Description:

Define variable which has meaning of 'the probability of observations o1 through ot and ... Now we can define , the probability of being in state i at time t ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 26
Provided by: hos1
Category:
Tags: cse

less

Transcript and Presenter's Notes

Title: CSE 552


1
  • CSE 552
  • Hidden Markov Models for Speech Recognition
  • Spring, 2004
  • Oregon Health Science University
  • OGI School of Science Engineering
  • John-Paul Hosom
  • Lecture Notes for May 5
  • Gamma, Xi, and the Forward-Backward Algorithm

2
Review ? and ?
  • Define variable ? which has meaning of the
    probability of observations o1 through ot and
    being in state i at time t, given our HMM

Compute ? and P(O ?) with the following
procedure
Induction
Termination
3
Review ? and ?
  • In the same way that we defined ?, we can define
    ?
  • Define variable ? which has meaning of the
    probability of observations ot1 through oT,
    given that were in state i at time t, and
    given our HMM

Compute ? with the following procedure
Where a value of 1 is chosen arbitrarily (but
wont affect results) Induction
4
Forward Procedure Algorithm Example
  • Example hi

0.65
0.55
0.15
0.20
  • observed features o1 0.8 o2 0.8 o3
    0.2

?1(h)0.55 ?1(ay)0.0 ?2(h) 0.550.3
0.00.0 0.55 0.09075 ?2(ay) 0.550.7
0.00.4 0.15 0.05775
?3(h) 0.090750.3 0.057750.0 0.20
0.0054 ?3(ay) 0.090750.7 0.057750.4
0.65 0.0563
??3(i) 0.0617
5
Backward Procedure Algorithm Example
  • What are all ? values?

?3(h)1.0 ?3(ay)1.0
?2(h) 0.30.201.0 0.70.651.0
0.515 ?2(ay) 0.00.201.0 0.40.651.0
0.260
?1(h) 0.30.550.515 0.70.150.260
0.1123 ?1(ay) 0.00.550.515 0.40.150.260
0.0156 ?0() 1.00.550.1123
0.00.150.0156 0.0618 ?0() ? ?3(i)
P(O?)
6
Probability of Gamma
  • Now we can define ?, the probability of being in
    state i at time t given an observation
    sequence and HMM.

also
, so
7
Probability of Gamma Illustration
Illustration what is probability of being in
state 2 at time 2?
b1(o3)
b1(o1)
b1(o2)
State 1
a21
a12
b2(o3)
b2(o1)
b2(o2)
a22
a22
State 2
a32
a23
b3(o3)
b3(o1)
b3(o2)
State 3
8
Gamma Example
  • Given this 3-state HMM and set of 4
    observations, what is probability of being in
    state A at time 2?

0.2
0.3
1.0
0.8
0.7
1.0
A
C
B
1.0
0.0
0.0
1.0
O 0.2 0.3 0.4 0.5
9
Gamma Example
1. Compute forward probabilities up to time 2
10
Gamma Example
2. Compute backward probabilities for times 4, 3,
2
11
Gamma Example
3. Compute ?
12
Xi
  • We can define one more variable ? is the
    probability of being in state i at time t,
    and in state j at time t1, given the
    observations and HMM
  • We can specify ? as follows

13
Xi Diagram
  • This diagram illustrates ?

b1(o4)
b1(o1)
b1(o3)
b1(o2)
a12
State 1
a21
a22
b2(o3)
b2(o2)
b2(o4)
b2(o1)
a22
a32
State 2
a23
b3(o4)
b3(o1)
b3(o3)
b3(o2)
State 3
a12b2(o3)
t
t1
t2
t-1
?2(1)
?3(2)
14
Xi Example 1
  • Given the same HMM and observations as before,
    what is ?2(A,B)?

15
Xi Example 2
  • Given this 3-state HMM and set of 4
    observations, what is the expected number of
    transitions from B to C?

0.2
0.3
1.0
0.8
0.7
1.0
A
C
B
1.0
0.0
0.0
1.0
O 0.2 0.3 0.4 0.5
16
Xi Example 2
17
Xi
  • We can also specify ? in terms of ?
  • and finally,
  • But why do we care??

18
How Do We Improve Estimates of HMM Parameters?
  • With the Expectation-Maximization algorithm,
    also known as the Baum-Welch method
  • In this case, we can use the following
    re-estimation formulae

19
How Do We Improve Estimates of HMM Parameters?
  • For discrete HMMs
  • After computing new model parameters, we
    maximize by substituting the new parameter
    values in place of the old parameter values
    and repeat.

20
How Do We Improve Estimates of HMM Parameters?
  • For continuous HMMs

jstate, kmixture component!!
p(being in state j from component k)
p(being in state j)
21
How Do We Improve Estimates of HMM Parameters?
  • For continuous HMMs

expected value of ot based on existing ?
expected value of diagonal of covariance
matrix based on existing ?
22
How Do We Improve Estimates of HMM Parameters?
  • EM called Baum-Welch, also called
    forward-backward algorithm
  • This process is guaranteed to converge
    monotonically to a maximum-likelihood
    estimate.
  • There may be many local maxima cant guarantee
    the process will reach globally best result.

23
Multiple Training Files
So far, weve implicitly assumed a single set of
observations for training. Most systems are
trained on multiple sets of observations (files).
This makes it necessary to use
accumulators. Initialize for each file compute
initial state boundaries (e.g. flat start) add
information to accumulator compute average,
standard deviation Update for each
iteration reset accumulators for each
file add information to accumulators compute
average, standard deviation update estimates
24
Viterbi Search Project Notes
  • Assume that any state can follow any other
    state this will greatly simplify the
    implementation.
  • Also assume that this is a whole-word
    recognizer, and that each word is recognized
    with a separate execution of the program.
    This will greatly simplify the implementation
  • Print out both the score for the utterance and
    the most likely state sequence from t1 to T

25
Viterbi Search Project Notes
  • the Normal p.d. f. returns probabilities??

techniques from multivariate calculus must be
used to show that
(Devore, p. 138)
Examples ot 2.0, ? 4.0, ? 5.0 N0.07365 ot
3.9, ? 4.0, ? 0.2 N1.76032 Conclusion whe
n ? is small and ot is near ?, N(ot, ?, ?)
yields likelihoods instead of probabilities.
Write a Comment
User Comments (0)
About PowerShow.com