Chap 8' Hidden Markov Model - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Chap 8' Hidden Markov Model

Description:

1. Chap 8. Hidden Markov Model. HMM was the most powerful statistical method for speech. ... Hidden Markov Model. In the Markov chain, each state correspond to ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 29

Provided by: yih8

Category:

more less

Transcript and Presenter's Notes

Title: Chap 8' Hidden Markov Model

1
Chap 8. Hidden Markov Model

HMM was the most powerful statistical method for
speech.
If we treat the speech recognition problem as a
template (pattern) matching problem, the most
difficult problem in speech recognition is the
Time alignment and normalization of speech
signal.
Dynamic Time Warping (DTW), dynamic programming

Test template
Reference templates
2
DTW

Let
The global pattern dissimilarity measure
Find the optimal path

DTW ?Viterbi search in Trellis
Complexity
Search constraint in DTW.

4
Markov chain

If X is first order Markov chain, ie.
Then

Probabilistic finite state machine.
Example
Find the probability of
a specified sequence

Average state duration
6

Example 1
Example 2

7
Hidden Markov Model

In the Markov chain, each state correspond to a
deterministic observation.
HMM the observation is probabilistic function
of state
Example

Notations in HMM

Using HMM for speech production-
Signal in same state have the same statistics
phone in speech
The state machine template structure of speech
3 problems for HMM
(1) The evaluation problem (Recognition)
When the models are known, for a observation
sequence
Find
(2) The decoding problem (Segmentation)
When the models are known, for a observation
sequence
Find
(3) The learning problem (Training)
Find

Problem 1
The evaluation problem (Recognition)
How many computation need?
Is there a more efficient way to find the
likelihood function?
? Forward-Backward method!

Forward algorithm

Problem 2
The decoding problem (Segmentation)

survivor
13

Leave survivor in each node
Complexity ? no. of states in trellis time

Problem 3
The learning problem (Training)
estimated the parameter in HMM, Baum-Welch
algorithm

Find the prob. transition from state i to j at
time t
State occupied prob.

Parameter estimation using EM algorithm

And, the A and B have the constraints

Multiple Training data
Multiple Models
The same

19
Continuous HMM

The observation prob. is continuous mixture
Gaussian pdf.
Similar,

And,
where

21
Semicontinuous HMM (SCHMM)

Continuous observation pdf mixture Gaussian.
Same mixtures were used for all states, but
different weights.
Parameter estimation will be more accuracy.
Observation pdf
Parameters estimation

22
ASR for continuous speech

One-state (one-pass) Algorithm
Only 1 survivor
leave in each frame
? Frame synchronous

Algorithm for one-state recognizer

24
Segmental K-mean

If we change the state occupy prob. into delta
function
And, use force alignment (Viterbi decoding) to
find the state sequence
? segmental K-mean

25
Implementation of HMM

HMM Types left-to-right one step forward only
or allow 1-state skip
word graph
Parameters in HMM Models
(1) How many words need to recognize?
(2) Training data?
(3) Speaker dependent/independent.
(3) Sub-word model less training data need!
(4) Unit selection
phone for spelling language like English,
initial/final for Mandarin.
(5) Number of states in sub-unit 3 states for
phone-level unit,
3/5 for initial/final.

Initial model of HMM
(1) Hand labeling (to word or phone level)
(2) Uniform segmentation (inside word/phone
level)
(3) Using old HMM model to do force alignment
do the decoding problem - find the segmentation
Number of mixtures in CHMM
(1) the inside recognition rate will increase
when the number of mixtures
increased. (overfit)
(2) the state number will depends on the number
of training samples
(3) degenerate case if the number of training
samples is too low
combined the similar model/state
example zh(?),z(?)using the same models
model/state tying
(4) Gender dependent model.
(5) left/right dependent models (di/tri-phone
models).

27
Normalization in Baum-Welch algorithm

In Baum-Welch algorithm
may overflow.
Normalization -

28
Duration model in HMM

In fact, in HMM model, the observation
probability B is the most important item. The
transition probability A is usually been ignored,
because the dynamic range of b() is usually much
larger than a.
And, the effect of transition probability A can
be replace by the duration model.
Multiply a Gamma duration pdf in state transition
The log(?) can be found numerically.
The parameter of the Gamma pdf