HMM - Part 2 - PowerPoint PPT Presentation

About This Presentation
Title:

HMM - Part 2

Description:

Simple optimization algorithms for likelihood functions rely on the ... [Minka 1998] 4. objective function. current guess. Form an Initial Guess of =(A,B, ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 35
Provided by: whm
Category:
Tags: hmm | minka | part

less

Transcript and Presenter's Notes

Title: HMM - Part 2


1
HMM - Part 2
  • The EM algorithm
  • Continuous density HMM

2
The EM Algorithm
  • EM Expectation Maximization
  • Why EM?
  • Simple optimization algorithms for likelihood
    functions rely on the intermediate variables,
    called latent dataFor HMM, the state sequence is
    the latent data
  • Direct access to the data necessary to estimate
    the parameters is impossible or difficultFor
    HMM, it is almost impossible to estimate (A, B,
    ?) without considering the state sequence
  • Two Major Steps
  • E step computes an expectation of the likelihood
    by including the latent variables as if they were
    observed
  • M step computes the maximum likelihood estimates
    of the parameters by maximizing the expected
    likelihood found in the E step

3
Three Steps for EM
  • Step 1. Draw a lower bound
  • Use the Jensens inequality
  • Step 2. Find the best lower bound ? auxiliary
    function
  • Let the lower bound touch the objective function
    at the current guess
  • Step 3. Maximize the auxiliary function
  • Obtain the new guess
  • Go to Step 2 until converge

Minka 1998
4
Form an Initial Guess of ?(A,B,?)
objective function
current guess
5
Step 1. Draw a Lower Bound
objective function
lower bound function
6
Step 2. Find the Best Lower Bound
objective function
lower bound function
7
Step 3. Maximize the Auxiliary Function
objective function
auxiliary function
8
Update the Model
objective function
9
Step 2. Find the Best Lower Bound
objective function
auxiliary function
10
Step 3. Maximize the Auxiliary Function
objective function
11
Step 1. Draw a Lower Bound (contd)
Objective function
If f is a concave function, and X is a r.v.,
then Ef(X) f(EX)
Apply Jensens Inequality
12
Step 2. Find the Best Lower Bound (contd)
  • Find that makes
  • the lower bound function
  • touch the objective function
  • at the current guess

13
Step 2. Find the Best Lower Bound (contd)
Set it to zero
14
Step 2. Find the Best Lower Bound (contd)
15
EM for HMM Training
  • Basic idea
  • Assume we have ? and the probability that each Q
    occurred in the generation of O
  • i.e., we have in fact observed a complete
    data pair (O,Q) with frequency proportional to
    the probability P(O,Q?)
  • We then find a new that maximizes

  • It can be guaranteed that
  • EM can discover parameters of model ? to maximize
    the log-likelihood of the incomplete data,
    logP(O?), by iteratively maximizing the
    expectation of the log-likelihood of the complete
    data, logP(O,Q?)

16
Solution to Problem 3 - The EM Algorithm
  • The auxiliary function
  • where and
    can be expressed as

17
Solution to Problem 3 - The EM Algorithm (contd)
  • The auxiliary function can be rewritten as

18
Solution to Problem 3 - The EM Algorithm (contd)
  • The auxiliary function is separated into three
    independent terms, each respectively corresponds
    to , , and
  • Maximization procedure on can be
    done by maximizing the individual terms
    separately subject to probability constraints
  • All these terms have the following form

19
Solution to Problem 3 - The EM Algorithm (contd)
  • Proof Apply Lagrange Multiplier

Constraint
20
Solution to Problem 3 - The EM Algorithm (contd)
21
Solution to Problem 3 - The EM Algorithm (contd)
22
Solution to Problem 3 - The EM Algorithm (contd)
23
Solution to Problem 3 - The EM Algorithm (contd)
  • The new model parameter set
    can be expressed as

24
Discrete vs. Continuous Density HMMs
  • Two major types of HMMs according to the
    observations
  • Discrete and finite observation
  • The observations that all distinct states
    generate are finite in number, i.e., Vv1, v2,
    v3, , vM, vk?RL
  • In this case, the observation probability
    distribution in state j, Bbj(k), is defined as
    bj(k)P(otvkqtj), 1?k?M, 1?j?Not
    observation at time t, qt state at time t
  • ? bj(k) consists of only M probability values
  • Continuous and infinite observation
  • The observations that all distinct states
    generate are infinite and continuous, i.e., Vv
    v?RL
  • In this case, the observation probability
    distribution in state j, Bbj(v), is defined as
    bj(v)f(otvqtj), 1?j?Not observation at
    time t, qt state at time t
  • ? bj(v) is a continuous probability density
    function (pdf) and is often a mixture of
    Multivariate Gaussian (Normal) Distributions

25
Gaussian Distribution
  • A continuous random variable X is said to have a
    Gaussian distribution with mean µand variance
    s2(sgt0) if X has a continuous pdf in the
    following form

26
Multivariate Gaussian Distribution
  • If X(X1,X2,X3,,XL) is an L-dimensional random
    vector with a multivariate Gaussian distribution
    with mean vector ? and covariance matrix ?, then
    the pdf can be expressed as
  • If X1,X2,X3,,XL are independent random
    variables, the covariance matrix is reduced to
    diagonal, i.e.,

27
Multivariate Mixture Gaussian Distribution
  • An L-dimensional random vector X(X1,X2,X3,,XL)
    is with a multivariate mixture Gaussian
    distribution if
  • In CDHMM, bj(v) is a continuous probability
    density function (pdf) and is often a mixture of
    multivariate Gaussian distributions

28
Solution to Problem 3 The Segmental K-means
Algorithm
  • Assume that we have a training set of
    observations and an initial estimate of model
    parameters
  • Step 1 Segment the training data
  • The set of training observation sequences is
    segmented into states, based on the current
    model, by Viterbi Algorithm
  • Step 2 Re-estimate the model parameters
  • Step 3 Evaluate the model If the difference
    between the new and current model scores exceeds
    a threshold, go back to Step 1 otherwise, return

29
Solution to Problem 3 The Segmental K-means
Algorithm (contd)
  • 3 states and 4 Gaussian mixtures per state

State
s3
s3
s3
s3
s3
s3
s3
s3
s3
s2
s2
s2
s2
s2
s2
s2
s2
s2
s1
s1
s1
s1
s1
s1
s1
s1
s1
1 2 N
O1
O2
ON
?12,?12,c12
?11,?11,c11
K-means
Global mean
Cluster 1 mean
Cluster 2mean
?13,?13,c13
?14,?14,c14
30
Solution to Problem 3 The Intuitive View
(CDHMM)
  • Define a new variable ?t(j,k)
  • probability of being in state j at time t with
    the k-th mixture component accounting for ot

31
Solution to Problem 3 The Intuitive View
(CDHMM) (contd)
  • Re-estimation formulae for
    are

32
A Simple Example
The Forward/Backward Procedure
S1
S1
S1
State
S2
S2
S2
1 2 3 Time
o1
o2
o3
33
A Simple Example (contd)









q 1 1 1
q 1 1 2
Total 8 paths
34
A Simple Example (contd)
back
Write a Comment
User Comments (0)
About PowerShow.com