A Survey of Large Margin Hidden Markov Model - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

A Survey of Large Margin Hidden Markov Model

Description:

A Survey of Large Margin Hidden Markov Model. Xinwei Li, Hui Jiang. York ... of all models at the same time, only one selected model will be adjusted in each ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 27
Provided by: Ryan85
Category:

less

Transcript and Presenter's Notes

Title: A Survey of Large Margin Hidden Markov Model


1
A Survey of Large Margin Hidden Markov Model
  • Xinwei Li, Hui Jiang
  • York University

2
Reference Papers
  • Xinwei Li M.S. thesis Sep. 2005, Large
    Margin HMMs for SR
  • Xinwei Li ICASSP 05, Large Margin HMMs for
    SR
  • Chaojun Liu ICASSP 05, Discriminative
    training of CDHMMs for Maximum Relative
    Separation Margin
  • Xinwei Li ASRU 05, A constrained joint
    optimization method for LME
  • Hui Jiang SAP 2006, Large Margin HMMs for
    SR
  • Jinyu Li ICSLP 06, Soft Margin Estimation of
    HMM parameters

3
Outline
  • Large Margin HMMs
  • Analysis of Margin in CDHMM
  • Optimization methods for Large Margin HMMs
    estimation
  • Soft Margin Estimation for HMM

4
Large Margin HMMs for ASR
  • In ASR, given any speech utterance ?, a speech
    recognizer will choose the word W as output based
    on the plug-in MAP decision rule as follows
  • For a speech utterance Xi, assuming its true word
    identity as Wi, the multiclass separation margin
    for Xi is defined as

Discriminant function
O denotes the set of all possible words
5
Large Margin HMMs for ASR
  • According to the statistical learning theory
    Vapnik, the generalization error rate of a
    classifier in new test sets is theoretically
    bounded by a quantity related to its margin
  • Motivated by the large margin principle, even for
    those utterances in the training set which all
    have positive margin, we may still want to
    maximize the minimum margin to build an HMM-based
    large margin classifier for ASR

6
Large Margin HMMs for ASR
  • Given a set of training data D X1, X2,,XT,
    we usually know the true word identities for all
    utterances in D, denoted as L W1, W2,,WT
  • First, from all utterances in D, we need to
    identify a subset of utterances S as
  • We call S as support vector set and each
    utterance in S is called a support token which
    has relatively small positive margin among all
    utterances in the training set D

where egt 0 is a preset positive number
7
Large Margin HMMs for ASR
  • This idea leads to estimating the HMM models ?
    based on the criterion of maximizing the minimum
    margin of all support tokens, which is named as
    large margin estimation (LME) of HMM

8
Analysis of Margin in CDHMM
  • Adopt the Viterbi method to approximate the
    summation with the single optimal Viterbi path,
    the discriminant function can be expressed as

9
Analysis of Margin in CDHMM
  • Here, we only consider to estimate mean vectors

In this case, the discriminant functions can be
represented as a summation of some quadratic
terms related to mean values of CDHMMs
10
Analysis of Margin in CDHMM
  • As a result, the decision margin can be represent
    as a standard diagonal quadratic form
  • Thus, for each feature vector xit, we can divide
    all of its dimensions into two parts

we can see that each feature dimension
contributes to the decision margin separately
11
Analysis of Margin in CDHMM
  • After some math manipulation, we have

linear function
quadratic function
12
Analysis of Margin in CDHMM
13
Analysis of Margin in CDHMM
14
Analysis of Margin in CDHMM
15
Optimization methods for LM HMM estimation
  • An iterative localized optimization method
  • An constrained joint optimization method
  • Semidefinite programming method

16
Iterative localized optimization
  • In order to increase the margin unlimitedly while
    keeping the margins positive for all samples,
    both of the models must be moved together
  • if we keep one of the models fixed, the other
    model cannot be moved too far under the
    constraint that all samples must have positive
    margin
  • Otherwise the margin for some tokens will become
    negative
  • Instead of optimizing parameters of all models at
    the same time, only one selected model will be
    adjusted in each step of optimization
  • Then the process iterates to update another model
    until the optimal margin is achieved

17
Iterative localized optimization
  • How to select the target model in each step?
  • The model should be relevant to the support token
    with the minimum margin
  • The minimax optimization can be re-formulated as

18
Iterative localized optimization
  • Approximated by summation of exponential functions

19
Iterative localized optimization
20
Constrained Joint optimization
  • Introduce some constraints to make the
    optimization problem bounded
  • In this way, the optimization can be performed
    jointly with respect to all model parameters

21
Constrained Joint optimization
  • In order to bound the margin contribution from
    the linear part
  • In order to bound the margin contribution from
    the quadratic part

22
Constrained Joint optimization
  • Reformulate the large margin estimation as the
    following constrained minimax optimization
    problem

23
Constrained Joint optimization
  • The constrained minimization problem can be
    transformed into an unconstrained minimization
    problem

24
Constrained Joint optimization
25
Soft Margin estimation
  • Model separation measure and frame selection
  • SME objective function and sample selection

26
Soft Margin estimation
  • Difference between SME and LME
  • LME neglects the misclassified samples.
    Consequently, LME often needs a very good
    preliminary estimate from the training set
  • SME works on all the training data, both the
    correctly classified and misclassified samples
  • While SME must first choose a margin ?
    heuristically
Write a Comment
User Comments (0)
About PowerShow.com