Ch%205b:%20Discriminative%20Training%20(temporal%20model) - PowerPoint PPT Presentation

About This Presentation
Title:

Ch%205b:%20Discriminative%20Training%20(temporal%20model)

Description:

The MCE framework is used for discriminative training (also MMI is possible) ... How to merge the discriminative power of LVQ with the sequential modeling ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 22
Provided by: iaho
Category:

less

Transcript and Presenter's Notes

Title: Ch%205b:%20Discriminative%20Training%20(temporal%20model)


1
Ch 5b Discriminative Training (temporal model)
  • 14.2.2002 Ilkka Aho

2
Abbreviations
  • MCE Minimum Classification Error
  • MMI Maximum Mutual Information
  • STLVQ Shift-Tolerant Learning Vector
    Quantization
  • TDNN Time-Delay Neural Network
  • HMM Hidden Markov Model
  • DP Dynamic Programming
  • DTW Dynamic Time Warping
  • GPD Generalized Probabilistic Descent
  • PBMEC Prototype-Based Minimum Error Classifier

3
Basics
  • Prototype-based methods use class representatives
    (sample or an average of samples) to classify new
    patterns
  • The MCE framework is used for discriminative
    training (also MMI is possible)
  • A central concern is the design or learning of
    prototypes that will yield good classification
    performance

4
STLVQ for Speech Recognition
  • LVQ algorithm in its basic form is a method for
    static pattern recognition
  • STLVQ handles a stream of dynamically varying
    patterns (fig. 1.)
  • STLVQ is much simpler than TDNN model, but
    yielded very good results on the same phoneme
    recognition tasks

5
Figure 1. STLVQ system architecture.
6
Limitations and Strengths of STLVQ
  • STLVQ assumes only a single phoneme as an input
    token
  • Training and testing datasets are obtained from
    manually labeled speech databases
  • How to extend the phoneme recognition to word or
    sentence recognition?
  • LVQ is applied locally

7
Expanding the Scope of LVQ for Speech Recognition
  • Representation of longer speech sequences such as
    entire utterances
  • Global optimization
  • Application to continuos speech recognition
  • A need for some kind of time warping or
    normalization
  • How to merge the discriminative power of LVQ with
    the sequential modeling abilities of HMMs?
  • Two methods LVQ-HMM (fig. 2.) and HMM-LVQ (fig.
    3.)

8
Figure 2. LVQ-HMM architecture.
9
Figure 3. HMM-LVQ architecture.
10
MCE Interpretation of LVQ
  • A prototype-based implementation of the MCE
    framework
  • The LVQ classification rule is based on the
    Euclidean distance between a pattern vector and
    each category's reference vectors
  • The category of the nearest reference vector is
    given as the classification decision
  • Figures 4, 5 and 6 demonstrate the smoothness of
    MCE loss

11
Figure 4. Average empirical loss measured over 10
samples from a one- dimensional, two class
classification problem. The ideal zero-one
loss is used in calculating the overall loss.
12
Figure 5. Now a sigmoidal MCE loss, a 0.1, is
used in calculating the overall loss.
13
Figure 6. The same situation as in the figure 5.
except a 1.0 now.
14
Prototype-based Methods Using DP
  • DP is used to find the path through a grid of
    local matches between prototype and test sample
    frames that has the best overall score
  • When calculating the reference distance between
    the input utterance and the reference utterance
    it is more practical to use the top path or the
    top few paths than every single DP path possible
  • Nonlinear compressing and stretching prototypes
  • DTW is a specific application of DP techniques to
    speech processing

15
MCE-Trained Prototypes and DTW
  • The idea is to define the MCE loss in terms of a
    discriminant function that reflects the structure
    of a straightforward DTW-based recognizer
  • The loss function have to be continous and
    differentiable that some gradient-based
    optimization technique (for example GPD) can be
    used to minimize the overall loss
  • Also the loss function have to reflect
    classification performance
  • Good results in the Bell Labs E-set task and in
    phoneme recognition tasks

16
PBMEC
  • PBMEC models prototypes at a finer grain than
    MCE-trained DTW
  • PBMEC prototypes are modeled within phonetic or
    subphonetic states
  • Word models are formed by connecting different
    states together
  • Multi-state PBMEC (fig. 7.)
  • The discriminant function for a category is
    defined as the final accumulated score of the
    best DP path for that category (fig. 8.)
  • MCE-GPD update rule for PBMEC pulls the nearest
    reference vectors for the correct category closer
    to the input and pushes the nearest reference
    vectors for the incorrect category away
  • MCE-GPD in the context of speech recognition
    using phoneme models (fig. 9.)

17
Figure 7. Multi-state PBMEC architecture.
18
Figure 8. Final DP score.
19
Figure 9. DP segmentations for the words aida
and taira.
20
HMM design based on MCE
  • The prototype-like nature of HMMs
  • The MCE framework can be applied to HMMs in a
    very same way that in the case of the PBMEC model
  • HMM state likelihood and discriminant function
  • MCE misclassification measure and loss
  • Calculating of MCE Gradient for HMMs
  • There are a very large number of applications of
    MCE-trained HMMs
  • Some of the best context-independent results have
    been reported for the Texas Instruments-Massachuse
    tts Institute of Technology database

21
Homework Question
Explain the main differencies between following
methods in speech recognition
  • STLVQ
  • Prototype-based DP (DTW technique)
  • HMM design based on MCE
Write a Comment
User Comments (0)
About PowerShow.com