Digit%20Recognition%20Using%20the%20SPEECHDAT%20Corpus - PowerPoint PPT Presentation

About This Presentation
Title:

Digit%20Recognition%20Using%20the%20SPEECHDAT%20Corpus

Description:

Age, gender and region distribution are approximately equal in both train and ... Results may be improved through the use of discriminative training techniques ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 26
Provided by: frederico3
Category:

less

Transcript and Presenter's Notes

Title: Digit%20Recognition%20Using%20the%20SPEECHDAT%20Corpus


1
Robust Recognition of Digits and Natural Numbers
Frederico Rodrigues and Isabel Trancoso
INESC/IST, 2000
2
Summary
  • Problem overview
  • Baseline system
  • Extensions to the baseline system
  • Conclusions and future work

3
The Problem
4
Corpus Description
  • Multilingual telephone speech corpus
  • SPEECHDAT(M) 1000 speakers
  • SPEECHDAT(II) 4000 speakers
  • Orthographically transcribed including noise
    events

5
Noise events
  • spk Speaker related noises
  • sta Stationary noises
  • int Intermittent noises

6
(No Transcript)
7
Train and Test Set Definition
  • Selection procedure
  • Age, gender and region distribution are
    approximately equal in both train and test sets
  • SPEECHDAT II
  • Fixed 500 speakers evaluation set
  • Additional 300 speakers development set
  • SPEECHDAT(M)
  • 200 speakers evaluation set
  • Overall ratio of 80 Train/20 Test

8
Sub-corpus Used
  • I1 - Isolated digit strings
  • B1 - Sequences of 10 digits
  • N - Natural numbers

9
Feature Extraction
  • MFCC (Mel Frequency Cepstral Coefficients)
  • 14 Cepstra 14 ? Cepstra Energy ? Energy
  • Speech signal band-limited between 200 and 3800
    Hz
  • Hamming Window 25 ms each 10 ms
  • Cepstral Mean Substraction
  • Simple but effective technique for channel and
    speaker normalization

10
Acoustic Modeling
  • Left-right continuous density HMMs
  • Word models for each digit. No skips.
  • Silence and filler models with forward and
    backward skips
  • Gender dependent models

HMM Hidden Markov Model
11
Model Topology
Fillers and silence models topology
12
Baseline System - Isolated Digits
  • Choose isolated digits with no noise marks
  • HMM parameters initialized with the global mean
    and variance of the training data
  • Embedded Baum-Welch Reestimation
  • Evaluate performance withViterbi decoding
  • Grammar allowing one digit and initial and final
    silence
  • Grammar allowing one digit and any number of
    fillers or silence

13
Baseline System - Isolated Digits
14
Baseline System - Isolated Digits
  • Increment Gaussian mixtures per state up to 3 for
    the digit models
  • Introduce files with noise marks
  • Repeat re-estimation/evaluation process
  • Increment Gaussian mixtures per state up to 3 for
    the filler and digit models

15
Connected vs Isolated Digits
Example Number 3 1 2 6 said as Isolated
Digits t r e S u d o j S s 6 j S Connected
Digits t r e z u d o j S _ 6 j S
16
Baseline System - Connected Digits
  • Use best isolated digit models as bootstrap
    models
  • Repeat re-estimation/evaluation process
  • Increment gradually Gaussian mixtures per state
    up to 5 for the digit models

17
Baseline System - Results
18
Extension to the Baseline System
  • New way of modelling the filler models
  • Same training/evaluation process
  • Train the 9 filler and silence models with no
    skips
  • Build a unique filler model concatenating all
    filler and silence models

19
New Filler Model Arquitecture
20
Results With New Filler Model
21
Natural Numbers
  • Phone models with 3 states and no skips
  • Larger vocabulary size
  • May be adapted to other tasks
  • Phones initialized from models already trained
    for a directory assistance task
  • Digits are still modeled by word models
  • Grammar for natural numbers ranging from zero to
    hundreds of millions

22
Natural Numbers Example
Number 25 Hypothesis 1 vinte e cinco (Twenty
and five) Hypotesis 2 vinte cinco (Twenty
five) But vinte cinco could also be the
sequence of natural numbers 20 5
23
Natural Numbers - Results
24
Sample Application
25
Conclusions and Future Work
  • Explicitly modeling fillers is a difficult task
  • Improved filler model decreases error rate up to
    50
  • Develop context dependent models
  • Solve vowel reduction and co-articulation
    problems
  • Results may be improved through the use of
    discriminative training techniques
Write a Comment
User Comments (0)
About PowerShow.com