Automatic Speaker Recognition: Recent Progress, Current Applications, and Future Trends - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Speaker Recognition: Recent Progress, Current Applications, and Future Trends

Description:

Project Outline Implementation Project Modules Speech Database Speech Signal Analysis Hidden Markov Models + Training ... length file buffering Speech ... variables ... – PowerPoint PPT presentation

Number of Views:277
Avg rating:3.0/5.0
Slides: 24
Provided by: DAR1158
Category:

less

Transcript and Presenter's Notes

Title: Automatic Speaker Recognition: Recent Progress, Current Applications, and Future Trends


1
  • Seminar
  • Speech Recognition Projects
  • E.M. Bakker
  • LIACS Media Lab
  • Leiden University

2
Project Outline
  • Implementation
  • Project Modules
  • Speech Database
  • Speech Signal Analysis
  • Hidden Markov Models Training
  • Language Models Training
  • Recognition Algorithms
  • Evaluation

3
Implementation
  • A Safe C Programming Style
  • Not to be used in C
  • Syntax and Programming Style
  • Conventions
  • Basic Design Rules
  • Program Services
  • Memory Services
  • Diagnostics
  • Important Topics
  • Portability
  • Testing
  • Reliability

4
Implementation A Safe C Programming Style
  • Features to be avoided, or not to be used in C
  • C inherited features
  • if(c0), ?, , ,goto, break, continue, union,
    struct, bit-wise, ( !), int, short, double,
    unsigned, , --, explicit constant numbers,
    cast, variable argument lists
  • Preprocessor features
  • macros for constants, macros for functions,
    pragma, compiler/platform specific directives
  • Object Oriented
  • global data, global non-member functions, public
    data, friend, overloading operators_at_, ,...
  • Memory and pointer-related
  • pointers, new, delete, malloc, free(), pointers
    to functions, -gt, -gt ., const char, NULL, type
    ref - t, type count, type count, type
    count, type (count), type (count)
  • printf, scanf, assembly language, object passed
    by val and temporary objects

5
Implementation Syntax and Programming Style
  • Programs in plain English
  • Meaningful names
  • One statement per line
  • const for data and methods whenever possible
  • variables local whenever possible
  • private/protected data members whenever possible
  • do not use confusing syntax like
  • if (a)
  • for (I0Ilt4)
  • always use default in switch-statement
  • use assert in all the critical points

6
Implementation Conventions
  • Functions and methods My_Example_Function()
  • Variables my_example_var
  • Classes MyExampleClass
  • Constants MY_EXAMPLE_CONSTANT
  • In general meaningful names, except for indices
  • Comment
  • file-description
  • version history (bugs new functionality)
  • user information (user guide)
  • implementation information (reference guide)
  • code comment

7
Implementation Basic Design Rules
  • Project modularity achieved through classes.
  • Structure the program by Classes only (only
    methods are allowed, no separate functions)
  • Project is decomposed into modules with as little
    cross-dependence as possible
  • One module per class
  • Classes should have minimal interfaces
  • Modules should have minimal dependencies
  • Implementation issues hidden from clients
    (information hiding)
  • Inheritance should be extensively used
  • Advantages
  • Improved readability
  • Reduced maintenance work
  • Improved robustness

8
Implementation Program Services
  • Safe memory management
  • memory service
  • dynamic memory management C without pointers
  • Diagnostics
  • decide which data must be checked when, and
    define the actions
  • File management, user interfaces
  • User program configuration management
  • Text data management
  • Mathematical data management

9
Implementation Memory Services Diagnostics
  • Memory Services
  • Diagnostics

10
Some Important Topics
  • Portability
  • portability and defined options in files
    compatib.h, defopt.h, Boolean.h
  • Testing
  • test routines and version history
  • Reliability
  • readability
  • maintainability

11
RES General Specification
  • RES (Recognition Experimental System) is an HMM
    based experimental tool for continuous
    multispeaker speech recognition. The system works
    on recorded speech files and it basically
    includes
  • the batch modules for acoustic model
    initialization and training
  • grammar models training
  • phoneme/word recognition
  • performance evaluation.
  • RES is state of art in speaker independent
    phonetic recognition
  • with 69.2 of percent correct using all TIMIT
    test data using context independent phonetic
    models.
  • It yields 87.83 of percent correct in speaker
    independent word recognition on ATIS using
    context independent phonetic models not optimally
    tuned on this database.

12
RES General Specification
  • How to build an ASR system for a different
    language?
  • we need many segmented speech recordings to feed
    the training programs and get good HMM models of
    our voices.
  • use a freeware program like Snack 1.4 (search on
    the Internet) to prepare the data.
  • search a Dutch multispeaker phonetic database.
  • Design and feed the right language-model.
  • Speech samples to train and test the RES system?
  • You can download speech samples from Linguistic
    Data Consortium (LDC) after you have obtained a
    user account.

13
General Specification
14
General Specification
  • Required C custom libraries
  • none
  • Portability
  • Linux
  • Windows 3.x, Windows 95, NT
  • DOS with DjGpp
  • Compilers
  • Ms Visual C gt4.0
  • DjGpp version 2.8.1 or
  • GNU Linux Gpp version 2.8.1 or newer

15
Speech Database
  • Speech data retrieval
  • Speech files
  • NIST1A (ATIS x, TIMIT),
  • MS WAV
  • custom, adding software drivers
  • Label File
  • ATIS
  • IMIT
  • various subsets, custom labels alphabets included
    in a file, custom label handling supplying a
    driver.
  • Other options
  • overlap
  • window length
  • file buffering

16
Speech Signal Analysis
  • Feature Extraction
  • Signal processing
  • Any concatenation of processing blocks is
    allowed. Each block performs a class of
    processing and the actual processing is specified
    by the options.
  • Available processing blocks
  • Preemphasis_and_Hamming
  • Mean_Subtraction
  • FFT
  • MFCC with Log/non Log Energy
  • any order differences
  • Other Blocks can be added supplying proper
    drivers.

17
Hidden Markov Models
  • HMM model Initialization
  • HMM topology
  • 4 predefined types with configurable number of
    states.
  • Acoustic Units
  • as allowed by the available database
  • emission densities
  • Untied Gaussian mixtures
  • full or diagonal covariance matrix
  • number of mixtures configurable for each acoustic
    unit
  • Initialization method
  • maximum distortion splitting on segmented database

18
Hidden Markov Models Training
  • Training algorithm
  • Single and Simultaneous Model Re-estimation
    Baum-Welch.
  • parameter re-estimation selective by
    configuration.

19
Language Models
  • Language Model
  • unigram and bigram on words and phonemes
  • Smoothing techniques
  • Good-Turing, non-linear and linear interpolation
    model
  • Word Clustering minimum mean square error on
    transition probability
  • Perplexity
  • word and phoneme based computation

20
Recognition
  • Recognition
  • Recognition Unit
  • acoustic units, words
  • Algorithm Type
  • Viterbi with Beam search and Window search
    pruning strategies

21
Evaluation
  • Evaluation Wagner-Fisher algorithm

22
Projects
  • 1. Dutch Speech Corpus Database Interface (2
    groups)
  • in an early phase some example classes should be
    available, like counting, etc.
  • maybe use tools like praat (for wav labeling
    with phonetics), etc.
  • 2. Signal Analysis and Feature Extraction (2
    groups)
  • 3. HMM Initialization HMM Dutch Phonetic
    Training (2 groups)
  • 4. Dutch Language Model Word Class Training (2
    groups)
  • in an early phase some examples should be
    available
  • 5. Recognition (2 groups)
  • Evaluation (All)

23
Project Designs
  • The design of the project should contain the
    following
  • The implementation goals
  • The underlying technique and theory
  • A functional description of the starting-code and
    tools
  • The design of new code and functionality
  • Implementation goals and a time-scheme
  • NB if it is considered difficult to obtain all
    the goals within the current time-frame, team up
    with the other team
  • Interfacing
  • Define the module-interfaces
  • Define the time-path for the essential
    module-inputs
  • Define a realistic time-path for the
    (partial-)outputs of the module.
Write a Comment
User Comments (0)
About PowerShow.com