Juicer: A weighted finitestate transducer speech decoder - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Juicer: A weighted finitestate transducer speech decoder

Description:

Juicer: A weighted finitestate transducer speech decoder – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 26
Provided by: din53
Category:

less

Transcript and Presenter's Notes

Title: Juicer: A weighted finitestate transducer speech decoder


1
Juicer A weighted finite-state transducer speech
decoder
  • D. Moore1, J. Dines1,
  • M. Magimai Doss1, J. Vepa1,
  • O. Cheng1 and T. Hain2
  • 1 IDIAP Research Institute
  • 2 Department of Computer Science, University of
    Sheffield

2
Overview
  • The speech decoding problem
  • Why develop another decoder?
  • WFST theory and practice
  • What is Juicer?
  • Benchmarking experiments
  • The future of Juicer

3
The speech decoding problem
  • Given a recording and models of speech
    language, generate a text transcription of what
    was said

Decoder
She had your dark suit.
Models
4
The speech decoding problem
  • Or

5
The speech decoding problem
  • Or

6
The speech decoding problem
  • ASR system building blocks
  • Grammar
  • N-gram language model
  • Lexical knowledge
  • pronunciation dictionary
  • Phonetic knowledge
  • context dependency
  • phonological rules
  • Acoustic knowledge
  • state distributions

Naive combination of these knowledge sources
leads to a large, inefficient representation of
the search space
7
The speech decoding problem
  • The main issue in decoding is carrying out an
    efficient search of the space defined by the
    knowledge sources
  • Two ways we can do this
  • Avoid performing redundant search
  • Dont pursue unpromising hypotheses
  • An additional issue flexibility of the decoder

8
Why develop another decoder?
  • Need of a state-of-the-art speech decoder that is
    also suitable for on-going research
  • At present, such software is not freely available
    to the research community
  • Open-source development and distribution
    framework

9
WFST theory and practice
  • Maps sequences of input symbols to sequences of
    output symbols
  • Transition pairs have an associated weight
  • In the example
  • Input sequence Ia b c d maps to output
    sequence OX Y Z W, with the path weight a
    function of all transition weights associated
    with that path, f(0.1,0.2,0.5,0.1)

10
WFST theory and practiceWFST operations
  • Composition
  • Combination of transducers
  • Determinisation
  • Only one transition per input label
  • Minimisation
  • Least number of states and transitions
  • Weight pushing to aid in minimisation

11
WFST theory and practiceComposition
12
WFST theory and practiceDeterminisation
13
WFST theory and practiceWeight pushing
minimisation
14
WFST theory and practiceWFST and speech decoding
  • ASR system building blocks
  • Grammar
  • Lexical knowledge
  • Phonetic knowledge
  • Acoustic knowledge
  • Each of these knowledge sources
  • has a WFST representation

15
WFST theory and practice WFST and speech decoding
  • Requires some special considerations
  • Lexicon and grammar composition can not be
    determinised
  • Nor can the context dependency transducer
  • where L, G, C are WFSTs for the grammar, lexicon
    and context dependency

16
WFST theory and practiceWFST and speech decoding
  • Pros
  • Flexibility
  • Simple decoder architecture
  • Optimised search space
  • Cons
  • Transducer size
  • Knowledge sources are fixed during composition
  • WFST-only knowledge sources

17
What is Juicer?
  • A time-synchronous Viterbi decoder
  • Tools for WFST construction
  • An interface between 3rd party FSM tools

18
What is Juicer?Decoder
  • Pruning
  • beam search, histogram
  • 1-best output
  • word and model timing information
  • Lattice generation
  • phone level lattice output
  • State-to-phone transducer is not optimised
  • incorporated at run time

19
What is Juicer?WFST tools
  • gramgen
  • word-loop, word pair, N-gram language models
  • lexgen
  • multiple pronunciations
  • cdgen
  • monophone, word-internal n-phone, cross-word
    triphone
  • HTK CDHMM and hybrid HMM/ANN model support
  • build-wfst
  • composition, determinisation and minimisation
    using 3rd party tools (ATT, MIT)

20
Benchmarking experiments
  • Experiments were conducted in order to
  • Compare with existing state-of-the-art decoders
  • Assess the current capabilities and limitations
    of the decoder
  • Guide future development and research directions

21
Benchmarking experiments 20k Wall Street Journal
Task
  • Equivalent performance on wide beam settings
  • HDecode wins out on narrow beam-widths
  • Only part of the story

22
Benchmarking experiments but whats the catch?
  • Composition of large static networks
  • Practically infeasible due to memory limitations
  • Is slow
  • And may not always be necessary

23
Benchmarking experimentsAMI Meeting Room
Recogniser
  • Decoding for the NIST Rich Transcription
    evaluations
  • Juicer uses pruned LMs
  • Good trade-off between RTF and WER

Chosen operating point
24
The future of Juicer
  • Further benchmarking
  • Testing against HDecode
  • Trade off between pruned LMs and performance
  • Added capabilities
  • On the fly network expansion
  • Word lattice generation
  • Support for MLLR transforms, feature transforms
  • Distribution and support
  • Currently only available to AMI and IM2 partners

25
Summary
  • Questions?
  • I have presented today
  • WFST theory and practice
  • The Juicer tools and decoder
  • Preliminary experiments
  • but more importantly
  • We hope to have generated interest in Juicer
Write a Comment
User Comments (0)
About PowerShow.com