Juicer: A weighted finitestate transducer speech decoder presentation

About This Presentation

Transcript and Presenter's Notes

Title: Juicer: A weighted finitestate transducer speech decoder

1
Juicer A weighted finite-state transducer speech
decoder

D. Moore1, J. Dines1,
M. Magimai Doss1, J. Vepa1,
O. Cheng1 and T. Hain2
1 IDIAP Research Institute
2 Department of Computer Science, University of
Sheffield

2
Overview

The speech decoding problem
Why develop another decoder?
WFST theory and practice
What is Juicer?
Benchmarking experiments
The future of Juicer

3
The speech decoding problem

Given a recording and models of speech
language, generate a text transcription of what
was said

Decoder
She had your dark suit.
Models
4
The speech decoding problem

5
The speech decoding problem

6
The speech decoding problem

ASR system building blocks
Grammar
N-gram language model
Lexical knowledge
pronunciation dictionary
Phonetic knowledge
context dependency
phonological rules
Acoustic knowledge
state distributions

Naive combination of these knowledge sources
leads to a large, inefficient representation of
the search space
7
The speech decoding problem

The main issue in decoding is carrying out an
efficient search of the space defined by the
knowledge sources
Two ways we can do this
Avoid performing redundant search
Dont pursue unpromising hypotheses
An additional issue flexibility of the decoder

8
Why develop another decoder?

Need of a state-of-the-art speech decoder that is
also suitable for on-going research
At present, such software is not freely available
to the research community
Open-source development and distribution
framework

9
WFST theory and practice

Maps sequences of input symbols to sequences of
output symbols
Transition pairs have an associated weight
In the example
Input sequence Ia b c d maps to output
sequence OX Y Z W, with the path weight a
function of all transition weights associated
with that path, f(0.1,0.2,0.5,0.1)

10
WFST theory and practiceWFST operations

Composition
Combination of transducers
Determinisation
Only one transition per input label
Minimisation
Least number of states and transitions
Weight pushing to aid in minimisation

11
WFST theory and practiceComposition
12
WFST theory and practiceDeterminisation
13
WFST theory and practiceWeight pushing
minimisation
14
WFST theory and practiceWFST and speech decoding

ASR system building blocks
Grammar
Lexical knowledge
Phonetic knowledge
Acoustic knowledge

Each of these knowledge sources
has a WFST representation

15
WFST theory and practice WFST and speech decoding

Requires some special considerations
Lexicon and grammar composition can not be
determinised
Nor can the context dependency transducer
where L, G, C are WFSTs for the grammar, lexicon
and context dependency

16
WFST theory and practiceWFST and speech decoding

Pros
Flexibility
Simple decoder architecture
Optimised search space
Cons
Transducer size
Knowledge sources are fixed during composition
WFST-only knowledge sources

17
What is Juicer?

A time-synchronous Viterbi decoder
Tools for WFST construction
An interface between 3rd party FSM tools

18
What is Juicer?Decoder

Pruning
beam search, histogram
1-best output
word and model timing information
Lattice generation
phone level lattice output
State-to-phone transducer is not optimised
incorporated at run time

19
What is Juicer?WFST tools

gramgen
word-loop, word pair, N-gram language models
lexgen
multiple pronunciations
cdgen
monophone, word-internal n-phone, cross-word
triphone
HTK CDHMM and hybrid HMM/ANN model support
build-wfst
composition, determinisation and minimisation
using 3rd party tools (ATT, MIT)

20
Benchmarking experiments

Experiments were conducted in order to
Compare with existing state-of-the-art decoders
Assess the current capabilities and limitations
of the decoder
Guide future development and research directions

21
Benchmarking experiments 20k Wall Street Journal
Task

Equivalent performance on wide beam settings
HDecode wins out on narrow beam-widths
Only part of the story

22
Benchmarking experiments but whats the catch?

Composition of large static networks
Practically infeasible due to memory limitations
Is slow
And may not always be necessary

23
Benchmarking experimentsAMI Meeting Room
Recogniser

Decoding for the NIST Rich Transcription
evaluations
Juicer uses pruned LMs
Good trade-off between RTF and WER

Chosen operating point
24
The future of Juicer

Further benchmarking
Testing against HDecode
Trade off between pruned LMs and performance
Added capabilities
On the fly network expansion
Word lattice generation
Support for MLLR transforms, feature transforms
Distribution and support
Currently only available to AMI and IM2 partners

25
Summary

Questions?

I have presented today
WFST theory and practice
The Juicer tools and decoder
Preliminary experiments
but more importantly
We hope to have generated interest in Juicer

Write a Comment

User Comments (0)

About PowerShow.com

Juicer: A weighted finitestate transducer speech decoder PowerPoint PPT Presentation