Progress Presentation of Sphinx 3.6 2005 Q2

About This Presentation

Title:

Progress Presentation of Sphinx 3.6 2005 Q2

Description:

Jack: 'zzzzzzzzz' (Literally fell asleep, not his default behavior) Progress of GMM Computation ... This is the black magic of Ravi . Magic 1: Instead of using ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 45

Provided by: Arthu61

Learn more at: http://www.cs.cmu.edu

more less

Transcript and Presenter's Notes

Title: Progress Presentation of Sphinx 3.6 2005 Q2

1
Progress Presentation of Sphinx 3.6 (2005 Q2)

Arthur Chan
Carnegie Mellon University
Jun 7, 2005

2
This talk

Purpose of this talk
A working progress report on various aspects of
the development
A briefing on s3.generic.
Codebase only exists in my hard disc since Mar 28
2005
Include a bunch of gentle changes but its still
significantly different from current s3.5
Development is regarded as incomplete
Allows developers to have mutual understanding on
the code and its potential effects in future
development

3
Outline of this talk (26 pages)

Review of changes of Sphinx 3.5 from Jan to April
1st
Mainly on GMM Computation (2 pages)
S3.generic (22 pages)
High Priority Items
New search architecture (7 pages)
Development of the new search using
word-conditioned tree copies (7 pages)
Manipulation of LMs (1 page)
Other Items
Gentle re-factoring and minor changes (5 pages)
Progress on documentation (2 pages)
Discussion (2 pages)
On future plan of Sphinx 3 and SphinxTrain (1
page)

4
Review of work on GMM Computation in 3.X (X6)
5
Review of GMM Computation

Completed in Q1 2005 in conjunction with the ICSI
speed up setup development
Include
Absolute discounting of CIGMMs
Usage of best Gaussian index (BGI)
Usage of adaptive CIGMMS (ACIGMMS)
Details
www-2.cs.cmu.edu/archan/presentation/SphinxLunch2
0050310.ppt (Sphinx Lunch Presentation)
On Improvements of CI-based GMM Selection
Eurospeech 2005
Already exists in the repository
tag SPHINX3_5_1_RCI_IRII

6
Last impression on GMM Computation

Internal comments on GMM computation was mixed
Speed gain starts to reach a limit (30 relative
instead of 80 relative)
Speed gain also starts to be not the focus,
accuracy becomes more important concern
Some Other Signs
AlexRs facial impressions
? (When talking about GMM computation)
? (When talking about future development of
search)
Jack
zzzzzzzzz (Literally fell asleep, not his
default behavior)

7
Progress of GMM Computation

Still under worked secretly
Detail disclosed later

8
Design of Search Architecture in Sphinx 3.6
9
Development of new search

Why a new search in Sphinx 3?
search in S3.X (Xlt6) (The Ravis Method)
An unconventional way to take care of
segmentation problem of using tree lexicon.
Gives nice memory/speed/accuracy trade-off when
it was first written
Downside
Not an exact bi-gram search
Techniques in literature couldnt be easily
applied.
We will be able to apply 5-10 existing or new
techniques if the conventional way is used.

10
Design of the new search architecture

Motivation
The risk of replacing the old search is high
The old search is an interesting one. It is a
waste if we just replace it.
Re-factoring was first done to allow Ravis
method and new search co-exist
Implemented by so called C classes
Struct with both internal variables and methods.
A function pointer implementation
Using similar concepts as implementation in
feat.c
Similar to how C handle class internally.

11
Separation of Mechanism and Implementation
-Provide Atomic Search Operations (ASOs) in the
form of function pointers -Only implement one
mechanism -ASOs could be configured by just
setting the value of function pointers - A single
interface for applications
Search Mechanism Module (srch.c)
Search Implementation Module (srch.c)
Search Implementation Module (srch.c)
-Could have multiple of them -Responsible for the
details such as handling of the graph and know
sources -Possibilities A, Decoding with
different implementations B, Operations that has
the concept of search including alignment,
phoneme recognition or keyword spotting.
Search Implementation Module (srch.c)
Search Implementation Module (srch.c)
Search Implementation Modules (srch_????.c)
12
Advantages

A cheap way of polymorphism
When the flow of the search need to change
E.g. batch mode or live mode
Only search mechanism module need to be
implemented
When detail of search need to change
One have options to choose to rewrite the whole
search or just part of the implementations
No need for complete replacement

13
What does the search mechanism module actually
do? -A flow chart
scores
Senone Computation
Search
Simplified Version
(Information For Pruning GMM)
Select Active CD Senone
1st Approximation
Compute Detail GMM Score (CD senone)
Compute Detail HMM Score (CD)
Propagate Graph (Phone- Level)
Rescoring At word End using High-Level KS (e.g.
LM)
Propagate Graph (Word- Level)
Compute Approx. GMM Score (CI senone)
14
Different Search Implementations

3 modes is currently implemented
Mode 4
Ravis Search for 3.X (Xlt6) (Completion 100)
Mode 5
Word-conditioned tree copy search (Completion
10)
Mode 1369
Debug mode of the search mechanism module.
No decoding will be done, only text output to
indicate the flow of the search
Reserved Modes (Not implemented yet)
Mode 0 - Force alignment
Mode 1 - Phoneme recognition
Mode 2 - Graph Search with FSM
Mode 3 - Flat Lexicon Search

15
Architecture Diagram
decode
livepretend
livedecode
Live-mode Decoder
Batch-mode Decoder
Search Mechanism
Implementation of Ravis Search (Mode 4)
Implementation of Search Debugging (Mode 1369)
Implementation of 3.6 Search (Mode 5)
GMM
Trees
Dict
LM
Fast GMM struct
Beam Struct
16
Search anatomy in debug mode

SEARCH DEBUG MODE UTT BEGIN
SEARCH DEBUG APPROXIMATE COMPUTATION AT TIME 0
SEARCH DEBUG SELECT ACTIVE GMM
SEARCH DEBUG DETAIL COMPUTATION AT TIME 0
SEARCH DEBUG COMPUTE HEURISTIC
SEARCH DEBUG HMM COMPUTE LV 2
SEARCH DEBUG HMM PROPAGATE GRAPH (PHONEME) LV 2
SEARCH DEBUG RESCORING AT LV2
SEARCH DEBUG HMM PROPAGATE GRAPH (WORD) LV 2
SEARCH DEBUG SHIFT ONE CACHE FRAME
SEARCH DEBUG APPROXIMATE COMPUTATION AT TIME 1
SEARCH DEBUG FRAME WINDUP
SEARCH DEBUG SELECT ACTIVE GMM
SEARCH DEBUG DETAIL COMPUTATION AT TIME 1
SEARCH DEBUG COMPUTE HEURISTIC
SEARCH DEBUG HMM COMPUTE LV 2
SEARCH DEBUG HMM PROPAGATE GRAPH (PHONEME) LV 2
SEARCH DEBUG RESCORING AT LV2
SEARCH DEBUG HMM PROPAGATE GRAPH (WORD) LV 2

17
Discussion

Why not using graph as the parent of the data
structure?
Say inherit a tree or a bi-tree from a graph?
This sounds like a way that could unify different
methods.

18
Discussion (cont.)

My answer
Because of legacy,
most recognizers actually use many special
methods to optimize speed of search of different
optimizations
Generic graph search may not able to represent
these methods sufficiently
Thats why a lot of graph approach turns out to
be slower than its tree equivalent
Could require a lot of effort
To make a generic graph search to be as fast as
the legacy system.

19
Development Progress of Search Mode 5 A
word-conditioned tree copies search
20
Flat Lexicon and Tree lexicon-Unigram Search
Word 1
P(w1)

P(w2)
Word 2
-Tree lexicon with single tree copy will produce
the same result as Flat lexicon -Only
difference In flat lexicon uw could be applied
at both word begin and word end In tree lexicon
uw could be applied only at the word end
21
Flat Lexicon and Tree lexicon-Bigram Search
P(w1w1)
Word 1
Word 1
ph2
P(w1w1)
P(w2w1)
ph1
P(w1w2)
Word 2
Word2
ph3
P(w1w2)
P(w2w2)
-The two searches are unequal because the tree
search doesnt consider the possibilities of
P(w2w1) or P(w2w2) -If max was taken at the
word end, then the Word Segmentation Error will
occur. (Another term Delayed Bigram)
22
Flat Lexicon and Tree lexicon-Bigram Search
(cont.)
P(w1w1)
P(w1w1)
Word 1
Word 1
P(w1)
P(w1)
P(w2w1)
P(w2w1)
P(w1w2)
Word 2
Word2
P(w2)
P(w1w2)
P(w2)
P(w2w2)
-Need to Maintaining copies of tree representing
state which word 1 and word 2 were entered
P(w2w2)
23
Flat Lexicon and Tree lexicon-Bigram Search
(cont.)

Intriguing Economics of Tree Lexicon
From Flat lexicon to Tree lexicon give
3-4 time reduction of state space
Expansion of Tree copies require N times state
space where N is of words (e.g. N100 to 65k)
So, why it became a text-book answer?
When search space is dynamically expanded with
pruning, it will be significantly smaller. (From
Lit., Usually only 10-50 times)
Multiple techniques can reduce this number
further.
Usage of back-off nodes
Usage of tail-sharing
Usage of sub-tree dominance
No need to expand the whole tree

24
Important Note How did Ravi solve it then?

This is the black magic of Ravi
Magic 1 Instead of using word tree copies
Transitions into lextrees staggered across time
Multiple tree are allocated
At alternate time, alternate lextree is entered.
Later -epl (entries per lextree) parameter was
introduced, that will make block of frames one
lextree entered, before switching to next
More word segmentations (start times) survive
Magic 2 Full LM rescoring at the leaf node
The backtrack pointer table could provide the
complete history.
Full LM will be used to rescore the history
Magic 3 Composite triphones
Detail omitted.

25
Current Status of the Development of mode 5 in 3.6

It is still incomplete.
Though check-in is necessary to avoid too
separate branches
Prototype 1, DP is completed.
But it used a lot of memory (50x tree copies)
tested in a very simple case.
No tree deletion.
No control when number of tree exceed max. (Just
reallocate)
Still keep the full LM rescoring feature in
Ravis search. (It will be useful someday. ? )
Expect to have 10 prototypes before actual
shipping.

26
Relationship between Mode 4 and 5

They share the code of GMM computation
So speed-up techniques in 3.X(X4 to X6) could
be applied to mode 5 as well
Mode 4 and Mode 5 still use the same lexical tree
data structure
Major difference
when entering to new trees, handling are
different.
Mode 4 enter a tree by looking at the time index.
Mode 5 enter a tree depends on the word copy.

27
Discussion

There are a lot of potential in the work of
search
Could we combine search philosophies of mode 4
and mode 5?
How could we reduce the memory size used in mode
5?
Tree copies for bigram and beyond?
Expect a lot of fun in next 3 months.

28
Manipulation of LMs
29
LM Manipulation

CALO and LISTEN shows that
Dynamic addition and deletion of LM is very
important.
New feature is implemented (not tested
thoroughly) for
Refactoring the LM code such that an array of LM
(lmset_t) always assume to exist.
Reading LM in text format.
In mode 4, deletion and addition of LMs
Expected problem in future
Changes in high level knowledge source such as LM
will also change the search graph.
This makes handling quite tricky.

30
Some other gentle re-factoring
31
Other re-factoring that affects us

Did it because
Push from projects
Push from implementation of mode 5
Important ones
1, kb and kbcore
2, Physical file structure of libs3decoder
3, refactoring across dag/astar/decode_anytopo
4, synchronization of command line

32
kb and kbcore

Changed motivated by the new search changes.
Kb and kbcore take care of mode initialization
srch will point resource to the kb.
Initialization of graph structures are now
responsibility of search implementation modules.
Implemented and tested
Consistent style of modules reporting
Add arguments for reporting in every modules

33
Physical file structure of libs3decoder

libs3decoder starts to be overcrowded
Now divided to eight libraries (Tested)
libs3decoder/libam (gmm, hmm, optimized
computation)
libs3decoder/libcep_feat (feature, d-coeff, agc,
cmn)
libs3decoder/libcommon (util, misc)
libs3decoder/libdict (dict, dict2pid, wid)
libs3decoder/liblm (lm, lmclass)
libs3decoder/libsearch(srch, srch_impl)
libs3decoder/libep (endptr, classify)
libs3decoder/libAPI (ld_decode_API, utt)
Not very orthogonal yet
E.g. libam/liblm inter-depends

34
libs3decoder Before/After
adaptor, Approx_cont_mgau, gs, hmm, interp, mdef,
mllr, ms_gauden, ms_mllr, ms_senone, cb2mllr_io
(not there yet)
Ascr, dag (new), flat_fwd, gmm_wrap (new), kb,
kbcore, lextree, vithist srch (new) srch_debug
(new) srch_time_switch_tree (Mode
4) srch_word_switch_tree (Mode 5)
agc, approx_cont_mgau, ascr, bio, cb2lmllr_io,
classify, cmn, cmn_prior, cont_mgau, corpus,
dict2pid, dict, endptr, fast_algo_struct, feat,
fe, fe_interface, fe_sigproc, fillpen, flat_fwd,
gs, hmm, interp, kb, kbcore, lextree,
live_decode_API, live_decode_args, lm, lmclass,
logs3, mdef, misc, mllr, ms_gauden, ms_mllr,
ms_senone, subvq, tmat, utt, vector, vithist, wid
am
search
agc, cmn, cmn_prior, feat, fe, fe_interface,
fe_sigproc
lm, lmclass, fillpen
cep_feat
lm
classify, endptr
3.5
dict, dict2pid, wid
ep
dict
bio, corpus, logs3, misc, stat stat (new), vector
utt, live_decode_api, live_decode_args
common
API
35
Refactoring across dag/astar/decode_anytopo

The three has a lot in common
So some fats need to be cut.
A standalone library dag.c is created.
E.g.
Dag_link, dag_update_link is shared
Dag_search, dag_load is still not easy to share.
Dag and 2nd-stage search of decode_anytopo may
still not be equivalent
Need more testing.

36
Synchronization of command line arguments

Clean up has been done for
decode
align
allphone
dag
astar
decode_anytopo
Use
wip for insertion penalty
-lw not -langw
-mean not meanfn
This should be stable in 3.6

37
Progress in Documentation
38
Doxygen-style documentation

Fixing a lot of bugs in doxygen documents during
the development
Close to completion
Instead of
int fun(int a, / a is a variable /
int b) / b is a variable /
It should be
int fun(int a, /lt a is a variable /
int b /lt b is a variable /
)

39
Status of Hieroglyphs Draft 1

It looks like a book now.
less crappy
the crappy parts are consistent
Another 3 chapters is completed
On software installation (Chapter 4)
On the front end of Sphinx (Chapter 6)
FAQs of using Sphinx (Appendix B)
The number of chapters is now increased by 2.
(From 12 to 14, finished from 6 to 9)
Still 5 chapters to go!

40
Status of Hieroglyphs Draft 1

Other chapters
Chapter I License and use of Sphinx,
SphinxTrain and CMU LM Toolkit (1st draft, 4th
Rev)
Chapter II Introduction to Sphinx, SphinxTrain
and CMU LM Toolkit (1st draft, 2nd Rev)
Chapter IX Search Structure and Speed-up of
Sphinx's recognizers (1st draft, 2nd Rev)
Chapter X Speaker adaptation using Sphinx (1st
draft, 3rd Rev)
Chapter XI Development using Sphinx (1st draft,
2nd Rev)
Appendix A.2 Full SphinxTrain Command Line
Information (1st draft, 2nd Rev)
Writing Quality
Still Low
Start to have logic and look like English
The 1st draft will be completed in the summer
(hopefully)

41
Final note on ST and S3

Our plan for SphinxTrain and sphinx3
Separation to libraries/applications is our main
goal
Before that merging ST to S3 will be a good step
libs3decoders refactoring will be a good step
for merging.
Do it slowly
Arthur Chan is disallowed to check-in more than 4
executables a month to sphinx 3
This should allow us to balance short-term and
long-term goal.

42
Sphinx development in general