3rd Progress Meeting For Sphinx 3.6 Development - PowerPoint PPT Presentation

About This Presentation
Title:

3rd Progress Meeting For Sphinx 3.6 Development

Description:

Sphinx 3 and 4 have gone through bug fixes. CALO effort are now split to two ... Several bug fixes causing seg faults are eliminated. Vithist.c bugs ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 40
Provided by: Arthu61
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: 3rd Progress Meeting For Sphinx 3.6 Development


1
3rd Progress Meeting For Sphinx 3.6 Development
  • Arthur Chan,
  • David Huggins-Daines,
  • Yitao Sun
  • Carnegie Mellon University
  • Jan 25, 2006

2
This meeting
  • 3rd Progress report on 3.6 development (40 pages)
  • Agenda
  • What happened in Fall 2005? (4 slides)
  • Progress of Sphinx Development in Fall 2005 (17
    slides)
  • Summary of Progress in 2005 (10 slides)
  • Discussion Should we create one release
    candidate? (1 slide)

3
What happened in FALL 2005?
4
What happened in Fall 2005?
  • Major Events in Sphinx Development
  • We participate GALE in Oct 2006
  • Conformance of the recognizers (sphinx 3 and
    sphinx 4) become an issue
  • Lack of advanced acoustic modeling techniques
    become very glaring
  • Sphinx 3 and 4 have gone through bug fixes.
  • CALO effort are now split to two
  • Off-line recognizer require major improvement in
    LM and AM.
  • AM Issue is shared with GALE
  • On-line recognizer (CALO jargon Smartnote)
  • Now have new LM and AM
  • Require significant development work

5
Time distribution (Estimated)
  • Arthur
  • 50 on GALE, 20 on CALO, 30 on Sphinx
  • Dave
  • 65 CALO, 30 on PocketSphinx, 5 on Sphinx
  • Yitao
  • 90 CALO, 10 on Sphinx

6
The Two Funded Projects
  • Upside
  • They point to issues that need to be solved
  • Need significant reprioritization of tasks
  • Balance of effort on the 2 projects is now
    achieved
  • Downside
  • Code development of Sphinx becomes a slower
    process
  • Also, we havent released s3 for a while
  • gt Should we release the code now?
  • Tired students and staffs can be found everywhere

7
Progress of Sphinx 3.6 in FALL 2005
8
Overview
  • Work on second-stage
  • Merging of bestpath search in the 2-nd stage of
    tree search
  • IBM lattice generation
  • word confidence estimation
  • Behavior changes and bug fixes
  • Treatment of acoustic scores
  • Assertion in vithist.c
  • Attempts in search algorithm improvements
  • Mode 3 Flat lexicon decoding
  • Mode 4 Tree lexicon decoding
  • Sphinx on Mandarin and coded language.
  • New tools conf, dp

9
Work Schedule
  • Sep 1 to Oct 1
  • Implementation of triphones in flat lexicon
    decoder
  • Oct 1 to Nov 1
  • Implementation of triphones on tree lexicon
    decoder (incomplete)
  • Nov 1 to Dec 8
  • IBM lattice generation
  • Confidence score generation
  • Fixed issues in scores
  • Dec 8 to Jan 3 Concept of vacation was tried
  • Jan 3 to now
  • Fixed bugs, prepare release.

10
Second-stage Processing
  • Best-path search could now be specified in decode
  • Implementation requires write back. (urgh.)
  • Recognizer can now generate lattice in IBM format
  • Word is attached at the link
  • Sphinx format generates word attached to the
    node.
  • Scores are normalized with best senone scores
  • Rongs confidence-based routine is now in Sphinx
  • conf
  • Goodies use Sphinx logs3 routine -gt
    significantly reduce alpha-beta scores mismatch.

11
Second-stage Processing (cont.)
  • Further work
  • Best-path generation doesnt conform to past 3.5
  • -gt Bugs caused by 3.6 development
  • Also, the best path is not always in the lattice
  • -gt Legacy bug
  • Confidence-based method
  • Lattice-based could only be used off-line
    currently
  • 10 of the data still have alpha-beta mismatch
  • Consensus network generation need special focus

12
Scores we see (Change 1)
  • Tree search now truly generate un-normalized
    scores.
  • was normalized by the ending frame only
  • Caused by bug introduced in mid-2005
  • All 1-st stage search use the same score logging
    functions
  • Include align, allphone, decode_anytopo, decode
  • matchseg_write, match_write are the current
    versions
  • log_ is still used but will soon be totally
    replaced

13
Scores we see(Change 2)
  • Multi-stream GMM computation (ms_gauden)
  • By default, it wont quantize log pdf to 8 bits
    now
  • Single-stream GMM computation
  • Vectors with zero means and variances are removed
    (-remove_zero_var_gau)
  • Scores and performance will change
  • Testing resource has changed.
  • (Evandro grins at this point)

14
Scores we see (Change 3)
  • Sphinx now supports generation of different
    hypseg format (-hypseg_fmt)
  • SPHINX 2-format
  • SPHINX 3-format
  • ctm format
  • Always require more processing, but it is better
    than nothing.

15
Scores a summary
  • Unnormalized (true) acoustic and language scores
    generated by (-hypsegscore_unscale)
  • 1-st stage search and
  • Best path search right after the 1-st stage
  • Normalized acoustic score would be generated by
  • Lattice generation
  • If developers wants to have true scores in
    lattice
  • Developers could get the best scores from the
    decoder (bestsenscrdir) and do their own
    processing

16
Other important bug fixes
  • Bug in vithist.c
  • Caused assertion and stop the recognizer
  • Now fix and will return error message to the
    search abstraction routine.

17
Attempts in search algorithm improvements (Mode 3)
  • Flat-lexicon decoder
  • Search implementation is completed
  • decode could now use flat-lexicon decoding
  • -op_mode 3
  • Decoders revamping is completed
  • Mode 2 (FST)
  • Mode 3 (Flat-lexicon)
  • Mode 4 (Ravis Tree-Lexicon)
  • Mode 5 (Arthurs Tree-Lexicon)
  • decode_anytopo is still there for backward
    compatibility purpose
  • decode_anytopo decode in mode 3

18
No Further Re-factoring
  • Avoid re-factoring before next check-in
  • Align and allphone have different input/output
    file formats
  • It doesnt make sense to stuff into a single
    executable.
  • Using XML configuration and control file will be
    a choice
  • But it takes too much time to implement

19
Algorithmic Work -Flat Lexicon Decoder
  • Full triphone completed in flat-lexicon decoding
  • 2.5 relative improvement in accuracy
  • But requires 100xRT (urgh)
  • Useful for debugging
  • Also considered full trigram implementation
  • Will results in another 5-10 times slow down
  • Conclusion
  • Flat lexicon search has come to its limit

20
Algorithmic Work -Tree Lexicon Decoder
  • Current full triphone implementation
  • Has flaws in score propagation
  • Tree copies
  • ? No time to do it at all, Q4s workload nearly
    kill AC
  • Benchmarking results
  • GALE results
  • Full Lexicon Tree Lexicon
  • CALO/Communicator results
  • Tree Lexicon 5 relative poorer.
  • Conclusion
  • Half a year on search is expected to give us
    another 5

21
Conclusion on Search
  • Need to seriously consider
  • Is working on search a good idea?
  • In both CALO/GALE, gain come from
  • SAT and cross adaptation
  • Second-stage processing
  • Confusion network
  • Confidence annotation
  • First-stage SD -gt Second-stage SA
  • VTLN
  • also only give 5 rel
  • but it only takes 5 days to implement

22
Sphinx on Different Text Encodings
  • There are already non-CMU work for
  • Spanish
  • French
  • Big question mark
  • Could it work on other encoding?

23
Sphinx on Mandarin (gb2312)
24
Sphinx on Mandarin (cont.)
  • Thanks to Ravi
  • Bugs we fixed to get it through
  • 1236322 libutil\str2words special character bug
  • 1236166 special character wasn't supported
  • This should give us fairly good foundation to
    start on most language

25
Summary of Sphinx in Fall 2005
  • We have done something
  • Strong focus in search research doesnt seem to
    get us far.
  • Fire to fight on the modeling side
  • Sounds like the time to check in and move on

26
Progress of Sphinx 3.X (From X5 to X6)
27
Progress of Sphinx 3.X(From X5 to X6)
  • New Features (4 slides)
  • Items that are significant
  • Gentle, mild and simple re-factoring and its
    consequence (4 slides)
  • Documentation (1 slide)
  • Regression testing (1 slide)
  • Pruned Features ?

28
New Features (Search)
  • Speed
  • Further enhancement of CIGMMS
  • BBI tree implementation (by Dave, in SphinxTrain)
  • Search
  • FST search
  • Full triphone implementation in decode_anytopo
  • Separation of search abstraction/implementation
    in 3.X

29
New Features (Adaptation)
  • Adaptation
  • Multiple classes for MLLR (by Dave)
  • MAP adaptation (by Dave, in SphinxTrain)

30
New Features (Others)
  • New executables
  • lm_convert
  • lm3g2dmp
  • dp
  • If Evandro ask, Why do we need dp in sphinx 3?
  • Say this, I dont know, we found the executable
    at ./s3/src/misc/dp.c
  • conf
  • Off-line word-level confidence annotation program
  • Mismatch dict-LM
  • Un-match entries could be automatically generated
    (-lts_mismatch)

31
Gentle, mild and simple re-factoring (GMM
computation)
  • GMM computation is now shared among
  • decode, decode_anytopo, align, allphone
  • So e.g.
  • decode_anytopo could use fast GMM computation
  • decode could use SCHMM

32
Gentle, mild and simple re-factoring (Search)
  • Its consequence in search programming
  • FST, Flat, Tree search now share the same
    interface (decode)
  • Just like Sphinx 2 and 4
  • Writing a new search wont be replacing a search
  • 2-nd stage now works for decode
  • Alright, not for FST search

33
Gentle, mild and simple re-factoring (Others)
  • Scores output now rationalized
  • Several bug fixes causing seg faults are
    eliminated
  • Vithist.c bugs
  • Class-based LM is now working correctly
  • Command-line among applications are now
    synchronized and re-factored

34
Documentation/Tutorial
  • Hieroglyph
  • Now writing 2nd draft
  • Doxygen documentation
  • (by Evandro) Tutorial now works
  • archive_s3
  • Sphinx 2
  • Sphinx 3
  • Sphinx 4

35
Regression Testing
  • Our weakest link
  • Now daily
  • Standard regression test is done
  • Performance check on Communicator/TIDIGITs/TI46
  • doxygen documentation will be made and tested
  • make check now has 50 tests (3.5 11)
  • fairly robust to careless mistakes

36
Expected Trimmed Features
  • Search
  • Mode 0 alignment
  • (?) Mode 1 allphone
  • Mode 5 word tree copies
  • If full triphone in Ravis tree search couldnt
    be quickly, trimmed it as well
  • (?) Yitaos PCFG rescoring

37
Conclusion of Sphinx 3.X (From X5 to X6)
  • We have done something
  • Development last year
  • has enriched the code
  • Niceify a lot of things internal to code
  • There are hiccups in our development
  • Not perfect
  • Well, compare this with NASDAQ.

38
DiscussionWhat should we do now?
  • Option 1, keep on working without release
  • Option 2, merge the crazy branch with the trunk
    without release
  • Option 3, merge the crazy branch with the trunk
    and create release-candidate Sphinx 3.6 RCI

39
End
Write a Comment
User Comments (0)
About PowerShow.com