Sphinx 3.4 Development Progress Report in February - PowerPoint PPT Presentation

About This Presentation
Title:

Sphinx 3.4 Development Progress Report in February

Description:

CALO and S3.5 Development. Which features should be there to ... (In Phoenix) Better semantic parsing. Resource Acquisition and ... Phoenix's source code is ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 19
Provided by: Arthu61
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Sphinx 3.4 Development Progress Report in February


1
Sphinx 3.4 DevelopmentProgress Report in February
  • Arthur Chan, Jahanzeb Sherwani
  • Carnegie Mellon University
  • Mar 1, 2004

2
This Presentation
  • S3.4 Development Progress
  • Speed-up
  • Language Model facilities
  • CALO and S3.5 Development
  • Which features should be there to make CALO
    better?
  • Schedule for next three months

3
Review of Last Month Progress
  • Last month
  • Wrote a speed-up version of s3.
  • Completed some coding of s3.4 speed-up task.
  • This month
  • Backbone of speed-up functionalities s3.4
    completed and tested.
  • Basic LM facilities completed and smoked-tested.

4
Current Systems Specifications(without Gaussian
Selection)
5
Speed-up Facilities in s3.3
GMM Computation
Seach
Lexicon Structure
Tree.
Pruning
Standard
Heuristic Search Speed-up
Not Implemented
Frame-Level
Not implemented
Senone-Level
Not implemented
Gaussian-Level
SVQ-based GMM Selection Sub-vector constrained
to 3
Component-Level
SVQ code removed
6
Speed-up Facilities in s3.4
GMM Computation
Seach
Lexicon Structure
Tree
Pruning
(New) Improved Word-end Pruning
Heuristic Search Speed-up
(New) Phoneme-Look-ahead
Frame-Level
(New) Naïve Down-Sampling (New) Conditional
Down-Sampling
Senone-Level
(New) CI-based GMM Selection
Gaussian-Level
(New) VQ-based GMM Selection (New) Unconstrained
no. of sub-vectors in SVQ-based GMM Selection
Component-Level
(New) SVQ code enabled
7
S3.4 Speed Performance in Communicator Task
8
Issues in Speed Optimization
  • Implementation Issues
  • Beams applied on GMM causing many techniques hard
    to be implemented
  • Some facilities were hardwired for specific
    purpose.
  • Performance Issues
  • Each techniques reduced computation by 40-50
    with lt5 degradation.
  • However, they didnt add-up
  • Reduction in computation has certain lower bound
    (usually 75-80 reduction is max.)
  • Overhead is huge in some techniques
  • E.g. VQ-based Gaussian Selection take 0.25xRT

9
Language Model Facilities
  • S3.3 only accept single LM without class in
    binary format
  • So far, S3.4 is able to accept multiple
    class-based LMs in binary format.
  • One major modification of codes
  • Affect 6-7 files.
  • Caveats
  • Not perfect implementation.
  • Text format is not yet supported. Backward
    compatibility is an issue.
  • Lack of test-cases. Only slightly smoke-tested
  • 1 more week work

10
Problems with s3.4 (valid for Feb 29th, 2004)
  • Only accept DMP file.
  • Txt format reader is very complex in Sphinx 2.
  • Straight conversion is not clean.
  • LMs are all loaded into memory
  • We can work on this.
  • Lexical tree are all built at the beginning
  • We tried to avoid the overhead of rebuilding tree
    in every utterance.

11
Summary in Sphinx 3.4 Development
  • Derivative s3.3
  • With Speed Optimization
  • Better LM facilities
  • Algorithmic Optimization is 90 completed
  • Still need to improve overhead performance.
    Tree-based GMM selection is desirable.
  • Improvement for individual technique.
  • Go-through the major hurdle of multiple LMs and
    class-based LMs.
  • Need more time to make it more stable.
  • Expected internal release time March 8, 2004

12
Sphinx 3.4 and CALO
  • Which pieces are missing?
  • Sphinx 3.4s decoding is still not streamlined gt
    Continuous Listening is not yet enabled.
  • Sphinxs speed may still not be ideal.
  • From s3 to s3.3, 10 degradation.
  • Sphinx 3.4 doesnt learn from data yet.

13
Sphinx 3.5. What should we do in next 3 months?
  • Expected release time (May June)
  • Interfaces
  • Streamlined front-end and decoding
  • (?) Portaudio based audio routine.
  • Speed/Accuracy
  • Improved lexical tree search
  • Machine optimization of Gaussian computation.
  • Combination of multiple recognizers
  • Learning
  • Acoustic Model adaptation
  • (?) Language Model adaptation
  • (In Phoenix) Better semantic parsing
  • Resource Acquisition and Load Balancing

14
Highlight I Speed/Accuracy
  • Improved lexical tree search
  • Current implementation used single lexical tree.
  • May be desirable to create tree copies.
  • Machine Optimization of Gaussian Computation
  • SIMD (Single Implementation Multiple Data)
  • Require help from assembly language experts.
    (Jason/Thomas)

15
Highlight II Multiple Recognizer Combination and
Resource Acquisition
  • Research by Rong suggests combination of multiple
    recognizer can improve accuracy
  • Speed worsen by 100 if we run two recognizers.
  • An interesting solution
  • Computation can be shared by other machines in
    the meeting.
  • Inspired by routing implementation.
  • A very natural solution in meeting scenario
    because usually only one person will be speaking.
  • Challenges Bandwidth and Load Balancing

16
Highlight III
  • Learning
  • Acoustic Model
  • Maximum Likely Linear Regression (MLLR)
  • Will be responsible by Jahanzeb
  • (?)Language Model
  • How?
  • Cached-based LM?
  • (?)Improved Robust Parsing
  • Better parsing based on previous command history
  • ? Phoenixs source code is not easy to trace
  • Thomas Harriss implementation may be a good
    place to start.

17
Arthur and Jahanzebs Proposed Schedule
18
Cont.
Write a Comment
User Comments (0)
About PowerShow.com