CS 378 Natural Language Processing *** Speech Processing: Present, Past and Future - PowerPoint PPT Presentation

About This Presentation
Title:

CS 378 Natural Language Processing *** Speech Processing: Present, Past and Future

Description:

Practical Overview of current applications and their future directions: ... dirty politics (L & H) mergers and buyouts start (still ongoing today) ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 16
Provided by: ingedeb
Category:

less

Transcript and Presenter's Notes

Title: CS 378 Natural Language Processing *** Speech Processing: Present, Past and Future


1
CS 378Natural Language ProcessingSpeech
Processing Present, Past and Future
  • Inge M. R. De Bleecker
  • Department of Linguistics
  • inge_at_mail.utexas.edu
  • October 14, 2003

2
Main Types of NLP Applications 2
  • Text processing information retrieval and search
    engines, information extraction, text
    summarization, machine translation,
    question-answering
  • Speech processing speech recognition (ASR)
    over-the-telephone (OTT) and dictation systems
    (desktop), speaker verification, text-to-speech
    (TTS)

3
Overview 3
  • Speech industry history (late 80s to present)
  • Practical Overview of current applications and
    their future directions
  • Speech recognition accuracy
  • Text-to-speech accuracy
  • Usability and design
  • Application building tools
  • Working in the speech industry

4
History Late Eighties 4
Sentiment OTT ASR finally ready for commercial
applications Technology OTT speaker-independent
discrete digits/yesno apps. Word-based language
models. TTS mostly used for numbers, if used at
all. Pre-recorded strings much more
common. Applications simple in structure and
functionality OTT banking, e.g. ask for
account balance. Desktop first dictation
systems, medical applications. Companies few.
Small research-oriented companies, or research
arms of big companies. E.g. Dragon Systems, VPC,
VCS, Kurzweil, BBN, ATT,
5
History Early Nineties 5
Sentiment credibility and usability of apps
grows. Multilingual developments. Technology
OTT SI continuous digits/yesno/command word apps.
Move to phoneme-based language models.
Applications OTT still simple,
system-directed dialog (vs user-directed,
mixed-initiative) Desktop more dictation
systems, command and control systems
(user-directed) Companies more companies pop
up. Most grow out of research communities.
6
History Mid to Late Nineties 6
  • Technology maturing of technologies used.
  • Companies
  • overall growth
  • dirty politics (L H)
  • mergers and buyouts start (still ongoing today)

7
History Late Nineties to Present 7
  • Technology maturing of technology continues
  • better recognition accuracy
  • unrestricted ASR input (natural speech)
  • move to more sophisticated dialog systems (see
    next slide)
  • tool standardization
  • Applications Wider use of apps. More attention
    to usability, dialog design, etc

8
Dialog System Architecture 8
ASR
Parser
Reasoning
Output Generation
TTS
9
Speech Recognition Accuracy 9
  • Present reasonable accuracy on natural speech.
    Most systems still use grammar to help
    recognizer. Grammars are written in VoiceXML or
    vendor-specific language, not very sophisticated
    from a linguistics point of view. Some systems
    are (theoretically) purely statistical. E.g.
    Nuances Accuroute.
  • Future need to add more linguistic principles to
    current statistic methods. Make signal processing
    more robust, encourage reusability.

10
TTS Accuracy 10
  • Present getting better all the time. During the
    last few years, additional research in prosody,
    intonation has paid off. More naturally sounding
    speech. Also deals with abbreviations, etc.
    Current TTS can be used to patch up real
    speech. E.g. ATT, Scansoft (Speechworks).
  • Future probably never a complete substitute for
    pre-recorded strings.

11
Usability Dialog Design 11
  • Present
  • Dialog design (VUI) is becoming more
    sophisticated through
  • use of natural speech input
  • mixed-initiative dialogs (more complicated for
    novice users)
  • chatty applications which provide gracious ways
    of dealing with low accuracy confirmations and
    errors, fall-back to system-directed dialog,
  • use of persona e.g. Bell Canadas Emily
  • Future
  • Continued improvements in dialog design are
    necessary (e.g. usability studies).
  • Dialog design is easier with current (and future)
    tools, but still an art!
  • It is (too) easy to design bad speech
    applications

12
Usability Other Issues 12
  • Present
  • Natural language generation (NLG) is not
    receiving much attention
  • Reasoning components very limited
  • Future
  • NLG needs to adapt to user, conform more to human
    speech patterns
  • multimodal applications
  • multilingual systems
  • use of e.g. ontologies in reasoning components,

13
Application Building Tools 13
  • Present
  • Standardization VoiceXML and VoiceXML platforms
    (alternative SALT)
  • Many platform companies VoiceGenie, Bevocal,
    Audium,
  • Also companies developing tools for platforms
    Aptera
  • VoiceXML
  • World of VoiceXML comprehensive site on all
    things VoiceXML
  • Free developers resources e.g. Bevocal
  • Small companies can have voicexml app hosted by
    a platform company
  • Big companies in-house platforms (telco-industry
    grade equipment), quite costly
  • Future
  • Development of better tools, that make it harder
    to build bad applications!

14
Speech Apps State-of-the-Art 14
  • Conclusion
  • ASR and TTS are usable in real-world applications
    right now.
  • To develop better applications, we need to
    improve accuracy, usability, etc or
  • think about some radically different approaches
    to the current problems! (gt the age-old
    argument)

15
Working in the Speech Industry 15
Working for A speech recognition/text-to-speech
company a CS undergraduate can work on software
development of tools, deployments. With addition
of some linguistics classes dialog designer, QA
of deployments, A VoiceXml platform company
general software development, A tools company
general software development, A consulting
(services) company dialog design, deployments.
Or Get a Ph.D. in EE and become a speech
scientist who develops the next generation speech
recognizer
Write a Comment
User Comments (0)
About PowerShow.com