CS 378 Natural Language Processing *** Speech Processing: Present, Past and Future

About This Presentation

Title:

CS 378 Natural Language Processing * Speech Processing: Present, Past and Future**

Description:

Practical Overview of current applications and their future directions: ... dirty politics (L & H) mergers and buyouts start (still ongoing today) ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 16

Provided by: ingedeb

Learn more at: https://www.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 378 Natural Language Processing *** Speech Processing: Present, Past and Future

1
CS 378Natural Language ProcessingSpeech
Processing Present, Past and Future

Inge M. R. De Bleecker
Department of Linguistics
inge_at_mail.utexas.edu
October 14, 2003

2
Main Types of NLP Applications 2

Text processing information retrieval and search
engines, information extraction, text
summarization, machine translation,
question-answering
Speech processing speech recognition (ASR)
over-the-telephone (OTT) and dictation systems
(desktop), speaker verification, text-to-speech
(TTS)

3
Overview 3

Speech industry history (late 80s to present)
Practical Overview of current applications and
their future directions
Speech recognition accuracy
Text-to-speech accuracy
Usability and design
Application building tools
Working in the speech industry

4
History Late Eighties 4
Sentiment OTT ASR finally ready for commercial
applications Technology OTT speaker-independent
discrete digits/yesno apps. Word-based language
models. TTS mostly used for numbers, if used at
all. Pre-recorded strings much more
common. Applications simple in structure and
functionality OTT banking, e.g. ask for
account balance. Desktop first dictation
systems, medical applications. Companies few.
Small research-oriented companies, or research
arms of big companies. E.g. Dragon Systems, VPC,
VCS, Kurzweil, BBN, ATT,
5
History Early Nineties 5
Sentiment credibility and usability of apps
grows. Multilingual developments. Technology
OTT SI continuous digits/yesno/command word apps.
Move to phoneme-based language models.
Applications OTT still simple,
system-directed dialog (vs user-directed,
mixed-initiative) Desktop more dictation
systems, command and control systems
(user-directed) Companies more companies pop
up. Most grow out of research communities.
6
History Mid to Late Nineties 6

Technology maturing of technologies used.
Companies
overall growth
dirty politics (L H)
mergers and buyouts start (still ongoing today)

7
History Late Nineties to Present 7

Technology maturing of technology continues
better recognition accuracy
unrestricted ASR input (natural speech)
move to more sophisticated dialog systems (see
next slide)
tool standardization
Applications Wider use of apps. More attention
to usability, dialog design, etc

8
Dialog System Architecture 8
ASR
Parser
Reasoning
Output Generation
TTS
9
Speech Recognition Accuracy 9

Present reasonable accuracy on natural speech.
Most systems still use grammar to help
recognizer. Grammars are written in VoiceXML or
vendor-specific language, not very sophisticated
from a linguistics point of view. Some systems
are (theoretically) purely statistical. E.g.
Nuances Accuroute.
Future need to add more linguistic principles to
current statistic methods. Make signal processing
more robust, encourage reusability.

10
TTS Accuracy 10

Present getting better all the time. During the
last few years, additional research in prosody,
intonation has paid off. More naturally sounding
speech. Also deals with abbreviations, etc.
Current TTS can be used to patch up real
speech. E.g. ATT, Scansoft (Speechworks).
Future probably never a complete substitute for
pre-recorded strings.

11
Usability Dialog Design 11

Present
Dialog design (VUI) is becoming more
sophisticated through
use of natural speech input
mixed-initiative dialogs (more complicated for
novice users)
chatty applications which provide gracious ways
of dealing with low accuracy confirmations and
errors, fall-back to system-directed dialog,
use of persona e.g. Bell Canadas Emily
Future
Continued improvements in dialog design are
necessary (e.g. usability studies).
Dialog design is easier with current (and future)
tools, but still an art!
It is (too) easy to design bad speech
applications

12
Usability Other Issues 12

Present
Natural language generation (NLG) is not
receiving much attention
Reasoning components very limited
Future
NLG needs to adapt to user, conform more to human
speech patterns
multimodal applications
multilingual systems
use of e.g. ontologies in reasoning components,

13
Application Building Tools 13

Present
Standardization VoiceXML and VoiceXML platforms
(alternative SALT)
Many platform companies VoiceGenie, Bevocal,
Audium,
Also companies developing tools for platforms
Aptera
VoiceXML
World of VoiceXML comprehensive site on all
things VoiceXML
Free developers resources e.g. Bevocal
Small companies can have voicexml app hosted by
a platform company
Big companies in-house platforms (telco-industry
grade equipment), quite costly
Future
Development of better tools, that make it harder
to build bad applications!

14
Speech Apps State-of-the-Art 14

Conclusion
ASR and TTS are usable in real-world applications
right now.
To develop better applications, we need to
improve accuracy, usability, etc or
think about some radically different approaches
to the current problems! (gt the age-old
argument)

15
Working in the Speech Industry 15
Working for A speech recognition/text-to-speech
company a CS undergraduate can work on software
development of tools, deployments. With addition
of some linguistics classes dialog designer, QA
of deployments, A VoiceXml platform company
general software development, A tools company
general software development, A consulting
(services) company dialog design, deployments.
Or Get a Ph.D. in EE and become a speech
scientist who develops the next generation speech
recognizer

Write a Comment

User Comments (0)