VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES - PowerPoint PPT Presentation

About This Presentation

Title:

VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES

Description:

... approach pursued by Fred Jelinek and Jim Baker at IBM T.J.Watson Research ... EMMA? SCXML? CCXML. The Evolution of the Interface. and the Research-Industry Chasm ... – PowerPoint PPT presentation

Number of Views:358

Avg rating:3.0/5.0

Slides: 29

Provided by: IBMU288

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES

1
VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING
MACHINES

Roberto Pieraccini,
CTO, Tell-Eureka Corporation
535 West 34th Street
New York, NY 10001
1 646 792 2744
roberto_at_telleureka.com
http//www.telleureka.com

2
The vision
3
Recreating the Speech Chain
4
The technology
5
Talking Machines First Steps into Spoken
Language Technology
Homer Dudley Bell Labs (1939)
6
Speech Recognition the Early Years

1952 Automatic Digit Recognition (AUDREY)
Davis, Biddulph, Balashek (Bell Laboratories)

7
1960s Speech Processing and Digital Computers

AD/DA converters and digital computers start
appearing in the labs

James Flanagan Bell Laboratories
8
The Illusion of Segmentation... or...
Why Speech Recognition is so Difficult
(userRoberto (attributetelephone-num
value7360474))
9
The Illusion of Segmentation... or...
Why Speech Recognition is so Difficult
(userRoberto (attributetelephone-num
value7360474))
errors
rules
errors
rules
errors
rules
errors
rules
10
1969 Whither Speech Recognition?

General purpose speech recognition seems far
away. Social-purpose speech recognition is
severely limited. It would seem appropriate for
people to ask themselves why they are working in
the field and what they can expect to accomplish.
It would be too simple to say that work in
speech recognition is carried out simply because
one can get money for it. That is a necessary but
no sufficient condition. We are safe in asserting
that speech recognition is attractive to money.
The attraction is perhaps similar to the
attraction of schemes for turning water into
gasoline, extracting gold from the sea, curing
cancer, or going to the moon. One doesnt attract
thoughtlessly given dollars by means of schemes
for cutting the cost of soap by 10. To sell
suckers, one uses deceit and offers glamour.
Most recognizers behave, not like scientists,
but like mad inventors or untrustworthy
engineers. The typical recognizer gets it into
his head that he can solve the problem. The
basis for this is either individual inspiration
(the mad inventor source of knowledge) or
acceptance of untested rules, schemes, or
information (the untrustworthy engineer
approach).
The Journal of the Acoustical Society of America,
June 1969

11
1971-1976 The ARPA SUR project

In spite of the anti-speech recognition campaign
headed by the Pierce Commission ARPA launches
into a 5 year program on Spoken Understanding
Research
REQUIREMENTS 1000 word vocabulary,
90understanding rate, near real time on a 100
MIPS machine
4 Systems built by the end of the program
SDC (24)
BBNs HWIM (44)
CMUs Hearsay II (74)
CMUs HARPY (95 -- 80 times real time!)
HARPY was based on an engineering approach
search on a network representing all the possible
utterances
Lack of a scientific evaluation approach
Speech Understanding too early for its timeThe
project was not extended.

LESSON LEARNED Hand-built knowledge does not
scale up Need of a global optimization criterion
Raj Reddy -- CMU
12
Vintage Speech Recognition
13
1970s Dynamic Time WarpingThe Brute Force of
the Engineering Approach
T.K. Vyntsyuk (1969) H. Sakoe, S. Chiba
(1970)
TEMPLATE (WORD 7)
UNKNOWN WORD
14
1980s -- The Statistical Approach

Based on work on Hidden Markov Models done by
Leonard Baum at IDA, Princeton in the late 1960s
Purely statistical approach pursued by Fred
Jelinek and Jim Baker at IBM T.J.Watson Research
Foundations of modern speech recognition engines

Jim Baker

No Data Like More Data
Whenever I fire a linguist, our system
performance improves (1988)
Some of my best friends are linguists (2004)

15
1980-1990 The statistical approach becomes
ubiquitous

Lawrence Rabiner, A Tutorial on Hidden Markov
Models and Selected Applications in Speech
Recognition, Proceeding of the IEEE, Vol. 77, No.
2, February 1989.

16
1980s-1990s The Power of Evaluation
SPOKEN DIALOG INDUSTRY
SPEECHWORKS
NUANCE
Pros and Cons of DARPA programs Continuous
incremental improvement - Loss of bio-diversity
17
The business of speech
18
Voice User Interface (VUI) Designthe Quantum
Leap in Dialog Systems

1995 -- The WildFire Effect
Change of perspective From technology driven to
user centered
RESEARCH Natural Language free form
Commercial Task completion and usability.
Persona the personality of the application (TTS
vs. Recording)
Speech recognition accuracy is important, but
success is determined by the VUI.
The importance of a repeatable, streamlined,
teachable, development process

19
The Speech Application Lifecycle
20
Voice User Interface Design
Get Amount Get Amount Interaction Module Interaction Module
PROMPTS PROMPTS PROMPTS PROMPTS
Type Wording Wording Source
Initial Please say the amount you would like to transfer from your Please say the amount you would like to transfer from your get_amount_I_1.wav
Initial ltorigin-accountgt ltorigin-accountgt TTS
Initial to your to your get_amount_I_2.wav
Initial ltdestination-accountgt ltdestination-accountgt TTS
Initial in dollars and cents. in dollars and cents. get_amount_I_3.wav
Retry 1 Please say the amount you would like to transfer from your Please say the amount you would like to transfer from your get_amount_I_1.wav
Retry 1 ltorigin-accountgt ltorigin-accountgt TTS
Retry 1 to your to your get_amount_I_2.wav
Retry 1 ltdestination-accountgt ltdestination-accountgt TTS
Retry 1 in dollars and cents. in dollars and cents. get_amount_I_3.wav
Retry 2 Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_R_2_1.wav
Timeout 1 I'm sorry, I didn't hear you. I'm sorry, I didn't hear you. get_amount_T_1_1.wav
Timeout 1 Please say the amount you would like to transfer from your Please say the amount you would like to transfer from your get_amount_I_1.wav
Timeout 1 ltorigin-accountgt ltorigin-accountgt TTS
Timeout 1 to your to your get_amount_I_2.wav
Timeout 1 ltdestination-accountgt ltdestination-accountgt TTS
Timeout 2 I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_T_2_1.wav
Help Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents. Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents. get_amount_H.wav
ACTIONS ACTIONS ACTIONS ACTIONS
CONDITION CONDITION CONDITION ACTION
if amount greater than amount in ltorigin-accountgt if amount greater than amount in ltorigin-accountgt if amount greater than amount in ltorigin-accountgt Go to "Play Wrong Amount Message"
else else else Go to "Play Confirmation"
21
Speech Science Tuning for performance
22
Speech Science Tuning for performance
utt sub-err fa-err fr-err rej OOV fa-oov
WaitPowerBothUp-2 17 5.88 0 0 5.88 5.88 0
WaitHowMuchSnow 17 5.88 11.76 5.88 23.53 29.41 40
MissingOneChannel 22 4.55 0 0 9.09 9.09 0
WPAllChannels 23 4.35 0 4.35 8.7 4.35 0
PictureBack 27 3.7 3.7 3.7 7.41 7.41 50
WaitFindInputSource 29 3.45 0 0 13.79 13.79 0
PictureProb 33 3.03 12.12 0 0 12.12 100
DM
ACTION
Utt Number of utterances Sub-err percent of
in-voc utterances wrongly recognized Fa-err
percent of utterances wrongly accepted Fr-err
percent of utterances wrongly rejected Rej
total percent of all utterances rejected OOV
percent of out-voc utterances Fa-oov percent
of out-voc utterances wrongly accepted

Prioritize grammars that need improvement
Use transcriptions to improve grammars

23
The Architectural Evolution of Spoken Dialog
1994
1998
2000
2005
Native Code
Proprietary IVR Systems
Standard Clients (VoiceXML)
Standard Application servers
24
The Voice Web
SCXML?
EMMA?
Web Server
Telephony Platform
Voice Browser
Internet
ASR
TTS
VoiceXML /SALT
Telephone
CCXML
25
The Evolution of the Interface
and the Research-Industry Chasm
Natural Language
Research Systems a-la DARPA Communicator
Directed Dialog
1994
1996
1998
2000
2002
2004
2006
26
The evolution of the market and the industry

600 to 1,000M
revenue
gt 8000 apps worldwide

HOSTING
APPLICATION DEVELOPERS PROFESSIONAL SERVICES
TOOLS AUTHORING, TUNING, PREPACKAGED
APPLICATIONS
New evolving standards guarantee interoperability
of engines and platforms.
PLATFORM INTEGRATORS IVR, VoiceXML, CTI,
TECHNOLOGY VENDORS SPEECH RECOGNITION, TTS
27
Third generation dialog systems
1st Generation INFORMATIONAL
2nd Generation TRANSACTIONAL
3RD Generation PROBLEM SOLVING
BANKING
CUSTOMER CARE
PACKAGE TRACKING
STOCK TRADING
TECHNICAL SUPPORT
FLIGHT STATUS
FLIGHT/TRAINRESERVATION
LOW
MEDIUM
HIGH
COMPLEXITY
28
2005 -- Spoken Dialog goes to Saturday Night Live

Write a Comment

User Comments (0)