VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES - PowerPoint PPT Presentation

About This Presentation
Title:

VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES

Description:

... approach pursued by Fred Jelinek and Jim Baker at IBM T.J.Watson Research ... EMMA? SCXML? CCXML. The Evolution of the Interface. and the Research-Industry Chasm ... – PowerPoint PPT presentation

Number of Views:358
Avg rating:3.0/5.0
Slides: 29
Provided by: IBMU288
Category:

less

Transcript and Presenter's Notes

Title: VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES


1
VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING
MACHINES
  • Roberto Pieraccini,
  • CTO, Tell-Eureka Corporation
  • 535 West 34th Street
  • New York, NY 10001
  • 1 646 792 2744
  • roberto_at_telleureka.com
  • http//www.telleureka.com

2
The vision
3
Recreating the Speech Chain
4
The technology
5
Talking Machines First Steps into Spoken
Language Technology
Homer Dudley Bell Labs (1939)
6
Speech Recognition the Early Years
  • 1952 Automatic Digit Recognition (AUDREY)
  • Davis, Biddulph, Balashek (Bell Laboratories)

7
1960s Speech Processing and Digital Computers
  • AD/DA converters and digital computers start
    appearing in the labs

James Flanagan Bell Laboratories
8
The Illusion of Segmentation... or...
Why Speech Recognition is so Difficult
(userRoberto (attributetelephone-num
value7360474))
9
The Illusion of Segmentation... or...
Why Speech Recognition is so Difficult
(userRoberto (attributetelephone-num
value7360474))
errors
rules
errors
rules
errors
rules
errors
rules
10
1969 Whither Speech Recognition?
  • General purpose speech recognition seems far
    away. Social-purpose speech recognition is
    severely limited. It would seem appropriate for
    people to ask themselves why they are working in
    the field and what they can expect to accomplish.
  • It would be too simple to say that work in
    speech recognition is carried out simply because
    one can get money for it. That is a necessary but
    no sufficient condition. We are safe in asserting
    that speech recognition is attractive to money.
    The attraction is perhaps similar to the
    attraction of schemes for turning water into
    gasoline, extracting gold from the sea, curing
    cancer, or going to the moon. One doesnt attract
    thoughtlessly given dollars by means of schemes
    for cutting the cost of soap by 10. To sell
    suckers, one uses deceit and offers glamour.
  • Most recognizers behave, not like scientists,
    but like mad inventors or untrustworthy
    engineers. The typical recognizer gets it into
    his head that he can solve the problem. The
    basis for this is either individual inspiration
    (the mad inventor source of knowledge) or
    acceptance of untested rules, schemes, or
    information (the untrustworthy engineer
    approach).
  • The Journal of the Acoustical Society of America,
    June 1969

11
1971-1976 The ARPA SUR project
  • In spite of the anti-speech recognition campaign
    headed by the Pierce Commission ARPA launches
    into a 5 year program on Spoken Understanding
    Research
  • REQUIREMENTS 1000 word vocabulary,
    90understanding rate, near real time on a 100
    MIPS machine
  • 4 Systems built by the end of the program
  • SDC (24)
  • BBNs HWIM (44)
  • CMUs Hearsay II (74)
  • CMUs HARPY (95 -- 80 times real time!)
  • HARPY was based on an engineering approach
  • search on a network representing all the possible
    utterances
  • Lack of a scientific evaluation approach
  • Speech Understanding too early for its timeThe
    project was not extended.

LESSON LEARNED Hand-built knowledge does not
scale up Need of a global optimization criterion
Raj Reddy -- CMU
12
Vintage Speech Recognition
13
1970s Dynamic Time WarpingThe Brute Force of
the Engineering Approach
T.K. Vyntsyuk (1969) H. Sakoe, S. Chiba
(1970)
TEMPLATE (WORD 7)
UNKNOWN WORD
14
1980s -- The Statistical Approach
  • Based on work on Hidden Markov Models done by
    Leonard Baum at IDA, Princeton in the late 1960s
  • Purely statistical approach pursued by Fred
    Jelinek and Jim Baker at IBM T.J.Watson Research
  • Foundations of modern speech recognition engines

Jim Baker
  • No Data Like More Data
  • Whenever I fire a linguist, our system
    performance improves (1988)
  • Some of my best friends are linguists (2004)

15
1980-1990 The statistical approach becomes
ubiquitous
  • Lawrence Rabiner, A Tutorial on Hidden Markov
    Models and Selected Applications in Speech
    Recognition, Proceeding of the IEEE, Vol. 77, No.
    2, February 1989.

16
1980s-1990s The Power of Evaluation
SPOKEN DIALOG INDUSTRY
SPEECHWORKS
NUANCE
Pros and Cons of DARPA programs Continuous
incremental improvement - Loss of bio-diversity
17
The business of speech
18
Voice User Interface (VUI) Designthe Quantum
Leap in Dialog Systems
  • 1995 -- The WildFire Effect
  • Change of perspective From technology driven to
    user centered
  • RESEARCH Natural Language free form
  • Commercial Task completion and usability.
  • Persona the personality of the application (TTS
    vs. Recording)
  • Speech recognition accuracy is important, but
    success is determined by the VUI.
  • The importance of a repeatable, streamlined,
    teachable, development process

19
The Speech Application Lifecycle
20
Voice User Interface Design
Get Amount Get Amount Interaction Module Interaction Module
PROMPTS PROMPTS PROMPTS PROMPTS
Type Wording Wording Source
Initial Please say the amount you would like to transfer from your Please say the amount you would like to transfer from your get_amount_I_1.wav
Initial ltorigin-accountgt ltorigin-accountgt TTS
Initial to your to your get_amount_I_2.wav
Initial ltdestination-accountgt ltdestination-accountgt TTS
Initial in dollars and cents. in dollars and cents. get_amount_I_3.wav
Retry 1 Please say the amount you would like to transfer from your Please say the amount you would like to transfer from your get_amount_I_1.wav
Retry 1 ltorigin-accountgt ltorigin-accountgt TTS
Retry 1 to your to your get_amount_I_2.wav
Retry 1 ltdestination-accountgt ltdestination-accountgt TTS
Retry 1 in dollars and cents. in dollars and cents. get_amount_I_3.wav
Retry 2 Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_R_2_1.wav
Timeout 1 I'm sorry, I didn't hear you. I'm sorry, I didn't hear you. get_amount_T_1_1.wav
Timeout 1 Please say the amount you would like to transfer from your Please say the amount you would like to transfer from your get_amount_I_1.wav
Timeout 1 ltorigin-accountgt ltorigin-accountgt TTS
Timeout 1 to your to your get_amount_I_2.wav
Timeout 1 ltdestination-accountgt ltdestination-accountgt TTS
Timeout 2 I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_T_2_1.wav
Help Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents. Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents. get_amount_H.wav
ACTIONS ACTIONS ACTIONS ACTIONS
CONDITION CONDITION CONDITION ACTION
if amount greater than amount in ltorigin-accountgt if amount greater than amount in ltorigin-accountgt if amount greater than amount in ltorigin-accountgt Go to "Play Wrong Amount Message"
else else else Go to "Play Confirmation"
21
Speech Science Tuning for performance
22
Speech Science Tuning for performance
utt sub-err fa-err fr-err rej OOV fa-oov
WaitPowerBothUp-2 17 5.88 0 0 5.88 5.88 0
WaitHowMuchSnow 17 5.88 11.76 5.88 23.53 29.41 40
MissingOneChannel 22 4.55 0 0 9.09 9.09 0
WPAllChannels 23 4.35 0 4.35 8.7 4.35 0
PictureBack 27 3.7 3.7 3.7 7.41 7.41 50
WaitFindInputSource 29 3.45 0 0 13.79 13.79 0
PictureProb 33 3.03 12.12 0 0 12.12 100
DM
ACTION
Utt Number of utterances Sub-err percent of
in-voc utterances wrongly recognized Fa-err
percent of utterances wrongly accepted Fr-err
percent of utterances wrongly rejected Rej
total percent of all utterances rejected OOV
percent of out-voc utterances Fa-oov percent
of out-voc utterances wrongly accepted
  • Prioritize grammars that need improvement
  • Use transcriptions to improve grammars

23
The Architectural Evolution of Spoken Dialog
1994
1998
2000
2005
Native Code
Proprietary IVR Systems
Standard Clients (VoiceXML)
Standard Application servers
24
The Voice Web
SCXML?
EMMA?
Web Server
Telephony Platform
Voice Browser
Internet
ASR
TTS
VoiceXML /SALT
Telephone
CCXML
25
The Evolution of the Interface
and the Research-Industry Chasm
Natural Language
Research Systems a-la DARPA Communicator
Directed Dialog
1994
1996
1998
2000
2002
2004
2006
26
The evolution of the market and the industry
  • 600 to 1,000M
  • revenue
  • gt 8000 apps worldwide

HOSTING
APPLICATION DEVELOPERS PROFESSIONAL SERVICES
TOOLS AUTHORING, TUNING, PREPACKAGED
APPLICATIONS
New evolving standards guarantee interoperability
of engines and platforms.
PLATFORM INTEGRATORS IVR, VoiceXML, CTI,
TECHNOLOGY VENDORS SPEECH RECOGNITION, TTS
27
Third generation dialog systems
1st Generation INFORMATIONAL
2nd Generation TRANSACTIONAL
3RD Generation PROBLEM SOLVING
BANKING
CUSTOMER CARE
PACKAGE TRACKING
STOCK TRADING
TECHNICAL SUPPORT
FLIGHT STATUS
FLIGHT/TRAINRESERVATION
LOW
MEDIUM
HIGH
COMPLEXITY
28
2005 -- Spoken Dialog goes to Saturday Night Live
Write a Comment
User Comments (0)
About PowerShow.com