Detecting Misunderstandings in the CMU Communicator Spoken Dialog System - PowerPoint PPT Presentation

About This Presentation

Title:

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System

Description:

Detecting Misunderstandings in the. CMU Communicator Spoken Dialog System ... Air-travel planning (Communicator) Weather info over the phone (Jupiter) ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 39

Provided by: danb7

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Detecting Misunderstandings in the CMU Communicator Spoken Dialog System

1
Detecting Misunderstandings in the CMU
Communicator Spoken Dialog System

Presented by Dan Bohus
Joint work with Paul Carpenter, Chun Jin,
Daniel Wilson, Rong Zhang,
Alex Rudnicky
Carnegie Mellon University 2002

2
Whats a Spoken Dialog System ?

Human talking to a computer
Taking turns in a goal-oriented dialog

3
Why Spoken Language Interfaces ?

Speech advantages and problems
?Speech is the natural communication modality for
humans
?Can easily express fairly complex structures
?Works well in hands- or eyes-busy situations
?Serial channel
?It is still an unreliable channel

4
Sample Spoken Dialog Systems

Interactive Voice Response systems (IVR)
Information Access Systems
Air-travel planning (Communicator)
Weather info over the phone (Jupiter)
E-mail access over the phone (ELVIS)
UA Baggage claims (Simon)
Other Systems guidance, personal assistants,
taskable agents, etc.

5
A Look Under the Hood

S Where are you flying from ?
U from London to Paris and then on to Toronto
D from London to Paris on then on to go on to
SI depart_location London, arrive_location
Paris
SO query depart_time
NL And when do you want to leave ?
S And when do you want to leave ?

6
Roadmap

Intro to Spoken Dialog Systems
? The Problem Misunderstandings
A Learning Solution
Experiments and Results
Conclusion

7
Speech Recognition

Speech Recognition is the main drive behind the
development of SDS.
But it is problematic
Input signal quality
Accents, Non-native speakers
Spoken language disfluencies stutters,
false-starts, /mm/, /um/
Typical Word Error Rates 20-30

8
Recognition Errors Impact

Errors propagate to upper levels
compromise the parse ? non-understandings
propagate to Dialog Control level ?
misunderstandings, which can have a higher cost,
if acted upon.

9
Misunderstandings in Action !

S Are you a registered user ?
U No I'm not. No NO I'M NOT NO
S What is your full name ?
U Adam Schumacher ATHENS IN AKRON
S There is an Athens in Georgia and in Greece.
Which destination did you want ?
U Neither NEITHER
S I'm sorry, I didn't catch that. There is an
Athens in Georgia and in Greece. Which
destination did you want ?
U Georgia GEORGIA
S A flight from Athens... Where do you want to
go ?
U Start over START OVER
S Do you really want to start over ?
U Yes YES
S What city are you leaving from ?
U Hamilton, Ontario HILTON ONTARIO
S Sorry, I'm not sure I understood what you
said. Where are you leaving from ?
U Hamilton HILTON
S I'm still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from ?
U Toronto TORONTO

10
Addressing the Problem

Wait for SR technology to reach better
performance.
?Increase the robustness of systems when faced
with poor recognition
Detect Misunderstandings
Use Recovery Techniques

11
Problem Formulation

Given an input utterance, and the current state
of the system, detect whether it was correctly
perceived by the system or not.(confidence
annotation problem)

12
Roadmap

Intro to Spoken Dialog Systems
The Problem Detecting Misunderstandings
? A Learning Solution
Experiments and Results
Conclusion

13
A Classification Task

Cast the problem as a classification task
Heuristic approach
Garble rule previously used in Communicator
Data-driven (learning) approach

14
A Data-Driven Approach

Machine learning approach
Learn to classify from a labeled training corpus
Use it to classify new instances

Features
Classifier(Learn Mode)
GOOD/BAD
Features
GOOD/BAD
Classifier
15
Ingredients

Three ingredients needed for a machine learning
approach
Corpus of labeled data to use for training
Identify a set of relevant features
Choose a classification technique

16
Roadmap

Intro to Spoken Dialog Systems
The Problem Misunderstandings
A Learning Solution
? Training corpus
Features
Classification techniques
Experiments and Results
Conclusion

17
Corpus Sources

Collected 2 months of sessions
October and November 1999
About 300 sessions
Both developer and outsider calls
Eliminated conversations with lt 5 turns
Developers calling to check if system is on-line
Wrong number calls

18
Corpus Structure

The Logs
Generated automatically by various system modules
Serve as a source of features for classification
(also contain the decoded utterances)
The Transcripts (the actual utterances)
Performed and double-checked by a human annotator
Provide a basis for labeling

19
Corpus Labeling

Labeling was done at the concept level.
Four possible labels
OK The concept is okay
RBAD Recognition is bad
PBAD Parse is bad
OOD Out of domain
Aggregate utterance labels generated
automatically.

20
Corpus Sample Labeling

Only 6 of the utterances actually contained
mixed-type concept labels !

21
Corpus Summary

Started with 2 months of dialog sessions
Eliminated short, ill-formed sessions
Transcribed the corpus
Labeled it at the concept level
Discarded mixed-label utterances
4550 binary labeled utterances
311 dialogs

22
Features Sources

Traditionally, features are extracted from the
Speech Recognition layer Chase.
In a SDS, there are at least 2 other orthogonal
knowledge sources
The Parser
The Dialog Manager

Speech
Features
Parsing
Dialog
23
Features Speech Recog.
Speech
Parsing
Dialog

WordNumber (11)
UnconfidentPerc of unconfident words (9)
this feature already captures other decoder level
features

24
Features Parser Level
Speech
Parsing
Dialog

UncoveredPerc of words uncovered by the parse
(36)
GapNumber of unparsed fragments (3)
FragmentationScore of transitions between
parsed and unparsed fragments (5)
Garble flag computed by a heuristic rule based
on parse coverage and fragmentation

25
Features Parser Level (2)
Speech
Parsing
Dialog

ConceptBigram bigram concept model score
P(c1 cn) ? P(cn cn-1) P(cn-1 cn-2) P(c2
c1)P(c1)
Probabilities trained from a corpus
ConceptNumber (4)

26
Features Dlg Mng. Level
Speech
Parsing
Dialog

DialogState the current state of the DM
StateDuration for how many turns did the DM
remain in the same state
TurnNumber how many turns since the beginning
of the session
ExpectedConcepts indicates if the concepts
correspond to the expectation of the DM.

27
Features Summary
Speech
Parsing
Dialog

12 Features from 3 levels in the system
Speech Level Features
WordNumber, UnconfidentPerc
Parsing Level Features
UncoveredPerc, FragmentationScore, GapNumber,
Garble, ConceptBigram, ConceptNumber
Dialog Management Level Features
DialogState, StateDuration, TurnNumber,
ExpectedConcepts

28
Classification Techniques

Bayesian Networks
Boosting
Decision Tree
Artificial Neural Networks
Support Vector Machine
Naïve Bayes

29
Roadmap

Intro to Spoken Dialog Systems
The Problem Detecting Misunderstandings
A Learning Approach
Training corpus
Features
Classification techniques
?Experiments and Results
Conclusion

30
Experimental Setup

Performance metric classification error rate
2 Performance baselines
Random baseline 32.84
Heuristic baseline 25.69
Used a 10-fold cross-validation process
Build confidence intervals for the error rates
Do statistical analysis of the differences in
performance exhibited by the classifiers

31
Results Individual Features
32
Results Classifiers
33
An in Depth Look at Error Rates

FP False acceptance
FN False rejection
Error Rate FP FN
CDR TN/(TNFP) 1-(FP/NBAD)

34
Results Classifiers (contd)
35
Conclusion

Spoken Dialog System performance is strongly
impaired by misunderstandings
Increase the robustness of systems when faced
with poor recognition
Detect Misunderstandings
Use Recovery Techniques

36
Conclusion (contd)

Data-driven classification task
Corpus
12 Features from 3 levels in the system
Empirically compared 6 classification techniques
Data-Driven Misunderstanding Detector
Significant improvement over previous heuristic
classifier
Correctly detect 74 of the misunderstandings

37
Future Work

Detect Misunderstandings
Improve performance by adding new features
Identify the source of the error
Use Recovery Techniques
Incorporate the confidence score into the Dialog
Management process

38
Pointers

Is This Conversation On Track?, P.Carpenter,
C.Jin, D.Wilson, R.Zhang, D.Bohus, A.Rudnicky,
Eurospeech 2001, Aalborg, Denmark
CMU Communicator
1-412-268-1084
www.cs.cmu.edu/dbohus/SDS

Write a Comment

User Comments (0)