Detecting Misunderstandings in the CMU Communicator Spoken Dialog System - PowerPoint PPT Presentation

About This Presentation
Title:

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System

Description:

Detecting Misunderstandings in the. CMU Communicator Spoken Dialog System ... Air-travel planning (Communicator) Weather info over the phone (Jupiter) ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 39
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Detecting Misunderstandings in the CMU Communicator Spoken Dialog System


1
Detecting Misunderstandings in the CMU
Communicator Spoken Dialog System
  • Presented by Dan Bohus
  • Joint work with Paul Carpenter, Chun Jin,
  • Daniel Wilson, Rong Zhang,
  • Alex Rudnicky
  • Carnegie Mellon University 2002

2
Whats a Spoken Dialog System ?
  • Human talking to a computer
  • Taking turns in a goal-oriented dialog

3
Why Spoken Language Interfaces ?
  • Speech advantages and problems
  • ?Speech is the natural communication modality for
    humans
  • ?Can easily express fairly complex structures
  • ?Works well in hands- or eyes-busy situations
  • ?Serial channel
  • ?It is still an unreliable channel

4
Sample Spoken Dialog Systems
  • Interactive Voice Response systems (IVR)
  • Information Access Systems
  • Air-travel planning (Communicator)
  • Weather info over the phone (Jupiter)
  • E-mail access over the phone (ELVIS)
  • UA Baggage claims (Simon)
  • Other Systems guidance, personal assistants,
    taskable agents, etc.

5
A Look Under the Hood
  • S Where are you flying from ?
  • U from London to Paris and then on to Toronto
  • D from London to Paris on then on to go on to
  • SI depart_location London, arrive_location
    Paris
  • SO query depart_time
  • NL And when do you want to leave ?
  • S And when do you want to leave ?

6
Roadmap
  • Intro to Spoken Dialog Systems
  • ? The Problem Misunderstandings
  • A Learning Solution
  • Experiments and Results
  • Conclusion

7
Speech Recognition
  • Speech Recognition is the main drive behind the
    development of SDS.
  • But it is problematic
  • Input signal quality
  • Accents, Non-native speakers
  • Spoken language disfluencies stutters,
    false-starts, /mm/, /um/
  • Typical Word Error Rates 20-30

8
Recognition Errors Impact
  • Errors propagate to upper levels
  • compromise the parse ? non-understandings
  • propagate to Dialog Control level ?
    misunderstandings, which can have a higher cost,
    if acted upon.

9
Misunderstandings in Action !
  • S Are you a registered user ?
  • U No I'm not. No NO I'M NOT NO
  • S What is your full name ?
  • U Adam Schumacher ATHENS IN AKRON
  • S There is an Athens in Georgia and in Greece.
    Which destination did you want ?
  • U Neither NEITHER
  • S I'm sorry, I didn't catch that. There is an
    Athens in Georgia and in Greece. Which
    destination did you want ?
  • U Georgia GEORGIA
  • S A flight from Athens... Where do you want to
    go ?
  • U Start over START OVER
  • S Do you really want to start over ?
  • U Yes YES
  • S What city are you leaving from ?
  • U Hamilton, Ontario HILTON ONTARIO
  • S Sorry, I'm not sure I understood what you
    said. Where are you leaving from ?
  • U Hamilton HILTON
  • S I'm still having trouble understanding you. To
    go on, I need you to answer the following
    question. What city are you leaving from ?
  • U Toronto TORONTO

10
Addressing the Problem
  • Wait for SR technology to reach better
  • performance.
  • ?Increase the robustness of systems when faced
    with poor recognition
  • Detect Misunderstandings
  • Use Recovery Techniques

11
Problem Formulation
  • Given an input utterance, and the current state
    of the system, detect whether it was correctly
    perceived by the system or not.(confidence
    annotation problem)

12
Roadmap
  • Intro to Spoken Dialog Systems
  • The Problem Detecting Misunderstandings
  • ? A Learning Solution
  • Experiments and Results
  • Conclusion

13
A Classification Task
  • Cast the problem as a classification task
  • Heuristic approach
  • Garble rule previously used in Communicator
  • Data-driven (learning) approach

14
A Data-Driven Approach
  • Machine learning approach
  • Learn to classify from a labeled training corpus
  • Use it to classify new instances

Features
Classifier(Learn Mode)
GOOD/BAD
Features
GOOD/BAD
Classifier
15
Ingredients
  • Three ingredients needed for a machine learning
    approach
  • Corpus of labeled data to use for training
  • Identify a set of relevant features
  • Choose a classification technique

16
Roadmap
  • Intro to Spoken Dialog Systems
  • The Problem Misunderstandings
  • A Learning Solution
  • ? Training corpus
  • Features
  • Classification techniques
  • Experiments and Results
  • Conclusion

17
Corpus Sources
  • Collected 2 months of sessions
  • October and November 1999
  • About 300 sessions
  • Both developer and outsider calls
  • Eliminated conversations with lt 5 turns
  • Developers calling to check if system is on-line
  • Wrong number calls

18
Corpus Structure
  • The Logs
  • Generated automatically by various system modules
  • Serve as a source of features for classification
    (also contain the decoded utterances)
  • The Transcripts (the actual utterances)
  • Performed and double-checked by a human annotator
  • Provide a basis for labeling

19
Corpus Labeling
  • Labeling was done at the concept level.
  • Four possible labels
  • OK The concept is okay
  • RBAD Recognition is bad
  • PBAD Parse is bad
  • OOD Out of domain
  • Aggregate utterance labels generated
    automatically.

20
Corpus Sample Labeling
  • Only 6 of the utterances actually contained
    mixed-type concept labels !

21
Corpus Summary
  • Started with 2 months of dialog sessions
  • Eliminated short, ill-formed sessions
  • Transcribed the corpus
  • Labeled it at the concept level
  • Discarded mixed-label utterances
  • 4550 binary labeled utterances
  • 311 dialogs

22
Features Sources
  • Traditionally, features are extracted from the
    Speech Recognition layer Chase.
  • In a SDS, there are at least 2 other orthogonal
    knowledge sources
  • The Parser
  • The Dialog Manager

Speech
Features
Parsing
Dialog
23
Features Speech Recog.
Speech
Parsing
Dialog
  • WordNumber (11)
  • UnconfidentPerc of unconfident words (9)
  • this feature already captures other decoder level
    features

24
Features Parser Level
Speech
Parsing
Dialog
  • UncoveredPerc of words uncovered by the parse
    (36)
  • GapNumber of unparsed fragments (3)
  • FragmentationScore of transitions between
    parsed and unparsed fragments (5)
  • Garble flag computed by a heuristic rule based
    on parse coverage and fragmentation

25
Features Parser Level (2)
Speech
Parsing
Dialog
  • ConceptBigram bigram concept model score
  • P(c1 cn) ? P(cn cn-1) P(cn-1 cn-2) P(c2
    c1)P(c1)
  • Probabilities trained from a corpus
  • ConceptNumber (4)

26
Features Dlg Mng. Level
Speech
Parsing
Dialog
  • DialogState the current state of the DM
  • StateDuration for how many turns did the DM
    remain in the same state
  • TurnNumber how many turns since the beginning
    of the session
  • ExpectedConcepts indicates if the concepts
    correspond to the expectation of the DM.

27
Features Summary
Speech
Parsing
Dialog
  • 12 Features from 3 levels in the system
  • Speech Level Features
  • WordNumber, UnconfidentPerc
  • Parsing Level Features
  • UncoveredPerc, FragmentationScore, GapNumber,
    Garble, ConceptBigram, ConceptNumber
  • Dialog Management Level Features
  • DialogState, StateDuration, TurnNumber,
    ExpectedConcepts

28
Classification Techniques
  • Bayesian Networks
  • Boosting
  • Decision Tree
  • Artificial Neural Networks
  • Support Vector Machine
  • Naïve Bayes

29
Roadmap
  • Intro to Spoken Dialog Systems
  • The Problem Detecting Misunderstandings
  • A Learning Approach
  • Training corpus
  • Features
  • Classification techniques
  • ?Experiments and Results
  • Conclusion

30
Experimental Setup
  • Performance metric classification error rate
  • 2 Performance baselines
  • Random baseline 32.84
  • Heuristic baseline 25.69
  • Used a 10-fold cross-validation process
  • Build confidence intervals for the error rates
  • Do statistical analysis of the differences in
    performance exhibited by the classifiers

31
Results Individual Features
32
Results Classifiers
33
An in Depth Look at Error Rates
  • FP False acceptance
  • FN False rejection
  • Error Rate FP FN
  • CDR TN/(TNFP) 1-(FP/NBAD)

34
Results Classifiers (contd)
35
Conclusion
  • Spoken Dialog System performance is strongly
    impaired by misunderstandings
  • Increase the robustness of systems when faced
    with poor recognition
  • Detect Misunderstandings
  • Use Recovery Techniques

36
Conclusion (contd)
  • Data-driven classification task
  • Corpus
  • 12 Features from 3 levels in the system
  • Empirically compared 6 classification techniques
  • Data-Driven Misunderstanding Detector
  • Significant improvement over previous heuristic
    classifier
  • Correctly detect 74 of the misunderstandings

37
Future Work
  • Detect Misunderstandings
  • Improve performance by adding new features
  • Identify the source of the error
  • Use Recovery Techniques
  • Incorporate the confidence score into the Dialog
    Management process

38
Pointers
  • Is This Conversation On Track?, P.Carpenter,
    C.Jin, D.Wilson, R.Zhang, D.Bohus, A.Rudnicky,
    Eurospeech 2001, Aalborg, Denmark
  • CMU Communicator
  • 1-412-268-1084
  • www.cs.cmu.edu/dbohus/SDS
Write a Comment
User Comments (0)
About PowerShow.com