Spoken Dialogue Systems - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Spoken Dialogue Systems

Description:

... Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score ... Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 20
Provided by: juliahir
Category:

less

Transcript and Presenter's Notes

Title: Spoken Dialogue Systems


1
Spoken Dialogue Systems
  • Julia Hirschberg
  • CS 4706

2
Issues
  • Error avoidance
  • Error detection
  • From the system side how likely is it the
    system made an error?
  • From the user side what cues does the user
    provide to indicate an error?
  • Error handling what can the system do when it
    thinks an error has occurred?
  • Evaluation how do you know what needs fixing
    most?

3
Avoiding misunderstandings
  • By imitating human performance
  • Timing and grounding (Clark 03)

4
Recognizing Problematic Dialogues
  • Hastie et al, Whats the Trouble? ACL 2002.

5
Recognizing Problematic Utterances (Hirschberg et
al 99--)
  • Collect corpus from interactive voice response
    system
  • Identify speaker turns
  • incorrectly recognized
  • where speakers first aware of error
  • that correct misrecognitions
  • Identify prosodic features of turns in each
    category and compare to other turns
  • Use Machine Learning techniques to train a
    classifier to make these distinctions
    automatically

6
Turn Types
TOOT Hi. This is ATT Amtrak Schedule System.
This is TOOT. How may I help you? User Hello.
I would like trains from Philadelphia to New York
leaving on Sunday at ten thirty in the evening.
TOOT Which city do you want to go to? User
New York.
misrecognition
correction
aware site
7
Results
  • Reduced error in predicting misrecognized turns
    to 8.64
  • Error in predicting awares (12)
  • Error in predicting corrections (18-21)

8
Evidence from Human Performance
  • Users provide explicit positive and negative
    feedback
  • Corpus-based vs. laboratory experiments do
    these tell us different things?
  • Bell Gustafson 00
  • What do we learn from this?
  • What functions does feedback serve?
  • Krahmer et al
  • go on and go back signals in grounding
    situations (implicit/explicit verification)

9
  • Pos short turns, unmarked word order,
    confirmation, answers, no corrections or
    repetitions, new info
  • Neg long turns, marked word order,
    disconfirmation, no answer, corrections,
    repetitions, no new info
  • Hypotheses supported but
  • Can these cues be identified automatically?
  • How might they affect the design of SDS?

10
Error Handling Strategies
  • Goldberg et al 03 how should systems best
    inform the user that they dont understand?
  • System rephrasing vs. repetitions vs. statement
    of not understanding
  • Apologies
  • What behaviors might these produce?
  • Hyperarticulation
  • User frustration
  • User repetition or rephrasing

11
  • What lessons do we learn?
  • What produces least frustration?
  • Best recognized input?

12
Evaluating Dialogue Systems
  • PARADISE framework (Walker et al 00)
  • Performance of a dialogue system is affected
    both by what gets accomplished by the user and
    the dialogue agent and how it gets accomplished

Maximize Task Success
Minimize Costs
Efficiency Measures
Qualitative Measures
13
Task Success
  • Task goals seen as Attribute-Value Matrix
  • ELVIS e-mail retrieval task (Walker et al 97)
  • Find the time and place of your meeting with
    Kim.

Attribute Value Selection Criterion Kim or
Meeting Time 1030 a.m. Place 2D516
  • Task success defined by match between AVM values
    at end of with true values for AVM

14
Metrics
  • Efficiency of the InteractionUser Turns, System
    Turns, Elapsed Time
  • Quality of the Interaction ASR rejections, Time
    Out Prompts, Help Requests, Barge-Ins, Mean
    Recognition Score (concept accuracy),
    Cancellation Requests
  • User Satisfaction
  • Task Success perceived completion, information
    extracted

15
Experimental Procedures
  • Subjects given specified tasks
  • Spoken dialogues recorded
  • Cost factors, states, dialog acts automatically
    logged ASR accuracy,barge-in hand-labeled
  • Users specify task solution via web page
  • Users complete User Satisfaction surveys
  • Use multiple linear regression to model User
    Satisfaction as a function of Task Success and
    Costs test for significant predictive factors

16
User SatisfactionSum of Many Measures
  • Was Annie easy to understand in this
    conversation? (TTS Performance)
  • In this conversation, did Annie understand what
    you said? (ASR Performance)
  • In this conversation, was it easy to find the
    message you wanted? (Task Ease)
  • Was the pace of interaction with Annie
    appropriate in this conversation? (Interaction
    Pace)
  • In this conversation, did you know what you could
    say at each point of the dialog?
  • (User Expertise)
  • How often was Annie sluggish and slow to reply to
    you in this conversation? (System Response)
  • Did Annie work the way you expected her to in
    this conversation? (Expected Behavior)
  • From your current experience with using Annie to
    get your email, do you think you'd use Annie
    regularly to access your mail when you are away
    from your desk? (Future Use)

17
Performance Functions from Three Systems
  • ELVIS User Sat. .21 COMP .47 MRS - .15 ET
  • TOOT User Sat. .35 COMP .45 MRS - .14ET
  • ANNIE User Sat. .33COMP .25 MRS .33 Help
  • COMP User perception of task completion (task
    success)
  • MRS Mean recognition accuracy (cost)
  • ET Elapsed time (cost)
  • Help Help requests (cost)

18
Performance Model
  • Perceived task completion and mean recognition
    score are consistently significant predictors of
    User Satisfaction
  • Performance model useful for system development
  • Making predictions about system modifications
  • Distinguishing good dialogues from bad
    dialogues
  • But can we also tell on-line when a dialogue is
    going wrong

19
Next Week
  • Speech summarization and data mining
Write a Comment
User Comments (0)
About PowerShow.com