Exploring adaptation in humancomputer dialogs - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Exploring adaptation in humancomputer dialogs

Description:

User rephrases category 'Turkish restaurant' - 'restaurant' ... Domain: air travel, travel assistance. Contains ~3000 dialogs with 9 systems ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 67
Provided by: svetlanast
Category:

less

Transcript and Presenter's Notes

Title: Exploring adaptation in humancomputer dialogs


1
Exploring adaptation in human-computer dialogs
  • Svetlana Stoyanchev
  • Advisor Amanda Stent
  • Thesis Proposal
  • February 8, 2008

2
Outline
  • Dialog examples
  • Responsive and directive adaptation in dialog
    systems
  • Analysis of system errors
  • Summary of experiments
  • My system building work
  • Proposed experiments
  • Schedule

3
Have you tried the new free 411 voice interfaces?
  • System is too quick to make a selection
  • Considering a number of matching choices, the
    system should ask a user to narrow down
  • User rephrases category Turkish restaurant -gt
    restaurant
  • System looses an important information about the
    category

4
An example from Switchboard human-human corpus
  • A but uh see that our whole system is built on
    on owing and borrowing
  • B that's just
  • A uh true but uh uh uh without it people
    wouldn't be able to own automobiles or they
    wouldn't be able to own a house
  • B i don't have a master charge thank you
    laughter
  • A uh but i still go right back to what i said
    when is the last time you had uh fifteen thousand
    dollars all at one time to go out and buy an
    automobile
  • B right but that's the problem see as our system
    shouldn't be based on owing and borrowing and all
    that

Psycholinguistic Research on Adaptation Garrod
Anderson 1987. Production and comprehension in
dialog become tightly coupled Brennan Clark
1996 While there is great variability across
conversations, there is less variability within
Brennan 1998 finds that speakers form
conceptual pacts with particular addressees by
using consistent terminology Branigan 2004 finds
effects of priming in syntactic structure in
dialog utterances
Owing and borrowing getting in debt Borrowing
and owing Etc.
5
Example communication with a non-adaptive dialog
system
6
Outline
  • Dialog examples
  • Responsive and directive adaptation in dialog
    systems
  • Analysis of system errors
  • Summary of experiments
  • My system building work
  • Proposed experiments
  • Schedule

7
(No Transcript)
8
Examples of responsive adaptation
If a user hyper-articulates, ASR switches to an
acoustic model trained on hyper-articulated
data (Soltau and Waibel 2000) Adapting to a
particular users accent (Humphries and
Woodland 97) Adapting language model to a state
of a system (most system)
9
Example communication with a Lets Go dialog
system
10
Examples of responsive adaptation
S Which topic is the most important to you? U
instructor S Was the instructor good, average,
or excellent? Here the system could have used
teacher or lecturer
In a grammar-based NLU, adjust rule probabilities
based on the statistics from users past
utterances
Adapting syntax of an utterance based on a
users preference (Walker, Stent 2004)
Adapting to users knowledge level, whether user
is in a hurry (Komatani, 2005)
11
Directive adaptation
System guides a user into using grammar and
vocabulary that is best understood by the system.

Example systems confirmation prompt Traveling
from A to B, is this correct? Possible user
responses to correct the error Adaptive No,
traveling from A to C No, from A to
C Non-adaptive No, I will fly from A to
C Arriving at C
12
How to evaluate impact of adaptation
  • Speech Recognition Performance
  • Dialog length ( for some task-oriented dialogs)
  • User satisfaction subjective user survey
  • How quickly user recovers from errors in a dialog

13
System errors
  • In my experiments I look at places of system
    errors because
  • In my directive adaptation experiments it gives
    me an opportunity
  • To prime a user in a rejection prompt and to
    detect reaction in response to varying prompts.
  • To see the difference between users utterance
    before and after the correction prompt.
  • In my responsive adaptation experiment, the goal
    is to minimize the time to recover from errors
    in dialog

14
Analysis of system errors
  • Description of errors
  • How do we find them
  • How long do they go on
  • What are their causes
  • Response to errors
  • System
  • User

15
Analysis of system errors in Communicator corpus
  • Domain air travel, travel assistance
  • Contains 3000 dialogs with 9 systems
  • Each user calls one system at least 5 times
  • System utterances are labeled
  • One type of error is a system non-understanding.
    It is marked by slu_reject
  • Total number of systems rejections 4118

16
Communicator annotations
17
Example from Communicator corpus
18
Analysis of system rejections
  • Prob(reject next reject) 39

19
Hypothesizing about causes of rejections
  • Errors due to out-of-grammar utterances
  • User attempts to take the initiative by asking a
    question
  • User initiates a correction

20
Hypothesizing about causes of rejections
  • Errors due to out-of-grammar utterances
  • User attempts to take the initiative by asking a
    question
  • User initiates a correction

21
Systems actions on non-understandings (dialog
acts)
22
Change in systems dialog act after first reject
  • Example of OMIT
  • Before
  • ltimp conf, depart-arrive dategt,
  • ltreq info, depart-arrive timegt
  • After
  • ltreq info, depart-arrive timegt
  • Example of Change
  • Before
  • ltExp-conf depart-arrive timegt
  • After
  • ltexp-conf orig-citygt

23
Change in systems dialog act after first reject
  • What do users do when encounter a system error?

24
Partner model
Users build a partner model of a system a
users perception about the systems knowledge
and capabilities
Great! I can say anything!
S Hello, how can I help you? U I want to fly
from New York to London S I am sorry I did not
understand, where are you leaving from? U
Leaving from New York
Maybe I have to specify the cities separately
S What time would you like to leave U at ten
oclock S I am sorry, I did not understand,
please specify the time of your departure U ten
a m
It did not understand 10 oclock. I should try
to rephrase
25
Examples of users paraphrases
  • Observation on rejections users rephrase trying
    to guess what system recognizes
  • Study users behavior on the level of single
    concept
  • how users vary their choices of the form of
    concepts
  • do prompts affect users choices

26
Motivation
Better recognition
Ideal
Poor recognition
Limited interaction
Natural interaction
27
Motivation
Better recognition
Ideal
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
28
Motivation
Better recognition
Ideal
Speech Graffiti (S. Tomko)
Users can use a limited set of keywords and
concepts
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
29
Motivation
Better recognition
Ideal
Speech Graffiti (S. Tomko)
Users can use a limited set of keywords and
concepts
Using Adaptation
Free speech (Lets Go)
Poor recognition
Limited interaction
Natural interaction
30
Computer science studies on adaptation
  • Church 2000. introduced a method for measuring
    lexical adaptation in text.
  • A. Dubey, P. Sturt, and F. Keller 2006. Use
    Churchs measures and detect both between and
    within a speaker
  • Reitter, F. Keller, and J. Moore 2006.
    Computational modeling of structural priming in
    dialogue. Show rapid degradation of priming
    effect in a dialog over time.
  • E. Reitter, and J. Moore 2007 show that lexical
    adaptation positively correlates with task
    success in human-human task-oriented Maptask
    corpus
  • S. Stenchikova, A. Stent 2007 create new
    technique for measuring adaptation between
    dialogs, compare partner-specific and recency
    adaptation.

31
Summary of experiments
32
Outline
  • Dialog examples
  • Responsive and directive adaptation in dialog
    systems
  • Analysis of system errors
  • Summary of experiments
  • My system building work
  • Proposed experiments
  • Schedule

33
RavenCalendar
  • Built at Stony Brook
  • Provides voice interface for manipulating a
    calendar
  • Built using distributed Olympus architecture
  • Ability to replace components
  • Application is suited for a long-term users

RavenCalendar A Multimodal Dialog System for
Managing a Personal Calendar S. Stenchikova, B.
Mucha, S. Hoffman, A. Stent NAACL HLT
Demonstration Program, pages 15-16, Rochester,
New York, USA, April 2007
34
Rate-a-Course
  • Survey system for evaluating courses
  • Built at Stony Brook
  • Uses Voice XML
  • Ran in-lab experiments on 48 subjects

Dialog Systems for Surveys the Rate-a-Course
System A. Stent, S. Stenchikova, and M.Marge
Proceedings of the 1st IEEE/ACL Workshop on
Spoken Language Technology. SLT 2006.
35
Lets Go System
  • Provides local bus information to people
  • Developed and deployed at CMU
  • Has a constant pool of real users
  • We received permission to run the proposed
    experiments on the system

36
Question Answering
  • Retrieves answers to natural language questions
  • Experiments are performed with speech interface
    to the system

Name-Aware Speech Recognition for Interactive
Question Answering S. Stenchikova, D.
Hakkani-Tur, and G. Tur ICASSP 2008
QASR Question Answering Using Semantic Roles for
Speech Interface S. Stenchikova, D. Hakkani-Tur,
and G. Tur Proceedings of ICSLP-Interspeech 2006,
Pittsburgh, PA
37
Outline
  • Dialog Examples
  • Responsive and directive adaptation in dialog
    systems
  • Analysis of system errors
  • Summary of experiments
  • My system building work
  • Proposed experiments
  • Schedule

38
Summary of experiments
39
Proposed experiment
  • Match Natural Language Understanding Natural
    Language Generation for the form of time concept
  • Time concept appears in majority of systems, has
    multiple realizations
  • Explore non-understanding prompt strategies and
    the power of directive prompts.

40
Experimental questions
  • How do users form models of the system?
  • Can prompts be helpful in assisting users to
    build a user model matching reality?
  • Explore effect of variation in the form of
    concepts in systems non-understanding prompt

41
Prompt variation experiment
  • System grammar is limited to understand a
    particular format X oclock
  • Method
  • See whether users next utterance will use the
    systems format
  • How long will it take the user to say guess the
    recognized format

42
Experiment
  • Variable1 systems utterance at
    non-understanding/explicit confirmation
  • Generic I did not understand, please repeat
  • Specific Did you say ltanother time in format Ygt
    ?
  • Variable 2 systems ASR grammar for time
  • Specific
  • A. Hour pm
  • B. Hour oclock
  • C. Hour
  • Flexible
  • D. All of the above
  • Dependent variable users grammar.
  • Measure whether users grammar matches systems
    prompt
  • How long will it take for the user to figure out
    ASRs grammar

43
Systems experiment 1
X is the default or users preferred form
either original users utterance, or general most
frequent Y and Z are different from X Open
questions How to choose X Y and Z
Optional
44
Follow up experiment
Vary prompts on the system to check if there is
an effect on the users (Maybe try more
variations) Hypothesis Even if NLU is not
biased, users follow systems prompt
45
Complication
  • System may need to say a time in a generic prompt
    condition when doing implicit confirmation or
    presenting result
  • Solution throw away dialogs where confirmation
    happens and the time concept is specified
  • If using a real system for an experiment system
    may really misunderstand the user, even if they
    speak using the format understood by the system,
    then it may confuse the user
  • The number of unforced errors will have to be
    taken into account when analyzing the data.

46
Systems
Option 1
  • I plan to perform this experiment on Lets Go
    unless NLU limitation impair the system
    performance significantly
  • Pros short conversations, no overhead for more
    users (can throw away dialogs where confirmation
    happens and concept is specified)
  • Model additional conditions in the same
    experiment
  • Specifying date
  • Relative day (today tomorrow )
  • Weekday
  • December first
  • First of december
  • December one
  • Variation in verb
  • Add/new/create
  • Remove/delete
  • Change/modify

Option 2 (back-up)
47
Implications of the study
  • I limit the scope of the test for this
    experiment, but I hope it generalizes. Using
    directive prompts can be a powerful tool for
    guiding users into using particular syntax and
    lexicon.
  • If I can reliably predict the form of a concept,
    it may be possible to do concept spotting in ASR.
  • Concept spotting refers to identifying concepts
    instead of full utterances in ASR stage.
  • If I find that prompts affect users form of
    concept, this will support an argument for using
    a shared grammar in NLG and NLU.

48
Summary of experiments
49
Proposed experiment on responsive adaptation in
ASR
Method two-pass ASR by classifying whether a
users utterance is a correction
50
Motivation
  • Examples of correction utterances from
    COMMUNICATOR where a system requests a date

S Leaving Chicago on what date?
S OK, from Zurich to Denver. What date will you
be traveling?
51
Related work
  • Characterizing and Predicting Corrections in
    Spoken Dialogue Systems D. Litman, and
    J.Hirschberg, and M. Swerts Computational
    Linguistics 2006
  • Describes Machine Learning Experiment on
    Predicting Corrections
  • Features
  • Prosodic features (frequency, duration, etc.)
  • ASR features (Recognized string, Grammar used)
  • System Features (confirmation strategy,
    initiative)
  • History features (features of turn-1 and turn-2,
    prior mentions of keywords like cancel, lengths
    of prior turns)
  • Result Best feature set Prosodic ASR SYS
    POS History (previous turn) error rate 15.72
    (F-measure .72 .89 )

52
Method training stage
  • Data for ASR training transcribed and annotated
    users utterances from past communication.
  • Hypothesis Splitting users utterances into
    correction and non-correction may benefit ASR
    performance
  • Classify users utterances from training data as
    correction or non-correction
  • Use unsupervised clustering using lexical and
    history features
  • Use unsupervised clustering using prosodic
    features
  • Use rules learned by Litman et. al. (may need to
    be adjusted)
  • Train 2 models for each dialog systems state
    one on utterances classified as corrections and
    one on the rest of utterances

53
Method runtime
  • Run 2 ASRs models per dialog state in parallel
  • 1) Language Model trained on non-corrections
  • 2) Language Model trained on corrections
  • Predict probability of an utterance being a
    correction
  • When probability of correction is gt threshold use
    ASR output from model 2, else from model 1
  • Evaluate ASR performance

54
Summary of experiments
55
Schedule
56
List of publications
  • Published work
  • Name-Aware Speech Recognition for Interactive
    Question Answering S. Stenchikova, D.
    Hakkani-Tur, and G. Tur ICASSP 2008
  • Measuring Adaptation Between Dialogs S.
    Stenchikova, A. Stent SIGDIAL, 2007
  • Dialog Systems for Surveys the Rate-a-Course
    System A. Stent, S. Stenchikova, and M.Marge
    Proceedings of the 1st IEEE/ACL Workshop on
    Spoken Language Technology. SLT 2006.
  • QASR Question Answering Using Semantic Roles for
    Speech Interface S. Stenchikova, D. Hakkani-Tur,
    and G. Tur Proceedings of ICSLP-Interspeech 2006,
    Pittsburgh, PA
  • Demo Papers
  • RavenCalendar A Multimodal Dialog System for
    Managing a Personal Calendar S. Stenchikova, B.
    Mucha, S. Hoffman, A. Stent NAACL HLT
    Demonstration Program, pages 15-16, Rochester,
    New York, USA, April 2007
  • QASR Spoken Question Answering Using Semantic
    Role Labeling. S. Stenchikova, D. Hakkani-Tur, G.
    Tur ASRU-2005, 9th biannual IEEE workshop on
    Automatic Speech Recognition and Understanding,
    Cancun, Mexico, December, 2005
  • Planned publications
  • Exploring Directive Adaptation in Spoken Dialog
    Systems (Submitted to Student Workshop at ACL
    2008)
  • Analysis of rejections in Communicator (to submit
    for SIGDIAL 2008, March 14)
  • Adaptation in the Rate-a-Course system experiment
    (to submit a short paper to HLT 2008, March 14)
  • Responsive adaptation ASR experiments on Lets Go
    Corpus (to submit to GOTAL, April 4)
  • Directive adaptation experiment (Fall 2008)

57
  • Thank you
  • Questions? Comments

58
Definitions
  • Adaptation the process of bringing one thing
    into correspondence with another implies a
    modification according to changing circumstances
    Merriam-Webster
  • Convergence when dialog participants change
    their language use to be more similar to each
    other over time
  • Priming A process that influences linguistic
    decision-making. An instance of priming occurs
    when a syntactic structure or lexical item giving
    evidence of a linguistic choice (prime)
    influences the recipient to make the same
    decision, i.e. re-use the structure, at a later
    choice-point (Reitter)
  • I differentiate between Directive and Responsive
    adaptation
  • Activating parts of particular representations
    or associations in memory just before carrying
    out an action or task. It is considered to be
    one of the manifestations of implicit memory. A
    property of priming is that the remembered item
    is remembered best in the form in which it was
    originally encountered.' (Wikipedia)

59
Proposed corpus study
  • What are users strategies in rephrasing
  • Try to detect a pattern
  • Does the form of concept used in prompts affect
    users choices?
  • Hypotheses some forms on concept are recognized
    correctly more frequently than others
  • Hypotheses through trial and error, user finds
    optimal (most frequently recognized) phrasing.

60
Variation in date-time confirmations in
Communicator
61
Rate-a-course experiment
Aspects of the courses
Initiative condition System/mixed/user Adaptive
Condition Adaptive / non-adaptive
62
Rate-a-course experiment
Aspects of the courses
Initiative condition System/mixed/user Adaptive
Condition Adaptive / non-adaptive
63
Rate-a-course experiment comparing adaptation
Compare using inference on proportions System
Adaptive vs. System Non-adaptive
No significant difference
System initiative vs. User initiative vs. Mixed
initiative samples are too small
64
Rate-a-course experiments detecting adaptation
Hypothesis users are more likely to follow
systems lexical choice
Problem unbalanced data. Can not make conclusion
65
QA ASR experi-ment
66
Evaluation
Evaluation 3 speakers read 40 questions Set 3
40 questions with a named entity Set 4 40
questions without a named entity
67
Evaluation Result
Cheating model (contains questions)
68
The End!
Write a Comment
User Comments (0)
About PowerShow.com