Error Handling in the RavenClaw Dialog Management Framework - PowerPoint PPT Presentation

About This Presentation
Title:

Error Handling in the RavenClaw Dialog Management Framework

Description:

Did you say you wanted a room on Friday? Implicit Confirmation. a room on Friday ... error handling [this poster] multi-participant dialog (Thomas Harris) ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 2
Provided by: dbo1
Category:

less

Transcript and Presenter's Notes

Title: Error Handling in the RavenClaw Dialog Management Framework


1
Error Handling in the RavenClaw Dialog Management
Framework Dan Bohus, Alexander I.
Rudnicky Computer Science Department, Carnegie
Mellon University
( infrastructure architecture )
( current research )
belief updating
RavenClaw dialog management
1
3
Bohus and Rudnicky - Constructing Accurate
Beliefs in Spoken Dialog Systems, in ASRU-2005
  • RavenClaw dialog management framework for
    complex, task-oriented domains
  • Dialog Task Specification (DTS) hierarchical
    plan which captures the domain-specific
    dialog control logic
  • Dialog Engine executes a given DTS.
  • platform for research
  • error handling this poster
  • multi-participant dialog (Thomas Harris)
  • turn-taking (Antoine Raux)

systems built
demo Roomline
  • problem confidence scores provide an initial
    assessment for the reliability of the
    information obtained from the user. However, a
    system should leverage information available
    in subsequent user responses in order to
    update and improve the accuracy of its beliefs.
  • goal bridge confidence annotation and
    correction detection in a unified framework
    for belief updating in task oriented spoken
    dialog system
  • approach
  • machine learning (generalized linear models)
  • integrate features from multiple knowledge
    sources in the system
  • work with compressed beliefs
  • top hypothesis other ASRU-2005 paper
  • k hypotheses other work in progress

sample problem
  • conference room reservations
  • live schedules for 13 rooms in 2 buildings
    on campus
  • size, location, a/v equipment
  • recognition sphinx-2 3-gram
  • parsing phoenix
  • synthesis cepstral theta
  • information access
  • Lets Go! Bus Information, RoomLine
  • guidance through procedures
  • LARRI, IPA
  • taskable agent
  • Vera
  • command-and-control
  • TeamTalk

S where would you like to fly from? U
Boston/0.45 Austin/0.30 S sorry, did you
say you wanted to fly from Boston? U No/0.37
Aspen / 0.7 Updated belief ? Boston/?
Austin/? Aspen/?
results data driven approach significantlyoutper
forms common heuristics
error rates
30
30
initial
RavenClaw error handling architecture
20
20
2
heuristic
10
10
proposed
  • goal task-independent, adaptive and scalable
    error handling architecture
  • approach
  • error handling strategies and error handling
    decision process are decoupled from the
    dialog task ? reusability, uniformity,
    plug-and-play strategies, lessens development
    effort
  • error handling decision process implemented in a
    distributed fashion
  • local concept error handling decision process
    (handles potential misunderstandings)
  • local request error handling decision process
    (handles non-understandings)
  • currently implemented as POMDPs (for concepts)
    and MDPs (for request agents)

updates followingexplicit confirmation
updates followingimplicit confirmation
oracle
misunderstanding recovery strategies
learning policies for recovering from
non-understandings
4
Explicit Confirmation Did you say you wanted a
room on Friday? Implicit Confirmation a room on
Friday for what time?
Bohus and Rudnicky - Sorry, I didnt Catch That!
an Investigation of Non-understanding Errors
and Recovery Strategies, in SIGdial-2005
  • question can dialog performance be improved by
    using a better, more informed policy for
    engaging non-understanding recovery strategies?
  • approach a between-groups experiment
  • control group system chooses a
    non-understanding strategy randomly
  • (i.e. in an uninformed
    fashion)
  • wizard group a human wizard chooses which
    strategy should be used
    whenever a non-understanding happens
  • 23 participants in each condition
  • first-time users, balanced by gender x native
    language
  • each attempted a maximum of 10 scenario-based
    interactions
  • evaluated global dialog performance (task
    success) and various local non-understanding
    recovery performance metrics (see side panel)

results wizard policy outperforms uninformed
recovery policy on a number of global and local
metrics

80


100
non-understanding recovery strategies
wizard policy
80
60
uninformed policy
60
RoomLine
40
40
  • AskRepeat
  • Can you please repeat that?
  • AskRephrase
  • Could you please try to rephrase that?
  • Reprompt
  • Would you like a small or a large room?
  • DetailedReprompt
  • Sorry, Im not sure I understood you correctly.
    Right now I need to know if you would prefer
    a small or a large room.
  • Notify
  • Sorry, I didnt catch that
  • Yield
  • Ø
  • MoveOn
  • Sorry, I didt catch that. Once choice would
    be Wean Hall 7220. Would you like a reserva-
    tion for this room?
  • YouCanSay
  • Sorry, I didnt catch that. Right now Im
    trying to find out if you would prefer a small
    room or a large one. You can say I want a
    small room or I want a large room. If the
    size of the room doesnt matter to you, just
    say I dont care.
  • TerseYouCanSay
  • Full-Help
  • Sorry, I didnt catch that. So far I found 5
    rooms matching your constraints. Right now Im
    trying to find out if you would prefer a small
    room or a large one. You can say I want a
    small room or I want a large room. If the
    size of the room doesnt matter to you, just
    say I dont care.

20
20
non-natives
natives
non-natives
natives
iWelcome
GetQuery
x DoQuery
DiscussResults
avg. task success rate
avg. recovery WER

1

5
date
end_time
4
1
rGetDate
rGetEndTime
3
concept error handling MDP
0
2
concept error handling MDP
-1
start_time
1
rGetStartTime
non-natives
natives
non-natives
natives
avg. recovery conceptutility
avg. recovery efficiency
request error handling MDP
request error handling MDP
predicting likelihood of success
  • question can we learn a better policy from
    data?
  • decision theoretic approach
  • learn to predict likelihood of success for each
    strategy
  • use features available at runtime
  • stepwise logistic regression (good class
    posterior probabilities)
  • compute expected utility for each strategy
  • choose strategy with maximum expected utility
  • preliminary results promising, a new experiment
    needed for validation

for 5 of 10 strategies models perform better than
a majority baseline, on both soft and hard error
dialog task specification
majority baseline error ? cross-validation error
dialog engine
start_time start_time time
Reprompt 49.2 ? 32.8
YouCanSay 48.6 ? 34.3
TerseYouCanSay 43.5 ? 32.6
MoveOn 35.6 ? 30.0
DetailedReprompt 37.7 ? 34.4
date date start_time start_time
time end_time end_time time
error handling decision process
ExplConf(start_time)
date date start_time start_time
time end_time end_time time location
location network with_network ? true
without_network ? false
GetStartTime
GetQuery
error handling strategies
RoomLine
rejection threshold optimization
transfer of confidence annotators across domains
dialog stack
expectation agenda
5
6
System User Parse System
For when do you need the room? Lets try two
to four p.m. time(two) end_time(four) Did
you say you wanted the room starting at two
p.m.?
Bohus and Rudnicky - A Principled Approach for
Rejection Threshold Optimization in Spoken
Dialog Systems, to be presented at Interspeech
work in progress, in collaboration with Antoine
Raux
  • error handling strategies are implemented as
    library dialog agents
  • new strategies can be plugged in as they are
    developed
  • data-driven approach for tuning state-specific
    rejection thresholds in a spoken dialog system
  • migrate (adapt) a confidence annotator trained
    with data from domain A to domain B, without
    any labeled data in the new domain (B).
Write a Comment
User Comments (0)
About PowerShow.com