Novice users will know how to adjust a knob but may not kno presentation

About This Presentation

Transcript and Presenter's Notes

Title: Novice users will know how to adjust a knob but may not kno

1
A Tale of Two Dialogue Models
CISC882 Natural Language Processing Final Project

By Sachin Kamboj

2
The Two Models

Missing Axiom Theory Model
Circuit Fix-It
Ronnie W. Smith, D. Richard Hipp, Alan W.
Biermann An architecture for voice dialog
systems based on prolog style theorem proving,
Computational Linguistics, Volume 21, Number 3,
September 1995.
Multimodal Speech and Gesture Models
Oviatt et. al. Designing the user interface for
multimodal speech and pen-based gesture
applications State of the art systems and future
research directions, Human Computer Interaction,
2000, vol. 15(4) pg 263-322

3
Missing Axiom Theory Model
4
Circuit Fix-ItAims/Target Behavior

The system should be able to handle
Convergence to a goal
Subdialogues and effective movement between them
An accounting of user knowledge and abilities
Change of initiative
Expectation of user input

5
Circuit Fix-ItSample Dialogs

Computer Put the knob to one zero.
User Okay.
Computer What is the voltage between connector
one two one and connector three four?
User I do not know.
Computer Locate the voltmeter.
User Done.
Computer Add a wire between the minus com hole
on the voltmeter and the
connector one two one.
User How do I do that?
Computer Connect the end of the black wire
with the small plug to the minus
corn hole on the voltmeter.
User Done.
Computer Connect the end of the black wire with
the large plug to connector one
two one.
User Which Knob?
Computer Near the top boundary and in the
center is a green region. Now, in the middle
right of this green region is the control knob.
User Okay.
Computer Put the knob to one zero.

6
Circuit Fix-ItProlog Style Theorem Proving

Based on the Missing Axiom Theory
System is built around a theorem prover and the
role of language is to supply the missing axioms
Goal is stated as a prolog axiom to be proven
The system tries to prove the axiom in a top-down
fashion.
If the proof succeeds using internally available
knowledge, the dialog terminates without any
interaction with the user.
If the proof fails the system tries to find the
missing axiom by engaging in a dialog
observeposition(sw1,X) ? find(sw1),
reportposition (sw1,X)

7
Circuit Fix-ItImplementing the Subdialog Feature

One of the requirements of the system is to allow
subdialogs.
As the system engages in conversation, only to
prove missing axioms, each subdialog involves a
separate proof.
Hence the system cannot follow a simple
depth-first policy to complete a proof.
Instead, to switch between subdialogs, the system
should allow the freezing of any proof and the
transfer of control to a different proof
Partially completed proofs have to be maintained
in memory.
Freezing of proofs handled through an
Interruptible Prolog Simulator (IPSIM)

8
Circuit Fix-ItAccounting for User Knowledge

The system should know what the user is capable
of doing
The requests should match the abilities of the
user
Abilities of different users will vary
Novice users will know how to adjust a knob but
may not know how to take a voltmeter reading
The system uses a user model to determine what
can be expected of the user.
The users capabilities are specified in the form
of prolog style rules
If the input describes some physical state, then
conclude that the user knows how to observe the
physical state. In addition if the physical state
is a property, then infer that the user knows how
to locate the object that has that property.

9
Circuit Fix-ItMechanisms for Obtaining Variable
Initiative

Variable initiative takes a role in selecting the
next subdialog to be entered.
The system implements four levels of initiative
Directive Mode unless the user needs
clarification, the system selects its response
according to the next goal
Suggestive Mode the system will select its
response depending on the next goal but will
allow interruptions to subdialogs about related
goals
Declarative Mode the user has dialog control,
but the system is free to mention relevant facts
Passive Mode The user has complete control. The
system will provide information only in direct
response to the users questions.

10
Circuit Fix-ItImplementation and Uses of
Expectation

If the computer produces an utterance that is an
attempt to have a specific task step S performed,
there are expectations for any of the following
types of responses
A statement about the missing or uncertain
background knowledge necessary for the
accomplishment of S
A statement about a subgoal of S.
A statement about the underlying purpose for S.
A statement about ancestor task steps of which
accomplishment of S is a part
A statement indicating the accomplishment of S.
Expectations serve two purposes
The detection and correction of errors
Provide an indication of the shift between
subdialogs

11
Circuit Fix-ItImplementation and Uses of
Expectation

The system computes the expectations and the cost
of each expectation.
The system also computes a set of meanings (or
semantic representations) of user utterances with
a corresponding cost
The system combines the two costs
C ß µ (1 ß)E
The meaning with the smallest total cost is
selected as the output of the parser

12
Circuit Fix-ItImplementation and Uses of
Expectation

An important side effect of matching meanings
with expectations is the ability to interpret an
utterance whose content does not specify its
meaning.
The reference of pronouns
Turn the switch up
Where is it?
The meaning of short answers
Turn the switch up
Okay
Maintaining dialog coherence

13
Basic Algorithm

ZmodSubdialog(Goal)
Create subdialog data structures
While there are rules available which may achieve
Goal
Grab next available rule R from knowledge unify
with Goal
If R trivially satisfies Goal, return with
success
If R is vocalize(X) then
Execute verbal output X (mode)
Record expectation
Receive response (mode)
Record implicit and explicit meanings for
response
Transfer control depending on which expected
response was received
Success response Return with success
Negative response No action
Confused response Modify rule for
clarification prioritize for execution
Interrupt Match response to expected response
of another subdialog
Go to that subdialog (mode)
If R is a general rule then
Store its antecedents
While there are more antecedents to process

14
Multimodal Speech and Gesture Interface Models
15
Introduction

What are multimodal interfaces?
Humans perceive the world through senses
Ears (hearing), Eyes (sight), Nose (smell), Skin
(touch) and Tongue (taste)
Communication through one sense is known as a
mode
Computers may process information through modes
as well
Keyboards, Microphone, Mice, etc.
Multimodal interfaces try to combine two
different modes of communicating.
Slide borrowed from a talk on Multimodal
Interfaces by Joe Caloza

16
Advantages

Combination of modalities allows more powerfully
expressive and transparent information seeking
dialogues
Different modalities provide complimentary
capabilities
Users prefer speech input for functions like
describing objects and events and for issuing
commands
Pen input is preferred for conveying symbols,
signs and gestures and for pointing and selecting
visible objects
Multimodal pen/voice interaction can result in
10 faster task completion time, 36 fewer
task-critical content errors 50 fewer
spontaneous disfluencies and shorter and more
simplified linguistic constructions
Corresponds to a 90-100 user preference to
interact multimodally

17
Advantages (2)

Able to support superior error-handling compared
with unimodal recognition interfaces
User-centric reasons
Users will select the input mode that they judge
to be less error prone for a particular lexical
context
Users language is simplified when interacting
multimodally
Users have a strong tendency to switch modes
after system errors
System-centric reasons
A well-designed multimodal architecture can
support mutual disambiguation of input signals

18
Advantages (3)

Allow users to exercise selection and control
over how they interact with the computer
Hence can accommodate a broader range of users
A visually impaired user may prefer speech input
and TTS output
A user with a hearing impairment, strong accent,
or a cold may prefer pen input
Multimodal interfaces are particularly suitable
for supporting mobile tasks such as communication
and personal navigation

19
Types of Multimodal Architecture

Can be subdivided into two main types
Early Fusion
Integrate signals at the feature level
Based on Hidden Markov Models and Temporal Neural
Networks
The recognition process in one mode influences
the course of recognition in the other
Used for closely coupled and synchronized
modalities (eg speech and lip movement)
Systems tend not to apply or generalize as well
if the modes differ substantially in the
information content or time scale characteristics
Require a large amount of training data to build
the system.

20
Types of Multimodal Architecture

Late Fusion
Integrate information at a semantic level
Use individual recognizers trained using unimodal
data
Systems based on semantic fusion can be scaled up
easier whether in the number of input modes or
the size and type of the vocabulary sets
Require an architecture that supports
fine-grained time stamping of at least the
beginning and end of each input signal
Required to figure out if two signals are part of
a multimodal construction or whether they should
be interpreted as unimodal commands.

21
Multimodal Architecture
Speech
Pen, Glove, Laser
Gesture Recognition
Speech Recognition
Gesture Understanding
NLP
Context Management
Multimodal Integration
Dialogue Manager
Graphics
VR
TTS
Application Invocation and Coordination
Response Planning
App1
App2
App3
22
Applications

OGI QuickSet System
Enables a user to create and position entities on
a map with speech, pen-based gestures and direct
manipulation.
These entities are then used to initialize/run a
simulation
IBMs Human-Centric Word Processor
Combines Natural Language understanding with
pen-based pointing and selection gestures.
Used to correct, manipulate and format text after
it has been entered
Boeings Virtual Reality Aircraft Maintenance
Training Prototype
Used for accessing the maintainability of new
aircraft designs and training mechanics in
maintenance procedures using VR

23
Applications

Meditor Multimode Text Editor
Combines keyboard, Braille terminal, a French
text-to-speech synthesiser, and a speech
recognition system.
Allows Blind people to perform simple Document
editing tasks.
MATCH
Multimodal Access to City Help
A Multimode Portable Device that accepts speech
and pen gestures created by ATTT
Allows mobile users to access restaurant and
subway information for New York City

24
Conclusion

Multimodal systems are useful for a wide variety
of applications
They provide increased robustness, ease of use
and flexibility.
They provide accessibility to computer to a wider
and more diverse range of users
However the area still needs a lot of research
and a lot of challenges need to be overcome

25
References

Harald Aust et al. The Philips Automatic Train
Timetable Information System, Speech
Communication 17 (1995) 249-262
Gavin E. Churcher, Eric S Atwell, Clive Souter
Dialogue Management Systems A Survey and
Overview University of Leeds, Research Report
Series, Report 97.06, Feb 1997
Sharon J. Goldwater et al. Building a Robust
Dialogue System with Limited Data, ANLP-NAACL
2000 Workshop Conversational Systems
Staffan Larsson et al. GoDiS- An Accommodating
Dialogue System, ANLP-NAACL 2000 Workshop
Conversational Systems
Diane J. Litman and Shimei Pan Designing and
Evaluating an Adaptive Spoken Dialogue System,
User Modelling and User-Adapted Interaction 12
111-137, 2002
Michael F. McTear Spoken Dialogue Technology
Enabling the conversational user interface, ACM
Computing Surveys, Vol 34, No. 1, March 2002, pp.
90-169
Mikio Nakano et al. WIT A toolkit for building
robust and real-time spoken dialogue systems, 1st
Sigdial Workshop at ACL2000
Stephanie Seneff and Joseph Polifroni Dialogue
Management in the Mercury Flight Reservation
System, ANLP-NAACL 2000 Workshop Conversational
Systems, May 2000, pp 11-16
Satinder Singh et al. Optimizing Dialogue
Management with Reinforcement Learning
Experiments with the NJFun SystemJournal of
Artificial Intelligence Research 16 (2002)
105-133
M. A. Walker et al. Evaluating spoken dialogue
agents with PARADISE Two case studies, Computer
Speech and Language (1998) 12, 317-347
Wayne Ward and Bryan Pellom The CU Communicator
System, International Workshop on Automatic
Speech Recognition and Understanding (1999),
Section 5
Sandra Williams Dialogue Management in
Mixed-initiative, Cooperative, Spoken Language
System11th Twente Workshop on Language
Technology (TWLT11) Dialogue Management in
Natural Language Systems, Enschade, Netherlands,
June 1996

26
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Novice users will know how to adjust a knob but may not kno PowerPoint PPT Presentation