Novice users will know how to adjust a knob but may not know how to take a voltmeter reading ... Allow users to exercise selection and control over how they ...
1 A Tale of Two Dialogue Models CISC882 Natural Language Processing Final Project
By Sachin Kamboj
2 The Two Models
Missing Axiom Theory Model
Circuit FixIt
Ronnie W. Smith D. Richard Hipp Alan W. Biermann An architecture for voice dialog systems based on prolog style theorem proving Computational Linguistics Volume 21 Number 3 September 1995.
Multimodal Speech and Gesture Models
Oviatt et. al. Designing the user interface for multimodal speech and penbased gesture applications State of the art systems and future research directions Human Computer Interaction 2000 vol. 154 pg 263322
3 Missing Axiom Theory Model 4 Circuit FixItAims/Target Behavior
The system should be able to handle
Convergence to a goal
Subdialogues and effective movement between them
An accounting of user knowledge and abilities
Change of initiative
Expectation of user input
5 Circuit FixItSample Dialogs
Computer Put the knob to one zero.
User Okay.
Computer What is the voltage between connector one two one and connector three four?
User I do not know.
Computer Locate the voltmeter.
User Done.
Computer Add a wire between the minus com hole on the voltmeter and the
connector one two one.
User How do I do that?
Computer Connect the end of the black wire with the small plug to the minus
corn hole on the voltmeter.
User Done.
Computer Connect the end of the black wire with the large plug to connector one
two one.
User Which Knob?
Computer Near the top boundary and in the center is a green region. Now in the middle right of this green region is the control knob.
User Okay.
Computer Put the knob to one zero.
6 Circuit FixItProlog Style Theorem Proving
Based on the Missing Axiom Theory
System is built around a theorem prover and the role of language is to supply the missing axioms
Goal is stated as a prolog axiom to be proven
The system tries to prove the axiom in a topdown fashion.
If the proof succeeds using internally available knowledge the dialog terminates without any interaction with the user.
If the proof fails the system tries to find the missing axiom by engaging in a dialog
observepositionsw1X ? findsw1 reportposition sw1X
7 Circuit FixItImplementing the Subdialog Feature
One of the requirements of the system is to allow subdialogs.
As the system engages in conversation only to prove missing axioms each subdialog involves a separate proof.
Hence the system cannot follow a simple depthfirst policy to complete a proof.
Instead to switch between subdialogs the system should allow the freezing of any proof and the transfer of control to a different proof
Partially completed proofs have to be maintained in memory.
Freezing of proofs handled through an Interruptible Prolog Simulator IPSIM
8 Circuit FixItAccounting for User Knowledge
The system should know what the user is capable of doing
The requests should match the abilities of the user
Abilities of different users will vary
Novice users will know how to adjust a knob but may not know how to take a voltmeter reading
The system uses a user model to determine what can be expected of the user.
The users capabilities are specified in the form of prolog style rules
If the input describes some physical state then conclude that the user knows how to observe the physical state. In addition if the physical state is a property then infer that the user knows how to locate the object that has that property.
9 Circuit FixItMechanisms for Obtaining Variable Initiative
Variable initiative takes a role in selecting the next subdialog to be entered.
The system implements four levels of initiative
Directive Mode unless the user needs clarification the system selects its response according to the next goal
Suggestive Mode the system will select its response depending on the next goal but will allow interruptions to subdialogs about related goals
Declarative Mode the user has dialog control but the system is free to mention relevant facts
Passive Mode The user has complete control. The system will provide information only in direct response to the users questions.
10 Circuit FixItImplementation and Uses of Expectation
If the computer produces an utterance that is an attempt to have a specific task step S performed there are expectations for any of the following types of responses
A statement about the missing or uncertain background knowledge necessary for the accomplishment of S
A statement about a subgoal of S.
A statement about the underlying purpose for S.
A statement about ancestor task steps of which accomplishment of S is a part
A statement indicating the accomplishment of S.
Expectations serve two purposes
The detection and correction of errors
Provide an indication of the shift between subdialogs
11 Circuit FixItImplementation and Uses of Expectation
The system computes the expectations and the cost of each expectation.
The system also computes a set of meanings or semantic representations of user utterances with a corresponding cost
The system combines the two costs
C ß µ 1 ßE
The meaning with the smallest total cost is selected as the output of the parser
12 Circuit FixItImplementation and Uses of Expectation
An important side effect of matching meanings with expectations is the ability to interpret an utterance whose content does not specify its meaning.
The reference of pronouns
Turn the switch up
Where is it?
The meaning of short answers
Turn the switch up
Okay
Maintaining dialog coherence
13 Basic Algorithm
ZmodSubdialogGoal
Create subdialog data structures
While there are rules available which may achieve Goal
Grab next available rule R from knowledge unify with Goal
If R trivially satisfies Goal return with success
If R is vocalizeX then
Execute verbal output X mode
Record expectation
Receive response mode
Record implicit and explicit meanings for response
Transfer control depending on which expected response was received
Success response Return with success
Negative response No action
Confused response Modify rule for clarification prioritize for execution
Interrupt Match response to expected response of another subdialog
Go to that subdialog mode
If R is a general rule then
Store its antecedents
While there are more antecedents to process
14 Multimodal Speech and Gesture Interface Models 15 Introduction
What are multimodal interfaces?
Humans perceive the world through senses
Ears hearing Eyes sight Nose smell Skin touch and Tongue taste
Communication through one sense is known as a mode
Computers may process information through modes as well
Keyboards Microphone Mice etc.
Multimodal interfaces try to combine two different modes of communicating.
borrowed from a talk on Multimodal Interfaces by Joe Caloza
16 Advantages
Combination of modalities allows more powerfully expressive and transparent information seeking dialogues
Different modalities provide complimentary capabilities
Users prefer speech input for functions like describing objects and events and for issuing commands
Pen input is preferred for conveying symbols signs and gestures and for pointing and selecting visible objects
Multimodal pen/voice interaction can result in 10 faster task completion time 36 fewer taskcritical content errors 50 fewer spontaneous disfluencies and shorter and more simplified linguistic constructions
Corresponds to a 90100 user preference to interact multimodally
17 Advantages 2
Able to support superior errorhandling compared with unimodal recognition interfaces
Usercentric reasons
Users will select the input mode that they judge to be less error prone for a particular lexical context
Users language is simplified when interacting multimodally
Users have a strong tendency to switch modes after system errors
Systemcentric reasons
A welldesigned multimodal architecture can support mutual disambiguation of input signals
18 Advantages 3
Allow users to exercise selection and control over how they interact with the computer
Hence can accommodate a broader range of users
A visually impaired user may prefer speech input and TTS output
A user with a hearing impairment strong accent or a cold may prefer pen input
Multimodal interfaces are particularly suitable for supporting mobile tasks such as communication and personal navigation
19 Types of Multimodal Architecture
Can be subdivided into two main types
Early Fusion
Integrate signals at the feature level
Based on Hidden Markov Models and Temporal Neural Networks
The recognition process in one mode influences the course of recognition in the other
Used for closely coupled and synchronized modalities eg speech and lip movement
Systems tend not to apply or generalize as well if the modes differ substantially in the information content or time scale characteristics
Require a large amount of training data to build the system.
20 Types of Multimodal Architecture
Late Fusion
Integrate information at a semantic level
Use individual recognizers trained using unimodal data
Systems based on semantic fusion can be scaled up easier whether in the number of input modes or the size and type of the vocabulary sets
Require an architecture that supports finegrained time stamping of at least the beginning and end of each input signal
Required to figure out if two signals are part of a multimodal construction or whether they should be interpreted as unimodal commands.
Enables a user to create and position entities on a map with speech penbased gestures and direct manipulation.
These entities are then used to initialize/run a simulation
IBMs HumanCentric Word Processor
Combines Natural Language understanding with penbased pointing and selection gestures.
Used to correct manipulate and format text after it has been entered
Boeings Virtual Reality Aircraft Maintenance Training Prototype
Used for accessing the maintainability of new aircraft designs and training mechanics in maintenance procedures using VR
23 Applications
Meditor Multimode Text Editor
Combines keyboard Braille terminal a French texttospeech synthesiser and a speech recognition system.
Allows Blind people to perform simple Document editing tasks.
MATCH
Multimodal Access to City Help
A Multimode Portable Device that accepts speech and pen gestures created by ATTT
Allows mobile users to access restaurant and subway information for New York City
24 Conclusion
Multimodal systems are useful for a wide variety of applications
They provide increased robustness ease of use and flexibility.
They provide accessibility to computer to a wider and more diverse range of users
However the area still needs a lot of research and a lot of challenges need to be overcome
25 References
Harald Aust et al. The Philips Automatic Train Timetable Information System Speech Communication 17 1995 249262
Gavin E. Churcher Eric S Atwell Clive Souter Dialogue Management Systems A Survey and Overview University of Leeds Research Report Series Report 97.06 Feb 1997
Sharon J. Goldwater et al. Building a Robust Dialogue System with Limited Data ANLPNAACL 2000 Workshop Conversational Systems
Staffan Larsson et al. GoDiS An Accommodating Dialogue System ANLPNAACL 2000 Workshop Conversational Systems
Diane J. Litman and Shimei Pan Designing and Evaluating an Adaptive Spoken Dialogue System User Modelling and UserAdapted Interaction 12 111137 2002
Michael F. McTear Spoken Dialogue Technology Enabling the conversational user interface ACM Computing Surveys Vol 34 No. 1 March 2002 pp. 90169
Mikio Nakano et al. WIT A toolkit for building robust and realtime spoken dialogue systems 1st Sigdial Workshop at ACL2000
Stephanie Seneff and Joseph Polifroni Dialogue Management in the Mercury Flight Reservation System ANLPNAACL 2000 Workshop Conversational Systems May 2000 pp 1116
Satinder Singh et al. Optimizing Dialogue Management with Reinforcement Learning Experiments with the NJFun SystemJournal of Artificial Intelligence Research 16 2002 105133
M. A. Walker et al. Evaluating spoken dialogue agents with PARADISE Two case studies Computer Speech and Language 1998 12 317347
Wayne Ward and Bryan Pellom The CU Communicator System International Workshop on Automatic Speech Recognition and Understanding 1999 Section 5
Sandra Williams Dialogue Management in Mixedinitiative Cooperative Spoken Language System11th Twente Workshop on Language Technology TWLT11 Dialogue Management in Natural Language Systems Enschade Netherlands June 1996