Novice users will know how to adjust a knob but may not kno PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Novice users will know how to adjust a knob but may not kno


1
A Tale of Two Dialogue Models
CISC882 Natural Language Processing Final Project
  • By Sachin Kamboj

2
The Two Models
  • Missing Axiom Theory Model
  • Circuit Fix-It
  • Ronnie W. Smith, D. Richard Hipp, Alan W.
    Biermann An architecture for voice dialog
    systems based on prolog style theorem proving,
    Computational Linguistics, Volume 21, Number 3,
    September 1995.
  • Multimodal Speech and Gesture Models
  • Oviatt et. al. Designing the user interface for
    multimodal speech and pen-based gesture
    applications State of the art systems and future
    research directions, Human Computer Interaction,
    2000, vol. 15(4) pg 263-322

3
Missing Axiom Theory Model
4
Circuit Fix-ItAims/Target Behavior
  • The system should be able to handle
  • Convergence to a goal
  • Subdialogues and effective movement between them
  • An accounting of user knowledge and abilities
  • Change of initiative
  • Expectation of user input

5
Circuit Fix-ItSample Dialogs
  • Computer Put the knob to one zero.
  • User Okay.
  • Computer What is the voltage between connector
    one two one and connector three four?
  • User I do not know.
  • Computer Locate the voltmeter.
  • User Done.
  • Computer Add a wire between the minus com hole
    on the voltmeter and the
  • connector one two one.
  • User How do I do that?
  • Computer Connect the end of the black wire
    with the small plug to the minus
  • corn hole on the voltmeter.
  • User Done.
  • Computer Connect the end of the black wire with
    the large plug to connector one
  • two one.
  • User Which Knob?
  • Computer Near the top boundary and in the
    center is a green region. Now, in the middle
    right of this green region is the control knob.
  • User Okay.
  • Computer Put the knob to one zero.

6
Circuit Fix-ItProlog Style Theorem Proving
  • Based on the Missing Axiom Theory
  • System is built around a theorem prover and the
    role of language is to supply the missing axioms
  • Goal is stated as a prolog axiom to be proven
  • The system tries to prove the axiom in a top-down
    fashion.
  • If the proof succeeds using internally available
    knowledge, the dialog terminates without any
    interaction with the user.
  • If the proof fails the system tries to find the
    missing axiom by engaging in a dialog
  • observeposition(sw1,X) ? find(sw1),
    reportposition (sw1,X)

7
Circuit Fix-ItImplementing the Subdialog Feature
  • One of the requirements of the system is to allow
    subdialogs.
  • As the system engages in conversation, only to
    prove missing axioms, each subdialog involves a
    separate proof.
  • Hence the system cannot follow a simple
    depth-first policy to complete a proof.
  • Instead, to switch between subdialogs, the system
    should allow the freezing of any proof and the
    transfer of control to a different proof
  • Partially completed proofs have to be maintained
    in memory.
  • Freezing of proofs handled through an
    Interruptible Prolog Simulator (IPSIM)

8
Circuit Fix-ItAccounting for User Knowledge
  • The system should know what the user is capable
    of doing
  • The requests should match the abilities of the
    user
  • Abilities of different users will vary
  • Novice users will know how to adjust a knob but
    may not know how to take a voltmeter reading
  • The system uses a user model to determine what
    can be expected of the user.
  • The users capabilities are specified in the form
    of prolog style rules
  • If the input describes some physical state, then
    conclude that the user knows how to observe the
    physical state. In addition if the physical state
    is a property, then infer that the user knows how
    to locate the object that has that property.

9
Circuit Fix-ItMechanisms for Obtaining Variable
Initiative
  • Variable initiative takes a role in selecting the
    next subdialog to be entered.
  • The system implements four levels of initiative
  • Directive Mode unless the user needs
    clarification, the system selects its response
    according to the next goal
  • Suggestive Mode the system will select its
    response depending on the next goal but will
    allow interruptions to subdialogs about related
    goals
  • Declarative Mode the user has dialog control,
    but the system is free to mention relevant facts
  • Passive Mode The user has complete control. The
    system will provide information only in direct
    response to the users questions.

10
Circuit Fix-ItImplementation and Uses of
Expectation
  • If the computer produces an utterance that is an
    attempt to have a specific task step S performed,
    there are expectations for any of the following
    types of responses
  • A statement about the missing or uncertain
    background knowledge necessary for the
    accomplishment of S
  • A statement about a subgoal of S.
  • A statement about the underlying purpose for S.
  • A statement about ancestor task steps of which
    accomplishment of S is a part
  • A statement indicating the accomplishment of S.
  • Expectations serve two purposes
  • The detection and correction of errors
  • Provide an indication of the shift between
    subdialogs

11
Circuit Fix-ItImplementation and Uses of
Expectation
  • The system computes the expectations and the cost
    of each expectation.
  • The system also computes a set of meanings (or
    semantic representations) of user utterances with
    a corresponding cost
  • The system combines the two costs
  • C ß µ (1 ß)E
  • The meaning with the smallest total cost is
    selected as the output of the parser

12
Circuit Fix-ItImplementation and Uses of
Expectation
  • An important side effect of matching meanings
    with expectations is the ability to interpret an
    utterance whose content does not specify its
    meaning.
  • The reference of pronouns
  • Turn the switch up
  • Where is it?
  • The meaning of short answers
  • Turn the switch up
  • Okay
  • Maintaining dialog coherence

13
Basic Algorithm
  • ZmodSubdialog(Goal)
  • Create subdialog data structures
  • While there are rules available which may achieve
    Goal
  • Grab next available rule R from knowledge unify
    with Goal
  • If R trivially satisfies Goal, return with
    success
  • If R is vocalize(X) then
  • Execute verbal output X (mode)
  • Record expectation
  • Receive response (mode)
  • Record implicit and explicit meanings for
    response
  • Transfer control depending on which expected
    response was received
  • Success response Return with success
  • Negative response No action
  • Confused response Modify rule for
    clarification prioritize for execution
  • Interrupt Match response to expected response
    of another subdialog
  • Go to that subdialog (mode)
  • If R is a general rule then
  • Store its antecedents
  • While there are more antecedents to process

14
Multimodal Speech and Gesture Interface Models
15
Introduction
  • What are multimodal interfaces?
  • Humans perceive the world through senses
  • Ears (hearing), Eyes (sight), Nose (smell), Skin
    (touch) and Tongue (taste)
  • Communication through one sense is known as a
    mode
  • Computers may process information through modes
    as well
  • Keyboards, Microphone, Mice, etc.
  • Multimodal interfaces try to combine two
    different modes of communicating.
  • Slide borrowed from a talk on Multimodal
    Interfaces by Joe Caloza

16
Advantages
  • Combination of modalities allows more powerfully
    expressive and transparent information seeking
    dialogues
  • Different modalities provide complimentary
    capabilities
  • Users prefer speech input for functions like
    describing objects and events and for issuing
    commands
  • Pen input is preferred for conveying symbols,
    signs and gestures and for pointing and selecting
    visible objects
  • Multimodal pen/voice interaction can result in
    10 faster task completion time, 36 fewer
    task-critical content errors 50 fewer
    spontaneous disfluencies and shorter and more
    simplified linguistic constructions
  • Corresponds to a 90-100 user preference to
    interact multimodally

17
Advantages (2)
  • Able to support superior error-handling compared
    with unimodal recognition interfaces
  • User-centric reasons
  • Users will select the input mode that they judge
    to be less error prone for a particular lexical
    context
  • Users language is simplified when interacting
    multimodally
  • Users have a strong tendency to switch modes
    after system errors
  • System-centric reasons
  • A well-designed multimodal architecture can
    support mutual disambiguation of input signals

18
Advantages (3)
  • Allow users to exercise selection and control
    over how they interact with the computer
  • Hence can accommodate a broader range of users
  • A visually impaired user may prefer speech input
    and TTS output
  • A user with a hearing impairment, strong accent,
    or a cold may prefer pen input
  • Multimodal interfaces are particularly suitable
    for supporting mobile tasks such as communication
    and personal navigation

19
Types of Multimodal Architecture
  • Can be subdivided into two main types
  • Early Fusion
  • Integrate signals at the feature level
  • Based on Hidden Markov Models and Temporal Neural
    Networks
  • The recognition process in one mode influences
    the course of recognition in the other
  • Used for closely coupled and synchronized
    modalities (eg speech and lip movement)
  • Systems tend not to apply or generalize as well
    if the modes differ substantially in the
    information content or time scale characteristics
  • Require a large amount of training data to build
    the system.

20
Types of Multimodal Architecture
  • Late Fusion
  • Integrate information at a semantic level
  • Use individual recognizers trained using unimodal
    data
  • Systems based on semantic fusion can be scaled up
    easier whether in the number of input modes or
    the size and type of the vocabulary sets
  • Require an architecture that supports
    fine-grained time stamping of at least the
    beginning and end of each input signal
  • Required to figure out if two signals are part of
    a multimodal construction or whether they should
    be interpreted as unimodal commands.

21
Multimodal Architecture
Speech
Pen, Glove, Laser
Gesture Recognition
Speech Recognition
Gesture Understanding
NLP
Context Management
Multimodal Integration
Dialogue Manager
Graphics
VR
TTS
Application Invocation and Coordination
Response Planning
App1
App2
App3
22
Applications
  • OGI QuickSet System
  • Enables a user to create and position entities on
    a map with speech, pen-based gestures and direct
    manipulation.
  • These entities are then used to initialize/run a
    simulation
  • IBMs Human-Centric Word Processor
  • Combines Natural Language understanding with
    pen-based pointing and selection gestures.
  • Used to correct, manipulate and format text after
    it has been entered
  • Boeings Virtual Reality Aircraft Maintenance
    Training Prototype
  • Used for accessing the maintainability of new
    aircraft designs and training mechanics in
    maintenance procedures using VR

23
Applications
  • Meditor Multimode Text Editor
  • Combines keyboard, Braille terminal, a French
    text-to-speech synthesiser, and a speech
    recognition system.
  • Allows Blind people to perform simple Document
    editing tasks.
  • MATCH
  • Multimodal Access to City Help
  • A Multimode Portable Device that accepts speech
    and pen gestures created by ATTT
  • Allows mobile users to access restaurant and
    subway information for New York City

24
Conclusion
  • Multimodal systems are useful for a wide variety
    of applications
  • They provide increased robustness, ease of use
    and flexibility.
  • They provide accessibility to computer to a wider
    and more diverse range of users
  • However the area still needs a lot of research
    and a lot of challenges need to be overcome

25
References
  • Harald Aust et al. The Philips Automatic Train
    Timetable Information System, Speech
    Communication 17 (1995) 249-262
  • Gavin E. Churcher, Eric S Atwell, Clive Souter
    Dialogue Management Systems A Survey and
    Overview University of Leeds, Research Report
    Series, Report 97.06, Feb 1997
  • Sharon J. Goldwater et al. Building a Robust
    Dialogue System with Limited Data, ANLP-NAACL
    2000 Workshop Conversational Systems
  • Staffan Larsson et al. GoDiS- An Accommodating
    Dialogue System, ANLP-NAACL 2000 Workshop
    Conversational Systems
  • Diane J. Litman and Shimei Pan Designing and
    Evaluating an Adaptive Spoken Dialogue System,
    User Modelling and User-Adapted Interaction 12
    111-137, 2002
  • Michael F. McTear Spoken Dialogue Technology
    Enabling the conversational user interface, ACM
    Computing Surveys, Vol 34, No. 1, March 2002, pp.
    90-169
  • Mikio Nakano et al. WIT A toolkit for building
    robust and real-time spoken dialogue systems, 1st
    Sigdial Workshop at ACL2000
  • Stephanie Seneff and Joseph Polifroni Dialogue
    Management in the Mercury Flight Reservation
    System, ANLP-NAACL 2000 Workshop Conversational
    Systems, May 2000, pp 11-16
  • Satinder Singh et al. Optimizing Dialogue
    Management with Reinforcement Learning
    Experiments with the NJFun SystemJournal of
    Artificial Intelligence Research 16 (2002)
    105-133
  • M. A. Walker et al. Evaluating spoken dialogue
    agents with PARADISE Two case studies, Computer
    Speech and Language (1998) 12, 317-347
  • Wayne Ward and Bryan Pellom The CU Communicator
    System, International Workshop on Automatic
    Speech Recognition and Understanding (1999),
    Section 5
  • Sandra Williams Dialogue Management in
    Mixed-initiative, Cooperative, Spoken Language
    System11th Twente Workshop on Language
    Technology (TWLT11) Dialogue Management in
    Natural Language Systems, Enschade, Netherlands,
    June 1996

26
Questions?
Write a Comment
User Comments (0)
About PowerShow.com