A Framework For Developing Conversational User Interfaces

About This Presentation

Title:

A Framework For Developing Conversational User Interfaces

Description:

... every table cell can potentially be ... Laboratory directory (auto-attendant) Restaurant query system ... telephone = 'The telephone for :name is :phone' ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 40

Provided by: eugenewe

Learn more at: https://cs.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Framework For Developing Conversational User Interfaces

1
A Framework For Developing Conversational User
Interfaces

James Glass, Eugene Weinstein, Scott Cyphers,
Joseph Polifroni
MIT Computer Science and Artificial Intelligence
Laboratory Cambridge, MA USA

Grace Chung Corporation for National Research
Initiatives Reston, VA USA
Mikio Nakano NTT Corporation Atsugi, Japan
2
Conversational User Interfaces
Speech
Human
Computer
3
Types of Conversational Interfaces

Conversational systems differ in the degree with
which human or computer controls the conversation
(initiative)

Directed Dialogue
Free Form Dialogue
Mixed Initiative Dialogue
4
Conversational Interfaces

Can understand verbal input

Speech recognition
Language understanding (in context)

Language Generation

Can engage in dialogue with a user during the
interaction

Dialogue Management
Speech Synthesis

Can verbalize response
Language generation
Speech synthesis

Audio
Back End
Speech Recognition
Context Resolution
Language Understanding
5
The Problem With Conversational Interfaces

Advanced conversational systems are out there
Both user and computer can take initiative
Goal conversational skill of system should
approach that of human operator
But
These systems are built by experts
Huge learning curve for novices, and
Tremendous iterative effort required even from
experts
For this reason
Most advanced conversational systems remain in
research labs
e.g. Jupiter weather info system
(1-888-573-TALK) Zue et al, IEEE Trans. SAP,
8(1), 2000
However, we have seen limited commercial
deployment
e.g. ATTs How May I Help You, Gorin et al,
Speech Communication, 23, 1997

6
Simplifying Conversational System Creation

Goal make it easier for both expert and novice
developers to create conversational interfaces
But still use advanced human language
technologies
Strategy simplify configuration process
Automatically configure technology components
bases on examples
Allow specification through web interface or
unified configuration file

Configuration Engine
SpeechBuilder
Web Interface
Configuration File
7
Configuring a Conversational Interface Knowledge
Representation

First, define example sentences for in-domain
actions

Then, define the important concepts present in
the actions (attributes)
Concept values make up recognizer vocabulary!
Examples of attributes automatically matched to
attribute classes

8
Starting with a Database Table

Provide database table to configure speech
interface

Only some columns are used to access entries
(e.g., Name)
Values of those columns become values for domain
concepts
Default action sentences are automatically
generated
But, every table cell can potentially be an
answer to a question
All Names of columns become one concept
property

9
Dialogue Management

Generic Dialogue Manager (Polifroni Chung,
ICSLP 2002)

Language Generation
Hotels
Generic Dialogue Manager
Air Travel
Dialogue Management
Speech Synthesis
Sports
Weather
Audio
Back End

Plan system responses
Regularize common concepts
Summarize database results

Speech Recognition
Context Resolution
Language Understanding
10
Context Resolution
Input Query
Show me restaurants in Cambridge.
Resolve Deixis
What does this one serve?
Resolve Pronouns
What is their phone number?
Inherit Predicates
Are there any on Main Street?
Incorporate Fragments
What about Massachusetts Ave?
Fill in Default Values
Give me directions from MIT.
11
Human Language Technology Details

Approach Use same technologies as deployed in
our mainstream, more complex systems
Speech Recognizer (Glass, Computer, Speech, and
Language, 2003)
Trained on 100 hours of mostly telephone speech
Word pronunciations supplied by large dictionary,
generated by rule, or provided by developer
Natural Language Understanding (Seneff,
Computational Linguistics, 1992)
Hierarchical sentence grammar used to parse
sentence hypothesis
Back off to concept spotting when no full parse
is made
Language Generation (BaptistSeneff, ICSLP 2000)
Used in SQL (DB Query) generation, paraphrasing
URL-encoding meaning representation, responses

12
Web-based Interface
Defining Actions and Concepts (Attributes)
13
Web-based Interface Viewing Sentences
Examining how sentences are reduced to an action
and a set of attribute-value pairs
14
Web-based Interface Response Generation
Customizing system responses
15
Web-based Interface Editing Pronunciations
Modifying system generated pronunciations for the
vocabulary
16
Web-based Interface Context Resolution
Context Resolution configured through Masking and
Inheritance of concepts
17
Voice Configuration File An Alternative to the
Web Interface

Entire domain can be specified in single
configuration file
Allows for automated generation of conversational
systems

ltactionsgt ltrequest_namegt i would like a
restaurant can you (showgive) me a Chinese
restaurant in Arlington lt/actionsgt ltattributesgt
ltcuisinegt ChineseTaiwanese ltcitygt
Washington Boston Arlington lt/attributesgt lt
discoursegt name masks(city cuisine
neighborhood) lt/discoursegt ltconstraintsgt ltreques
t_namegt (cityneighborhood) prompt_for_city lt/c
onstraintsgt
18
Deployment

SpeechBuilder functional for the past three years
Some example domains
Office appliance control
Laboratory directory (auto-attendant)
Restaurant query system
Has been used by MIT researchers (experts) as
well as novice developers at our sponsor
companies
Used in technology transfer workshop for
pervasive computing project (Oxygen)
SpeechBuilder has been used as an educational
tool
Computational linguistics class at Georgetown
University
Summer class at Johns Hopkins University
Youngest SpeechBuilder developer 9 years old

19
Japanese SpeechBuilder

Created in collaboration with NTT
Challenge Segmentation (no spaces between words)

20
Example Domain

A hotel application using the generic dialogue
manager
Compiled via SpeechBuilder using constraints
shown previously
Other generic functionality is automatically
included
Illustrated technical issues

Soliciting necessary information from user

Interpreting fragments correctly in context

Canonicalizing relative dates

Ordering and summarizing results of query to
content provider

Resolving superlatives/updating discourse context

Interpreting pronouns in context

Returning and speaking specific properties

Repeating previous replies

21
Another Example Domain Object Manipulation System

Stock SpeechBuilder domain for spoken dialogue
Custom back-end connected to stereo camera and
person tracking algorithm (Demirdjian, WOMOT 2003)

22
Ongoing and Future Work

Incorporate speech synthesis
Allow use of concatenative speech synthesizer (Yi
et al, ICSLP 2000) in SpeechBuilder
Allow use of multiple modalities
Provide functionality to incorporate multimodal
input into systems
Improve dialogue management tools and modules
Improve ability of SpeechBuilder systems to use
more sophisticated dialogue strategies
Provide additional generic semantic concepts for
use in domains
Allow system refinement by unsupervised learning
Use confidence scores to improve domain language
model (NakanoHazen, Eurospeech 2003)
Allow system modification in real-time
Need ability to re-train recognizer during
runtime (Schalkwyk et al, Eurospeech 2003)

23
Thank You! For more information

http//www.sls.csail.mit.edu/
Email us! ecoder_at_mit.edu
Jupiter weather Information system
1-617-258-0300 (outside USA)
1-888-573-TALK (USA toll-free)
Mercury flight information system
1-617-258-6040 (outside USA)
1-877-MIT-TALK (USA toll-free)
Pegasus flight status system
1-617-258-0301 (outside USA)
1-877-LCS-TALK (USA toll-free)

24
THE END
25

Utility for rapid prototyping of speech-based
interfaces
Used to create demonstrations for NTT CS Labs
open house
Prototypes were developed with a few days of
effort
Three papers submitted for publishing

26
Human Language Technologies

Only some columns are used to access entries
(e.g., Name)
Values of those columns become values for domain
concepts
Default action sentences are automatically
generated
But, every table cell can potentially be an
answer to a question
Names of non-access columns become a concept

27
To Configure Response Generation

For each concept present in the domain, define
how queries about that concept should be answered

lttelephonegt The telephone for name is phone

Define some prompts for generic events, e.g.
welcome and goodbye

ltwelcomegt Welcome to the auto-attendant ltno_da
tagt Sorry, there was no data matching your
request.
28
Conversational User Interfaces Input Side
Speech
Find me a flight to Boston on Tuesday
actionflights to_cityBoston dayTuesday
29
Conversational User Interfaces Output Side
Speech
Synthesis
Delta flight, number fifty five from La Guardia
to Boston
Text
Generation
flight_num55 airlineDelta originLGA destBOS
Meaning
DB
Action
30
Conversational User Interfaces The Whole Picture
Or Is It?
Speech
Speech
Synthesis
Text
Generation
Meaning
Action
31
The Missing Pieces Context and Dialogue

Context Resolution

Dialogue Management

32
Conversational User Interfaces The Whole Picture
Speech
Speech
Synthesis
Text
Understanding
Generation
Meaning
Meaning
Context Resolution, Dialogue Management
Action
33
The Problem With Conversational Interfaces

Complex conversational systems are out there
Both user and computer can take initiative
Goal conversational skill of system should
approach that of human operator
But
These systems are built by experts
Huge learning curve for novices, and
Tremendous iterative effort required even from
experts
For this reason
Most advanced conversational systems remain in
research labs
e.g. Jupiter weather info system
(1-888-573-TALK) Zue et al, IEEE Trans. SAP,
8(1), 2000
However, we have seen limited commercial
deployment
e.g. ATTs How May I Help You, Gorin et al,
Speech Communication, 23, 1997

34
Configuring Response Generation

For each concept present in the domain, define
how queries about that concept should be answered
Configure some generic prompts for summarizing
long results
Define some prompts for generic events, e.g.
welcome

35
Configuring Context Resolution

Context Resolution (discourse) configured through
Masking and Inhertiance of concepts
Inheritance configures how actions remember
concepts, e.g.
User What is the phone number for Jim Glass
System Jim Glass phone number is 3-1640
User What about his email address?
System Jim Glass email address is
glass_at_mit.edu
Name concept is inherited
Masking configures how certain concepts block
other concepts, even in the presence of
inheritance, e.g.
User Do you have any restaurants in Boston?
System In Boston, I have the following
User What about in Times Square?
System In Times Square, New York, I have
City concept is masked by Neighborhood concept

Name is inherited
City is masked
36
Voice Configuration File

Developers can also use Voice Configuration
(VCFG) file format to configure SpechBuilder
domains

Generic Dialogue Manager (Polifroni Chung,
ICSLP 2002)

Hotels
Language Generation
Generic Dialogue Manager
Air Travel
Speech Synthesis
Sports
Dialogue Management
Weather

Plan system responses
Regularize common concepts
Summarize database results

Database
Audio
Context Resolution
Speech Recognition
Language Understanding
38
Deployment

SpeechBuilder functional for the past three years
Some example domains
Office appliance control
Laboratory directory (auto-attendant)
Restaurant query system
Has been used by MIT researchers (experts) as
well as novice developers at our partner
companies
SpeechBuilder has been used by students in
Computational linguistics class at Georgetown
University
Summer class at Johns Hopkins University
Technology transfer workshop for pervasive
computing project (Oxygen)
In collaboration with NTT, we have developed a
Japanese version of SpeechBuilder. Japanese
domains
Bus timetable system
Weather information system

39
Configuring a Speech Interface with
SpeechBuilder Knowledge Representation

First define some concepts present in the domain
(attributes)
Concept values make up recognizer vocabulary!

Then, define examples of things to do with the
concepts (actions)
Examples of attributes automatically matched to
attribute classes

Write a Comment

User Comments (0)

About PowerShow.com

A Framework For Developing Conversational User Interfaces - PowerPoint PPT Presentation

A Framework For Developing Conversational User Interfaces

... every table cell can potentially be ... Laboratory directory (auto-attendant) Restaurant query system ... telephone = 'The telephone for :name is :phone' ... – PowerPoint PPT presentation