A Framework For Developing Conversational User Interfaces - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

A Framework For Developing Conversational User Interfaces

Description:

... every table cell can potentially be ... Laboratory directory (auto-attendant) Restaurant query system ... telephone = 'The telephone for :name is :phone' ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 40
Provided by: eugenewe
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Framework For Developing Conversational User Interfaces


1
A Framework For Developing Conversational User
Interfaces
  • James Glass, Eugene Weinstein, Scott Cyphers,
    Joseph Polifroni
  • MIT Computer Science and Artificial Intelligence
    Laboratory Cambridge, MA USA

Grace Chung Corporation for National Research
Initiatives Reston, VA USA
Mikio Nakano NTT Corporation Atsugi, Japan
2
Conversational User Interfaces
Speech
Human
Computer
3
Types of Conversational Interfaces
  • Conversational systems differ in the degree with
    which human or computer controls the conversation
    (initiative)

Directed Dialogue
Free Form Dialogue
Mixed Initiative Dialogue
4
Conversational Interfaces
  • Can understand verbal input
  • Speech recognition
  • Language understanding (in context)

Language Generation
  • Can engage in dialogue with a user during the
    interaction

Dialogue Management
Speech Synthesis
  • Can verbalize response
  • Language generation
  • Speech synthesis

Audio
Back End
Speech Recognition
Context Resolution
Language Understanding
5
The Problem With Conversational Interfaces
  • Advanced conversational systems are out there
  • Both user and computer can take initiative
  • Goal conversational skill of system should
    approach that of human operator
  • But
  • These systems are built by experts
  • Huge learning curve for novices, and
  • Tremendous iterative effort required even from
    experts
  • For this reason
  • Most advanced conversational systems remain in
    research labs
  • e.g. Jupiter weather info system
    (1-888-573-TALK) Zue et al, IEEE Trans. SAP,
    8(1), 2000
  • However, we have seen limited commercial
    deployment
  • e.g. ATTs How May I Help You, Gorin et al,
    Speech Communication, 23, 1997

6
Simplifying Conversational System Creation
  • Goal make it easier for both expert and novice
    developers to create conversational interfaces
  • But still use advanced human language
    technologies
  • Strategy simplify configuration process
  • Automatically configure technology components
    bases on examples
  • Allow specification through web interface or
    unified configuration file

Configuration Engine
SpeechBuilder
Web Interface
Configuration File
7
Configuring a Conversational Interface Knowledge
Representation
  • First, define example sentences for in-domain
    actions
  • Then, define the important concepts present in
    the actions (attributes)
  • Concept values make up recognizer vocabulary!
  • Examples of attributes automatically matched to
    attribute classes

8
Starting with a Database Table
  • Provide database table to configure speech
    interface
  • Only some columns are used to access entries
    (e.g., Name)
  • Values of those columns become values for domain
    concepts
  • Default action sentences are automatically
    generated
  • But, every table cell can potentially be an
    answer to a question
  • All Names of columns become one concept
    property

9
Dialogue Management
  • Generic Dialogue Manager (Polifroni Chung,
    ICSLP 2002)

Language Generation
Hotels
Generic Dialogue Manager
Air Travel
Dialogue Management
Speech Synthesis
Sports
Weather
Audio
Back End
  • Plan system responses
  • Regularize common concepts
  • Summarize database results

Speech Recognition
Context Resolution
Language Understanding
10
Context Resolution
Input Query
Show me restaurants in Cambridge.
Resolve Deixis
What does this one serve?
Resolve Pronouns
What is their phone number?
Inherit Predicates
Are there any on Main Street?
Incorporate Fragments
What about Massachusetts Ave?
Fill in Default Values
Give me directions from MIT.
11
Human Language Technology Details
  • Approach Use same technologies as deployed in
    our mainstream, more complex systems
  • Speech Recognizer (Glass, Computer, Speech, and
    Language, 2003)
  • Trained on 100 hours of mostly telephone speech
  • Word pronunciations supplied by large dictionary,
    generated by rule, or provided by developer
  • Natural Language Understanding (Seneff,
    Computational Linguistics, 1992)
  • Hierarchical sentence grammar used to parse
    sentence hypothesis
  • Back off to concept spotting when no full parse
    is made
  • Language Generation (BaptistSeneff, ICSLP 2000)
  • Used in SQL (DB Query) generation, paraphrasing
    URL-encoding meaning representation, responses

12
Web-based Interface
Defining Actions and Concepts (Attributes)
13
Web-based Interface Viewing Sentences
Examining how sentences are reduced to an action
and a set of attribute-value pairs
14
Web-based Interface Response Generation
Customizing system responses
15
Web-based Interface Editing Pronunciations
Modifying system generated pronunciations for the
vocabulary
16
Web-based Interface Context Resolution
Context Resolution configured through Masking and
Inheritance of concepts
17
Voice Configuration File An Alternative to the
Web Interface
  • Entire domain can be specified in single
    configuration file
  • Allows for automated generation of conversational
    systems

ltactionsgt ltrequest_namegt i would like a
restaurant can you (showgive) me a Chinese
restaurant in Arlington lt/actionsgt ltattributesgt
ltcuisinegt ChineseTaiwanese ltcitygt
Washington Boston Arlington lt/attributesgt lt
discoursegt name masks(city cuisine
neighborhood) lt/discoursegt ltconstraintsgt ltreques
t_namegt (cityneighborhood) prompt_for_city lt/c
onstraintsgt
18
Deployment
  • SpeechBuilder functional for the past three years
  • Some example domains
  • Office appliance control
  • Laboratory directory (auto-attendant)
  • Restaurant query system
  • Has been used by MIT researchers (experts) as
    well as novice developers at our sponsor
    companies
  • Used in technology transfer workshop for
    pervasive computing project (Oxygen)
  • SpeechBuilder has been used as an educational
    tool
  • Computational linguistics class at Georgetown
    University
  • Summer class at Johns Hopkins University
  • Youngest SpeechBuilder developer 9 years old

19
Japanese SpeechBuilder
  • Created in collaboration with NTT
  • Challenge Segmentation (no spaces between words)

20
Example Domain
  • A hotel application using the generic dialogue
    manager
  • Compiled via SpeechBuilder using constraints
    shown previously
  • Other generic functionality is automatically
    included
  • Illustrated technical issues
  • Soliciting necessary information from user
  • Interpreting fragments correctly in context
  • Canonicalizing relative dates
  • Ordering and summarizing results of query to
    content provider
  • Resolving superlatives/updating discourse context
  • Interpreting pronouns in context
  • Returning and speaking specific properties
  • Repeating previous replies

21
Another Example Domain Object Manipulation System
  • Stock SpeechBuilder domain for spoken dialogue
  • Custom back-end connected to stereo camera and
    person tracking algorithm (Demirdjian, WOMOT 2003)

22
Ongoing and Future Work
  • Incorporate speech synthesis
  • Allow use of concatenative speech synthesizer (Yi
    et al, ICSLP 2000) in SpeechBuilder
  • Allow use of multiple modalities
  • Provide functionality to incorporate multimodal
    input into systems
  • Improve dialogue management tools and modules
  • Improve ability of SpeechBuilder systems to use
    more sophisticated dialogue strategies
  • Provide additional generic semantic concepts for
    use in domains
  • Allow system refinement by unsupervised learning
  • Use confidence scores to improve domain language
    model (NakanoHazen, Eurospeech 2003)
  • Allow system modification in real-time
  • Need ability to re-train recognizer during
    runtime (Schalkwyk et al, Eurospeech 2003)

23
Thank You! For more information
  • http//www.sls.csail.mit.edu/
  • Email us! ecoder_at_mit.edu
  • Jupiter weather Information system
  • 1-617-258-0300 (outside USA)
  • 1-888-573-TALK (USA toll-free)
  • Mercury flight information system
  • 1-617-258-6040 (outside USA)
  • 1-877-MIT-TALK (USA toll-free)
  • Pegasus flight status system
  • 1-617-258-0301 (outside USA)
  • 1-877-LCS-TALK (USA toll-free)

24
THE END
25
  • Utility for rapid prototyping of speech-based
    interfaces
  • Used to create demonstrations for NTT CS Labs
    open house
  • Prototypes were developed with a few days of
    effort
  • Three papers submitted for publishing

26
Human Language Technologies
  • Only some columns are used to access entries
    (e.g., Name)
  • Values of those columns become values for domain
    concepts
  • Default action sentences are automatically
    generated
  • But, every table cell can potentially be an
    answer to a question
  • Names of non-access columns become a concept

27
To Configure Response Generation
  • For each concept present in the domain, define
    how queries about that concept should be answered

lttelephonegt The telephone for name is phone
  • Define some prompts for generic events, e.g.
    welcome and goodbye

ltwelcomegt Welcome to the auto-attendant ltno_da
tagt Sorry, there was no data matching your
request.
28
Conversational User Interfaces Input Side
Speech
Find me a flight to Boston on Tuesday
actionflights to_cityBoston dayTuesday
29
Conversational User Interfaces Output Side
Speech
Synthesis
Delta flight, number fifty five from La Guardia
to Boston
Text
Generation
flight_num55 airlineDelta originLGA destBOS
Meaning
DB
Action
30
Conversational User Interfaces The Whole Picture
Or Is It?
Speech
Speech
Synthesis
Text
Generation
Meaning
Action
31
The Missing Pieces Context and Dialogue
  • Context Resolution


  • Dialogue Management



32
Conversational User Interfaces The Whole Picture
Speech
Speech
Synthesis
Text
Understanding
Generation
Meaning
Meaning
Context Resolution, Dialogue Management
Action
33
The Problem With Conversational Interfaces
  • Complex conversational systems are out there
  • Both user and computer can take initiative
  • Goal conversational skill of system should
    approach that of human operator
  • But
  • These systems are built by experts
  • Huge learning curve for novices, and
  • Tremendous iterative effort required even from
    experts
  • For this reason
  • Most advanced conversational systems remain in
    research labs
  • e.g. Jupiter weather info system
    (1-888-573-TALK) Zue et al, IEEE Trans. SAP,
    8(1), 2000
  • However, we have seen limited commercial
    deployment
  • e.g. ATTs How May I Help You, Gorin et al,
    Speech Communication, 23, 1997

34
Configuring Response Generation
  • For each concept present in the domain, define
    how queries about that concept should be answered
  • Configure some generic prompts for summarizing
    long results
  • Define some prompts for generic events, e.g.
    welcome

35
Configuring Context Resolution
  • Context Resolution (discourse) configured through
    Masking and Inhertiance of concepts
  • Inheritance configures how actions remember
    concepts, e.g.
  • User What is the phone number for Jim Glass
  • System Jim Glass phone number is 3-1640
  • User What about his email address?
  • System Jim Glass email address is
    glass_at_mit.edu
  • Name concept is inherited
  • Masking configures how certain concepts block
    other concepts, even in the presence of
    inheritance, e.g.
  • User Do you have any restaurants in Boston?
  • System In Boston, I have the following
  • User What about in Times Square?
  • System In Times Square, New York, I have
  • City concept is masked by Neighborhood concept

Name is inherited
City is masked
36
Voice Configuration File
  • Developers can also use Voice Configuration
    (VCFG) file format to configure SpechBuilder
    domains

ltactionsgt ltrequest_namegt i would like a
restaurant can you (showgive) me a Chinese
restaurant in Arlington lt/actionsgt ltattributesgt
ltcuisinegt ChineseTaiwanese ltcitygt
Washington Boston Arlington lt/attributesgt lt
discoursegt name masks(city cuisine
neighborhood) lt/discoursegt ltconstraintsgt ltreques
t_namegt (cityneighborhood) prompt_for_city lt/c
onstraintsgt
37
Dialogue Management
  • Generic Dialogue Manager (Polifroni Chung,
    ICSLP 2002)

Hotels
Language Generation
Generic Dialogue Manager
Air Travel
Speech Synthesis
Sports
Dialogue Management
Weather
  • Plan system responses
  • Regularize common concepts
  • Summarize database results

Database
Audio
Context Resolution
Speech Recognition
Language Understanding
38
Deployment
  • SpeechBuilder functional for the past three years
  • Some example domains
  • Office appliance control
  • Laboratory directory (auto-attendant)
  • Restaurant query system
  • Has been used by MIT researchers (experts) as
    well as novice developers at our partner
    companies
  • SpeechBuilder has been used by students in
  • Computational linguistics class at Georgetown
    University
  • Summer class at Johns Hopkins University
  • Technology transfer workshop for pervasive
    computing project (Oxygen)
  • In collaboration with NTT, we have developed a
    Japanese version of SpeechBuilder. Japanese
    domains
  • Bus timetable system
  • Weather information system

39
Configuring a Speech Interface with
SpeechBuilder Knowledge Representation
  • First define some concepts present in the domain
    (attributes)
  • Concept values make up recognizer vocabulary!
  • Then, define examples of things to do with the
    concepts (actions)
  • Examples of attributes automatically matched to
    attribute classes
Write a Comment
User Comments (0)
About PowerShow.com