Stochastic Language Generation for Spoken Dialog Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Stochastic Language Generation for Spoken Dialog Systems

Description:

Stochastic natural language generator. Festival domain-dependent Text-to-Speech ... We designed a corpus-driven stochastic generation engine that takes advantage of ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 23
Provided by: alic49
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Stochastic Language Generation for Spoken Dialog Systems


1
Stochastic Language Generation for Spoken Dialog
Systems
  • Alice Oh
  • aliceo_at_cs.cmu.edu
  • 24 October 2009

2
Communicator Project
  • A spoken dialog system in which users engage in a
    telephone conversation with the system using
    natural language to solve a complex travel
    reservation task
  • Components
  • Sphinx-II speech recognizer
  • Phoenix semantic parser
  • Domain agents
  • Agenda-based dialog manager
  • Stochastic natural language generator
  • Festival domain-dependent Text-to-Speech
  • Want to know more? Call toll-free at
    1-877-CMU-PLAN

3
Natural Language Generation
  • Natural Language Understanding (NLU)
  • Words ? Semantic (syntactic) representation
  • Natural Language Generation (NLG)
  • Semantic (syntactic) Represetation ? Words
  • There has been active research in machine
    translation and automatic summarization (e.g.,
    stock quotes, weather, medical information)
    communities.
  • NLG is often divided into 3 steps
  • Text planning
  • Content planning
  • Sentence realization

4
Current Approaches
  • Traditional (rule-based) NLG
  • hand-crafted generation grammar rules and other
    knowledge
  • input a very richly specified set of semantic
    and syntactic features
  • Example
  • (h / possibleltlatent
  • domain (h2 / obligatoryltnecessary
  • domain (e / eat,take in
  • agent you
  • patient (c / poulet))))
  • You may have to eat chicken
  • Template-based NLG
  • simple to build
  • input a dialog act, and/or a set of slot-value
    pairs
  • from a Nitrogen demo website,
    http//www.isi.edu/natural-language/projects/nitro
    gen/

5
Problem Statement
  • To build a generation engine for a dialog system
    that can combine the advantages, as well as
    overcome the difficulties, of the two current
    approaches (template-based generation, and
    traditional, linguistic NLG)

6
Our Approach
  • We designed a corpus-driven stochastic generation
    engine that takes advantage of the
    characteristics of task-oriented conversational
    systems. Some of those characteristics are that
  • Spoken language is often grammatically imperfect
  • Spoken utterances are much shorter in length
  • Task-oriented dialogs are very focused and to the
    point (no flowery speech)
  • There are well-defined subtopics within the task,
    so the language can be selectively modeled

7
Stochastic NLG overview
  • Language Model an n-gram language model built
    from a corpus of travel reservation dialogs
  • Generation given an utterance class, randomly
    generates a set of candidate utterances based on
    the LM distributions
  • Scoring based on a set of rules, scores the
    candidates and picks the best one
  • Slot filling substitute slots in the utterance
    with the appropriate values in the input frame

8
Stochastic NLG can also be thought of as a way to
automatically build templates from a corpus
  • If you set n equal to a large enough number, most
    utterances generated by LM-NLG will be exact
    duplicates of the utterances in the corpus.

9
Stochastic NLG Language Model
  • Human-Human dialogs in travel reservations
  • (Leah, ATIS/American Express dialogs)

10
Tags
  • Utterance classes (29)
  • query_arrive_city inform_airport
  • query_arrive_time inform_confirm_utterance
  • query_arrive_time inform_epilogue
  • query_confirm inform_flight
  • query_depart_date inform_flight_another
  • query_depart_time inform_flight_earlier
  • query_pay_by_card inform_flight_earliest
  • query_preferred_airport inform_flight_later
  • query_return_date inform_flight_latest
  • query_return_time inform_not_avail
  • hotel_car_info inform_num_flights
  • hotel_hotel_chain inform_price
  • hotel_hotel_info other
  • hotel_need_car
  • hotel_need_hotel
  • hotel_where
  • Attributes (24)
  • airline flight_num
  • am hotel
  • arrive_airport hotel_city
  • arrive_city hotel_price
  • arrive_date name
  • arrive_time num_flights
  • car_company pm
  • car_price price
  • connect_airline
  • connect_airport
  • connect_city
  • depart_airport
  • depart_city
  • depart_date
  • depart_time
  • depart_tod

11
Tagging
  • CMU corpus tagged manually
  • SRI corpus tagged semi-automatically using
    trigram language models built from CMU corpus

12
Stochastic NLG Generation
  • Given an utterance class, randomly generates a
    set of candidate utterances based on the LM
    distributions
  • Generation stops when an utterance has penalty
    score of 0 or the maximum number of iterations
    (50) has been reached
  • Average time 75 msec for Communicator dialogs

13
Stochastic NLG Scoring
  • Assign various penalty scores for
  • unusual length of utterance (thresholds for
    too-long and too-short)
  • slot in the generated utterance with an invalid
    (or no) value in the input frame
  • a new and required attribute in the input
    frame thats missing from the generated utterance
  • repeated slots in the generated utterance
  • Pick the utterance with the lowest penalty (or
    stop generating at an utterance with 0 penalty)

14
Stochastic NLG Slot Filling
  • Substitute slots in the utterance with the
    appropriate values in the input frame
  • Example
  • What time do you need to arrive in arrive_city?
  • What time do you need to arrive in New York?

15
Examples
  • Corpus
  • What time do you want to depart depart_city?
  • What time on depart_date would you like to
    depart?
  • What time would you like to leave?
  • What time do you want to depart on depart_date?
  • Output (different from corpus)
  • What time would you like to depart?
  • What time on depart_date would you like to
    depart depart_city?
  • What time on depart_date would you like to
    depart on depart_date?

16
Rejected Examples
  • Not enough info
  • There is a flight at depart_time ampm.
  • Contains attributes not specified in the frame
  • I have an airline flight at depart_time
    ampm from depart_city arriving at
    arrive_time ampm with a stop-over in
    connect_city at connect_airport.
  • Scoring to get the best utterance is important!

17
Evaluation
  • User satisfaction questionnaire
  • Comparative evaluation
  • two systems with different NLG
  • human reading the output, teasing out TTS
  • compare task completion, as well as user
    satisfaction
  • Batch-mode generation, output evaluated by a
    human grader

18
Preliminary Evaluation
  • Batch-mode generation using two systems,
    comparative evaluation of output by human
    subjects
  • User Preferences (49 utterances total)
  • Weak preference for Stochastic NLG (p 0.18)

subject
stochastic
templates
difference
1
41
8
33
2
34
15
19
3
17
32
-15
4
32
17
15
5
30
17
13
6
27
19
8
7
8
41
-33
average
27
21.29
5.71
19
Evaluation
  • Must be able to evaluate generation independent
    of the rest of the dialog system
  • Comparative evaluation using dialog transcripts
  • need 15-20 subjects
  • 8-10 dialogs system output generated batch-mode
    by two different engines
  • Evaluation of human travel agent utterances
  • Do users rate them well?
  • Is it good enough to model human utterances?

20
Stochastic NLG Advantages
  • corpus-driven
  • easy to build (minimal knowledge engineering)
  • fast prototyping
  • minimal input (speech act, slot values)
  • natural output
  • leverages data-collecting/tagging effort

21
Stochastic NLG Shortcomings
  • What might sound natural (imperfect grammar,
    intentional omission of words, etc.) for a human
    speaker may sound awkward (or wrong) for the
    system.
  • It is difficult to define utterance boundaries
    and utterance classes. Some utterances in the
    corpus may be a conjunction of more than one
    utterance class.
  • Factors other than the utterance class may affect
    the words (e.g., discourse history).
  • Some sophistication built into traditional NLG
    engines is not available (e.g., aggregation,
    anaphorization).

22
Future Work
  • How big of a corpus do we need?
  • How much of it needs manual tagging?
  • How does the n in n-gram affect the output?
  • What happens to output when two different human
    speakers are modeled in one model?
  • Can we replace scoring with a search algorithm?
Write a Comment
User Comments (0)
About PowerShow.com