CS 904: Natural Language Processing Spoken Dialogue Systems - PowerPoint PPT Presentation

Loading...

PPT – CS 904: Natural Language Processing Spoken Dialogue Systems PowerPoint presentation | free to view - id: 56153-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CS 904: Natural Language Processing Spoken Dialogue Systems

Description:

A system that allows a user to speak his queries in natural language ... constructing a plan (e.g. kitchen design consultant, a plan to rescue from an island) ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 42
Provided by: ven7
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS 904: Natural Language Processing Spoken Dialogue Systems


1
CS 904 Natural Language ProcessingSpoken
Dialogue Systems
  • L. Venkata Subramaniam
  • April 2, 2002

2
What is a Spoken Dialogue System?
  • A system that allows a user to speak his queries
    in natural language and receive useful responses
    from it.
  • Spoken dialogue systems provide an interface
    between the user and a computer-based application
    that permits spoken interaction with the
    application in a relatively natural manner.

3
Issues in Dialogue Systems
  • System needs to participate actively to maintain
    a natural, smooth-flowing dialogue even in the
    event of recognition and interpretation errors.
  • Use of acknowledgements to verify understanding.
  • Recognize when something is not understood and
    generate clarification sub dialogues.

4
Spoken and Written Dialogues
  • Spoken Querying
  • Recognition errors
  • Grammatical errors in speech
  • Unclear sentence boundaries
  • Omissions and word fragments
  • Inversions
  • Interjections
  • Speech repairs
  • Written querying does not pose many of these
    problems.

5
Interactive Voice Response (IVR) Systems and
Dialogue Systems
  • IVR
  • Provide an interface between users and computer
    databases over telephone lines.
  • Employ a touch-tone or DTMF user interface.
  • Newer systems allow simple voice commands.
  • Spoken Dialogue System
  • Permits spoken interaction for a user with the
    application in a relatively natural manner.

6
Controlled Speech and Spontaneous Speech
  • Controlled speech has limited task vocabulary and
    grammar
  • Spontaneous speech
  • High out-of-vocabulary rate
  • Higher Recognition errors
  • High grammatical variation
  • Unclear sentence boundaries, omissions,
    inversions, word fragments, interjections,
    restarts, speech repairs.

7
System Performance Issues
  • It must run in near real time.
  • The user should need minimal training and should
    not be constrained in what he can say.
  • The dialogue should result in something that can
    be independently evaluated.

8
Performance Assessment
  • Confusion matrices for key words
  • Number of dialogue turns
  • Rate of correction/repair turns
  • Time to completion
  • Transaction success rate
  • Quality of the final solution

There is no correct answer so it is difficult to
measure accuracy as in speech recognition for
instance.
9
Dialogue Complexity and Example Systems
10
Levels of Sophistication in a Dialogue System
  • Touch-tone replacement
  • System Prompt "For checking information, press
    or say one."

    Caller Response "One."
  • Directed dialogue
  • System Prompt "Would you like checking account
    information or rate information?"

    Caller Response "Checking", or
    "checking account," or "rates."
  • Natural language
  • System Prompt "What transaction would you like
    to perform?"
    Caller
    Response "Transfer Rs. 500 from checking to
    savings."

11
Levels of Complexity in Dialogue Management
  • Strict Policy
  • User can only specify information relating to
    current goal/subgoal
  • Context is easier to determine
  • Free Policy
  • Handle unintended requests or requests that
    deviate from the task
  • Context more difficult to determine
  • Can lead to confusion/errors

12
Initiative
  • System-initiative system always has control,
    user only responds to system questions
  • User-initiative user always has control, system
    passively answers user questions
  • Mixed-initiative control switches between system
    and user using fixed rules
  • Variable-initiative control switches between
    system and user dynamically based on participant
    roles, dialogue history, etc.

13
Dialogue and Task Complexity
  • Practical Dialogue Dialogue is focussed on
    accomplishing a concrete task.

14
Finite State Dialogue Modeling Long Dist. Dialing
  • System asks a series of questions that the user
    answers "What number would you like to call?",
    "Is this a Delhi number?"
  • Initiative always with the system.
  • Context is fixed by the question being asked.

15
Frame Based Dialogue Modeling
  • System interprets the speech to acquire enough
    information in order to perform a specific
    action.
  • There is a single context that remains fixed for
    the system.
  • The problem is cast as form filling where the
    form specifies all relevant information for an
    action
  • Monitor the form for completion.
  • From user utterances extract relevant elements.
  • Use empty slots as triggers for questions to the
    user.

16
Frame Based Train Arrival/Departure Info
  • "When does the Bangalore Rajdhani leave Hazrat
    Nizammuddin?"
  • Initiative with the User.
  • Context is fixed to train arr./dep. info.

17
Frame Based Dialogue Modeling
  • System interprets the speech to acquire enough
    information in order to perform a specific
    action.
  • There is a single context that remains fixed for
    the system.
  • The problem is cast as form filling where the
    form specifies all relevant information for an
    action
  • Monitor the form for completion.
  • From user utterances extract relevant elements.
  • Use empty slots as triggers for questions to the
    user.

18
Sets of Contexts Banking Transaction
  • Sets of Contexts each represented by using the
    frame-based approach.
  • Initiative with the user.
  • Context is fixed by the question asked by the
    user.

19
Issues in Multiple Context
  • System should recognize when context switches.
  • Effect changes/corrections User may want to
    change the fixed deposit duration set earlier
    based on new information he obtains from the
    system on the interest rates.

20
Complex Dialogue Modeling
  • Plan (Task) Based Model The dialogue involves
    interactively constructing a plan (e.g. kitchen
    design consultant, a plan to rescue from an
    island).
  • Agent Based Model Involves planning and also
    executing and monitoring operations in a
    dynamically changing world (e.g. emergency rescue
    coordination).

21
Dialogue System Model
22
Spoken Dialogue System
Us e r
Discourse
Semantic
Speech
Interpretation
Interpretation
Recognition
Response
Dialogue
Speech
Generation
Management
Synthesis
23
Parts of the Spoken Dialogue System
  • Signal Processing
  • Convert the audio wave into a sequence of feature
    vectors.
  • Speech Recognition
  • Decode the sequence of feature vectors into a
    sequence of words.
  • Semantic Interpretation
  • Determine the meaning of the words.
  • Discourse Interpretation
  • Understand what the user intends by interpreting
    utterances in context.
  • Dialogue Management
  • Determine system goals in response to user
    utterances based on user intention.
  • Speech Synthesis
  • Generate synthetic speech as a response.

24
Dialogue System Interfaces
  • The dialogue manager interacts with the user to
    collect enough information to query the knowledge
    source and give a useful reply.

25
Robust interpretation of speech in presence of
recognition errors
  • Statistical error correction
  • Robust syntactic and semantic parsing
  • Use of context (discourse)

26
Statistical Error Correction
  • Corrects the errors made by the speech
    recognition unit.
  • Given an observed sequence O from the speech
    recognizer, it finds the most likely original
    word sequence S.
  • Finds S that maximizes Prob(O/S).Prob(S) or
    alternately Prob(S/O).

27
Robust Parsing
  • Spoken Language is not sentence based.
  • A speaker commits a sequence of speech acts "OK
    let's do that then Open a new account for me."
    Acknowledgement ("OK"), an acceptance ("let's do
    that"), a request ("Open a new account").
  • "Where is Lagaan playing in South Delhi?" The
    parsing should do concept extraction/keyword(s)
    spotting. Utterance
    Type where question Movie
    Lagaan
    Town South Delhi.

28
Discourse Interpretation
  • Maintains the systems idea of the state of the
    discourse.
  • The omitted words (or phrases) and the pronominal
    references are complemented by the use of common
    sense and discourse information.

29
Reference Resolution
  • Domain Knowledge (banking transaction)
  • Discourse Knowledge
  • World Knowledge
  • U I would like to open a fixed deposit account.
  • S For what amount?
  • U Make it for 8000 Rupees.
  • S For what duration?
  • U What is the interest rate for 3 months?
  • S Six percent.
  • U Oh good then make it for that duration.

30
Utterance Types
  • Possible user utterances are tagged as one of
    many types.
  • Examples

31
Utterance Type Detection
  • Words and Word Grammar Pick the Utterance Type
    which is most likely given the word string.
  • Discourse Grammar Pick the Utterance Type which
    is most likely given the surrounding utterance
    types.
  • Prosodic Information Use pitch contour, energy,
    SNR, speaking rate to choose Utterance Type.

32
Implementation of Utterance Type Detection
  • The discourse structure of a conversation is
    modeled using a HMM where the individual dialogue
    acts are observations emanating from the HMM
    states.
  • Constraints on the likely sequence of dialogue
    acts are modeled via a dialogue act n-gram.
  • The statistical dialogue grammar is combined with
    word n-grams, decision trees, and neural networks
    modeling the idiosyncratic lexical and prosodic
    manifestations of each dialogue act.

33
Training Set
  • Typical transaction transcripts for the
    application forms the training set.
  • Dialogues between the system and user are
    recorded and transcribed.
  • Each sentence from the user (utterance) is hand
    classified. Classes include yes-no question,
    yes-answer etc.

34
Dialogue Management
  • Determine system goals in response to specific
    user utterances in carrying out the intent of the
    user
  • Interpretation of user input in context.
  • Maintenance of discourse context.
  • Planning the content of the system responses.
  • Managing problem solving and planning.
  • Interface between user and system knowledge base.

35
Response Generation
  • Generate natural language utterances to achieve
    specific tasks.
  • Content selection determine what to say
  • Utterance Realization determine how to say it

36
Application Specific Needs
  • Dictionary
  • Domain concepts
  • Grammar
  • Dialog objects

37
Spoken Dialogue Systems
38
Dialogue Based Systems
  • IBM http//www.software.ibm.com/speech/overview/
    business/demo.html
  • Nuance http//www.nuance.com/demos/demos.html
  • MIT Spoken Language Systems Laboratory
    http//www.sls.lcs.mit.edu/sls/whatwedo/applicatio
    ns.html
  • SpeechWorks http//www.speechworks.com/demos/in
    dex.cfm
  • AT T http//www.research.att.com/algor/hmihy/
  • CMU Communicator http//fife.speech.cs.cmu.edu/Co
    mmunicator/

39
References
40
References
  • Tutorial on Spoken Dialogue Systems
    http//www.colloquial.com/carp/Publications/acl99.
    ppt
  • Tutorial on IVR http//www.iec.org/online/tutoria
    ls/speech_enabled/
  • 1997 Summer Workshop at CLSP/JHU Discourse
    Language Modeling Project http//www.colorado.edu/
    ling/jurafsky/ws97/
  • James F. Allen, Donna K. Byron, Myroslava
    Dzikovska, George Ferguson, Lucian Galescu,
    Amanda Stent, "Towards Conversational
    Human-Computer Interaction, " AI magazine, 2001.
  • Zue, V., S. Seneff, J. Glass, J. Polifroni, C.
    Pao, T. Hazen and L. Hetherington, Jupiter A
    Telephone-based Conversational Interface for
    Weather Information, IEEE Trans. on Speech and
    Audio Processing, 8(1), 2000.

41
References (Cont.)
  • J. F. Allen, B. W. Miller, E. K. Ringger, T.
    Sikorski, "Robust understanding in a dialogue
    system," Proc. 34th Association for Computational
    Linguistics, June 1996.
About PowerShow.com