Voice XML - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Voice XML

Description:

Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright Agenda History of Voice Applications and Voice XML Related Voice Type Languages Advantages of ... – PowerPoint PPT presentation

Number of Views:287
Avg rating:3.0/5.0
Slides: 37
Provided by: csis4
Category:

less

Transcript and Presenter's Notes

Title: Voice XML


1
Voice XML
  • Team 1
  • Matt Ganis, Jonathan Hill, Henry Wong
  • Anne I. Mannette-Wright

2
Agenda
  • History of Voice Applications and Voice XML
  • Related Voice Type Languages
  • Advantages of Voice XML
  • Architecture of VoiceXML
  • Paper 1
  • Paper 2
  • Paper 3
  • Demonstration
  • Voice XML 2.0
  • Differences between Voice XML 1.0 and 2.0
  • The Future Voice XML 2.1

3
History of Voice Applications
  • Voice technologies emerged in the 1990s
  • Automatic Speech Recognition (ASR)
  • Small vocabulary and speech recognition problems
    were solved
  • Text-to-Speech Systems
  • Can generate speech responses on the fly
  • Interactive Voice Response (IVR) applications

4
History of Voice Applications
  • IVRs became programmable but programmable IVRs
    are
  • Difficult to program (call scripting is often
    vendor specific) so each vendor had to reinvent
    wheel
  • Did not allow for the easy movement of an
    application from one IVR to another due to the
    proprietary nature of IVRs

5
History of Voice XML
  • 1995 ATT started work on Phone Markup Language
    (PML)
  • Oct.1998 Motorola developed VoxML (Voice Markup
    Language)
  • Feb.1999 IBM developed SpeechML technology
  • Mar.1999 VoiceXML Forum was formed by IBM, ATT,
    Lucent, and Motorola
  • Mission was to design a standard dialog design
    language that developers could use to build
    conversational applications
  • March 2000 VoiceXML Forum releases VoiceXML 1.0
    to the general public
  • May 2000 accepted by W3C

6
W3C Speech Interface Framework
From McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from www.voicexmlreview.org/De
c2001/features/inside.html
7
Related Voice Type Languages
  • Related to VoiceXML
  • Grammar XML (grXML)
  • Provides speech grammars used by speech
    recognition engines
  • Speech Synthesis Markup Language (SSML)
  • SSML specification is based upon JSML(J Speech
    Markup Language) and JSGF (J Speech Grammar
    Format) specifications, which are owned by Sun.
  • Introduced in September 2004 is currently a W3C
    standard at Version 1.0
  • Standardized way of specifying how text is
    rendered as speech and includes tags for
    pronunciation, tone, inflection, etc.
  • Often embedded in VoiceXML scripts to drive
    interactive telephony systems.

8
Related Voice Type Languages
  • Related to VoiceXML (Continued)
  • Call Control XML (CCXML)
  • W3C standard markup language for controlling
    telephony and telephony equipment currently at
    Version 1.0
  • Performs tasks such as setting up conference
    calls, transferring incoming calls, etc.
  • Works hand-in-hand with VoiceXML

9
Architecture of VoiceXML
From http//www.w3.org/TR/voicexml/Voice
eXtensible Markup Language (VoiceXML) version
1.0
10
Advantages of Voice XML
  • VoiceXML is a markup language that
  • Minimizes client/server interactions by
    specifying multiple interactions per document.
  • Shields application authors from low-level, and
    platform-specific details.
  • Separates user interaction code (in VoiceXML)
    from service logic (e.g. CGI scripts).
  • Promotes service portability across
    implementation platforms. VoiceXML is a common
    language for content providers, tool providers,
    and platform providers.
  • Is easy to use for simple interactions, and yet
    provides language features to support complex
    dialogs.

11
Paper 1
  • Authored by Bruce Lucas VoiceXML for Web-based
    Distributed Conversational Applications
  • Presents an introduction to VoiceXML
  • Comparison to HTML
  • Support for Natural Dialogue

12
Paper 1
  • VoiceXML is an XML application which results in
    the following benefits
  • Allows the reuse and easy retooling of existing
    tools for creating, transforming, and parsing XML
    documents
  • Allows VoiceXML to make use of other
    complementary XML-based standards. Example Java
    Speech Markup Language for speech synthesis
  • A form is VoiceXMLs basic dialogue unit
  • Contains a set of inputs (fields)
  • Specifies what to do with a set of fields after
    data is collected
  • A field includes a prompt and a specification of
    what the user is allowed to say

13
Paper 1 - VoiceXML Code Example
  • lt?xml version1.0?gt
  • ltvxml version1.0gt
  • ltmenugt
  • ltpromptgtSay one of ltenumerate/gtlt/promptgt
  • ltchoice nexthttp//www.sports.example/sports.vx
    mlgt
  • Sports scores
  • lt/choicegt
  • ltchoice nexthttp//www.weather.example/weather.
    vxmlgt
  • Weather information
  • lt/choicegt
  • ltchoice nextlogingt
  • Log in
  • lt/choicegt
  • lt/menugt
  • ltform idlogingt
  • ltfield namephone_number typephonegt
  • ltpromptgtPlease say your complete phone
    numberlt/promptgt
  • lt/fieldgt

14
Paper 1
  • VoiceXML includes support for common field types
    including numbers, digits, phone, date and time
    AND for user-specified fields using grammars
  • ltformgt
  • ltfield namedrinkgt
  • ltpromptgtWhat would you like to drink?lt/promptgt
  • ltgrammargt
  • coffee tea orange juice milk
    nothing
  • lt/grammargt
  • lt/fieldgt
  • ltfield namesandwichgt
  • ltpromptgtWhat sandwich would you like?lt/promptgt
  • ltgrammar srcsandwiches.gram/gt
  • lt/fieldgt
  • ltblockgt
  • ltsubmit next/servlet/order/gt
  • lt/blockgt
  • lt/formgt

15
Paper 1 The Distributed Model
From Lucas, Bruce, VoiceXML for Web-Based
Distributed Conversational Applications, Communica
tions of the ACM, Vol.43, No.9, September 2000.
  • VoiceXML provides support for advanced features
    such as
  • Local validation and processing
  • Audio playback and recording
  • Support for context specific and taped help and
    reusable sub dialogues

16
Paper 1 VoiceXML compared with HTML
  • An HTML document is a single unit specified by a
    URI and presented to the user all at once
  • A VoiceXML document contains a number of dialogue
    units (menus or forms) presented sequentially
  • An HTML document has no markup language to
    identify distinct units
  • A VoiceXML document is structured to reflect the
    sequential nature of the voice medium
  • An HTML document is like one single dialogue
  • A VoiceXML document requires dialogue elements so
    they can be presented one at a time.
  • VoiceXML has application logic for sequencing
    among dialogue units

17
Paper 1 Support for Natural Dialogue
  • VoiceXML supports directed and mixed
    initiative dialogues
  • directed dialogues the computer directs the
    conversation at each step by prompting the user
    for the next piece of information
  • Example C On what date do you wish to fly?
  • H May 6th
  • mixed initiative dialogues each participant
    can take the initiative in leading a
    conversation. VoiceXML does this by allowing
    input grammars to be specified at the form level
  • C How can I help you?
  • H Id like to fly from New York on May 8th
  • C Where would you like to fly to?

18
Paper 2
  • Concepts of Programming by Voice
  • Motivated by need to program without typing,
    therefore preventing repetitive stress injuries
    (RPI), a common injury among those who spend long
    hours typing
  • Voice-activated software for the disabled is a
    prime motivator in development
  • Paper proposes a system that creates an
    environment for voice-activated programming

19
Paper 2
  • Costs of such software has fallen dramatically
  • 7500 in 1998
  • 100 in 2005
  • Products Include
  • Dragon Naturally Speaking
  • IBM Via Voice
  • Hausbie Voice Express

20
Paper 2
  • Authors developed a generator called
    VocalGenerator using Dragon Naturally Speaking
    with MS Visual C
  • Input a context-free grammar compatible with
    most programming languages
  • Output An environment in which a voice
    recognition, syntax-directed program can be
    written by voice input alone
  • Allows for better recognition and selection of
    sections of code

21
Paper 2
  • Evaluation of the product
  • Programming is faster using a Syntax directed
    voice recognition system than a natural language
    DVR
  • A programmer suffering from repetitive stress
    injuries will be able to program at a speed
    sufficient to maintain competitive employment

22
Paper 3
  • Paper 3 focuses on V-commerce through a
    survey of Voice XML applications for business
    communication
  • Looks at the inherent risks in human to human
    communication and the challenges these pose to
    human to computer communication
  • Examines speech recognition
  • Seeks to leverage the predominance of telephone
    usage globally

23
Paper 3
  • Utilizes the W3C Voice Browser Working Group
    design criteria including
  • Consistency
  • Interoperability
  • Generality
  • Internationalization
  • Generalization and Readability
  • Implementation

24
Paper 3
  • Looks at the potential for Voice-activated Web
    interface
  • Looks at a transactional communication method
    with six phases
  • Sender has an idea
  • Sender transforms the idea into a message
  • Sender transmits a message
  • Receiver gets the message
  • Receiver interprets the message
  • Receiver reacts and sends feedback

25
Paper 3
  • Challenges Include
  • Unproven business models
  • Business Process Change Requirements
  • Channel conflicts
  • Technology hurdles
  • Legal issues
  • Security privacy

26
Paper 3
  • Conclusions
  • Speech is natural, flexible and efficient
  • Voice technology will improve
  • Voice recognition capabilities will improve
  • The intersection of voice recognition, telecom
    and Web technologies may lead to a large market
    for products that take advantage of this
    intersection

27
Demo
  • Using TellMe Studio (http//studio.tellme.com)
  • TellMe Studio provides you with resources to
  • Build and test your own Internet-powered "phone
    sites" with nothing but your Web browser and an
    ordinary telephone in the following ways
  • Type VoiceXML directly into an area called the
    Scratchpad and then call the phone number to
    preview the code
  • Publish the VoiceXML and audio files on a
    publically accessible Web server, point Studio at
    the URL for your application's "home page", and
    once again call the Studio phone number to
    preview the application
  • Browse and leverage an extensive library of
    sample code, grammars, audio, and VoiceXML
    documentation
  • Participate in the Voice Web development
    community through open newsgroups

28
Demo (Continued)
  • This demo Drink Recipes I - will use one of the
    prebuilt VoiceXML scripts available from the
    TellMe Studio Code Library
  • This version of Drink Recipes
  • asks the caller for a drink name
  • in response, plays back the drink's ingredients
    list and mixing instructions.
  • demonstrates the use of large grammars and how to
    create data-driven applications.

29
VoiceXML 2.0
From McGashan, Dr. Scott, VoiceXML 2.0 from the
Inside, retrieved from http//www.voicexmlrevie
w.org/Dec2001/features/inside.html
30
Differences Between VoiceXML 2.0
  • Differences between VoiceXML 1.0 and 2.0
  • Interoperability
  • Functional Completeness
  • Clarity

31
VoiceXML 2.0
  • Interoperability VoiceXML 2.0 contains the
    following new formats that guarantee developers
    that their applications run on any VoiceXML
    platform conforming to the VoiceXML 2.0
    specification
  • input XML Format of the Speech Recognition
    Grammar Specification for speech and DTMF input
    VoiceXML 1.0 did not require any particular
    speech grammar format
  • output Speech Synthesis Markup Language (SSML)
    is used for text-to-speech and audio output
    VoiceXML 1.0 did not use SSML and its speech
    markup elements are not supported in Voice XML
    2.0

32
VoiceXML 2.0
  • Interoperability (Continued)
  • protocol the HTTP protocol for fetching
    documents and resources is supported. Voice XML
    1.0 did not require support for HTTP
  • audio audio platforms recommended for support in
    VoiceXML 1.0 are now required in VoiceXML 1.0

33
VoiceXML 2.0
  • Functional Completeness New elements, attributes
    and variables have been added in VoiceXML 2.0
    that enable developers to ensure that key aspects
    of the cycle of generating system output,
    interpreting user input and transitioning from
    one dialog to another is described.
  • NOTE VoiceXML 1.0 contained gaps for example
    when prompts were played to the user
  • Some of the new/enhanced elements, variables and
    support include
  • application.lastresult variable provides info
    about last recognition in the application
  • ltloggt element generates a debug message
  • ltthrowgt and ltcatchgt elements enhanced to provide
    more info
  • ltaudiogt element enhanced with an expr
    attribute
  • ltmenugt enhanced with accept attribute
  • Enhanced support for greater control over
    universal grammars

34
VoiceXML 2.0
  • Clarity Voice XML 2.0 provides a clear
    description and interpretation of ALL elements
    (and their attributes), how they interact with
    one another, and their expected behavior.
  • NOTE VoiceXML 1.0 contains omissions and
    contradictions in this respect
  • Some clarification changes include
  • Subdialogs ltsubdialoggt description clarified
  • Root and Leaf document definitions explicitly
    defined
  • Prompt queueing and input collection
    relationship between these two clarified
  • Relationship between VoiceXML 2.0 and ECMAScript
    variables clarified
  • VoiceXML 2.0 clarifies conformance between
    VoiceXML documents and VoiceXML processors
  • Alignment of VoiceXML 2.0 with Speech Grammar
    and Speech Synthesis specifications

35
VoiceXML 2.1
  • Voice XML 2.1was released on June 13, 2005 by the
    W3C as a candidate recommendation
  • Voice XML 2.1 proposes 8 enhancements to VoiceXML
    2.0 as follows
  • Referencing grammars dynamically
  • Referencing scripts dynamically
  • Using ltmarkgt to detect Barge-in during prompt
    playback
  • Using ltdatagt to fetch XML without requiring a
    dialog transfer
  • Concatenating prompts dynamically using
    ltforeachgt.
  • Recording user utterances while attempting
    recognition
  • Adding namelist to ltdisconnectgt
  • Adding type to lttransfergt

36
References
  • Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim,
    VoiceXML for Business Applications A Survey,
    First Annual ABIT Conference, May 3-5, 2001,
    Pittsburg, Pennsylvania.
  • Arnold, Stephen A., Mark, Leo and Goldthwaite,
    John, Programming by Voice, VocalProgramming,
    ASSETS00, November 13-15, Arlington, Virginia
  • Lucas, Bruce, VoiceXML for Web-based Distributed
    Conversational Applications, Communications of
    the ACM, September 2000, Vol.43, No.9, pp.53-57.
  • http//www.w3.org/TR/voicexml/Voice eXtensible
    Markup Language (VoiceXML version 1.0
  • http//www.w3.org/TR/voicexml/Voice eXtensible
    Markup Language (VoiceXML version 2.0)
  • http//www.w3.org/TR/voicexml/Voice eXtensible
    Markup Language (VoiceXML version 2.1)
  • https//studio.tellme.com/vxml2/ovw/migrating21.ht
    ml
  • http//www.voicexmlreview.org/Dec2001/features/ins
    ide-full.html
  • McGashan, Dr. Scott, VoiceXML 2.0 from the
    Inside, retrieved from www.voicexmlreview.org/Dec
    2001/features/inside.html
Write a Comment
User Comments (0)
About PowerShow.com