Voice Browsers - PowerPoint PPT Presentation

Loading...

PPT – Voice Browsers PowerPoint presentation | free to download - id: 10034-MWQ0N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Voice Browsers

Description:

Interaction via key pads, spoken commands, listening to prerecorded speech, ... To-do lists, shopping lists, and calorie counters. 13. Advancing Towards Voice ... – PowerPoint PPT presentation

Number of Views:2226
Avg rating:3.0/5.0
Slides: 104
Provided by: shalgi
Learn more at: http://www.cs.huji.ac.il
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Voice Browsers


1
Voice Browsers
GeneralMagic Demo
  • Making the Web accessible to more of us, more of
    the time.

SDBI November 2001, Shani Shalgi
2
What is a Voice Browser?
  • Expanding access to the Web
  • Will allow any telephone to be used to access
    appropriately designed Web-based services
  • Server-based
  • Voice portals

3
What is a Voice Browser?
  • Interaction via key pads, spoken commands,
    listening to prerecorded speech, synthetic speech
    and music.
  • An advantage to people with visual impairment
  • Web access while keeping hands eyes free for
    other things (eg. Driving).

4
What is a Voice Browser?
  • Mobile Web
  • Naturalistic dialogs with Web-based services.

5
Motivation
  • Far more people today have access to a telephone
    than have access to a computer with an Internet
    connection.
  • Many of us have already or soon will have a
    mobile phone within reach wherever we go.

6
Motivation
  • Easy to use - for people with no knowledge or
    fear of computers.
  • Voice interaction can escape the physical
    limitations on keypads and displays as mobile
    devices become ever smaller.

7
Motivation
  • Many companies to offer services over the phone
    via menus traversed using the phone's keypad.
    Voice Browsers are the next generation of call
    centers, which will become Voice Web portals to
    the company's services and related websites,
    whether accessed via the telephone network or via
    the Internet.

8
Motivation
  • Disadvantages to existing methods
  • WAP (Cellular phones, Palm Pilots)
  • Small screens
  • Access Speed
  • Limited or fragmented availability
  • Akward input
  • Price
  • Lack of user habit

9
Differences Between Graphical Voice Browsing
  • Graphical browsing is more passive due to the
    persistence of the visual information
  • Voice browsing is more active since the user has
    to issue commands.
  • Graphical Browsers are client-based, whereas
    Voice Browsers are server-based.

10
Possible Applications
  • Accessing business information
  • The corporate "front desk" which asks callers
    who or what they want
  • Automated telephone ordering services
  • Support desks
  • Order tracking
  • Airline arrival and departure information
  • Cinema and theater booking services
  • Home banking services

11
Possible Applications (2)
  • Accessing public information
  • Community information such as weather, traffic
    conditions, school closures, directions and
    events
  • Local, national and international news
  • National and international stock market
    information
  • Business and e-commerce transactions

12
Possible Applications (3)
  • Accessing personal information
  • Voice mail
  • Calendars, address and telephone lists
  • Personal horoscope
  • Personal newsletter
  • To-do lists, shopping lists, and calorie counters

13
Advancing Towards Voice
  • Until now, speech recognition and synthesis
    technologies had to be handcrafted into
    applications.
  • Voice Browsers intend the voice technologies to
    be handcrfted directly into web servers.
  • This demands transformation of Web content into
    formats better suited to the needs of voice
    browsing or authoring content directly for voice
    browsers.

14
  • The World Wide Web Consortium (W3C) develops
    interoperable technologies (specifications,
    guidelines, software, and tools) to lead the Web
    to its full potential as a forum for information,
    commerce, communication, and collective
    understanding.

15
WC3 Speech Interface Framework
  • Pronunciation Lexicon
  • Call Control
  • Voice Browser Interoperation
  • VoiceXML
  • Speech Synthesis
  • Speech Recognition
  • DTMF Grammars
  • Speech Grammars
  • Stochastic (N-Gram) Language Models
  • Semantic Interpretation

16
VoiceXML
  • VoiceXML is a dialog markup language designed for
    telephony applications, where users are
    restricted to voice and DTMF (touch tone) input.

Browser
text.html
Internet
Web Server
text.vxml
17
Speech Synthesis
  • The specification defines a markup language for
    prompting users via a combination of prerecorded
    speech, synthetic speech and music. You can
    select voice characteristics (name, gender and
    age) and the speed, volume, pitch, and emphasis.
    There is also provision for overriding the
    synthesis engine's default pronunciation.

18
Speech Recognition
Speech Grammars
Speech
Semantic Interpretation
Stochastic Language Models
USER
Touch Tone
DTMF Grammars
19
DTMF Grammars
  • Touch tone input is often used as an alternative
    to speech recognition.
  • Especially useful in noisy conditions or when the
    social context makes it awkward to speak.
  • The W3C DTMF grammar format allows authors to
    specify the expected sequence of digits, and to
    bind them to the appropriate results

20
Speech Grammars
  • In most cases, user prompts are very carefully
    designed to encourage the user to answer in a
    form that matches context free grammar rules.
  • Speech Grammars allow authors to specify rules
    covering the sequences of words that users are
    expected to say in particular contexts. These
    contexual clues allow the recognition engine to
    focus on likely utterances, improving the chances
    of a correct match.

21
Stochastic (N-Gram) Language Models
  • In some applications it is appropriate to use
    open ended prompts (how can I help). In these
    cases, context free grammars are unuseful.
  • The solution is to use a stochastic language
    model. Such models specify the probability that
    one word occurs following certain others. The
    probabilities are computed from a collection of
    utterances collected from many users.

22
Semantic Interpretation
  • The recognition process matches an utterance to a
    speech grammar, building a parse tree as a
    byproduct.
  • There are two approaches to harvesting semantic
    results from the parse tree
  • 1. Annotating grammar rules with semantic
    interpretation tags (ECMAScript).
  • 2. Representing the result in XML.

23
Semantic Interpretation - Example
  • For example (1st approach), the user utterance
  • "I would like a medium coca cola and a large
    pizza with pepperoni and mushrooms.
  • could be converted to the following semantic
    result
  • drink
  • beverage "coke
  • drinksize "medium
  • pizza
  • pizzasize "large"
  • topping "pepperoni", "mushrooms"

24
Pronunciation Lexicon
  • Application developers sometimes need to ability
    to tune speech engines, whether for synthesis or
    recognition.
  • W3C is developing a markup language for an open
    portable specification of pronunciation
    information using a standard phonetic alphabet.
  • The most commonly needed pronunciations are for
    proper nouns such as surnames or business names.

25
Call Control
  • Fine-grained control of speech (signal
    processing) resources and telephony resources in
    a VoiceXML telephony platform.
  • Will enable application developers to use markup
    to perform call screening, whisper call waiting,
    call transfer, and more.
  • Can be used to transfer a user from one voice
    browser to another on a competely different
    machine.

26
Voice Browser Interoperation
  • Mechanisms to transfer application state, such as
    a session identifier, along with the user's audio
    connections.
  • The user could start with a visual interaction on
    a cell phone and follow a link to switch to a
    VoiceXML application.
  • The ability to transfer a session identifier
    makes it possible for the Voice Browser
    application to pick up user preferences and other
    data entered into the visual application.

27
Voice Browser Interoperation (2)
  • Finally, the user could transfer from a VoiceXML
    application to a customer service agent.
  • The agent needs the ability to use their console
    to view information about the customer, as
    collected during the preceding VoiceXML
    application. The ability to transfer a session
    identifier can be used to retrieve this
    information from the customer database.

28
Voice Style Sheets?
  • Some extensions are proposed to HTML 4.0 and CSS2
    to support voice browsing
  • Prerecorded content is likely to include music
    and different speakers. These effects can be
    reproduced to some extent via the aural style
    sheets features in CSS2.

29
Voice Style Sheets!
  • Volume
  • Rate
  • Pitch
  • Direction
  • Spelling out text letter by letter
  • Speech fonts (male/female, adult/child etc.)
  • Inserted text before and after element content
  • Sound effects and music

Authors want control over how the document is
rendered. Aural style sheets (part of CSS2)
provide a basis for controlling a range of
features
30
How Does It Work?
  • How do I connect?
  • Do I speak to the browser or does the browser
    speak to me?
  • What is seen on the screen?
  • How do I enter input?

31
Problems
  • How does the browser understand what I say?
  • How can I tell it what I want?
  • what if it doesnt understand?

32
Overview on Speech Technologies
  • Speech Synthesis
  • Text to Speech
  • Speech Recognition
  • Speech Grammars
  • Stochastic n-gram models
  • Semantic Interpretation

33
What is Speech Synthesis?
  • Generating machine voice by arranging phonemes
    (k, ch, sh, etc.) into words.
  • There are several algorithms for performing
    Speech Synthesis. The choice depends on the task
    they're used for.

34
How is Speech Synthesis Performed?
  • The easiest way is to just record the voice of a
    person speaking thedesired phrases.
  • This is useful if only a restricted volume of
    phrases and sentences is used, e.g. schedule
    information of incoming flights. The quality
    depends on the way recording is done.

35
How is Speech Synthesis Performed?
  • Another option is to record a large database of
    words.
  • Requires large memory storage
  • Limited vocabulary
  • No prosodic information
  • More sophisticated but worse in quality are
    Text-To-Speech algorithms.

36
How is Speech Synthesis Performed?Text To Speech
  • Text-To-Speech algorithms split the speech into
    smaller pieces. The smaller the units, the less
    they are in number, but the quality also
    decreases.
  • An often used unit is the phoneme,the smallest
    linguistic unit. Depending on the language used,
    there are about 35-50 phonemes in western
    European languages, i.e. we need only 35-50
    single recordings.
  • february twenty fifth f eh b r ax r iy t w eh n
    t iy f ih f th

37
Text To Speech
  • The problem is, combining them as fluent speech
    requires fluent transitions between the elements.
    The intelligibility is therefore lower, but the
    memory required is small.
  • A solution is using diphones. Instead of
    splitting at the transitions, the cut is done at
    the center of the phonemes, leaving the
    transitions themselves intact.

38
Text To Speech
  • This means there are now approximately 1600
    recordings needed (4040).
  • The longer the units become, the more elements
    there are, but the qualityincreases along with
    the memory required.

39
Text To Speech
  • Other units which are widely usedare
    half-syllables, syllables, words, or combinations
    of them, e.g. wordstems and inflectional
    endings.
  • TTS is dictionary-driven. The larger the
    dictionary resident in the browser is, the better
    the quality.
  • For unknown words, falls back on rules for
    regular pronunciation.

40
Text To Speech
  • Vocabulary is unlimited!!!
  • But what about the prosodic information?
  • Pronunciation depends on the context in which a
    word occurs. Limited linguistic analysis is
    needed.
  • How can I help?
  • Help is on the way!

41
Text To Speech
  • Another example
  • I have read the first chapter.
  • I will read some more after lunch.
  • For these cases, and in the cases of irregular
    words and name pronunciation, authors need a way
    to provide supplementary TTS information and to
    indicate when it applies.

42
Text To Speech
  • But specialized representations for phonemic and
    prosodic information can be off putting for
    non-specialist users.
  • For this reason it is common to see simplified
    ways to write down pronunciation, for instance,
    the word "station" can be defined as
  • station stay-shun

43
Text To Speech
  • This approach encourages users to add
    pronunciation information, leading to an increase
    in the quality of spoken documents, compared to
    more complex and harder to learn approaches.
  • This is where W3C comes in
  • Providing a specification to enable consistent
    control (generating, authoring, processing) of
    voice output by speech synthesizers for varying
    speech content, for use in voice browsing and in
    other contexts.

44
Overview on Speech Technologies
  • Speech Synthesis
  • Text to Speech
  • Speech Recognition
  • Speech Grammars
  • Stochastic n-gram models
  • Semantic Interpretation

45
Speech Recognition
46
Speech Recognition
47
Speech Recognition
48
Speech Recognition
49
Speech Recognition
  • Automatic speech recognition is the process by
    which a computer maps an acoustic speech signal
    to text.
  • Speech is first digitized and then matched
    against a dictionary of coded waveforms. The
    matches areconverted into text.

50
Speech Recognition
  • Types of voice recognition applications
  • Command systems recognize a few hundred words and
    eliminate using the mouse or keyboard for
    repetitive commands.
  • Discrete voice recognition systems are used for
    dictation, but require a pause between each word.
  • Continuous voice recognition understands natural
    speech without pauses and is the most process
    intensive.

51
Speech Recognition
  • A speaker dependent system is developed to
    operate for a single speaker.
  • These systems are usually easier to develop,
    cheaper to buy and more accurate, but not as
    flexible as speaker adaptive or speaker
    independent systems.

52
Speech Recognition
  • A speaker independent system is developed to
    operate for any speaker of a particular type
    (e.g. American English).
  • These systems are the most difficult to develop,
    most expensive and accuracy is lower than speaker
    dependent systems. However, they are more
    flexible.

53
Speech Recognition
  • A speaker adaptive system is developed to adapt
    its operation to the characteristics of new
    speakers. It's difficulty lies somewhere between
    speaker independent and speaker dependent
    systems.

54
Speech Recognition
  • Speech recognition technologies today are highly
    advanced.
  • There is a huge gap between the ability to
    recognize speech and the ability to interpret
    speech.

55
How is Speech Recognition Performed?
  • Speech recognition technology involves complex
    statistical models that characterize the
    properties of sounds, taking into account factors
    such as male vs. female voices, accents, speaking
    rate, background noise, etc.
  • The process of speech recognition includes 5
    stages

1. Capture and digital sampling 2. Spectral
representation and analysis 3. Segmentation. 4.
Phonetic Modeling 5. Search and Match
56
How is Speech Recognition Performed?
  • Speech Grammars
  • HMM (Hidden Markov Modelling)
  • DTW (Dynamic Time Warping)
  • NNs (Neural Networks)
  • Expert systems
  • Combinations of techniques.
  • HMM-based systems are currently the most
    commonly used and most successful approach.

57
Speech Grammars
  • The grammar allows a speech application to
    indicate to a recognizer what it should listen
    for, specifically
  • Words that may be spoken,
  • Patterns in which those words may occur,
  • Language of the spoken words.

58
Speech Grammars
  • In simple speech recognition/speech understanding
    systems, the expected input sentences are often
    modeled by a strict grammar (such as a CFG).
  • In this case, the user is only allowed to utter
    those sentences, that are explicitly covered by
    the grammar.
  • Good for menus, form filling, ordering services,
    etc.

59
Speech Grammars
  • Experience shows that a context free grammar with
    reasonable complexity can never foresee all the
    different sentence patterns, users come up with
    in spontaneous speech input.
  • This approach is therefore not sufficient for
    robust speech recognition/ understanding tasks or
    free text input applications such as dictation.

60
For Example
  • Possible answers to a question may be "Yes" or
    "No, but it could also be any other word used
    for negative or positive response. It could be
    "Ya," "you betch'ya," "sure," "of course" and
    many other expressions. It is necessary to feed
    the speech recognition engine with likely
    utterances representing the desired response.

61
Speech Grammars
  • What is done?
  • Beta and Pilot versions
  • Upgrade versions

62
Speech Grammars - Example
  • very





63
Speech Grammars - Example
  • very
  • big
  • pizza
  • with
  • and

64
Hidden Markov Model
  • Notations
  • T Observation sequence length
  • O o1,o2,,oT Observation sequence
  • N Number of States (we either know or guess)
  • Q q1qN finite set of possible states
  • M number of possible observations
  • V v1,v2,,vM finite set of possible
    observations
  • Xt state at time t (state variable)

65
Hidden Markov Model
  • Distributional parameters
  • A aij where aij P(Xt1 qj Xt qi)
    (transition probabilities)
  • B bi(k) where bi(k) P(Ot vk Xt qi)
    (observation probabilities)
  • ?t P(X0 qi) (initial state distribution)

66
Hidden Markov Model
  • Definitions
  • A Hidden Markov Model (HMM) is a five-tuple
    (Q,V,A,B,?).
  • Let ? A,B,? denote the parameters for a given
    HMM with fixed Q and V.

67
Hidden Markov Model
  • Problems
  • 1. Find P(O ?), the probability of the
    observations given the model.
  • 2. Find the most likely state trajectory
  • X x1,x2,,xT given the model and
    observations. (Find X so that P(O,X ?) is
    maximized)
  • 3. Adjust the ? parameters to maximize
  • P(O ?)

68
Language Models
  • A Language model is a probability distribution
    over word sequences
  • P(And nothing but the truth) ?? 0.001
  • P(And nuts sing on the roof) ? 0

69
The Equation
Notation W' argmaxW P(OW) P(W)
70
The N-Gram (Markovian) Language Model
  • Hard to compute P(W)
  • P(And nothing but the truth)
  • Step 1 Decompose probability -
  • P(And nothing but the truth)
  • P(And) ?P(nothing and) ?
  • P(but and nothing) ? P(the and
    nothing but) ? P(truth and nothing but
    the)

71
The Trigram Approximation
  • Assume each word depends only on the previous two
    words (three words total tri means three, gram
    means writing)
  • P(the whole truth and nothing but) ?
  • P(thenothing but)
  • P(truth whole truth and nothing but the) ?
  • P(truthbut the)

72
N-Gram - The Markovian Model
  • The Markovian state machine is an automatation
    with statistical weights
  • A state represents a phoneme, diphone or word.
  • We do not include all options, but only those
    which are related to the context or subject.
  • We calculate all probable paths from beginning to
    end of phrase/word and return the one with the
    maximum probability.

73
Back to Trigrams
  • How do we find the probabilities?
  • Get real text, and start counting!
  • P(the nothing but) ?
  • Count(nothing but the)
  • Count(nothing but)

74
N-grams
  • Why stop at 3-grams?
  • If P(zrstuvwxy)?? P(zxy) is good, then
    P(zrstuvwxy) ? P(zvwxy) is better!
  • 4-gram, 5-gram start to become expensive...

75
The N-Gram (Markovian) Language Model - Summary
  • N-Gram language models are used in large
    vocabulary speech recognition systems to provide
    the recognizer with an a-priori likelihood P(W)
    of a given word sequence W.
  • The N-Gram language model is usually derived from
    large training texts that share the same language
    characteristics as expected input.

76
Combining Speech Grammars and N-Gram Models
  • Using an N-Gram model in the recognizer and a CFG
    in a (separate) understanding component
  • Integrating special N-Gram rules at various
    levels in a CFG to allow for flexible input in
    specific context
  • using a CFG to model the structure of phrases
    (e.g. numeric expressions) that incorporated in a
    higher-level N-Gram model (class N-Grams)

77
Overview on Speech Technologies
  • Speech Synthesis
  • Text to Speech
  • Speech Recognition
  • Speech Grammars
  • Stochastic n-gram models
  • Semantic Interpretation

78
Semantic Interpretation
  • We have recognized the phrases and words, what
    now?
  • Problems
  • What does the user mean?
  • We have the right keywords, but the phrase is
    meaningless or unclear.

79
Semantic Interpretation
  • As stated before, the technologies of speech
    recognition exceed those of interpretation.
  • Most interpreters are base on key words.
  • Sometimes this is not good enough!

80
Back To Voice Browsers
  • Making the Web accessible to more of us, more
    of the time.
  • Personal Browser Demo
  • Now well talk about voiceXML, navigation and
    various problems

81
VoiceXML - Example 1
  • Hello World!
  • The top-level element is , which is mainly
    a container for dialogs. There are two types of
    dialogs forms and menus. Forms present
    information and gather input menus offer choices
    of what to do next.

82
VoiceXML - Example 1
  • Hello World!
  • This example has a single form, which contains a
    block that synthesizes and presents "Hello
    World!" to the user. Since the form does not
    specify a successor dialog, the conversation ends.

83
VoiceXML - Example 2
  • Our second example asks the user for a choice of
    drink and then submits it to a server script
  • Would you like coffee,tea, milk, or
    nothing?
  • type"application/grammarxml"/
  • ink2.asp"/

84
VoiceXML - Example 2
  • A sample interaction is
  • C (computer) Would you like coffee, tea, milk,
    or nothing?
  • H (human) Orange juice.
  • C I did not understand what you said. (a
    platform-specific default message.)
  • C Would you like coffee, tea, milk, or nothing?
  • H Tea
  • C (continues in document drink2.asp)

85
VoiceXML - Architectural Model
Web Server
VoiceXML interpreter context may listen for a
special escape phrase that takes the user to a
high-level personal assistant, or for escape
phrases that alter user preferences like volume
or text-to-speech characteristics.
The implementation platform generates events in
response to user actions (e.g. spoken or
character input received, disconnect) and system
events (e.g. timer expiration).
86
Scope of VoiceXML
  • Output of synthesized speech (TTS)
  • Output of audio files.
  • Recognition of spoken input.
  • Recognition of DTMF input.
  • Recording of spoken input.
  • Control of dialog flow.
  • Telephony features such as call transfer and
    disconnect.

The language provides means for collecting
character and/or spoken input, assigning
the input to document-defined request variables,
and making decisions that affect the
interpretation of documents written in the
language. A document may be linked to other
documents through Universal Resource Identifiers
(URIs).
87
VoiceXML
  • Voice XML is intended to be analogous to
    graphical surfing.
  • There are limitations.
  • Excellent for menu applications.
  • Awkward for open dialog applications
  • There are other languages VoXML, omniviewXML

88
Navigation
  • The user might be able to speak the word "follow"
    when she hears a hypertext link she wishes to
    follow.
  • The user could also interrupt the browser to
    request a short list of the relevant links.

89
Navigation example
  • User links?
  • Browser The links are
  • 1 company info
  • 2 latest news
  • 3 placing an order
  • 4 search for product details
  • Please say the number now
  • User 2
  • Browser Retrieving latest news...

90
Navigation through Headings
  • Another command could be used to request a list
    of the document's headings. This would allow
    users to browse an outline form of the document
    as a means to get to the section that interests
    them.

91
Navigation to Specific URLs
  • Graphical Browsers allow entering a wanted URL in
    the browser window
  • How is this supported in Voice Browsers?
  • Think What problems do you anticipate?
  • Will we be able to transfer from any voice portal
    to any other?
  • How do we know where to go?

92
How Slow / Fast ?
  • If voice browsers are meant to replace human
    operator dialog, they must be fast in response.
  • Speech Recognition / Interpretation / Synthesis
    depend on implementation
  • When a user requests a certain document, several
    related documents can be downloaded for easier
    access.

93
Friendly vs. Annoying
  • How friendly do you want the service to be?
  • Friendly is sometimes time consuming.
  • What percentage of the time does the user talk
    and what percentage of the time is he listening?
  • What parameters can I control?

94
Voice and Graphics
  • Can I access the Voice Browser through my
    computer?
  • Some sites are authored only for voice.
  • Some will be for both. This leads to more
    difficulties which must be dealt with.

95
Inserted text
  • When a hypertext link is spoken by a speech
    synthesizer, the author may wish to insert text
    before and after the link's caption, to guide the
    user's response.
  • For example
  • Driving instruction
  • May be offered by the voice browser using the
    following words
  • For driving instructions press 1

96
Inserted text
  • The words "For and "Press 1" were added to the
    text embedded in the anchor element.
  • On first glance it looks as if this 'wrapper'
    text should be left for the voice browser to
    generate, but on further examination you can
    easily find problems with this approach.

97
Inserted text
  • For example, the text for the following element
    cannot be For
  • Leave us a
    message
  • We need to say
  • To leave us a message, press 5

98
Inserted text
  • The CSS2 draft specification includes the means
    to provide "generated text" before and after
    element content.
  • For example
  • style'cue-before "To"
  • cue-after ", press 5"'
  • hrefLeaveMessage.htmlLeave us a message

99
Handling Errors and Ambiguities
  • Users might easily enter unexpected or ambiguous
    input, or just pause, providing no input at all.
  • Some examples to errors which might generate
    events
  • When presented with a numbered list of links, the
    user enters a number that is outside the range
    presented .
  • The phrase uttered by the user matches more than
    one template rule.

100
Handling Errors and Ambiguities
  • The phrase\sound uttered doesn't match a known
    command.
  • The user looses track and the browser needs to
    time-out and offer assistance
  • Ums and Errs
  • Authors will have control over the browser
    response to selection errors and timeouts.
  • Other errors might be dealt with by the browser
    or platform.

101
Some Nice Demos
  • Email assistant demo
  • Bank service demo (cough, ambiguity)
  • Financial Center Demo (ums)
  • Telectronics Demo

102
Who has implemented VoiceXML interpreters?
  • BeVocal Café
  • General Magic
  • HeyAnita's FreeSpeech Developer Network
  • IBM Voice Server SDK Beta Program based on
    VoiceXML Version 1.0
  • Motorolas Mobile Application Development Toolkit
    (MADK)

103
Who has implemented VoiceXML interpreters?
  • Nuance Developer Network
  • Open VXI VoiceXML interpreter
  • PIPEBEACHs speechWeb
  • Teleras DeVXchange
  • Tellme Studio
  • VoiceGenie
About PowerShow.com