Technologies for speech applications - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Technologies for speech applications

Description:

speak xml:lang='fr' pour service en fran ais, dites fran ais /speak /prompt ... speak xml:lang='fr' Bienvenue a portal fran ais /speak ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 32
Provided by: unkn1098
Category:

less

Transcript and Presenter's Notes

Title: Technologies for speech applications


1
Chapter 3
  • Technologies for speech applications

2
Figure 3.1. Speech Technologies
3
(No Transcript)
4
3.2 Touchtone recognition
  • Caller responds to hierarchies of voice menus by
    pressing buttons on the telephone keypad
  • Potential problems
  • Lost in space
  • Time-consuming menus

5
Speech recognition
  • Potential problems
  • Understandability
  • Time-consuming dialogs
  • Users may interrupt prompts by barge-in

6
Table 3.1 Speech Reco Engines
 
7
For voice portals
  • Continuous speech
  • Speaker independent
  • Switch vocabularies
  • Spontaneous speech
  • Multi-threaded

8
Figure 3.4 Speech Recognition
9
Phoneme identification
  • Use acoustic model to transform extracted
    features to sequences of phonemes
  • Approaches
  • Neural networks
  • Hidden Markov models

10
Word identification
  • Use words in language model to convert sequences
    of phonemes to words
  • Two approaches
  • Grammars
  • N-grams

11
Figure 3.6 Language Model Creation
12
Developers responsibility
  • Acoustic model
  • Lexicon
  • Language model

13
3.4 Voice identification
  • General techniques for identifying people
  • Something you know
  • Something you have
  • Something about you

14
Figure 3.2 Speaker registration, identification,
and authentication
15
Voice id technologies are appealing
  • Are unobtrusive
  • Are location independent
  • Require no special equipment
  • Replace passwords

16
Why voice technologies fail
  • Siblings with similar voice profiles
  • Teenage male voice break
  • Colds, sore throats, sore lips, etc
  • Variety of microphones
  • Tape recordings

17
Measuring accuracy of speech id systems
18
3.5 Language identification
  • Explicit selectionthe caller speaks the name of
    his or her preferred national language
    ltpromptgt ltspeak xmllang"en-us"gtFor
    service in English, say English lt/speakgt
    ltspeak xmllang"fr"gtpour service en français,
    dites français lt/speakgt lt/promptgt
  • Implicit selectiona default language is used
    unless the caller overrides the default with an
    explicit selection ltpromptgt
    ltspeak xmllang"fr"gtBienvenue a portal
    françaislt/speakgt
  • ltspeak xmllang"en-us"gtFor
    Service in English, say Englishlt/speakgt
    lt/promptgt 
  • Calling area
  • Caller  profile
  • One number per language.
  • Language recognition technology

19
3.6 Word spotting
  • Attention word
  • User signals that he/she will speak
  • Switching contexts
  • Suspend current activity and begin a new activity
  • Extract critical words
  • Security

20
3.7. Language understanding
  • Knowledge representation techniques
  • Parse trees
  • Form templates
  • Semantic net
  • Approaches for creating knowledge representation
  • Parser
  • Semantic attachments
  • General natural language understanding algorithm

21
3.8 Classification
  • Uses
  • Classify documents into categories and topics
  • Categorize graphical objects
  • Replace menu hierarchies in voice applications
  • Navigating in large web sites
  • Locating a chapter or section of a large document
  • Locating a Web site about a specific topic
  • Locating descriptions of goods or services in a
    large on-line database
  • Example
  • How may I help you? from ATT

22
3.9 Dialog management
  • Human-driven conversational dialogsthe
    person repeatedly asks a question or speaks a
    command and the computer responds.
  • Application-driven conversational
    dialogsthe application repeatedly asks questions
    to solicit answers and instructions from a
    caller.
  • Mixed-initiative dialogshuman-driven and
    computer-driven dialogs are combined. The caller
    and computer take turns driving the
    conversations.

23
Figure 3.9 Voice Interpreter and voice Browser
24
3.10 NL processing
  • Machine translation
  • Word replacement
  • Phrase translation
  • Full national language translation
  • Query generation
  • Generate SQL query from knowledge representation
  • Summarization
  • Generate English summary from knowledge
    representation
  • Generation
  • Prerecorded sentences
  • Templates
  • Reversible parse tree

25
Figure 3.7 Speech synthesis
26
Figure 3.8 Concatenative Speech
27
Table 3.2 Concatenative vs parameter-based
speech synthesis
28
3.12 Music synthesis
  • Uses
  • Branding
  • Set the mood
  • Signal the caller
  • Fundamental part of the dialog
  • Approaches
  • Prerecord
  • Synthesize on the flyMIDI

29
3.13. Tools
  • VoiceXML interpreters
  • Specification tools
  • Call flow tools
  • Menu generators
  • Form and field generators
  • Rehearsal tools
  • Logging tools
  • Performance measurement summary tools
  • System performance tools

30
3.14 Related Technologies
  • Distributed Speech recognition
  • Noise mitigation
  • Noise reduction and cancellation algorithms
  • Feature extractionperform on client
  • Signal processing algorithms that extract
    essential features from acoustic data
  • Multimodal user interfaces
  • WML and WAP
  • Conversey tags

31
3.15 Key concepts
  • Lots of technologies may be useful
  • Voice identification
  • Speaker registration
  • Speaker identification
  • Speaker verification
  • Speech recognition
  • Requires acoustic models, lexicons, language
    models, and grammars
  • Speech synthesis
  • Synthesis (during development
  • Prerecorded (for production)
  • Dialog management
  • VoiceXML browser
Write a Comment
User Comments (0)
About PowerShow.com