Technologies for speech applications

About This Presentation

Title:

Technologies for speech applications

Description:

speak xml:lang='fr' pour service en fran ais, dites fran ais /speak /prompt ... speak xml:lang='fr' Bienvenue a portal fran ais /speak ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 32

Provided by: unkn1098

Category:

more less

Transcript and Presenter's Notes

Title: Technologies for speech applications

1
Chapter 3

Technologies for speech applications

2
Figure 3.1. Speech Technologies
3
(No Transcript)
4
3.2 Touchtone recognition

Caller responds to hierarchies of voice menus by
pressing buttons on the telephone keypad
Potential problems
Lost in space
Time-consuming menus

5
Speech recognition

Potential problems
Understandability
Time-consuming dialogs
Users may interrupt prompts by barge-in

6
Table 3.1 Speech Reco Engines

7
For voice portals

Continuous speech
Speaker independent
Switch vocabularies
Spontaneous speech
Multi-threaded

8
Figure 3.4 Speech Recognition
9
Phoneme identification

Use acoustic model to transform extracted
features to sequences of phonemes
Approaches
Neural networks
Hidden Markov models

10
Word identification

Use words in language model to convert sequences
of phonemes to words
Two approaches
Grammars
N-grams

11
Figure 3.6 Language Model Creation
12
Developers responsibility

Acoustic model
Lexicon
Language model

13
3.4 Voice identification

General techniques for identifying people
Something you know
Something you have
Something about you

14
Figure 3.2 Speaker registration, identification,
and authentication
15
Voice id technologies are appealing

Are unobtrusive
Are location independent
Require no special equipment
Replace passwords

16
Why voice technologies fail

Siblings with similar voice profiles
Teenage male voice break
Colds, sore throats, sore lips, etc
Variety of microphones
Tape recordings

17
Measuring accuracy of speech id systems
18
3.5 Language identification

Explicit selectionthe caller speaks the name of
his or her preferred national language
ltpromptgt ltspeak xmllang"en-us"gtFor
service in English, say English lt/speakgt
ltspeak xmllang"fr"gtpour service en français,
dites français lt/speakgt lt/promptgt
Implicit selectiona default language is used
unless the caller overrides the default with an
explicit selection ltpromptgt
ltspeak xmllang"fr"gtBienvenue a portal
françaislt/speakgt
ltspeak xmllang"en-us"gtFor
Service in English, say Englishlt/speakgt
lt/promptgt
Calling area
Caller profile
One number per language.
Language recognition technology

19
3.6 Word spotting

Attention word
User signals that he/she will speak
Switching contexts
Suspend current activity and begin a new activity
Extract critical words
Security

20
3.7. Language understanding

Knowledge representation techniques
Parse trees
Form templates
Semantic net
Approaches for creating knowledge representation
Parser
Semantic attachments
General natural language understanding algorithm

21
3.8 Classification

Uses
Classify documents into categories and topics
Categorize graphical objects
Replace menu hierarchies in voice applications
Navigating in large web sites
Locating a chapter or section of a large document
Locating a Web site about a specific topic
Locating descriptions of goods or services in a
large on-line database
Example
How may I help you? from ATT

22
3.9 Dialog management

Human-driven conversational dialogsthe
person repeatedly asks a question or speaks a
command and the computer responds.
Application-driven conversational
dialogsthe application repeatedly asks questions
to solicit answers and instructions from a
caller.
Mixed-initiative dialogshuman-driven and
computer-driven dialogs are combined. The caller
and computer take turns driving the
conversations.

23
Figure 3.9 Voice Interpreter and voice Browser
24
3.10 NL processing

Machine translation
Word replacement
Phrase translation
Full national language translation
Query generation
Generate SQL query from knowledge representation
Summarization
Generate English summary from knowledge
representation
Generation
Prerecorded sentences
Templates
Reversible parse tree

25
Figure 3.7 Speech synthesis
26
Figure 3.8 Concatenative Speech
27
Table 3.2 Concatenative vs parameter-based
speech synthesis
28
3.12 Music synthesis