Pronouncing Words in TTS Systems - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Pronouncing Words in TTS Systems

Description:

Pronunciation by rhyming analogy. Expanding the lexicon using Active Learning techniques ... Morphological preprocessing before dictionary look-up ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: juliahir
Category:

less

Transcript and Presenter's Notes

Title: Pronouncing Words in TTS Systems


1
Pronouncing Words in TTS Systems
  • Julia Hirschberg
  • CS 4706

2
Today
  • Motivation
  • Improve TTS intelligibility and naturalness
  • An application Language Learning
  • Challenges for automatic word pronunciation
  • Standard methods
  • Pronunciation by rule
  • Pronouncing dictionaries
  • Innovative solutions
  • Pronunciation by language origin
  • Pronunciation by rhyming analogy
  • Expanding the lexicon using Active Learning
    techniques

3
Motivation
  • Intelligibility
  • Naturalness
  • Applications to language learning
  • Unlimited vocabulary
  • Type a word or phrase and hear it spoken in your
    target language
  • To imitate
  • To learn to recognize

4
Converting Text to Phonemes
  • Pronouncing numbers in different contexts
  • Identifying proper names
  • Expanding abbreviations and acronyms correctly

5
Numbers
  • Pronouncing numbers in different contexts
  • In 1996 she sold 1995 shares and deposited 42 in
    her 401(k).
  • The number is 212-555-1210.
  • That cc is Visa 4444-3607-5959, expiration
    2/07.
  • Conventions
  • Years
  • Money
  • Phone numbers
  • Money amounts

6
Cultural Dependence
  • Russia
  • Article 3 of the rules attached to the Moscow
    Telephone Network Subscribers Directory, 1916
  • Numbers over a hundred are to be pronounced
    as follows 1.23one twenty three, 9.72nine
    seventy two, 70.09seventy zero nine. In numbers
    over 10,000 every figure of a hundred should
    be pronounced separately, for example,
    1.20.48one twenty forty eight, 2.08.35two zero
    eight thirty five, 3.35.29three thirty five
    twenty nine, 4.49.52four forty nine fifty two,
    5.15.86five fifteen eighty six etc., not one
    hundred and twenty forty eight, two hundred and
    eight thirty five etc.

7
  • In France
  • A French phone number is 10 digits given in
    series of two
  • 01-43-48-12-85
  • "Zéro un, quarante-trois, quarante-huit, douze,
    quatre-vingt-cinq".
  • Numbers in addresses are always pronounced as a
    full number
  • Chambre 823, 240 rie Rivoli
  • Chambre huit-cent-vingt-trois. Deux-cent-quarante,
    rue de Rivoli

8
Pronouncing Words
  • Part-of-speech
  • use, close, dove, multiply, coax
  • Origin
  • shoe (ME shoo), phoenix (Gr)
  • mole, attaches, resume
  • Morphological analysis
  • ferryboat, ferryboats
  • Popemobile
  • Letter-to-sound correspondences
  • oo, th, qu, e (beet, bet, bite, weigh,)

9
  • Conventions for numbers and symbols c, evalu8,
    cu, tsp
  • Genre email, chat, recipe, want ad, software
    license

10
Word Sense Ambiguity and Pronunciation
  • Homographs
  • bass/bass
  • Nice/nice
  • desert/desert
  • Homograph disambiguation

11
Letter-to-Sound Rules
  • E.g.
  • I _Ce ? /ai/ rise
  • Else I ? /ih/ rip
  • Advantages
  • Pronounces anything, seen or not
  • Disadvantages
  • Must be built by hand
  • How to encode all the exceptions
  • E.g. ripen/risen/riser/river

12
  • Proper names
  • Nice, Ramirez, Ribeiro, Rise, Infiniti
  • Solutions
  • More complex rules
  • Exceptions dictionary
  • Consulted first
  • But how handle morphological analysis?
  • Rises hat

13
Dictionary-based Approaches
  • Rely on very large dictionary
  • Disadvantages
  • Hand labor to create entries
  • Redundancy
  • Cat, cats, cats, cats
  • Out-of-vocabulary items
  • Proper names covering all U.K. surnames would
    require gt5,000,000 entries
  • New words fax, email, mudd,
  • Technical terms liposuction, anova, bernaise
  • Foreign borrowings frappe, ciao, louche

14
  • Solutions
  • Morphological preprocessing before dictionary
    look-up
  • Fall back to L2Sound rules if no dictionary hit

15
More Innovative Approaches
  • Pronouncing OOV words
  • Handling proper names
  • Inferring country of origin Takashita, Leroy,
    Kirov, Lima, Infiniti
  • Pronunciation by analogy
  • Analog/dialog
  • Risible/visible
  • Proper names Alifano/Califano

16
Bootstrapping Phonetic Lexicons (Maskey et al 04)
  • For some languages, online pronouncing lexicons
    exist but for others.e.g. Nepali
  • How to minimize effort in creating lexicons?
  • Idea
  • Given a native speaker and a large amount of
    online text in the language
  • Native speaker builds small lexicon by hand for
    seed set of N most common words in text, e.g.
  • is /izh/
  • the /dhax/

17
  • Derive L2S rules from lexicon automatically, e.g.
  • is ? ihzh
  • the ? dhax
  • Loop Choose the next N most common set of words
    from the text and use the lexicon L2S rules to
    predict pronunciations, e.g.
  • telephone -gt /telaxfown/
  • He -gt /hax/?
  • Rise -gt /rihzhax/?
  • Assign a confidence score to each prediction by
    comparing each word to all words in lexicon
  • If is -gt /ihzh in lexicon and no other
    orthographically similar words are pronounced
    differently, new rule his -gt /hihzh/ scores high
  • For low confidence pronunciations, Active
    Learning step
  • Inspect and calculate error rate
  • Hand correct errors and add all to lexicon

18
  • Build a new set of L2S rules from augmented
    lexicon
  • Iterate from Loop until performance stabilizes
  • Results
  • English
  • 94 success on test set after 23 iterations, 16K
    entry lexicon
  • Performance comparable to CMUDict and 1/7 the
    size
  • German
  • 90 accuracy after 13 iterations, 28K lexicon
  • Nepali

19
  • 94.6 accuracy after 16 iterations, 5K lexicon

20
Improving Pronunciation Dictionary Coverage
(Fackrell and Skut 04)
  • Idea Many proper names have more than one
    spelling (e.g. More, Moore Smith Smythe)
  • Find a mapping between OOV (Out of Vocabulary)
    spellings and alternate spellings a fuzzy
    match
  • Identify spelling alternations that are
    pronunciation-neutral in an existing lexicon to
    produce rewrite rules for OOV words

21
How do current systems do on pronunciation?
  • Loquendo (temporarily unavailable)
  • CEPSTRAL
  • ATT Naturally Speaking

22
Next Class
Accenting and information status
Write a Comment
User Comments (0)
About PowerShow.com