Phonology from a computational point of view - PowerPoint PPT Presentation

About This Presentation
Title:

Phonology from a computational point of view

Description:

Phonology from a computational point of view Phonemes, dialects, letter-to-sound conversion March 2001 Phonology: The study of the sound patterns of languages. – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 69
Provided by: Prefer861
Category:

less

Transcript and Presenter's Notes

Title: Phonology from a computational point of view


1
Phonology from a computational point of view
  • Phonemes, dialects, letter-to-sound conversion
  • March 2001

2
Phonology
  • The study of the sound patterns of languages.
  • We will extend this to include the letter
    patterns of languages.

3
Syntax
Information Retrieval
Morphology catch PAST
Spelling caught
Phonemic representation K AO1 T
Sound
4
Why study phonology in this course?
  • Text to speech (TTS) applications include a
    component which converts spelled words to
    sequences of phonemes ( sound representations).
  • E.g., sight ?S AY1 T
  • John ? J AA1 N

5
Keep separate
  • Spelling ( orthography)
  • Detailed description of pronunciation
  • Abstract description of pronunciation called
    phonemic representation

6
  • Agenda
  • Phonology set of phonemes their realizations as
    phones
  • The phonemes are reasonably constant across a
    language.
  • The phones vary a lot within a speaker and across
    speakers.
  • Some of that variation is extremely rule-governed
    and must be understood example, English flap
    (in butter).

7
  1. In addition to the phonemes syllable structure,
    and
  2. Prosody. Today stress levels 0,1,2
  3. Texts discussion of spelling errors, as a
    lead-in to Viterbi-ing the Minimum Edit Distance
  4. Letter to sound (LTS)

8
  • All speakers have a set of several dozen basic
    pronunciation units (phonemes) to which they do
    not add (or from which delete) during their adult
    lifetimes. 39 phonemes in American English.
  • This phonemic inventory is not completely fixed
    and stable across the United States, but it is
    much more fixed and stable than is the
    pronunciation of these phonemes.

9
How is that possible?
  • Im from New York the vowel that I have in cat
    is very different from the vowel in a south
    Chicago natives cat but the phonemes are the
    same they correspond across thousands of words.

10
Phonemic inventory
  • In computational circles, phonemic inventory
    described in DARPAbet
  • Some words from the CMU dictionary
  • THE DH AH0
  • THE(2) DH AH1
  • THE(3) DH IY0
  • THEA TH IY1 AH0
  • THEALL TH IY1 L
  • THEANO TH IY1 N OW0
  • THEATER TH IY1 AH0 T ER0

11
Darpabet
  • AA odd AA D
  • AE at AE T
  • AH hut HH AH T
  • AO ought AO T
  • AW cow K AW
  • AY hide HH AY D

12
15 Vowels
  • AA odd AA D
  • AE at AE T
  • AH hut HH AH T
  • AO ought AO T
  • AW cow K AW
  • AY hide HH AY D
  • EH Ed EH D
  • ER hurt HH ER T
  • EY ate EY T
  • IH it IH T
  • IY eat IY T
  • OW oat OW T
  • OY toy T OY
  • UH hood HH UH D
  • UW two T UW

13
24 Consonants
  • Z zee Z IY
  • ZH seizure S IY ZH ER
  • HH he HH IY
  • CH cheese CH IY Z
  • JH gee JH IY
  • L lee L IY
  • M me M IY
  • N knee N IY
  • NG ping P IY NG
  • R read R IY D
  • W we W IY
  • Y yield Y IY L D
  • B be B IY
  • D dee D IY
  • G green G R IY N
  • P pee P IY
  • T tea T IY
  • K key K IY
  • S sea S IY
  • SH she SH IY
  • F fee F IY
  • V vee V IY
  • DH thee DH IY
  • TH theta TH EY T AH

14
Moby system http//www.dcs.shef.ac.uk/research/ila
sh/Moby/
  • // sounds like the "a" in "dab"
  • /(_at_)/ sounds like the "a" in "air"
  • /A/ sounds like the "a" in "far"
  • /eI/ sounds like the "a" in "day"
  • /_at_/ sounds like the "a" in "ado"
  • or the glide "e" in "system" (dipthong schwa)
  • /-/ sounds like the "ir" glide in "tire"
  • or the "dl" glide in "handle"
  • or the "den" glide in "sodden" (dipthong little
    schwa)
  • /Oi/ sounds like the "oi" in "oil"
  • /A/ sounds like the "o" in "bob"
  • /AU/ sounds like the "ow" in "how"
  • /O/ sounds like the "o" in "dog"

15
Some sources of dictionaries,including CMUs
  • ftp//svr-ftp.eng.cam.ac.uk/pub/pub/pub/comp.speec
    h/dictionaries

16
The tremendous variety of actual pronunciations
that native speakers can blissfully ignore is
staggering
  • But speech recognition systems need to be trained
    on this, just as people are in their youth.

17
Varieties of sounds in everyones speech
  • Most phonemes have several different
    pronunciations (called their allophones),
    determined by nearby sounds, most usually by the
    following sound.
  • The most striking instance of such variation is
    in the realization of the phoneme /T/ in American
    English.

18
  • Well return to the flap after the syllable.

19
The syllable
S
rhyme
onset
coda
nucleus
h e l p
20
Flap (D) in American English
  • We find the flap of water (waDer) under these
    conditions strictly inside a word

21
But across words
  • Word initial t never flaps, regardless of
    stresses before or after eat my tomato, see
    Topeka...
  • Word-final t followed by a vowel-initial word
    normally does flap, regardless of stresses before
    or after. at all, sit on it...
  • But in the words to, tonight, today, tomorrow,
    the to acts as if it were linked to the
    preceding word. go Do bed

22
Generalization
  • English permits phonemes to belong simultaneously
    to two syllables ( be ambisyllabic) under
    certain conditions.
  • Ambisyllabic t's convert to flaps.
  • Generally speaking

23
s s
onset rhyme onset rhyme
B UH1 T ER
This is where we get a flap in American English
24
  • Within a word
  • C becomes part of syllable with a following onset
    ("maximize syllable onset")

25
...within a word
s
C
V
26
This also applies across words --in English, and
in many languages, but not (e.g.) in German
s
V
C


27
Within a word, ambisyllabification before an
unstressed vowel
e.g., atom
s
s
V
C
V
-stress
stress
28
But not across word boundaries
we don't say my tomato my Domato
29
/T/ as flap inside words
following stressed following unstressed
preceding stressed no flap Beethoven, attar flap matter, cattle
preceding unstressed no flap return, Mattel optional sanity
30
/T/ as flap at word-edge
  • If a word ends in a /t/ and the next word starts
    with a vowel, flap is normal
  • at D all, What D is your name?, etc.
  • If a word ends in a vowel and the next word
    starts with a vowel, never a flap unless the
    second word starts with the prefix to- !
  • the t tomato, the t topology of but
  • go D to the moon, go D tomorrow

31
Most computational devices avoid worrying about
these issues
  • by (always) treating phonemes in the context of
    their left- and right-hand neighbors.
  • Need to produce an AE? Find out what neighbors it
    needs to be produced next to. H AE T? Find an AE
    that was produced after an H and before a T.

32
Variation in pronunciation islargely
geographical, but it is also related to class,
race, and gender
  • William Labov is the master analyst of this
    material, and many papers are available at his
    web site
  • http//www.ling.upenn.edu/labov/home.html
  • See especially his
  • http//www.ling.upenn.edu/phono_atlas/ICSLP4.html
    Dialect Diversity in North America

33
Ongoing changes in American English pronunciation
  • 1. Loss of difference between AA (cot) and AO
    (caught). See also hot dog (h AA t d AO g).
  • Some speakers produce these vowels differently (I
    do). Others do not.
  • Labovs group has produced the following map

34
AA / AO distinction/collapse
35
Distinction between vowels IH and EH before n
  • ink-pen versus baby-pin
  • distinction lost in the South.

36
in/en distinction (pin/pen)
37
Variation in AE phoneme (hat)
  • A very wide range of American speakers do NOT
    have the same vowels in sand and sang.
  • The vowels in cat and sang are the same, but in
    sand the vowel is much higher.
  • However, in the Northern Cities shift, all AE is
    pronounced like the last two syllables of idea
    this is prevalent right here in the south Chicago
    area.

38
(No Transcript)
39
Sound Letter relationships
  • LTS Letter to sound, or
  • Phoneme-Grapheme relationships.
  • In most languages, this is simple.
  • But in English and in French, its very messy.
  • Why? Because the spelling system in both is based
    on how the language used to be pronounced, and
    the pronunciation has since changed.

40
Other languages
  • In most other languages, spelling reflects
    current pronunciation much more accurately.
  • Stress most languages dont mark which syllable
    is stressed. In some languages, there are simple
    principles that tell us which syllable is
    stressed, but when there are no such principles
    (e.g. English, Russian), then you need to build
    word-lists with the stressed indicated.

41
Letter to sound for English
  • Letter gtgt phoneme for speech synthesis
  • Phoneme gtgt letter for speech recognition

42
Challenges to Letter-to-Sound
  • There are always new words being found, and most
    of them are new proper names (people, places,
    products, companies, etc.)

43
Damper, Marchand, Adamson and Gustafson 1998
Testing Letter to Sound
  • Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS
    November 1998
  • They contest Liberman and Churchs statement in
    1991
  • We will describe algorithms for pronunciation of
    English wordsthat reduce the error rate to only
    a few tenths of a percent for ordinary text,
    about two orders of magnitude better than the
    word error rates of 15 or so that were common a
    decade ago.
  • They write,
  • In this paper, we have shown that automatic
    pronunciation of novel words is not a solved
    problem in TTS synthesis. The best that can be
    done is about 70 words correct using PbA
    Pronunciation by Analogytraditional
    rulesperform very badly much worse than
    pronunciation by analogy and other data-driven
    approaches.

44
Damper et al.
  • Compare 4 approaches
  • Hand-written phonological rules
  • Pronunciation by analogy (based on Dedina and
    Nusbaum 1991)
  • Neural networks (based on Sejnowski and
    Rosenbergs NETtalk)
  • Information theory-based approach (Nearest
    neighbor)

45
How to evaluate LTS?
  • Systems typically use
  • a large dictionary
  • a set of exceptional words
  • a backoff strategy for words that slip through
    the first 2 steps.
  • Is it fair to test the backoff strategy on words
    in the first two sets, then?

46
  • Damper et al propose
  • Test on a single, entire, large dictionary
  • Strict scoring, not frequency-weighted, giving
    credit only for full-word correct
  • A standardized phoneme output set should be
    employed

47
Evaluation
  • In reality, different descriptions of English use
    different sets of phonemes (e.g., is stress
    marked on the vowels? British versus American)
  • Issues in testing data-driven methods, because
    the performance of a data-driven method is
    tightly linked to the data it was trained on.

48
Data-driven method
Data
Learning method
Letter-to-sound conversion system
49
  • In theory, you should never test a data-driven
    method on data that it was trained on.
  • In theory, if you want to test the performance of
    the method on the whole dictionary, you can train
    the system on the whole dictionary less one word,
    and then test it on that word and do all of that
    each time for each word.
  • But that takes too long! and were also
    interested in the relationship between training
    corpus size and total performance.

50
Damper et als work-around
  • For various values of N (up to half the size of
    the dictionary)
  • Take two random samples of the dictionary, each
    of size N. Train on one set, test on the other.
  • N 100, 500, 1000, 2000, 5000 and 8,140.
  • Dictionary is of size 16,280.

51
Results Hand-written rules
  • Elovitz et al hand-written rules for this
    purpose. 25.7 of words were entirely correct.
    Length errors (especially due to geminate
    consonants), /g/-/j/ confusions and vowel
    substitutions abound. Extensive efforts were
    made to make sure that this low figure was not an
    error!

52
Pronunciation by analogy
  • Begin with a (hand-made) alignment of letters to
    sounds. For every observed string of letters,
    gather the set of phonemes that it can be
    associated with, and store in data-structure
    along with their frequency.
  • For the test word, find all ways of dividing the
    word up into pieces that are present in the data
    structure. Weight the resulting analyses by (1)
    how many subpieces are involved, and (2)
    frequencies of the subpieces, and choose the
    best.

53
Results PbA neural net
  • PbA 71.8 correct.
  • Neural net 54.4, when trained on the whole
    dictionary

54
Information-Gain trees
  • IB1-IG 57.4 correct
  • This approach is a variant on decision-tree
    learning (an important paradigm in machine
    learning).

55
  • In simplest terms, a decision-tree approach
    studies a problem like, What phoneme realizes
    this letter in this context? by looking at all
    relevant examples in the data, and considering
    all context data (what precedes, what follows,
    etc.) and deciding, first, which factor gives
    the most information
  • Measure the uncertainty first uncertainty of how
    this t should be pronounced
  • Measure the uncertainty if you know what the
    following letter is.
  • Measuring uncertainty

56
Entropy as measure of uncertainty
  • Set of possibilities for realizing t
  • T 64
  • TH 36
  • calculate
  • 0.64 log (0.64) 0.36 log (0.36)
  • and multiply by 1 0.94268

57
  • realization of t
  • if following letter is h (36)
  • T .02
  • TH .98
  • Entropy -1(.02log(.02) .98 log(.98) )
  • .14144 (base 2 logs!)
  • if following letter is anything else (64)
  • T 1.00
  • TH .00
  • Entropy -1 ( 1 log 1)0 log 0 ) 0
  • Total entropy now 0.36 .14144 0
  • .05092 a huge decrease from 0.94268!

58
Information gain and LTS
  • The idea is to use this method of testing to
    automatically determine which aspects of a
    letters neighborhood are most revealing in
    determining how that letter should be realized in
    that word.
  • But 57.4 fully correct results in this
    experiment.

59
Bottom line
  • Still a lot of work to be done both in getting
    results and testing how well various methods
    work.

60
Minimal Edit Distance
  • A first look at Viterbi in action

61
  • Whats the best way to line up two different
    strings? To answer that question, we have to make
    some specifications.
  • One (p. 53ff in textbook, Section 5.6) could be
    that perfect alignments are free, while a
    deletion (non-alignment) costs 1 and a
    substitution costs 2.

62
E X E C U T I O N
I N T E N T I O N
These are free and there are no reduced fares
for any kind of partial match for the others.
63
Cost 3 substitutions 2 hangings 8
E X E C U T I O N
I N T E N T I O N
64
Cost 1 substitutions 6 hangings 8
Same cost thats how weve set up the problem.
E X E C U T I O N
I N T E N T I O N
65
N 9 10 11 10 11 12 11 10 9 8
O 8 9 10 9 10 11 10 9 8 9
I 7 8 9 8 9 10 9 8 9 10
T 6 7 8 7 8 9 8 9 10 11
N 5 6 7 6 7 8 9 10 11 12
E 4 5 6 5 6 7 8 9 10 11
T 3 4 5 6 7 8 9 10 11 12
N 2 3 4 5 6 7 8 8 10 11
I 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
E X E C U T I O N
66
  • The chart tells us something about how we walk
    through it, but (the books not clear on this),
    we also have to keep track on a memo-pad what the
    best path was that got us to that box.
  • We need to find a path that only goes Right, Up,
    or Both (Up Right) and leads us to the best
    final box.

67
  • We can arbitrarily choose one of the best ways to
    get to a box in this case, because the problem at
    hand doesnt set different costs depending on the
    row-transitions. But very frequently such costs
    must be borne in mind.

68
N 9 10 11 10 11 12 11 10 9 8
O 8 9 10 9 10 11 10 9 8 9
I 7 8 9 8 9 10 9 8 9 10
T 6 7 8 7 8 9 8 9 10 11
N 5 6 7 6 7 8 9 10 11 12
E 4 5 6 5 6 7 8 9 10 11
T 3 4 5 6 7 8 9 10 11 12
N 2 3 4 5 6 7 8 8 10 11
I 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
E X E C U T I O N
Write a Comment
User Comments (0)
About PowerShow.com