Phonology from a computational point of view

About This Presentation

Title:

Phonology from a computational point of view

Description:

Phonology from a computational point of view Phonemes, dialects, letter-to-sound conversion March 2001 Phonology: The study of the sound patterns of languages. – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 69

Provided by: Prefer861

Category:

more less

Transcript and Presenter's Notes

Title: Phonology from a computational point of view

1
Phonology from a computational point of view

Phonemes, dialects, letter-to-sound conversion
March 2001

2
Phonology

The study of the sound patterns of languages.
We will extend this to include the letter
patterns of languages.

3
Syntax
Information Retrieval
Morphology catch PAST
Spelling caught
Phonemic representation K AO1 T
Sound
4
Why study phonology in this course?

Text to speech (TTS) applications include a
component which converts spelled words to
sequences of phonemes ( sound representations).
E.g., sight ?S AY1 T
John ? J AA1 N

5
Keep separate

Spelling ( orthography)
Detailed description of pronunciation
Abstract description of pronunciation called
phonemic representation

Agenda
Phonology set of phonemes their realizations as
phones
The phonemes are reasonably constant across a
language.
The phones vary a lot within a speaker and across
speakers.
Some of that variation is extremely rule-governed
and must be understood example, English flap
(in butter).

In addition to the phonemes syllable structure,
and
Prosody. Today stress levels 0,1,2
Texts discussion of spelling errors, as a
lead-in to Viterbi-ing the Minimum Edit Distance
Letter to sound (LTS)

All speakers have a set of several dozen basic
pronunciation units (phonemes) to which they do
not add (or from which delete) during their adult
lifetimes. 39 phonemes in American English.
This phonemic inventory is not completely fixed
and stable across the United States, but it is
much more fixed and stable than is the
pronunciation of these phonemes.

9
How is that possible?

Im from New York the vowel that I have in cat
is very different from the vowel in a south
Chicago natives cat but the phonemes are the
same they correspond across thousands of words.

10
Phonemic inventory

In computational circles, phonemic inventory
described in DARPAbet
Some words from the CMU dictionary
THE DH AH0
THE(2) DH AH1
THE(3) DH IY0
THEA TH IY1 AH0
THEALL TH IY1 L
THEANO TH IY1 N OW0
THEATER TH IY1 AH0 T ER0

11
Darpabet

AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D

12
15 Vowels

AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
EH Ed EH D
ER hurt HH ER T

EY ate EY T
IH it IH T
IY eat IY T
OW oat OW T
OY toy T OY
UH hood HH UH D
UW two T UW

13
24 Consonants

Z zee Z IY
ZH seizure S IY ZH ER
HH he HH IY
CH cheese CH IY Z
JH gee JH IY
L lee L IY
M me M IY
N knee N IY
NG ping P IY NG
R read R IY D
W we W IY
Y yield Y IY L D

B be B IY
D dee D IY
G green G R IY N
P pee P IY
T tea T IY
K key K IY
S sea S IY
SH she SH IY
F fee F IY
V vee V IY
DH thee DH IY
TH theta TH EY T AH

14
Moby system http//www.dcs.shef.ac.uk/research/ila
sh/Moby/

// sounds like the "a" in "dab"
/(_at_)/ sounds like the "a" in "air"
/A/ sounds like the "a" in "far"
/eI/ sounds like the "a" in "day"
/_at_/ sounds like the "a" in "ado"
or the glide "e" in "system" (dipthong schwa)
/-/ sounds like the "ir" glide in "tire"
or the "dl" glide in "handle"
or the "den" glide in "sodden" (dipthong little
schwa)
/Oi/ sounds like the "oi" in "oil"
/A/ sounds like the "o" in "bob"
/AU/ sounds like the "ow" in "how"
/O/ sounds like the "o" in "dog"

15
Some sources of dictionaries,including CMUs

ftp//svr-ftp.eng.cam.ac.uk/pub/pub/pub/comp.speec
h/dictionaries

16
The tremendous variety of actual pronunciations
that native speakers can blissfully ignore is
staggering

But speech recognition systems need to be trained
on this, just as people are in their youth.

17
Varieties of sounds in everyones speech

Most phonemes have several different
pronunciations (called their allophones),
determined by nearby sounds, most usually by the
following sound.
The most striking instance of such variation is
in the realization of the phoneme /T/ in American
English.

Well return to the flap after the syllable.

19
The syllable
S
rhyme
onset
coda
nucleus
h e l p
20
Flap (D) in American English

We find the flap of water (waDer) under these
conditions strictly inside a word

21
But across words

Word initial t never flaps, regardless of
stresses before or after eat my tomato, see
Topeka...
Word-final t followed by a vowel-initial word
normally does flap, regardless of stresses before
or after. at all, sit on it...
But in the words to, tonight, today, tomorrow,
the to acts as if it were linked to the
preceding word. go Do bed

22
Generalization

English permits phonemes to belong simultaneously
to two syllables ( be ambisyllabic) under
certain conditions.
Ambisyllabic t's convert to flaps.
Generally speaking

23
s s
onset rhyme onset rhyme
B UH1 T ER
This is where we get a flap in American English
24

Within a word
C becomes part of syllable with a following onset
("maximize syllable onset")

25
...within a word
s
C
V
26
This also applies across words --in English, and
in many languages, but not (e.g.) in German
s
V
C

27
Within a word, ambisyllabification before an
unstressed vowel
e.g., atom
s
s
V
C
V
-stress
stress
28
But not across word boundaries
we don't say my tomato my Domato
29
/T/ as flap inside words
following stressed following unstressed
preceding stressed no flap Beethoven, attar flap matter, cattle
preceding unstressed no flap return, Mattel optional sanity
30
/T/ as flap at word-edge

If a word ends in a /t/ and the next word starts
with a vowel, flap is normal
at D all, What D is your name?, etc.
If a word ends in a vowel and the next word
starts with a vowel, never a flap unless the
second word starts with the prefix to- !
the t tomato, the t topology of but
go D to the moon, go D tomorrow

31
Most computational devices avoid worrying about
these issues

by (always) treating phonemes in the context of
their left- and right-hand neighbors.
Need to produce an AE? Find out what neighbors it
needs to be produced next to. H AE T? Find an AE
that was produced after an H and before a T.

32
Variation in pronunciation islargely
geographical, but it is also related to class,
race, and gender

William Labov is the master analyst of this
material, and many papers are available at his
web site
http//www.ling.upenn.edu/labov/home.html
See especially his
http//www.ling.upenn.edu/phono_atlas/ICSLP4.html
Dialect Diversity in North America

33
Ongoing changes in American English pronunciation

1. Loss of difference between AA (cot) and AO
(caught). See also hot dog (h AA t d AO g).
Some speakers produce these vowels differently (I
do). Others do not.
Labovs group has produced the following map

34
AA / AO distinction/collapse
35
Distinction between vowels IH and EH before n

ink-pen versus baby-pin
distinction lost in the South.

36
in/en distinction (pin/pen)
37
Variation in AE phoneme (hat)

A very wide range of American speakers do NOT
have the same vowels in sand and sang.
The vowels in cat and sang are the same, but in
sand the vowel is much higher.
However, in the Northern Cities shift, all AE is
pronounced like the last two syllables of idea
this is prevalent right here in the south Chicago
area.

38
(No Transcript)
39
Sound Letter relationships

LTS Letter to sound, or
Phoneme-Grapheme relationships.
In most languages, this is simple.
But in English and in French, its very messy.
Why? Because the spelling system in both is based
on how the language used to be pronounced, and
the pronunciation has since changed.

40
Other languages

In most other languages, spelling reflects
current pronunciation much more accurately.
Stress most languages dont mark which syllable
is stressed. In some languages, there are simple
principles that tell us which syllable is
stressed, but when there are no such principles
(e.g. English, Russian), then you need to build
word-lists with the stressed indicated.

41
Letter to sound for English

Letter gtgt phoneme for speech synthesis
Phoneme gtgt letter for speech recognition

42
Challenges to Letter-to-Sound

There are always new words being found, and most
of them are new proper names (people, places,
products, companies, etc.)

43
Damper, Marchand, Adamson and Gustafson 1998
Testing Letter to Sound

Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS
November 1998
They contest Liberman and Churchs statement in
1991
We will describe algorithms for pronunciation of
English wordsthat reduce the error rate to only
a few tenths of a percent for ordinary text,
about two orders of magnitude better than the
word error rates of 15 or so that were common a
decade ago.
They write,
In this paper, we have shown that automatic
pronunciation of novel words is not a solved
problem in TTS synthesis. The best that can be
done is about 70 words correct using PbA
Pronunciation by Analogytraditional
rulesperform very badly much worse than
pronunciation by analogy and other data-driven
approaches.

44
Damper et al.

Compare 4 approaches
Hand-written phonological rules
Pronunciation by analogy (based on Dedina and
Nusbaum 1991)
Neural networks (based on Sejnowski and
Rosenbergs NETtalk)
Information theory-based approach (Nearest
neighbor)

45
How to evaluate LTS?

Systems typically use
a large dictionary
a set of exceptional words
a backoff strategy for words that slip through
the first 2 steps.
Is it fair to test the backoff strategy on words
in the first two sets, then?

Damper et al propose
Test on a single, entire, large dictionary
Strict scoring, not frequency-weighted, giving
credit only for full-word correct
A standardized phoneme output set should be
employed

47
Evaluation

In reality, different descriptions of English use
different sets of phonemes (e.g., is stress
marked on the vowels? British versus American)
Issues in testing data-driven methods, because
the performance of a data-driven method is
tightly linked to the data it was trained on.

48
Data-driven method
Data
Learning method
Letter-to-sound conversion system
49

In theory, you should never test a data-driven
method on data that it was trained on.
In theory, if you want to test the performance of
the method on the whole dictionary, you can train
the system on the whole dictionary less one word,
and then test it on that word and do all of that
each time for each word.
But that takes too long! and were also
interested in the relationship between training
corpus size and total performance.

50
Damper et als work-around

For various values of N (up to half the size of
the dictionary)
Take two random samples of the dictionary, each
of size N. Train on one set, test on the other.
N 100, 500, 1000, 2000, 5000 and 8,140.
Dictionary is of size 16,280.

51
Results Hand-written rules

Elovitz et al hand-written rules for this
purpose. 25.7 of words were entirely correct.
Length errors (especially due to geminate
consonants), /g/-/j/ confusions and vowel
substitutions abound. Extensive efforts were
made to make sure that this low figure was not an
error!

52
Pronunciation by analogy

Begin with a (hand-made) alignment of letters to
sounds. For every observed string of letters,
gather the set of phonemes that it can be
associated with, and store in data-structure
along with their frequency.
For the test word, find all ways of dividing the
word up into pieces that are present in the data
structure. Weight the resulting analyses by (1)
how many subpieces are involved, and (2)
frequencies of the subpieces, and choose the
best.

53
Results PbA neural net

PbA 71.8 correct.
Neural net 54.4, when trained on the whole
dictionary

54
Information-Gain trees

IB1-IG 57.4 correct
This approach is a variant on decision-tree
learning (an important paradigm in machine
learning).

In simplest terms, a decision-tree approach
studies a problem like, What phoneme realizes
this letter in this context? by looking at all
relevant examples in the data, and considering
all context data (what precedes, what follows,
etc.) and deciding, first, which factor gives
the most information
Measure the uncertainty first uncertainty of how
this t should be pronounced
Measure the uncertainty if you know what the
following letter is.
Measuring uncertainty

56
Entropy as measure of uncertainty

Set of possibilities for realizing t
T 64
TH 36
calculate
0.64 log (0.64) 0.36 log (0.36)
and multiply by 1 0.94268

realization of t
if following letter is h (36)
T .02
TH .98
Entropy -1(.02log(.02) .98 log(.98) )
.14144 (base 2 logs!)
if following letter is anything else (64)
T 1.00
TH .00
Entropy -1 ( 1 log 1)0 log 0 ) 0
Total entropy now 0.36 .14144 0
.05092 a huge decrease from 0.94268!

58
Information gain and LTS

The idea is to use this method of testing to
automatically determine which aspects of a
letters neighborhood are most revealing in
determining how that letter should be realized in
that word.
But 57.4 fully correct results in this
experiment.

59
Bottom line

Still a lot of work to be done both in getting
results and testing how well various methods
work.

60
Minimal Edit Distance

A first look at Viterbi in action

Whats the best way to line up two different
strings? To answer that question, we have to make
some specifications.
One (p. 53ff in textbook, Section 5.6) could be
that perfect alignments are free, while a
deletion (non-alignment) costs 1 and a
substitution costs 2.

62
E X E C U T I O N
I N T E N T I O N
These are free and there are no reduced fares
for any kind of partial match for the others.
63
Cost 3 substitutions 2 hangings 8
E X E C U T I O N
I N T E N T I O N
64
Cost 1 substitutions 6 hangings 8
Same cost thats how weve set up the problem.
E X E C U T I O N
I N T E N T I O N
65
N 9 10 11 10 11 12 11 10 9 8
O 8 9 10 9 10 11 10 9 8 9
I 7 8 9 8 9 10 9 8 9 10
T 6 7 8 7 8 9 8 9 10 11
N 5 6 7 6 7 8 9 10 11 12
E 4 5 6 5 6 7 8 9 10 11
T 3 4 5 6 7 8 9 10 11 12
N 2 3 4 5 6 7 8 8 10 11
I 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
E X E C U T I O N
66

The chart tells us something about how we walk
through it, but (the books not clear on this),
we also have to keep track on a memo-pad what the
best path was that got us to that box.
We need to find a path that only goes Right, Up,
or Both (Up Right) and leads us to the best
final box.

We can arbitrarily choose one of the best ways to
get to a box in this case, because the problem at
hand doesnt set different costs depending on the
row-transitions. But very frequently such costs
must be borne in mind.

68
N 9 10 11 10 11 12 11 10 9 8
O 8 9 10 9 10 11 10 9 8 9
I 7 8 9 8 9 10 9 8 9 10
T 6 7 8 7 8 9 8 9 10 11
N 5 6 7 6 7 8 9 10 11 12
E 4 5 6 5 6 7 8 9 10 11
T 3 4 5 6 7 8 9 10 11 12
N 2 3 4 5 6 7 8 8 10 11
I 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
E X E C U T I O N

Write a Comment

User Comments (0)

About PowerShow.com

Phonology from a computational point of view - PowerPoint PPT Presentation

Phonology from a computational point of view

Phonology from a computational point of view Phonemes, dialects, letter-to-sound conversion March 2001 Phonology: The study of the sound patterns of languages. – PowerPoint PPT presentation