Phonetics: Speech production and perception

About This Presentation

Title:

Phonetics: Speech production and perception

Description:

Phonology: Study of sound combinations ... Missing word, phrase boundaries, endings Many tonal variations during speech Varied vowel durations Common knowledge, ... – PowerPoint PPT presentation

Number of Views:350

Avg rating:3.0/5.0

Slides: 53

Provided by: Harv55

Learn more at: http://cs.sou.edu

Category:

more less

Transcript and Presenter's Notes

Title: Phonetics: Speech production and perception

1
Introduction

Phonetics Speech production and perception
Phonology Study of sound combinations
Orthography Writing Systems
Well talk about each area and how they impact
Natural Language Processing

2
Phonetics
Study of speech production and perception

Phone set of all sounds that humans can
articulate
Phoneme - Distinct family of phones in a language
Languages utilize 15 40 phonemes
Note Too few distinct sounds for a language
vocabulary
Ears tuned to hear a languages distinct phonemes
Languages are easy to speak and still be
understood
Infer phoneme set find words differing in only
one sound
Allophone variant realizations of a phoneme
Can be separate phonemes in other language
Segment All phones, phonemes, and allophones

3
Overview of the Noisy Channel
The Noisy Channel

Computational Linguistics
Replace the ear with a microphone
Replace the brain with a computer algorithm

4
Production

We have a complete but approximate of how speech
is produced
We cannot accurately predict the audio signal
corresponding to given articulatory positions
The best synthesis methods, for now, use
concatenation-based algorithms to create
computerized speech.
Model Pulmonic egressive air-stream from the
source (glottis) through the vocal tract
operating as source-filter.

5
Vocal Source

Speaker alters vocal tension of the vocal folds
If folds are opened, speech is unvoiced
resembling background noise
If folds are stretched close, speech is voiced
Air pressure builds and vocal folds blow open
releasing pressure and elasticity causes the
vocal folds to fall back
Average fundamental frequency (F0) 60 Hz to 300
Hz
Speakers control vocal tension alters F0 and the
perceived pitch

Open
Closed
Period
6
Formants

Definition harmonics of F0
F1, F2, F3, etc.
Adds timbre to voiced sounds
Vowels have distinct harmonic patterns
Vocal articulators change emphasis of the
harmonics and alter their frequencies
There are complex relationships between formants
dependent on vocal musculature
Formants spread out as the pitch goes higher

7
Formant Speaker Variance
8
Vowel Formants
u
o
e
uh
eh
ih
ah
aw
ae
9
Vocal Tract
Note Velum is the soft pallette, epiglottis
guards protects the vocal cords
10
Another look at the vocal tract
11
Different Voices

Falsetto The vocal cords are stretched and
become thin causing high frequency
Creaky Only the front vocal folds vibrate,
giving a low frequency
Breathy Vocal cords vibrate, but air is
escaping through the glottis
Each person tends to consistently use particular
phonation patterns. This makes the voice uniquely
theirs.

12
Vowels
No restriction of the vocal tract, articulators
alter the formants

Diphthong Syllabics which show a marked glide
from one vowel to another, usually a steady vowel
plus a glide
Nasalized Some air flow through the nasal cavity
Rounding Shape of the lips
Tense Sound more extreme (further from the
schwa) and tend to have the tongue body higher
Relaxed Sounds closer to schwa (tonally neutral)
Tongue position Front to back, high to low

13
Vowel Characteristics

Demo of Vowel positions in the English language
http//faculty.washington.edu/dillon/PhonResources
/vowels.html

Demo http//faculty.washington.edu/dillon/PhonRes
ources/vowels.html
Vowel Word high Low front back round tense F1 F2
Iy Feel - - - 300 2300
Ih Fill - - - - 360 2100
ae Gas - - - 750 1750
aa Father - - - - 680 1100
ah Cut - - - - - 720 1240
ao Dpg - - - - - - 600 900
ax Comply - - - - - 720 1240
eh Pet - - - 570 1970
ow Tone - - - - 600 900
uh Good - - - 380 950
uw Tool 300 940
14
Consonants

Significant obstruction in the nasal or oral
cavities
Occur in pairs or triplets and can be voiced or
unvoiced
Sonorant continuous voicing
Unvoiced less energy
Plosive Period of silence and then sudden energy
burst
Lateral, semi vowels, retroflex partial air flow
block
Fricatives, affricatives Turbulence in the wave
form

15
Manner of Articulation

Voiced The vocal cords are vibrating, Unvoiced
vocal cords dont vibrate
Obstruent Noise-like sounds
Fricative Air flow not completely shut off
Affricate A sequence of a stop followed by a
fricative
Sibilant a consonant characterized by a hissing
sound (like s or sh)
Trill A rapid vibration of one speech organ
against another (Spanish r).
Aspiration burst of air following a stop.
Stop Air flow is cut off
Ejective airstream and the glottis are closed
and suddenly released (/p/).
Plosive Voiced stop followed by sudden release
Flap A single, quick touch of the tongue (t in
water).
Nasality Lowering the soft palate allows air to
flow through the nose
Glides vowel-like, syllable position makes them
short without stress (w, y)
On-glide glide before vowel, off-glide glide
after vowel
Approximant (semi-vowels) Active articulator
approaches the passive articulator, but doesnt
totally shut of (L and R).
Laterality The air flow proceeds around the side
of the tongue

16
Place of the Articulation
Articulation Shaping the speech sounds

Bilabial The two lips (p, b, and m)
Labio-dental Lower lip and the upper teeth (v)
Dental Upper teeth and tongue tip or blade
(thing)
Alveolar Alveolar ridge and tongue tip or blade
(d, n, s)
Post alveolar Area just behind the alveolar
ridge and tongue tip or blade (jug ?, ship ?,
chip ?, vision ?)
Retroflex Tongue curled and back (rolling r)
Palatal Tongue body touches the hard palate
(j)
Velar Tongue body touches soft palate (k, g, ?
(thing))
Glottal larynx (uh-uh, voiced h)

17
English Consonants
Type Phones Mechanism
Plosive b,p,d,t,g,k Close oral cavity
Nasal m, n, ng Open nasal cavity
Fricative V,f,z,s,dh,th,zh, sh Turbulent
Affricate jh, ch Stop Turbulent
Retroflex Liquid r Tongue high and curled
Lateral liquid l Side airstreams
Glide w, y Vowel like
18
Consonant Place and Manner
Labial Labio-dental Dental Aveolar Palatal Velar Glottal
Plosive p b t d k g ?
Nasal m n ng
Fricative f v th dh s z sh zh h
Retroflex sonorant r
Lateral sonorant l
Glide w y
19
Example word
20
Speech Production Analysis

Plate attached to roof of mouth measuring contact
Collar around the neck measuring glottis
vibrations
Measure air flow from mouth and nose
Three dimension images using MRI
Note IPA was designed before the above
technologies existed. They were devised by a
linguist looking down someones mouth or feeling
how sounds are made.

21
Perception

Some perceptual components are understood, but
knowledge concerning the entire human perception
model is rudimentary
Understood Components
The inner ear works as a filter bank
Sounds are perceived on a logarithmic scale
Some sounds will mask others

22
The Inner Ear

Two sensory organs are located in the inner ear.
The vestibule is the organ of equilibrium.
The cochlea is the organ of hearing.

23
Basilar Membrane
Note Basilar Membrane shown unrolled

Thin elastic fibers stretched across the cochlea
Short, narrow, stiff, and closely packed near the
oval window
Long, wider, flexible, and sparse near the end of
the cochlea
The membrane connects to a ligament at its end.
Separates two liquid filled tubes that run along
the cochlea
The fluids are very different chemically and
carry the pressure waves
A leakage between the two tubes causes a hearing
breakdown
Provides a base for sensory hair cells
The hair cells above the resonating region fire
more profusely
The fibers vibrate like the strings of a musical
instrument.

24
Place Theory
Decomposing the sound spectrum

Georg von Bekesys Nobel Prize discovery
High frequencies excite the narrow, stiff part at
the end
Low frequencies excite the wide, flexible part by
the apex
Auditory nerve input
Hair cells on the basilar membrane fire near the
vibrations
The auditory nerve receives frequency coded
neural signals
A large frequency range is possible because the
basilar membranes stiffness is exponential

Demo at http//www.blackwellpublishing.com/matthe
ws/ear.html
25
Hair Cells

The hair cells are in rows along the basilar
membrane.
Individual hair cells have multiple strands or
stereocilia.
The sensitive hair cells have many tiny
stereocilia which form a conical bundle in the
resting state
Pressure variations cause the stereocilia to
dance wildly and send electrical impulses to
the brain.

26
Firing of Hair Cells

There is a voltage difference across the cell
The stereocilia projects into the endolymph fluid
(60mV)
The perylymph fluid surrounds the membrane of the
haircells (-70mV)
When the hair cells moves
The potential difference increases
The cells fire

27
Speech Perception

We don't perceive speech linearly
The cochlea has rows of hair cells. Each row acts
as a frequency filter.
The frequency filters overlap

From early place theory experiments
28
Absolute Hearing Threshold

The hearing threshold but varies at different
frequencies.
An empirical formula approximates the SPL
threshold SPL(f) 3.65(f/1000)-0.8-6.5e-0.6(f/10
00-3.3)210-3(f/1000)4
The table measures the threshold for men (M) and
women (W) ages 20 through 60

29
Sound Threshold Measurements
30
Intensity and Neural Response

Auditory response is a function of intensity
The response saturates at a maximum intensity
level

From CMU Robust Speech Group
31
Bark and Mel Scales
32
Comparison of Frequency Perception Scales

Blue Bark Scale
Red Mel Scale
Green ERB Scale

Equivalent Rectangular Bandwidth (ERB) is an
unrealistic but simple rectangular approximation
to model the filters in the cochlea
33
Masking

Masking is a phenomenon in which perception of
one sound is obscured by the presence of another
sound
Masking occurs in both the time and frequency
domains
Time One Tone occurs shortly before another tone
Frequency One tone is near the frequency of
another
Experiment (Most involve single sin waves)
Fix one sound at a frequency and intensity
Varying a second sine waves intensity
When is the second sound heard?
Amplification of perception
Tones below the threshold of hearing can be
perceived if they occur simultaneously and the
total energy within a frequency band exceeds the
threshold.

34
Masking Patterns

A narrow band of noise at 410 Hz
Note the asymmetrical pattern

From CMU Robust Speech Group
35
Time Domain Masking

Noise will mask a tone if
The noise is sufficiently loud
The delay is short
Intensity of the noise needs to increase with the
delay length
There are two types of masking
Forward Noise masking a tone that follows
Backward A tone is masked by noise that follows
Delays
beyond 100 - 200 ms no forward masking occurs
Beyond 20 ms, no backward masking occurs.
Training can reduce or eliminate the perceived
backward masking.

36
Phonology

Study of sound combinations
Rule based
A finite state grammar can represent valid sound
combinations in a language
Unfortunately, these rules are language-specific
Statistics based
Most other areas of Natural Language processing
are trending to statistical-based methods

37
Syllables

Organizational phonological unit
Vowel between two consonants
Ambiguous positioning of consonants into
syllables
Tree structured representation
Basic unit of prosody
Lexical stress inherent property of a word
Sentential stress speaker choice to emphasize or
clarrify

38
Representing Stress

There have been unsuccessful attempts to
automatically assign stress to phonemes
Notations for representing stress
IPA (International Phonetic Alphabet) has a
diacritic symbol for stress
Numeric representation
0 reduced, 1 normal, 2 stressed
Relative
Reduced (R) or Stressed (S)
No notation means undistinguished

39
Phonological Grammars

SPC Sound Pattern for English
13 features for 8192 combinations
Complete descriptive grammar
Recent research
Trend towards context-sensitive descriptions
Little thought concerning computational
feasibility
Its unlikely that listeners apply thousands of
rules to perceive speech

40
Morphology

How phonemes combine to make words
Important for speech synthesis
Example singular to plural
Run to runs z sound (voiced)
Hit to Hits s sound (unvoiced)
Devise sets of rules of pronunciation

41
Orthography Writing Systems

Diacritics Accent marks
Prosody Stress, loudness, pitch, tone,
intonation, and length
Written symbolic representation of speech
Wide symbol set representing a speech message
Narrow symbol set representing a speech signal
English-based phonetic Transcriptions Arpanet,
Timit
IPA International Phonetic Alphabet
International standard attempt at a narrow
transcription
Intent represent all sounds of known languages
Disadvantages
Misses articulator interrelationships
Multiple realizations of the same sound
Non-linearity of speech, articulators always
moving

42
Narrow transcription Difficulties

Realizations are points in continuous space, not
discrete
Sounds take characteristics of adjacent sounds
(assimilation)
Sounds that are combinations of two
(co-articulation)
Articulator targets are often not reached
Diphthongs combine different phonemes
Adding (epenthesis) or deleting (elision)
Missing word, phrase boundaries, endings
Many tonal variations during speech
Varied vowel durations
Common knowledge, familiar background leads to
more sloppy speech with additional
non-linearities.

43
Written English

Spellings are not consistent with regard to
sounds
Same spelling, different sounds low vs. cow
Different spelling, same sounds cow, bough
Pronunciations of written languages evolve over
time
If current written English was phonetically
accurate
It would only apply to a single dialect
It would be wrong as soon as the population
altered its speech patterns

44
George Bernard Shaws System
His Goal Replace the Latin alphabet with One
that is phonetically accurate Result It didn't
work. Language phonetics Are not static and the
population was not willing to switch to a new
writting
45
Pitman Shorthand
46
ARPABET English-based phonetic system

Phone Example Phone Example Phone Example
iy beat b bet p pet
ih bit ch chet r rat
eh bet d debt s set
ah but f fat sh shoe
x bat g get t ten
ao bought hh hat th thick
ow boat hy high dh that
uh book jh jet dx butter
ey bait k kick v vet
er bert l let w wet
ay buy m met wh which
oy boy em bottom
arr dinner n net y yet
aw down en button z zoo
ax about ng sing zh measure
ix roses eng washing
aa cot - silence

47
The International Phonetic Alphabet
48
IPA Vowels
Caution English tongue positions dont exactly
match the chart. For example, father in English
does not have the tongue position as far back the
IPA vowel chart shows.
49
IPA Diacritics
50
IPA Tones and Word Accents
51
IPA Supra-segmental Symbols
52
Newer Technologies

Voice XML
Framework for integrating human/machine dialogues
W3 Consortium standard
Input audio files or human speech
Output synthesized
Script interpreted by voice-browsers
SSML (speech synthesis markup language)
XML-based technology to standardize manipulation
of synthesized speech
Others
SABLE (1998 Consortium)
SAPI (Microsoft Speech API )

Write a Comment

User Comments (0)

About PowerShow.com

Phonetics: Speech production and perception - PowerPoint PPT Presentation

Phonetics: Speech production and perception

Phonology: Study of sound combinations ... Missing word, phrase boundaries, endings Many tonal variations during speech Varied vowel durations Common knowledge, ... – PowerPoint PPT presentation