Structure of Human Speech - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Structure of Human Speech

Description:

Mynah bird speech. Klatt &Stefanski (1974) How does a ... Mynah / Grey parrot. Mynah produces 'formants' but probably through changing syrinx resonances, not ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 29

Provided by: chrisd

Category:

more less

Transcript and Presenter's Notes

Title: Structure of Human Speech

1
Structure of Human Speech

Chris Darwin

2
Vocal Tract
3
Pitch and Formants
1. Harmonics (giving pitch) produced by vocal
cord vibration
2. Formant frequencies resonances of the vocal
tract
3. Formant frequencies change as you change the
shape of your vocal tract
4
Source Filter
Larynx
Vocal tract
Output sound
5
Sex change
Me (m)
Shorter vocal-tract (higher formants)
Higher pitch
Both (-gt f)
6
Prosody vsSegments
Segmental consonants / vowels -gt
words Prosodic pitch contour, stress.
Emphasis, pragmatics. I thought she was
married? ".!" I thought she was
married.I thought she was married!I thought
she was married! NB in tone languages pitch used
segmentally. Are bird/mammal animal systems like
human prosody? generally use different pitch
contours
yes!
7
Vowel production
8
narrow-band spectrogram
sine-wave speech
9
Sine-wave speech
10
Orchestra in your throat
11
Tuvan throat music
12
Tuvan throat music
13
Mynah bird speech
Klatt Stefanski (1974) How does a mynah bird
imitate human speech? J Acoust Soc Amer, 55,
822-832.
14
Mynah / Grey parrot

Mynah produces "formants" but probably through
changing syrinx resonances, not through changing
vocal tract shape. (Klatt Stefanski, 1974, J
Acoust Soc Amer)
Grey parrot has a longer vocal tract and may use
changes in its shape to produce formant variation
(more like human speech). (Warren, Patterson,
Pepperburg, 1996, Auk)

15
Characteristics of speech
Narrow-band spectrogram
Only silence is /g/ of ago

No gaps between words
Smoothly changing sound from one speech sound to
the next
So you cant just shuffle the acoustic words

16
from Clive Frankish
17
Speech is more like semaphore than like music

Music discrete targets giving discrete
acoustic events
Semaphore discrete targets with transitions
between targets
Speech articulatory transitions between targets

18
Semaphore
19
Formants in a wide-band spectrogram
lt-- F 3
Burst --gt
lt-- F 2
lt-- Formant transitions -------gt
lt-- F1
w e g o
20
Where are the segments?
21
Speech is more like speech than like semaphore
Speech does not have invariant acoustic
targets consonants change with the
vowel. Compare /s/ in /si/ with /s/ in
/su/ This is due to co-articulation.
22
Different transition - same consonant
lt-- Formant transitions -------gt
1400 Hz
dee da
Liberman et al. (1967) Perception of the speech
code. Psych Rev 74, 431-461
23
Co-articulation
Arises because (mainly) consonant gestures dont
involve all the articulators eg /b/ is lips
only, tongue free to take up position for next
vowel. /d/ and /s/ just involve the tongue tip,
touching the alveolar ridge, tongue body and lips
free to take up position for next vowel - viz.
/si/ /su/.
24
Same noise - different consonant
F 2
1400 Hz
Burst --gt
F 1
pea ka
Liberman et al. (1967) Perception of the speech
code. Psych Rev 74, 431-461
25
Two articulatory systems
Öhman suggested that articulation can be
decomposed into two semi-independent
systems Slow movement from one vowel target to
nexteg /i/ -gt /u/ Rapid consonantal movement
superimposedeg /b/ /d/ So the /b/ in /ibu/ is
not the same as in /ibi/
26
Co-articulation
Advantages 1. information about different
segments is spread across time (Hocketts
squashed eggs). You know that a /u/ is coming
because of the type of /s/ you have heard.
27
Co-articulation
2. Liberman thought that this spreading across
time makes it easier to transmit information at a
fast rate. Liberman et al (1967) Psych Rev 74,
431-461
28
Co-articulation - 2
The disadvantage of co-articulation for
perception is that there are no constant acoustic
targets in speech. The same phoneme can be
represented as different sounds in different
contexts (/s/ before /u/ or /i/. Conversely, the
same sound, can be heard as different consonants
in different contexts (eg as /p/ before /i/ and
/a/ but as /k/ before /u/).
29
Speech Code
Factors that make it hard (for machines) to
recognise speech