Speech Science XII - PowerPoint PPT Presentation

About This Presentation
Title:

Speech Science XII

Description:

The closer the tone. is in frequency to. the centre frequency. of the noise, the ... Intensity of pure tone (masked) stimuls (dB) Intensity of masking noise ' ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 26
Provided by: wba2
Category:
Tags: xii | science | speech

less

Transcript and Presenter's Notes

Title: Speech Science XII


1
Speech Science XII
Version 2007-8
  • Speech Perception
  • (acoustic cues)

2
Topics
  • ? Psychoacoustics
  • ? Psychophonetics acoustic cues
  • Reading BHR, chap. 6, 184-203 (5th
    ed.) chaps. 9/10, 201 ff.
  • P.-M., 3.2.2., first part. pp. 158-171
    (2nd ed.) 149-162 (1st ed.)

3
Psychoacoustics 1
  • Psychoacoustics investigates the relationship
    between basic (acoustic) signal properties and
    basic auditory impressions
  • - How loud something sounds.
  • - How high- or low-pitched something sounds.
  • - How long somethings sounds.
  • - What the timbre (quality) of a sound is.
  • The questions asked are
  • - Can the signal be heard? (signal strength)
  • - Can differences between signals be heard?
    (for all signal properties)

4
Psychoacoustics 2
  • Important Psychoacoustics relates the objective,
    measurable signal to subjective impressions.
  • These are two different worlds
  • The simplest model of psychoacoustic
    perceptionwould be a linear relationship
  • - A change in a signal parameter always has an
    equivalent change in the auditory impression.
  • This not the case
  • (which makes psychoacoustics very complex .)
  • Some of the non-linearity has direct
    implications for phonetic understanding..

5
A non-linear relationship Loudness
6
The reason for non-linear loudness
Resonance characteristics of the outer ear
7
Non-linearity above threshold
8
Also, sounds mask one another
If noise is present, a tone has to be stronger to
be heard(it has a higher audibility threshold).
Intensity of pure tone (masked) stimuls (dB)
The closer the tone is in frequency tothe centre
frequencyof the noise, the stronger it has to
be to be heard!
9
Critical Bands (Barks Erbs)
Wide-band noise witha gap still masks a tonein
the middle of the gap
until the gap reachesa critical width.
Then the signal is heardat the same threshold
asif there were no noise.
The noise no longer interferes with the part of
the hearingmechanism dealing with the tone.
These critical bands arenarrow at low and
broader at higher frequencies.
10
Non-linearity of loudness with duration
  • Above approx 300 ms (exact duration not certain)
    the perceived loudness of a sound is determined
    by signal strength (and frequency) independent of
    its duration.
  • Below this duration, a shorter sound is heard as
    less loud than a longer sound of equal intensity.
  • I.e., it is as if the energy is integrated over
    time, so that a shorter sound has less energy
    than a longer one.
  • Phonetic importance? Short (unstressed)
    syllables are perceptually less prominent than
    longer (stressed) syllables.

11
Psychophonetics
  • Used here as a term to parallel
    psychoacoustics. In our definition,
    psychophonetics is the study of the relationship
    between the acoustic speech signal and functional
    aspects of speech e.g., speech sounds,
    (stressed/unstressed) syllables, tonal accents,
    junctural phenomena etc.
  • The experimental procedure typically requires
    changing the analytic properties of the acoustic
    speech signal in a controlled manner and
    recording the perceptual effect.
  • The properties changed are those of acoustic
    analysis duration, intensity, fundamental
    frequency and spectral structure.

12
Acoustic Cues
  • This term was coined in the 1950s, when
    synthesis and manipulation of the acoustic speech
    signal was starting. (Origin Haskins
    Laboratories, NJ, USA)
  • The cues are those acoustic properties that
    can be shown to affect the perception of a speech
    sound.(so we have acoustic cues for vowels and
    consonants, and within these categories
    fore.g. voicing, manner, place of articulation
    in consonants, degree of opening, place,
    rounding etc. in vowels )

13
Acoustic cues vowels 1
  • Cues Formants 1 and 2 (to a first
    approximation)

. and the evidence from formant synthesis
14
Acoustic cues - vowels 2
  • While monophthongs have a steady state formant
    structure, diphthongs e.g. aI, aU, ?I and
    (vowel glide) approximants e.g. j, w, ?
    have changing formants as a cue to their
    identity.

aI, aU, ?I have a more or less fixed formant
pattern, determined by the identity two vocalic
elements which define them.
j, w, ? have a defined starting point, but
the degree of formant change is determined by the
following vowel. The starting point has a
(slightly more damped) formant structure similar
to the related vowel j ? i w ? u
? ? y (see acoustics slides)
15
Acoustic cues plosives
  • Plosives have a temporally complex set of
    acoustic cues resulting from (i) the closing
    movement, (ii) the closure phase and the (iii)
    release of the closure.

The closure is a period with no energy
(voiceless stops) or a weak low frequency
periodic signal (voicing in the closure). This
introduces a perceptible interruption.
The release burst is the result of turbulence
due to the escaping air from the increased
intra-oral pressure built up during the closure.
This may be relatively weak (in voiced stops) or
strong (in voiceless stops).The different
spectral properties of the burst noise signal the
different places of articulation.
16
Release bursts and vowel quality
17
Vowel formant transitions as consonant cues
  • Formant transitions (changing formant values in
    the vowel preceding and following the stop
    consonant) reflect the articulator movement
    towards and away from the closure. The F2
    transition is a cue to the consonantal place of
    articulation F1 just signals the opening and
    closing movement.

The place of the stop determines the F2 formant
value from which or towards which the transition
moves (called the locus). But the actual shape of
the transition is determined by the vowel (as it
is with vowel glides).
18
Locus frequencies e.g. d
19
What sort of transitions for which place?
  • The previous slide showed that the locus for
    d (and logically for t, n, l, s, z) is
    fairly constant. The value (for the average adult
    male vocal tract) is about 1800 Hz.

For labial consonants, the vowel can be formed
independent of the consonant closure (the tongue
is free to move). Both F2 and F1 therefore just
reflect the opening and closing of the jaw and
lips. The locus is therefore always low.
For velar consonants, the consonant closure is
very dependent on the vowel (both use the tongue
dorsum).The locus is higher than for alveolars
both for front and back vowels, but for back
vowels it is lower than for front vowels. F2 and
F3 transitions often converge with velars.
20
(No Transcript)
21
The importance of timing as a cue to the
voicing distinction
The temporal differencesshown here signal
thedifference between weakand strong
plosives,whether there is closurevoicing
present or not.It is often claimed that
thedistinction fortis-lenis is better than
voiced-voiceless
22
Acoustic cues - fricatives
  • Fricative identity is determined by the
    spectral distribution of the energy (see also
    acoustics slides).

D
T
f
v
S
Z
z
s
23
Summary of cues - Manner
24
Summary of cues - Place
25
voice bar
Summary of cuesFortis-lenis
Write a Comment
User Comments (0)
About PowerShow.com