Title: Dynamic Aspects of the Cocktail Party Listening Problem
1Dynamic Aspects of the Cocktail
Party Listening Problem
- Douglas S. Brungart
- Air Force Research Laboratory
2Credits
AFOSR Sponsored Research Team Brian
Simpson Alex Kordik Rich McKinley Mark
Ericson Collaborators Chris Darwin Gerald
Kidd
3Introduction
1) Energetic and Informational Masking Speech
in Noise vs Speech in Speech 2) Monaural speech
segregation 3) Binaural and Dichotic speech
segregation 4) Dynamic aspects of cocktail party
problem 5) Audio-Visual cocktail party effects
4Energetic Masking
In classic speech-on-noise masking, only one type
of masking occurs Energetic Masking In
Energetic Masking -The masking sound is more
intense than the target in one or more critical
bands -Some portion of the target signal is
inaudible at the periphery
5Energetic Masking Articulation Theory
Energetic masking in speech was studied for years
by Fletcher and others at Bell Labs -Articulation
Theory -Articulation Index (AI) Allows
accurate prediction of intelligibility -For any
phonetically balanced vocabulary -For any
continuous noise source -Plus numerous
correction factors High-Amplitudes, Reverb,
Peak-Clipping, etc.
6Informational Masking
- Energetic Masking also occurs in
Speech-on-Speech masking - -Where signals overlap within critical band
- However, informational masking also occurs
- Listeners hear two or more audible sounds, but
cant segregate them into separate messages - Classic example multi-tone complexes
- - No energetic overlap in stimuli, but
substantial masking is observed (Kidd, Neff)
7Methods The Coordinate Response Measure (CRM)
Data collected with Coordinate Response Measure
-CRM Originally developed by Moore McKinley
(1980) - Format Ready (Call Sign) go to
(Color) (Number) now. - Target is indicated by
call sign Baron - Maskers indicated by other
call signs - Complete CRM corpus is available
(Bolia et. al, 2001) - 8 Talkers in corpus (4 M,
4 F), 2048 Phrases - 8 Talkers x 4 Colors x 8
Numbers x 8 Call Signs - Embedded call-sign
ideal for multitalker studies - Similar to many
multichannel monitoring tasks
8") document.writeln("") document.writeln("
Your call sign is Baron.
Methods The Coordinate Response Measure
Listeners respond by selecting the appropriate
colored digit with the computer mouse
9Methods Pros and Cons of CRM
Advantages of CRM Rapid data collection
training and scoring Sentences are
reusable Embedded call sign to designate
target - does not require a priori
designation Disadvantages of CRM Limited
vocabulary - partially offset by lack of
context - not phonetically balaced Not
conversationally realistic CRM emphasizes
speech on speech masking
10Methods Pros and Cons of CRM
Advantages of CRM Rapid data collection
training and scoring Sentences are
reusable Embedded call sign to designate
target - does not require a priori
designation Disadvantages of CRM Limited
vocabulary - partially offset by lack of
context - not phonetically balaced Not
conversationally realistic CRM emphasizes
speech on speech masking
11Methods Pros and Cons of CRM
Advantages of CRM Rapid data collection
training and scoring Sentences are
reusable Embedded call sign to designate
target - does not require a priori
designation Disadvantages of CRM Limited
vocabulary - partially offset by lack of
context - not phonetically balaced Not
conversationally realistic CRM emphasizes
informational masking
12Two-Talker Diotic Listening Results
TMMod. Noise Masker TNCont. Noise
Masker TDDiff. Sex Masker TSSame Sex
Masker TTSame Talker Masker
13Two-Talker Diotic Listening Error Distribution
Most errors match the color and number spoken by
the masking talker. This is indicative of
informational masking
14Three-Talker Diotic Listening Results
TTarget Talker MMod. Noise Masker DDiff. Sex
Masker SSame Sex Masker TSame Talker Masker
15Four-Talker Diotic Listening Results
TTarget Talker MMod. Noise Masker DDiff. Sex
Masker SSame Sex Masker TSame Talker Masker
163-4 Talker Listening Results
17Dichotic Listening Introduction
- To this point, all stimuli have been diotic
- Spatial separation is known to play a role
- - Cherrys Cocktail Party Problem
- Dichotic masking is pure informational masking
- - No contralateral energetic masking occurs
- Previous results have suggested
- - Almost perfect segregation across ears
- - Cherry, Broadbent, Triesman, Kidd, Neff, etc.
18Dichotic Listening Procedure
Dichotic listening similar to other procedure
but 1) Talkers were known a priori - 1 male,
1 female target talker 2) 2 Talkers presented
in right ear (T and M) 3) Masking signal was
presented in left ear
19Dichotic Listening Results
With 2 talkers in right ear Noise in left
ear doesnt interfere (Even when Loud)
Speech interferes substantially (Even when
Quiet) Reversed Speech interferes but
only when target in right ear lower
than masker in right ear
20Binaural ListeningSpatial Separation in Azimuth
- From the classic cocktail party effect
- Spatial separation improves segregation
-
Diotic vs. 45 Separation, same-sex talkers
21Binaural ListeningSpatial Separation in Distance
22Binaural ListeningSpatial Separation in Distance
With Natural Better-Ear SNR Cues, Both speech and
noise Benefit from separation in distance
23Binaural ListeningSpatial Separation in Distance
With normalization, speech is Better but Noise is
not
24Dynamic Aspects of Multitalker Listening
Most Cocktail-Party Listening Experiments
assume 1) Target talker is known (Selective
Attention) 2) Target talker is unknown
(Divided Attention) Real world listening falls
in between these extremes - Attention focused
primarily on one talker - Other talkers
monitored for important info How do listeners
adapt to conversational dynamics
25Dynamic Cocktail Party EffectsMultitalker
Transition Probability
Experiment 3-Talker Condition 1) Standard CRM
task 2) 2, 3, or 4 Spatially Separated Same-Sex
Talkers - Close or Far separation for 2 and 3
talkers 3) 5 Transition Probabilities (0-1) 4)
3 Talker Configurations - Talkers selected
randomly - Each location assigned a
talker - Target talker follows target
location 5) Total of 106,200 Trials -
Balanced by Target Talker and Target Location
26Dynamic Cocktail Party Effects Multitalker
Transition Probability
Overall Perfomance Improves Gradually After
Transitions
27Conclusions
- 1) Speech-on-Speech ? Speech-in-Noise
- - Deployment of Auditory Attention is Important
- - Signal similarity is a major factor
- - Spatial separation is particularly beneficial
- 2) Multitalker Listening is a Dynamic Process
- - Listeners adapt to source location changes
over 5-8 trials - Listeners learn new situations quickly (10
trials) - Listeners adopt optimal listening strategies
?