Title: WithinCategory Variation is Used in Spoken Word Recognition
1Within-Category Variation is Used in Spoken Word
Recognition Temporal Integration at Two Time
Scales Bob McMurray University of Iowa Dept.
of Psychology
2Collaborators
Richard Aslin Michael Tanenhaus David Gow
Joe Toscano Dana Subik Julie Markant
3 Perception Cognition
A detailed understanding of perceptual processing
is critical to understanding higher level
cognition.
Specifically Sensitivity to fine-grained
perceptual detail can help integrate information
over time.
4 Temporal Integration
Temporal integration a critical problem for
cognition. - information never arrives
synchronously.
- Vision integration across head-movements,
saccades and attention-shifts. - Music perception long-term dependencies and
short term expectancies.
5 - In language, information arrives sequentially.
- Partial syntactic and semantic representations
are formed as words arrive.
The
Hawkeyes
beat
the
Boilermakers
(once)
- Words are identified over sequential phonemes.
l
?
?
g?
?
d
?
6 Spoken Word Recognition is an ideal arena in
which to study these issues because
- Research divides word recognition into perceptual
and cognitive mechanisms. - Perceptual information available for temporal
information integration.
7 - Scales of temporal integration in word
recognition - A Word ordered series of articulations.
- - Build abstract representations.
- - Form expectations about future events.
- - Fast (online) processing.
- A phonology
- - Abstract across utterances.
- - Expectations about possible future events.
- - Slow (developmental) processing
8Mechanisms of Temporal Integration
- Stimuli do not change arbitrarily.
- Perceptual cues reveal something about the change
itself. - Active integration
- Anticipating future events
- Retain partial present representations.
- Resolve prior ambiguity.
9Overview
- Speech perception and Spoken Word Recognition.
2) Lexical activation is sensitive to
fine-grained detail in speech.
3) Fast temporal integration taking advantage of
regularity in the signal for temporal integration.
4) Slow temporal integration Developmental
consequences
10- Online Word Recognition
- Information arrives sequentially
- At early points in time, signal is temporarily
ambiguous.
- Later arriving information disambiguates the word.
11- Current models of spoken word recognition
- Immediacy Hypotheses formed from the earliest
moments of input. - Activation Based Lexical candidates (words)
receive activation to the degree they match the
input. - Parallel Processing Multiple items are active in
parallel. - Competition Items compete with each other for
recognition.
12Input
b... u tt e r
time
beach
butter
bump
putter
dog
13These processes have been well defined for a
phonemic representation of the input.
A
S
n
I
g
?
n
k
But considerably less ambiguity if we consider
subphonemic information.
Example subphonemic effects of motor processes.
14Coarticulation
Any action reflects future actions as it unfolds.
Example Coarticulation Articulation (lips,
tongue) reflects current, future and past
events. Subtle subphonemic variation in speech
reflects temporal organization.
Sensitivity to these perceptual details might
yield earlier disambiguation.
15 These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded. Example
Categorical Perception
16Categorical Perception
Subphonemic variation in VOT is discarded in
favor of a discrete symbol (phoneme).
17Evidence against the strong form of Categorical
Perception from psychophysical-type tasks
Discrimination Tasks Pisoni and Tash (1974)
Pisoni Lazarus (1974) Carney, Widin
Viemeister (1977)
Training Samuel (1977) Pisoni, Aslin, Perey
Hennessy (1982)
Goodness Ratings Miller (1997) Massaro
Cohen (1983)
18Experiment 1
?
Does within-category acoustic detail
systematically affect higher level language? Is
there a gradient effect of subphonemic detail on
lexical activation?
19McMurray, Aslin Tanenhaus (2002)
A gradient relationship would yield systematic
effects of subphonemic information on lexical
activation.
If this gradiency is useful for temporal
integration, it must be preserved over
time. Need a design sensitive to both acoustic
detail and detailed temporal dynamics of lexical
activation.
20Acoustic Detail
Use a speech continuummore steps yields a better
picture acoustic mapping.
KlattWorks generate synthetic continua from
natural speech.
9-step VOT continua (0-40 ms) 6 pairs of
words. beach/peach bale/pale bear/pear bump/pump
bomb/palm butter/putter 6 fillers. lamp leg loc
k ladder lip leaf shark shell shoe ship sheep shi
rt
21(No Transcript)
22Temporal Dynamics
How do we tap on-line recognition? With an
on-line task Eye-movements
Subjects hear spoken language and manipulate
objects in a visual world. Visual world
includes set of objects with interesting
linguistic properties. a beach, a peach and some
unrelated items. Eye-movements to each object are
monitored throughout the task.
Tanenhaus, Spivey-Knowlton, Eberhart Sedivy,
1995
23Why use eye-movements and visual world paradigm?
- Relatively natural task.
- Eye-movements generated very fast (within 200ms
of first bit of information). - Eye movements time-locked to speech.
- Subjects arent aware of eye-movements.
- Fixation probability maps onto lexical
activation..
24Task
A moment to view the items
25(No Transcript)
26Task
Bear
Repeat 1080 times
27Identification Results
High agreement across subjects and items for
category boundary.
proportion /p/
VOT (ms)
B
P
By subject 17.25 /- 1.33ms By item 17.24
/- 1.24ms
28Task
Target Bear Competitor Pear Unrelated Lamp,
Ship
29Task
30Task
- Given that
- the subject heard bear
- clicked on bear
How often was the subject looking at the pear?
Categorical Results
Gradient Effect
target
target
competitor
competitor
competitor
competitor
31Results
Response
Response
VOT
VOT
0 ms
5 ms
Competitor Fixations
Time since word onset (ms)
Long-lasting gradient effect seen throughout the
timecourse of processing.
32Response
Response
Looks to
Competitor Fixations
Looks to
Category Boundary
VOT (ms)
33Response
Response
Looks to
Competitor Fixations
Looks to
Category Boundary
VOT (ms)
34Summary
Subphonemic acoustic differences in VOT have
gradient effect on lexical activation.
- Gradient effect of VOT on looks to the
competitor.
- Effect holds even for unambiguous stimuli.
- Seems to be long-lasting.
Consistent with growing body of work using
priming (Andruski, Blumstein Burton, 1994
Utman, Blumstein Burton, 2000 Gow, 2001, 2002).
35The Proposed Framework
Sensitivity Use
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
2) Acoustic detail is represented as gradations
in activation across the lexicon.
- This sensitivity enables the system to take
advantage of subphonemic regularities for
temporal integration.
4) This has fundamental consequences for
development learning phonological organization.
36Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
37Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
38Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
ResponseP Looks to B
Competitor Fixations
ResponseB Looks to B
Category Boundary
0
5
10
15
20
25
30
35
40
VOT (ms)
39Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
- ? Non minimal pairs
- ? Duration of effect
- (experiment 1)
402) Acoustic detail is represented as gradations
in activation across the lexicon.
Input
b... u m p
time
41Temporal Integration
- This sensitivity enables the system to take
advantage of subphonemic regularities for
temporal integration.
- Regressive ambiguity resolution (exp 1)
- Ambiguity retained until more information
arrives. - Progressive expectation building (exp 2)
- Phonetic distinctions are spread over time
- Anticipate upcoming material.
42Development
4) Consequences for development learning
phonological organization.
- Learning a language
- Integrating input across many utterances to build
long-term representation. - Sensitivity to subphonemic detail (exp 4 5).
- Allows statistical learning of categories
(model).
43Experiment 2
?
44Misperception
What if initial portion of a stimulus was
misperceived?
Competitor still active - easy to activate it
rest of the way. Competitor completely
inactive - system will garden-path. P (
misperception ) ? distance from
boundary. Gradient activation allows the system
to hedge its bets.
45/ beIr?keId / vs. / peIr?kit /
barricade vs. parakeet
Input
p/b eI r ? k i t
time
Categorical Lexicon
parakeet
barricade
46Methods
10 Pairs of b/p items.
47(No Transcript)
48Eye Movement Results
Barricade -gt Parricade
1
VOT
0.8
0.6
Fixations to Target
0.4
0.2
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical
endpoint. --Even within the non-word range.
49Eye Movement Results
Barricade -gt Parricade
1
VOT
0.8
0.6
Fixations to Target
0.4
0.2
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical
endpoint. --Even within the non-word range.
50Experiment 2 Conclusions
Gradient effect of within-category variation
without minimal-pairs.
- Gradient effect long-lasting mean POD 240 ms.
- Regressive ambiguity resolution
- Subphonemic gradations maintained until more
information arrives. - Subphonemic gradation can improve (or hinder)
recovery from garden path.
51Progressive Expectation Formation
- Can within-category detail be used to predict
future acoustic/phonetic events? - Yes Phonological regularities create systematic
within-category variation. - Predicts future events.
52Experiment 3 Anticipation
Word-final coronal consonants (n, t, d)
assimilate the place of the following segment.
Maroong Goose
Maroon Duck
Place assimilation -gt ambiguous segments
anticipate upcoming material.
53Subject hears select the maroon
duck select the maroon goose select the
maroong goose select the maroong duck
54Results
Anticipatory effect on looks to non-coronal.
55Onset of goose oculomotor delay
0.3
Assimilated
0.25
Non Assimilated
Fixation Proportion
0.2
0.15
0.1
0.05
0
0
200
400
600
Time (ms)
Looks to duck as a function of time
Inhibitory effect on looks to coronal (duck,
p.024)
56- Sensitivity to subphonemic detail
- Increase priors on likely upcoming events.
- Decrease priors on unlikely upcoming events.
- Active Temporal Integration Process.
- Occasionally assimilation creates ambiguity
- Resolves prior ambiguity mudg drinker
- Similar to experiment 2
57Adult Summary
- Lexical activation is exquisitely sensitive to
within-category detail. - This sensitivity is useful to integrate material
over time. -
- Regressive Ambiguity resolution.
- Progressive Facilitation
-
- Taking advantage of phonological and lexical
regularities.
58Development
Historically, work in speech perception has been
linked to development. Sensitivity to
subphonemic detail must revise our view of
development.
Use Infants face additional temporal integration
problems No lexicon available to clean up
noisy input rely on acoustic regularities. Ex
tracting a phonology from the series of
utterances.
59Sensitivity to subphonemic detail For 30
years, virtually all attempts to address this
question have yielded categorical discrimination
(e.g. Eimas, Siqueland, Jusczyk Vigorito, 1971).
- Exception Miller Eimas (1996).
- Only at extreme VOTs.
- Only when habituated to non- prototypical
token.
60Use?
Nonetheless, infants possess abilities that would
require within-category sensitivity.
- Infants can use allophonic differences at word
boundaries for segmentation (Jusczyk, Hohne
Bauman, 1999 Hohne, Jusczyk, 1994)
- Infants can learn phonetic categories from
distributional statistics (Maye, Werker Gerken,
2002 Maye Weiss, 2004).
61Statistical Category Learning
Speech production causes clustering along
contrastive phonetic dimensions.
E.g. Voicing / Voice Onset Time B VOT
0 P VOT 40
62To statistically learn speech categories, infants
must
- This requires ability to track specific VOTs.
63Experiment 4
Why no demonstrations of sensitivity?
- Habituation
- Discrimination not ID.
- Possible selective adaptation.
- Possible attenuation of sensitivity.
- Synthetic speech
- Not ideal for infants.
- Single exemplar/continuum
- Not necessarily a category representation
Experiment 4 Reassess issue with improved
methods.
64HTPP
- Head-Turn Preference Procedure
- (Jusczyk Aslin, 1995)
- Infants exposed to a chunk of language
- Words in running speech.
- Stream of continuous speech (ala statistical
learning paradigm). - Word list.
- Memory for exposed items (or abstractions)
assessed - Compare listening time between consistent and
inconsistent items.
65Test trials start with all lights off.
66Center Light blinks.
67Brings infants attention to center.
68One of the side-lights blinks.
69When infant looks at side-light he hears a word
70as long as he keeps looking.
71Methods
7.5 month old infants exposed to either 4 b-, or
4 p-words. 80 repetitions total. Form a
category of the exposed class of words.
72Stimuli constructed by cross-splicing naturally
produced tokens of each end point.
73Novelty or Familiarity?
Novelty/Familiarity preference varies across
infants and experiments.
Were only interested in the middle stimuli (b,
p). Infants were classified as novelty or
familiarity preferring by performance on the
endpoints.
74 After being exposed to bear beach bail
bomb Infants who show a novelty effect will
look longer for pear than bear.
What about in between?
Listening Time
Bear
Bear
Pear
75Results
Novelty infants (B 36 P 21)
10000
9000
8000
Listening Time (ms)
7000
Exposed to
6000
B
P
5000
4000
Target
Target
Competitor
Target vs. Target Competitor vs. Target
plt.001 p.017
76Familiarity infants (B 16 P 12)
Target vs. Target Competitor vs. Target
P.003 p.012
77Infants exposed to /p/
Novelty N21
78Infants exposed to /b/
79Experiment 4 Conclusions
Contrary to all previous work
- 7.5 month old infants show gradient sensitivity
to subphonemic detail. - Clear effect for /p/
- Effect attenuated for /b/.
80Reduced effect for /b/ But
81- Category boundary lies between Bear Bear
- - Between (3ms and 11 ms) ??
- Within-category sensitivity in a different range?
82Experiment 5
Same design as experiment 3. VOTs shifted away
from hypothesized boundary Train
Test
-9.7 ms.
Bomb Bear Beach Bale
3.6 ms.
Bomb Bear Beach Bale
40.7 ms.
Palm Pear Peach Pail
83Familiarity infants (34 Infants)
.01
9000
.05
8000
7000
Listening Time (ms)
6000
5000
4000
B-
B
P
84Novelty infants (25 Infants)
.002
9000
.02
8000
7000
Listening Time (ms)
6000
5000
4000
B-
B
P
85Experiment 5 Conclusions
- Within-category sensitivity in /b/ as well as /p/.
- Shifted category boundary in /b/ not consistent
with adult boundary (or prior infant work). Why?
86/b/ results consistent with (at least) two
mappings.
/b/
/p/
1) Shifted boundary
Category Mapping Strength
VOT
- Inconsistent with prior literature.
- Why would infants have this boundary?
87HTPP is a one-alternative task. Asks B or
not-B not B or P
Hypothesis Sparse categories by-product of
efficient learning.
88Computational Model
Distributional learning model
- Model distribution of tokens as
- a mixture of Gaussian distributions
- over phonetic dimension (e.g. VOT) .
2) After receiving an input, the Gaussian with
the highest posterior probability is the
category.
89Statistical Category Learning
1) Start with a set of randomly selected
Gaussians.
- After each input, adjust each parameter to find
best description of the input.
- Start with more Gaussians than necessary--model
doesnt innately know how many categories. - ? -gt 0 for unneeded categories.
90(No Transcript)
91- Overgeneralization
- large ?
- costly lose phonetic distinctions
92- Undergeneralization
- small ?
- not as costly maintain distinctiveness.
93- To increase likelihood of successful learning
- err on the side of caution.
- start with small ?
94Sparseness coefficient of space not strongly
mapped to any category.
VOT
95Start with large s
VOT
Starting ?
0.4
0.35
0.3
0.25
Avg Sparsity Coefficient
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
Training Epochs
96Intermediate starting s
VOT
Starting ?
0.4
0.35
0.3
0.25
Avg Sparsity Coefficient
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
Training Epochs
97Limitations
- Occasionally model leaves sparse regions at the
end of learning. - Competition/Choice framework
- Additional competition or selection mechanisms
during processing categorization despite
incomplete information.
- Multi-dimensional categories
- 1-D 3 parameters / category
- 2-D 6
- 3-D 13
- 4-D 15
- Cue/model-reliability may reduce dimensionality.
98Non-parametric approach?
- Not constrained by a particular equationcan fill
space better.
- Similar properties in terms of starting ? and
sparseness.
99Model Conclusions
To avoid overgeneralization better to start
with small estimates for ?
Small or even medium starting ?s lead to sparse
category structure during infancymuch of
phonetic space is unmapped.
Sparse categories Similar temporal integration
to exp 2 Retain ambiguity (and partial
representations) until more input is available.
100AEM Paradigm
Examination of sparseness/completeness of
categories needs a two alternative task.
- Also useful with
- Color
- Shape
- Spatial Frequency
- Faces
Quicktime Demo
101Experiment 6
Anticipatory Eye Movements Train Bear0
Left Pail35 Right Test Bear0
Pear40 Bear5 Pear35 Bear10 Pear30
Bear15 Pear25 Same naturally-produced
tokens from Exps 4 5.
102Expected results
Adult boundary
unmapped
space
Pail
Bear
Performance
VOT
VOT
VOT
103Results
Correct 67 9 / 16 Better than chance.
Training Tokens
104Infant Summary
Infants show graded sensitivity to subphonemic
detail.
- /b/-results regions of unmapped phonetic space.
- Statistical approach provides support for
sparseness. - Given current learning theories, sparseness
results from optimal starting parameters. - Empirical test will require a two-alternative
task. - AEM train infants to make eye-movements in
response to stimulus identity.
105Conclusions
Infant and adults sensitive to subphonemic detail.
Sensitivity is important to adult and developing
word recognition systems. 1) Short term cue
integration. 2) Long term phonology
learning. In both cases Partially ambiguous
material is retained until more data
arrives. Partially active representations
anticipate likelihood of future material
106Conclusions
Spoken language is defined by change. But the
information to cope with it is in the signalif
we look online. Within-category acoustic
variation is signal, not noise.
107Within-Category Variation is Used in Spoken Word
Recognition Temporal Integration at Two Time
Scales Bob McMurray University of Iowa Dept. of
Psychology
108(No Transcript)
109Misperception Additional Results
110- 10 Pairs of b/p items.
- 0 35 ms VOT continua.
20 Filler items (lemonade, restaurant,
saxophone) Option to click X
(Mispronounced). 26 Subjects 1240 Trials over
two days.
111Identification Results
1.00
0.90
0.80
0.70
Significant target responses even at
extreme. Graded effects of VOT on correct
response rate.
Voiced
0.60
0.50
Response Rate
Voiceless
0.40
NW
0.30
0.20
0.10
0.00
0
5
10
15
20
25
30
35
Barricade
Parricade
112Phonetic Garden-Path
Garden-path effect Difference between looks
to each target (b vs. p) at same VOT.
113Target
GP Effect Gradient effect of VOT. Target
plt.0001 Competitor plt.0001
Competitor
114Assimilation Additional Results
115 runm picks runm takes
116Exp 3 4 Conclusions
- Within-category detail used in recovering from
assimilation temporal integration. - Anticipate upcoming material
- Bias activations based on context
- - Like Exp 2 within-category detail retained to
resolve ambiguity.. - Phonological variation is a source of information.
117Subject hears select the mud drinker select
the mudg gear select the mudg drinker
Critical Pair
118Onset of gear
Avg. offset of gear (402 ms)
0.45
0.4
0.35
0.3
Fixation Proportion
0.25
0.2
0.15
0.1
0.05
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Time (ms)
Mudg Gear is initially ambiguous with a late bias
towards Mud.
119Mudg Drinker is also ambiguous with a late bias
towards Mug (the /g/ has to come from
somewhere).
120(No Transcript)