WithinCategory Variation is Used in Spoken Word Recognition - PowerPoint PPT Presentation

About This Presentation

Title:

WithinCategory Variation is Used in Spoken Word Recognition

Description:

Within-Category Variation is Used in Spoken Word Recognition ... Telephone. Delaware. 4. Treadmill. Dreadlocks. 4. Train Tracks. Drain Pipes. 4. Pillbox ... – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 121

Provided by: bobmcm

Learn more at: http://www2.psychology.uiowa.edu

Category:

more less

Transcript and Presenter's Notes

Title: WithinCategory Variation is Used in Spoken Word Recognition

1
Within-Category Variation is Used in Spoken Word
Recognition Temporal Integration at Two Time
Scales Bob McMurray University of Iowa Dept.
of Psychology
2
Collaborators
Richard Aslin Michael Tanenhaus David Gow
Joe Toscano Dana Subik Julie Markant
3

Perception Cognition
A detailed understanding of perceptual processing
is critical to understanding higher level
cognition.
Specifically Sensitivity to fine-grained
perceptual detail can help integrate information
over time.
4

Temporal Integration
Temporal integration a critical problem for
cognition. - information never arrives
synchronously.

Vision integration across head-movements,
saccades and attention-shifts.
Music perception long-term dependencies and
short term expectancies.

In language, information arrives sequentially.
Partial syntactic and semantic representations
are formed as words arrive.

The
Hawkeyes
beat
the
Boilermakers
(once)

Words are identified over sequential phonemes.

l
?
?
g?
?
d
?
6

Spoken Word Recognition is an ideal arena in
which to study these issues because

Research divides word recognition into perceptual
and cognitive mechanisms.
Perceptual information available for temporal
information integration.

Scales of temporal integration in word
recognition
A Word ordered series of articulations.
- Build abstract representations.
- Form expectations about future events.
- Fast (online) processing.

A phonology
- Abstract across utterances.
- Expectations about possible future events.
- Slow (developmental) processing

8
Mechanisms of Temporal Integration

Stimuli do not change arbitrarily.
Perceptual cues reveal something about the change
itself.
Active integration
Anticipating future events
Retain partial present representations.
Resolve prior ambiguity.

9
Overview

Speech perception and Spoken Word Recognition.

2) Lexical activation is sensitive to
fine-grained detail in speech.
3) Fast temporal integration taking advantage of
regularity in the signal for temporal integration.
4) Slow temporal integration Developmental
consequences
10

Online Word Recognition
Information arrives sequentially
At early points in time, signal is temporarily
ambiguous.

Later arriving information disambiguates the word.

Current models of spoken word recognition
Immediacy Hypotheses formed from the earliest
moments of input.
Activation Based Lexical candidates (words)
receive activation to the degree they match the
input.
Parallel Processing Multiple items are active in
parallel.
Competition Items compete with each other for
recognition.

12
Input
b... u tt e r
time
beach

butter

bump

putter

dog
13
These processes have been well defined for a
phonemic representation of the input.
A
S
n
I
g
?
n
k
But considerably less ambiguity if we consider
subphonemic information.
Example subphonemic effects of motor processes.
14
Coarticulation
Any action reflects future actions as it unfolds.
Example Coarticulation Articulation (lips,
tongue) reflects current, future and past
events. Subtle subphonemic variation in speech
reflects temporal organization.
Sensitivity to these perceptual details might
yield earlier disambiguation.
15
These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded. Example
Categorical Perception
16
Categorical Perception
Subphonemic variation in VOT is discarded in
favor of a discrete symbol (phoneme).
17
Evidence against the strong form of Categorical
Perception from psychophysical-type tasks
Discrimination Tasks Pisoni and Tash (1974)
Pisoni Lazarus (1974) Carney, Widin
Viemeister (1977)
Training Samuel (1977) Pisoni, Aslin, Perey
Hennessy (1982)
Goodness Ratings Miller (1997) Massaro
Cohen (1983)
18
Experiment 1
?
Does within-category acoustic detail
systematically affect higher level language? Is
there a gradient effect of subphonemic detail on
lexical activation?
19
McMurray, Aslin Tanenhaus (2002)
A gradient relationship would yield systematic
effects of subphonemic information on lexical
activation.
If this gradiency is useful for temporal
integration, it must be preserved over
time. Need a design sensitive to both acoustic
detail and detailed temporal dynamics of lexical
activation.
20
Acoustic Detail
Use a speech continuummore steps yields a better
picture acoustic mapping.
KlattWorks generate synthetic continua from
natural speech.
9-step VOT continua (0-40 ms) 6 pairs of
words. beach/peach bale/pale bear/pear bump/pump
bomb/palm butter/putter 6 fillers. lamp leg loc
k ladder lip leaf shark shell shoe ship sheep shi
rt
21
(No Transcript)
22
Temporal Dynamics
How do we tap on-line recognition? With an
on-line task Eye-movements
Subjects hear spoken language and manipulate
objects in a visual world. Visual world
includes set of objects with interesting
linguistic properties. a beach, a peach and some
unrelated items. Eye-movements to each object are
monitored throughout the task.
Tanenhaus, Spivey-Knowlton, Eberhart Sedivy,
1995
23
Why use eye-movements and visual world paradigm?

Relatively natural task.
Eye-movements generated very fast (within 200ms
of first bit of information).
Eye movements time-locked to speech.
Subjects arent aware of eye-movements.
Fixation probability maps onto lexical
activation..

24
Task
A moment to view the items
25
(No Transcript)
26
Task
Bear
Repeat 1080 times
27
Identification Results
High agreement across subjects and items for
category boundary.
proportion /p/
VOT (ms)
B
P
By subject 17.25 /- 1.33ms By item 17.24
/- 1.24ms
28
Task
Target Bear Competitor Pear Unrelated Lamp,
Ship
29
Task
30
Task

Given that
the subject heard bear
clicked on bear

How often was the subject looking at the pear?
Categorical Results
Gradient Effect
target
target
competitor
competitor
competitor
competitor
31
Results
Response
Response
VOT
VOT
0 ms
5 ms
Competitor Fixations
Time since word onset (ms)
Long-lasting gradient effect seen throughout the
timecourse of processing.
32
Response
Response
Looks to
Competitor Fixations
Looks to
Category Boundary
VOT (ms)
33
Response
Response
Looks to
Competitor Fixations
Looks to
Category Boundary
VOT (ms)
34
Summary
Subphonemic acoustic differences in VOT have
gradient effect on lexical activation.

Gradient effect of VOT on looks to the
competitor.

Effect holds even for unambiguous stimuli.

Seems to be long-lasting.

Consistent with growing body of work using
priming (Andruski, Blumstein Burton, 1994
Utman, Blumstein Burton, 2000 Gow, 2001, 2002).
35
The Proposed Framework
Sensitivity Use

Word recognition is systematically sensitive to
subphonemic acoustic detail.

2) Acoustic detail is represented as gradations
in activation across the lexicon.

This sensitivity enables the system to take
advantage of subphonemic regularities for
temporal integration.

4) This has fundamental consequences for
development learning phonological organization.
36
Lexical Sensitivity

Word recognition is systematically sensitive to
subphonemic acoustic detail.

Voicing
Laterality, Manner, Place
Natural Speech
X Metalinguistic Tasks

37
Lexical Sensitivity

Word recognition is systematically sensitive to
subphonemic acoustic detail.

Voicing
Laterality, Manner, Place
Natural Speech
X Metalinguistic Tasks

38
Lexical Sensitivity

Word recognition is systematically sensitive to
subphonemic acoustic detail.

Voicing
Laterality, Manner, Place
Natural Speech
X Metalinguistic Tasks

ResponseP Looks to B
Competitor Fixations
ResponseB Looks to B
Category Boundary
0
5
10
15
20
25
30
35
40
VOT (ms)
39
Lexical Sensitivity

Word recognition is systematically sensitive to
subphonemic acoustic detail.

Voicing
Laterality, Manner, Place
Natural Speech
X Metalinguistic Tasks
? Non minimal pairs
? Duration of effect
(experiment 1)

40
2) Acoustic detail is represented as gradations
in activation across the lexicon.
Input
b... u m p
time
41
Temporal Integration

This sensitivity enables the system to take
advantage of subphonemic regularities for
temporal integration.

Regressive ambiguity resolution (exp 1)
Ambiguity retained until more information
arrives.
Progressive expectation building (exp 2)
Phonetic distinctions are spread over time
Anticipate upcoming material.

42
Development
4) Consequences for development learning
phonological organization.

Learning a language
Integrating input across many utterances to build
long-term representation.
Sensitivity to subphonemic detail (exp 4 5).
Allows statistical learning of categories
(model).

43
Experiment 2
?
44
Misperception
What if initial portion of a stimulus was
misperceived?
Competitor still active - easy to activate it
rest of the way. Competitor completely
inactive - system will garden-path. P (
misperception ) ? distance from
boundary. Gradient activation allows the system
to hedge its bets.
45
/ beIr?keId / vs. / peIr?kit /
barricade vs. parakeet
Input
p/b eI r ? k i t
time
Categorical Lexicon
parakeet
barricade
46
Methods
10 Pairs of b/p items.
47
(No Transcript)
48
Eye Movement Results
Barricade -gt Parricade
1
VOT
0.8
0.6
Fixations to Target
0.4
0.2
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical
endpoint. --Even within the non-word range.
49
Eye Movement Results
Barricade -gt Parricade
1
VOT
0.8
0.6
Fixations to Target
0.4
0.2
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical
endpoint. --Even within the non-word range.
50
Experiment 2 Conclusions
Gradient effect of within-category variation
without minimal-pairs.

Gradient effect long-lasting mean POD 240 ms.
Regressive ambiguity resolution
Subphonemic gradations maintained until more
information arrives.
Subphonemic gradation can improve (or hinder)
recovery from garden path.

51
Progressive Expectation Formation

Can within-category detail be used to predict
future acoustic/phonetic events?
Yes Phonological regularities create systematic
within-category variation.
Predicts future events.

52
Experiment 3 Anticipation
Word-final coronal consonants (n, t, d)
assimilate the place of the following segment.
Maroong Goose
Maroon Duck
Place assimilation -gt ambiguous segments
anticipate upcoming material.
53
Subject hears select the maroon
duck select the maroon goose select the
maroong goose select the maroong duck
54
Results
Anticipatory effect on looks to non-coronal.
55
Onset of goose oculomotor delay
0.3
Assimilated
0.25
Non Assimilated
Fixation Proportion
0.2
0.15
0.1
0.05
0
0
200
400
600
Time (ms)
Looks to duck as a function of time
Inhibitory effect on looks to coronal (duck,
p.024)
56

Sensitivity to subphonemic detail
Increase priors on likely upcoming events.
Decrease priors on unlikely upcoming events.
Active Temporal Integration Process.

Occasionally assimilation creates ambiguity
Resolves prior ambiguity mudg drinker
Similar to experiment 2

57
Adult Summary

Lexical activation is exquisitely sensitive to
within-category detail.
This sensitivity is useful to integrate material
over time.
Regressive Ambiguity resolution.
Progressive Facilitation
Taking advantage of phonological and lexical
regularities.

58
Development
Historically, work in speech perception has been
linked to development. Sensitivity to
subphonemic detail must revise our view of
development.
Use Infants face additional temporal integration
problems No lexicon available to clean up
noisy input rely on acoustic regularities. Ex
tracting a phonology from the series of
utterances.
59
Sensitivity to subphonemic detail For 30
years, virtually all attempts to address this
question have yielded categorical discrimination
(e.g. Eimas, Siqueland, Jusczyk Vigorito, 1971).

Exception Miller Eimas (1996).
Only at extreme VOTs.
Only when habituated to non- prototypical
token.

60
Use?
Nonetheless, infants possess abilities that would
require within-category sensitivity.

Infants can use allophonic differences at word
boundaries for segmentation (Jusczyk, Hohne
Bauman, 1999 Hohne, Jusczyk, 1994)

Infants can learn phonetic categories from
distributional statistics (Maye, Werker Gerken,
2002 Maye Weiss, 2004).

61
Statistical Category Learning
Speech production causes clustering along
contrastive phonetic dimensions.
E.g. Voicing / Voice Onset Time B VOT
0 P VOT 40
62
To statistically learn speech categories, infants
must

This requires ability to track specific VOTs.

63
Experiment 4
Why no demonstrations of sensitivity?

Habituation
Discrimination not ID.
Possible selective adaptation.
Possible attenuation of sensitivity.

Synthetic speech
Not ideal for infants.

Single exemplar/continuum
Not necessarily a category representation

Experiment 4 Reassess issue with improved
methods.
64
HTPP

Head-Turn Preference Procedure
(Jusczyk Aslin, 1995)
Infants exposed to a chunk of language
Words in running speech.
Stream of continuous speech (ala statistical
learning paradigm).
Word list.

Memory for exposed items (or abstractions)
assessed
Compare listening time between consistent and
inconsistent items.

65
Test trials start with all lights off.
66
Center Light blinks.
67
Brings infants attention to center.
68
One of the side-lights blinks.
69
When infant looks at side-light he hears a word
70
as long as he keeps looking.
71
Methods
7.5 month old infants exposed to either 4 b-, or
4 p-words. 80 repetitions total. Form a
category of the exposed class of words.
72
Stimuli constructed by cross-splicing naturally
produced tokens of each end point.
73
Novelty or Familiarity?
Novelty/Familiarity preference varies across
infants and experiments.
Were only interested in the middle stimuli (b,
p). Infants were classified as novelty or
familiarity preferring by performance on the
endpoints.
74

After being exposed to bear beach bail
bomb Infants who show a novelty effect will
look longer for pear than bear.
What about in between?
Listening Time
Bear
Bear
Pear
75
Results
Novelty infants (B 36 P 21)
10000
9000
8000
Listening Time (ms)
7000
Exposed to
6000
B
P
5000
4000
Target
Target
Competitor
Target vs. Target Competitor vs. Target
plt.001 p.017
76
Familiarity infants (B 16 P 12)
Target vs. Target Competitor vs. Target
P.003 p.012
77
Infants exposed to /p/
Novelty N21
78
Infants exposed to /b/
79
Experiment 4 Conclusions
Contrary to all previous work

7.5 month old infants show gradient sensitivity
to subphonemic detail.
Clear effect for /p/
Effect attenuated for /b/.

80
Reduced effect for /b/ But
81

Bear ? Pear

Category boundary lies between Bear Bear
- Between (3ms and 11 ms) ??

Within-category sensitivity in a different range?

82
Experiment 5
Same design as experiment 3. VOTs shifted away
from hypothesized boundary Train
Test
-9.7 ms.
Bomb Bear Beach Bale
3.6 ms.
Bomb Bear Beach Bale
40.7 ms.
Palm Pear Peach Pail
83
Familiarity infants (34 Infants)
.01
9000
.05
8000
7000
Listening Time (ms)
6000
5000
4000
B-
B
P
84
Novelty infants (25 Infants)
.002
9000
.02
8000
7000
Listening Time (ms)
6000
5000
4000
B-
B
P
85
Experiment 5 Conclusions

Within-category sensitivity in /b/ as well as /p/.

Shifted category boundary in /b/ not consistent
with adult boundary (or prior infant work). Why?

86
/b/ results consistent with (at least) two
mappings.
/b/
/p/
1) Shifted boundary
Category Mapping Strength
VOT

Inconsistent with prior literature.
Why would infants have this boundary?

87
HTPP is a one-alternative task. Asks B or
not-B not B or P
Hypothesis Sparse categories by-product of
efficient learning.
88
Computational Model
Distributional learning model

Model distribution of tokens as
a mixture of Gaussian distributions
over phonetic dimension (e.g. VOT) .

2) After receiving an input, the Gaussian with
the highest posterior probability is the
category.
89
Statistical Category Learning
1) Start with a set of randomly selected
Gaussians.

After each input, adjust each parameter to find
best description of the input.

Start with more Gaussians than necessary--model
doesnt innately know how many categories.
? -gt 0 for unneeded categories.

90
(No Transcript)
91

Overgeneralization
large ?
costly lose phonetic distinctions

Undergeneralization
small ?
not as costly maintain distinctiveness.

To increase likelihood of successful learning
err on the side of caution.
start with small ?

94
Sparseness coefficient of space not strongly
mapped to any category.
VOT
95
Start with large s
VOT
Starting ?
0.4
0.35
0.3
0.25
Avg Sparsity Coefficient
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
Training Epochs
96
Intermediate starting s
VOT
Starting ?
0.4
0.35
0.3
0.25
Avg Sparsity Coefficient
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
Training Epochs
97
Limitations

Occasionally model leaves sparse regions at the
end of learning.
Competition/Choice framework
Additional competition or selection mechanisms
during processing categorization despite
incomplete information.

Multi-dimensional categories
1-D 3 parameters / category
2-D 6
3-D 13
4-D 15
Cue/model-reliability may reduce dimensionality.

98
Non-parametric approach?

Not constrained by a particular equationcan fill
space better.

Similar properties in terms of starting ? and
sparseness.

99
Model Conclusions
To avoid overgeneralization better to start
with small estimates for ?
Small or even medium starting ?s lead to sparse
category structure during infancymuch of
phonetic space is unmapped.
Sparse categories Similar temporal integration
to exp 2 Retain ambiguity (and partial
representations) until more input is available.
100
AEM Paradigm
Examination of sparseness/completeness of
categories needs a two alternative task.

Also useful with
Color
Shape
Spatial Frequency
Faces

Quicktime Demo
101
Experiment 6
Anticipatory Eye Movements Train Bear0
Left Pail35 Right Test Bear0
Pear40 Bear5 Pear35 Bear10 Pear30
Bear15 Pear25 Same naturally-produced
tokens from Exps 4 5.
102
Expected results
Adult boundary
unmapped
space
Pail
Bear
Performance
VOT
VOT
VOT
103
Results
Correct 67 9 / 16 Better than chance.
Training Tokens
104
Infant Summary
Infants show graded sensitivity to subphonemic
detail.

/b/-results regions of unmapped phonetic space.
Statistical approach provides support for
sparseness.
Given current learning theories, sparseness
results from optimal starting parameters.
Empirical test will require a two-alternative
task.
AEM train infants to make eye-movements in
response to stimulus identity.

105
Conclusions
Infant and adults sensitive to subphonemic detail.
Sensitivity is important to adult and developing
word recognition systems. 1) Short term cue
integration. 2) Long term phonology
learning. In both cases Partially ambiguous
material is retained until more data
arrives. Partially active representations
anticipate likelihood of future material
106
Conclusions
Spoken language is defined by change. But the
information to cope with it is in the signalif
we look online. Within-category acoustic
variation is signal, not noise.
107
Within-Category Variation is Used in Spoken Word
Recognition Temporal Integration at Two Time
Scales Bob McMurray University of Iowa Dept. of
Psychology
108
(No Transcript)
109
Misperception Additional Results
110

10 Pairs of b/p items.
0 35 ms VOT continua.

20 Filler items (lemonade, restaurant,
saxophone) Option to click X
(Mispronounced). 26 Subjects 1240 Trials over
two days.
111
Identification Results
1.00
0.90
0.80
0.70
Significant target responses even at
extreme. Graded effects of VOT on correct
response rate.
Voiced
0.60
0.50
Response Rate
Voiceless
0.40
NW
0.30
0.20
0.10
0.00
0
5
10
15
20
25
30
35
Barricade
Parricade
112
Phonetic Garden-Path
Garden-path effect Difference between looks
to each target (b vs. p) at same VOT.
113
Target
GP Effect Gradient effect of VOT. Target
plt.0001 Competitor plt.0001
Competitor
114
Assimilation Additional Results
115
runm picks runm takes
116
Exp 3 4 Conclusions

Within-category detail used in recovering from
assimilation temporal integration.
Anticipate upcoming material
Bias activations based on context
- Like Exp 2 within-category detail retained to
resolve ambiguity..
Phonological variation is a source of information.

117
Subject hears select the mud drinker select
the mudg gear select the mudg drinker
Critical Pair
118
Onset of gear
Avg. offset of gear (402 ms)
0.45
0.4
0.35
0.3
Fixation Proportion
0.25
0.2
0.15
0.1
0.05
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Time (ms)
Mudg Gear is initially ambiguous with a late bias
towards Mud.
119
Mudg Drinker is also ambiguous with a late bias
towards Mug (the /g/ has to come from
somewhere).
120
(No Transcript)

Write a Comment

User Comments (0)