Title: Connectionist Time and Dynamic Systems Time in One Architecture? Modeling Word Learning at Two Timescales
1Connectionist Time and Dynamic Systems Time in
One Architecture?Modeling Word Learning at Two
Timescales
Jessica S. Horst (jessica-horst_at_uiowa.edu) Bob
McMurray Larissa K. Samuelson Dept. of
Psychology University of Iowa
2Two Time Scales in Neural Networks
- Connectionist and dynamical systems accounts
- stress change over time
- complement each other in timescale
- Dynamic Systems online processes
- Connectionist Networks long-term learningMany
domains of development require both timescales - Example language development requires
- sensitivity to brief and sequential nature of the
input - slower developmental processes.
3Two Time Scales in Language Acquisition
Word learning often attributed to fast mapping -
quick link between a novel name and a novel
object (e.g., Carey, 1978). But,
recent empirical data suggests that fast mapping
and word learning may represent two distinct time
scales (Horst Samuelson, April, 2005). -
Fast Mapping quick process emerging in the
moment. - Word Learning gradual process over
the course of developmentWe capture
both timescales in a recurrent network.
4Auditory Inputs
The Architecture
- Activation feed from input layers to decision
layers. - Decision units compete via inhibition.
- Activation feeds back to input layers.
- Cycle continues until system settles.c
Decision Units (Hidden) Layer
Visual Inputs
Initial State (Before Learning)
(McMurray Spivey, 2000)
- Unsupervised Hebbian learning occurs on every
cycle.
5- Online decision dynamics reflect auditory and
visual competitors.
6The Model
- 15 Auditory 15 Visual units
- 90 Decision units
- Names presented singly with a variable number of
objects
- Name-Decision Object-Decision associations
strengthened via learning - After 4000 training trials network forms localist
representations - Learns name-object links and to ignore visual
competitors
End State Post Learning
7Connection Strength
8Two Time Scales
- Fast Moment by Moment
- Online information integration and constraint
satisfaction (e.g., McClelland Elman, 1986,
Dell, 1981) - Reaches a pattern of stable activation through
input based on auditory and visual inputs and
stored knowledge (weights) - Model makes correct name-object links based on
the latest input
- Slow Over the Long-Term
- Unsupervised Hebbian Learning
- Associates words with visual targets
- Learns to ignore visual competitors
9Dependent Time Scales
- The two time scales are not independent
- Long-term learning depends critically on the
dynamics of the fast time scales - Competition between decision units ensures
pseudo-localist representationscritical for
Hebbian learning (e.g. Rumelhart Zipser, 1986) - Learning occurs on each cycle
- - Influences processing cycle-by-cycle
trial-by-trial - Accumulated learning across trials leads to
learning on long-term time scale (i.e., word
learning)
10Empirical Results
11Fast Time Scale
- 24-month-old children
- Saw 2 familiar 1 novel objects
- Asked to get familiar and novel objects (e.g.,
get the cow! or get the yok!)
- Children were excellent at fast mapping (finding
the referent of novel and familiar words in the
moment).
12Slow Time Scale
After a 5-minute delay, children were asked to
pick a newly fast-mapped name (e.g., get the
yok!)
- Children unable to retain mappings after a
5-minute delay
13Replication
- Initial findings replicated with simpler tasks
- effect of number of names or trials?
- Childrens difficulty in retaining newly
fast-mapped names is not related to the number of
names or trials
Replication 1 (N 12)
Replication 2 (N 12)
Fast Mapping Retention
9/12 4/9 n.s.
Fast Mapping Retention
7/12 4/7 n.s.
- 1 Novel Name
- 8 Familiar Names
- 7 Preference Trials
- 1 Novel Name
- 2 Familiar Names
Binomial, p lt .05, Binomial, p lt .01
14Simulations
15- 20 networks initialized with random weights
- 15 word lexicon (names objects)
- 5 familiar words
- 5 novel words
- 5 held out
- Trained on 5 familiar items for 5000 epochs
- Items presented in random order
- Run in the Fast Mapping Experiment
- 10 fast mapping trials (5 familiar, 5 novel)
- 5 retention trials
- Learning was not turned off during experiment.
16How The Model Behaves
- Fast Time Scale
- Model succeeded on both types of fast-mapping
trials - Model behavior patterned with empirical results
17- Slow Time Scale
- The model fails to retain the newly learned
words after a delay
Chance
18How The Model Thinks
- Analyses of weight matrices revealed that
relatively little learning occurred during the
test phase.
Change (RMS) in portions of weight matrix
2
1.6
1.2
Squared Deviations
0.8
0.4
0
Familiar
Familiar
Novel
Control
Words
Words
Words
Words
After
After Test
End
End
Learning
Temporal dynamics of processing
19Prior to Experiment
Connection Strength
After Experiment
20Conclusions
- Two time scales captured in a single
architecture - Fast, online fast mapping
- Slow, long-term word learning
- The model replicated the empirical findings
- Excellent word learning and fast mapping
- Poor retention
- Has sufficient knowledge to select the referent
at a given moment in time, given auditory and
visual input and stored knowledge (weights). - But not enough to subsequently know the word.
21Conclusions
- In-the-moment learning
- Subtly biases behavior
- Combined with activation dynamics, yields correct
response. - Does not provide robust, context-independent word
knowledge (in the short term) - Continued training on fast-mapped words (i.e.,
5000 epochs) makes them familiar words. - Accumulation of this learning provides robust
context-independent word knowledge over
development.
22Take-Home Messages
1) A fast-mapped word is not a known word but
a known word is known, because it has been
fast-mapped many, many times.
2) Understanding development requires models that
integrate both short-term dynamic processes and
long-term learning.
23References
- Carey, S. (1978). The child as word learner. In
M. Halle, J. Bresnan A. Miller (Eds.),
Linguistic Theory and Psychological Reality (pp.
264-293). Cambridge, MA MIT Press. - Dell, Gary S. (1986). A spreading-activation
theory of retrieval in sentence production.
Psychological Review, 93(3) 283-321. - Horst, J.S. Samuelson, L.K. (2005, April). Slow
Down Understanding the Time Course Behind Fast
Mapping. Poster session presented at the 2005
Biennial Meeting of the Society for Research in
Child Development, Atlanta, GA. - McClelland, J. Elman, J. (1986). The TRACE
Model of Speech Perception, Cognitive Psychology,
18(1), 1-86. - McMurray, B., Spivey, M. (2000). The
Categorical Perception of Consonants The
Interaction of Learning and Processing, The
Proceedings of the Chicago Linguistics Society,
34(2), 205-220. - Rumelhart, D. Zipser, D. (1986). Feature
Discovery By Competitive Learning. In Rumelhart,
D., McClelland, J. (Eds) Parallel Distributed
Processing Explorations in the Microstructure of
Cognition, 1, Cambridge, MA MIT Press.
Acknowledgements
The authors would like to thank Joseph Toscano
for programming assistance and support. This
work was supported by NICHD Grant R01-HD045713 to
LKS.