Title: Using auto-encoders to model early infant categorization: results, predictions and insights
1Using auto-encoders to model early infant
categorization results, predictions and
insights
2Overview
- An odd categorization asymmetry was observed in
3-4 month old infants. - We explain this asymmetry using a connectionist
auto-encoder model. - Our model made a number of predictions, which
turned out to be correct. - We used a more neurobiologically plausible
encoding for the stimuli. - The model can now show how young infants reduced
visual acuity may actually help them do
basic-level categorization.
3Background on infant statistical
category-learning
- Quinn, Eimas, Rosenkrantz (1993) noticed a
rather surprising categorization asymmetry in 3-4
month old infants - Infants familiarized on cats are surprised by
novel dogs - BUT infants familiarized on dogs are bored by
novel cats.
4How their experiment worked
Familiarization phase infants saw 6 pairs of
pictures of animals, say, cats, from one category
(i.e., a total of 12 different animals)
Test phase infants saw a pair consisting of a
new cat and a new dog. Their gaze time was
measured for each of the two novel animals.
5 Familiarization Trials
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Test phase
Compare looking times
12Results (Quinn et al., 1993)The categorization
asymmetry
- Infants familiarized on cats look significantly
longer at the novel dog in the test phase than
the novel cat. - No significant difference for infants
familiarized on dogs on the time they look at a
novel cat compared to a novel dog.
13Our hypothesis
- We assume that infants are hard-wired to be
sensitive to novelty (i.e., they look longer at
novel objects than at familiar objects). - Cats, on the whole, are less varied and thus are
included in the category of Dogs. - Thus, when they have seen a number of cats, a
dog is perceived as novel. But, when they have
seen a number of dogs, the new cat is perceived
as just another dog.
14Statistical distributions of patterns are what
count
The infants are becoming sensitive to the
statistical distributions of the patterns they
are observing.
15Consider the distribution of values of a
particular characteristic for Cats and Dogs
- Note that the distribution for Cats is
- narrower than that of Dogs
- included in that of Dogs.
16Suppose an infant has become familiarized with
the distribution for cats
And then sees a dog
Chances are the new stimulus will fall outside of
the familiarized range of values
17On the other hand, if an infant has become
familiarized with the distribution for Dogs
And then sees a cat
Chances are the new stimulus will be inside the
familiarized range of values
18How could we model this asymmetry?
- We based our connectionist model on a model of
infant categorization proposed by Sokolov (1963).
19Sokolovs (1963) model
Encode
Stimulus in the environment
20Decode and Compare
Encode
equal?
Stimulus in the environment
21Decode and Compare
Adjust
Encode
Stimulus in the environment
22Decode and Compare
Adjust
Encode
Stimulus in the environment
23Decode and Compare
Adjust
Encode
equal?
Stimulus in the environment
24Decode and Compare
Adjust
Encode
Stimulus in the environment
25Decode and Compare
Adjust
Encode
Stimulus in the environment
26Decode and Compare
Adjust
Encode
equal?
Stimulus in the environment
27Decode and Compare
Adjust
Encode
Stimulus in the environment
28Decode and Compare
Adjust
Encode
Stimulus in the environment
29Continue looping
until the internal representation corresponds to
the external stimulus
30Using an autoassociator to simulate the Sokolov
model
Stimulus from the environment
31encode
Stimulus from the environment
32decode
encode
Stimulus from the environment
33decode
compare
encode
Stimulus from the environment
34decode
adjust weights
encode
Stimulus from the environment
35decode
encode
Stimulus from the environment
36decode
encode
Stimulus from the environment
37decode
encode
Stimulus from the environment
38decode
compare
encode
Stimulus from the environment
39decode
adjust weights
encode
Stimulus from the environment
40decode
encode
Stimulus from the environment
41decode
encode
Stimulus from the environment
42decode
encode
Stimulus from the environment
43decode
compare
encode
Stimulus from the environment
44decode
adjust weights
encode
Stimulus from the environment
45Continue looping
until the internal representation corresponds to
the external stimulus
46Infant looking time ? network error
- In the Sokolov model, an infant continues to
look at the image until the discrepancy between
the image and the internal representation of the
image drops below a certain threshold. - In the auto-encoder model, the network continues
to process the input until the discrepancy
between the input and the (decoded) internal
representation of the input drops below a certain
(error) threshold.
47Input to our model
We used a three-layer, 10-8-10, non-linear
auto-encoder (i.e., a network that tries to
reproduce on output what it sees on input) to
model the data. The inputs were ten feature
values, normalized between 0 and 1.0 across all
of the images, taken from the original stimuli
used by Quinn et al. (1993). They were head
length, head width, eye separation, ear
separation, ear length, nose length, nose width,
leg length vertical extent, and horizontal
extent. The distributions and, especially,
the amount of inclusion of these features in
shown in the following graphs.
48Comparing the distributions of the input features
49Results of Our Simulation
50??
?1
?2
51A Prediction of the auto-encoder model
- If we were to reverse the inclusion relationship
between Dogs and Cats, we should be able to
reverse the asymmetry. - We selected the new stimuli from dog- and
cat-breeder books (and very slightly morphed some
of these stimuli). - We created a set of Cats and Dogs, such that Cats
now included Dogs i.e., the Cat category was
the broad category and the Dog category was the
narrow category.
52Reversing the Inclusion Relationship
Eye separation
Reversed distributions Cats include Dogs
Old distributions Dogs include Cats
Ear length
53Results
Prediction by the model
3-4 month infant data
54Removing the inclusion relationshipAnother
prediction from the model
- Our model also predicts that, regardless of the
variance of each category, if we remove the
inclusion relationship, we should eliminate the
categorization asymmetry.
55A new set of cat/dog stimuli was created in which
there is no inclusion relationship
56Prediction and Empirical Results The
categorization asymmetry disappears.
Infant data
Prediction of the auto-encoder
57A critique of our methodology The use of
explicit features
- We used explicit features (head length, leg
length, ear separation, nose length, etc.) to
characterize the animals (we hand-measured the
values using the photos shown to the infants). - We decided instead to use simply Gabor-filtered
spatial-frequency information to characterize the
pictures.
58The Forest and the TreesWhat are spatial
frequencies?
Very low spatial frequencies
59The Forest and the TreesWhat are spatial
frequencies?
Low spatial frequencies
60The Forest and the TreesWhat are spatial
frequencies?
Medium spatial frequencies
61The Forest and the TreesWhat are spatial
frequencies?
Medium-high spatial frequenciess
62The Forest and the TreesWhat are spatial
frequencies?
High spatial frequenciess
63The Forest and the TreesWhat are spatial
frequencies?
Very high spatial frequenciess
64The Forest and the TreesWhat are spatial
frequencies?
10 m. away Forest no longer visible. Trees with
branches and individual leaves visible
Extremely high spatial frequencies
65The Forest and the TreesCombining spatial
frequencies to obtain the full image
66Cats infant-to-adult visual acuity
Very low spatial frequencies
Two-month old vision
3-4 month old vision
(almost) adult vision
67Cats infant-to-adult visual acuity
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73Adult Vision with full range of spatial
frequencies
74Spatial frequency maps of images with Gabor
filtering
low freq.
high freq.
spatial-frequency map
We cover this map with spatial-frequency ovals
along various orientations of the image. (Each
oval is normalized to have approximately the same
energy.)
This allows us to characterize each dog/cat image
with a 26-unit vector.
75This is an experiment.Consider the following
image.
76?
77(No Transcript)
78(No Transcript)
79(No Transcript)
80Moral of the story Sometimes too much detail
hinders categorization (even for adults!)
81The same is true for infants Reducing
high-frequency information improves category
discrimination for distinct categories
Reducing the range of the spatial frequencies
from the retinal map to V1 decreases
within-category variance. This decreases the
difference between two exemplars of the same
category, but increases the difference between
exemplars from two different categories. This
will make learning distant basic-level or
super-ordinate category distinctions easier (but
subordinate-level category distinctions will be
more difficult).
82In other words, reduced visual acuity might
actually be good for infant categorization.
- Visual acuity in infants is not the same as that
of adults. They do not perceive high-spatial
frequencies (i.e., fine details), or perceive
them only poorly. - This reduced visual acuity may actually improve
perceptual efficiency by eliminating the
information overload caused by too many
extraneous fine details likely to overwhelm their
cognitive system. - Thus, distant basic-level category and
super-ordinate level category learning may
actually be facilitated by reduced visual
acuity.
83Reducing visual acuity in our model to simulate
young-infant vision by removing high spatial
frequencies
High spatial frequencies
84Reducing visual acuity in our model to simulate
young-infant vision by removing high spatial
frequencies
High spatial frequencies
85Reducing visual acuity in our model to simulate
young-infant vision by removing high spatial
frequencies
High spatial frequencies
86Reducing visual acuity in our model to simulate
young-infant vision by removing high spatial
frequencies
The high spatial frequencies have been removed.
The autoencoder will work with input from these
images, thereby simulating early infant vision.
87Two simulations with Gabor-filtered input
- Reproducing previous results Using vectors of
the 26 weighted spatial-frequency values, instead
of explicit feature values, produces autencoder
network results similar to those produced by
infants tested on the same images - Reduced visual acuity This is produced by
largely eliminating high-spatial frequency
information from the input (i.e., blurry
vision) actually significantly improves the
networks ability to categorize the images
presented to it.
88Reproducing previous results (Cats are the more
variable category)
Results with explicit feature values (French et
al., 2001)
Results for 3-4 month old infants
Very little jump in error
Large jump in error
Network generalization errors with Gabor-filtered
spatial-frequency information
89Conclusion about the use of Gabor-filtered input
instead of explicit feature measurements
- Spatial frequency data in the model produces a
reasonable fit to empirical data. - We avoid the thorny issue of using a particular
set of high-level feature measurements (ear
length, eye separation, etc.) to characterize the
images used in the simulations.
90Reduced visual acuity
Reduced perceptual acuity in 3-4 month old
infants produces an advantage for differentiating
perceptually distant basic-level categories and
super-ordinate categories.
91Simulation 2 The advantage in 3-4 month old
infants of reduced visual acuity
The frequencies removed or reduced were
- Above 3-4 cycles/degree very little contribution
- Above 7.1 cycles/degree no contribution
Network used 26-16-26 feedforward BP
autoencoder network (learning rate 0.1,
momentum 0.9)
92Error reduction ( improved generalization, even
overgeneralization) with reduced visual acuity
Network error
93Close categories vs. Very dissimilar categories
When a network is familiarized on one category
(say, Cat), reduced visual acuity decreases
errors (i.e., improves generalization) for novel
exemplars in the same category or very similar
categories (like Dog). But it should help in
discriminating dissimilar categories. So, for
example, reduced visual acuity should produce a
greater jump in error for network (or increased
attention for an infant) familiarized on Cats
when exposed to Cars.
94When trained on one category (Cats), errors on
dissimilar categories (Cars) are increased by
reduced visual acuity (i.e., better category
discrimination). Larger the error better
discrimination.
95When trained on one category (Cats), errors on
dissimilar categories (Cars) are increased by
reduced visual acuity (i.e., better category
discrimination)
96Jump in network error when trained on Cats and
tested on Novel Cats vs. Cars.
97A Prediction of the Model Consider Quinn et al.
(1993)
Familiarized on Cats
Familiarized on Dogs
Cat
Dog
Jump in interest
No jump in interest.
But what if we took this test Cat and, by adding
only high spatial-frequency information,
transformed it into this Dog?
98Presumably what the 3-month old infant would see
is this
Familiarized on Cats
Familiarized on Dogs
Cat
Cat
Prediction No jump in interest
No jump in interest.
The asymmetry would disappear, even though adults
would perceive a series of cats followed by a dog
and would expect a jump in infants interest, as
there usually is for a novel dog following
familiarization on cats.
99Modeling Dogs and Cats Conclusions
A simple connectionist auto-encoder does a good
job of reproducing certain surprising infant
categorization data.
This supports a statistical, perceptually based,
on-line categorization mechanism in young infants
This model makes testable predictions
that have subsequently been confirmed in infants.
Gabor-filtered spatial-frequency input is
neurobiologically plausible and produces a good
approximation to infant categorization data.
A counter-intuitive learning advantage for
categorizing distant basic-level categories and
super-ordinate categories arises from reduced
acuity input.
100The case of anomia
One possible answer variability In general,
natural kinds are LESS VARIABLE than artefactual
kinds (e.g., cats less variable than
chairs .
101Natural and artificial kinds Before and after
representational compression
Chair
Chair after compression
Butterfly
Butterfly after compression