Title: Spatial representation in the mind/brain: Do we need a global topographical map? Zenon Pylyshyn Rutgers Center for Cognitive Science and Institute Jean Nicod
1Spatial representation in the mind/brainDo we
need a global topographical map?Zenon Pylyshyn
Rutgers Center for Cognitive Science and
Institute Jean Nicod
- What is special about representation of space in
perception and thought? - Do we need a single global spatial
representation? - Do we need a topographical display in the brain?
Workshop on Frames of Reference Paris, November
17-19, 2005
2What is special about spatial representation?
- I have suggested (Pylyshyn, 1973) that no
convincing reason has been given why a form of
representation adequate for general knowledge
(i.e., a Language of Thought) cannot also serve
for encoding the content of spatial
representations - The difference between representing spatial
relations and representing other contents may lie
in their being different topics requiring a
different conceptual vocabulary, but they may not
require a different format or medium of
representation. Why cant spatial content be
encoded in a first-order calculus with using
Cartesian coordinates? - Is it just that it conflicts with our conscious
experience? - The problem with the general-LOT proposal is that
it fails to account for certain psychophysical
phenomena that are observed when vision and
spatial reasoning are actively engaged in solving
problems or in planning actions i.e., when
spatial representations are constructed in
working memory.
3Spatial representation during perception and
reasoning
- The impression that spatial representations are
different from other kinds of representations is
usually associated with examples from perception
and spatial reasoning. In these contexts, as
opposed to long-term-memory storage, there is
reason to think that such representations are
different in several ways - I have suggested several such differences
(Pylyshyn, 1978) e.g., - Working memory contents typically involve
relationships among tokens and contains no
quantifiers or negation, e.g., ?(x)F(x) is
represented by a finite set of xs, each of which
has property F(x)(i.e., all circles are red is
represented by a set of circles each of which is
red) - In the present talk I will focus on another way
that such representations are special in the
way they encode space. Because these
representations are not tied to vision, and do
not even require a visual cortex or be
accompanied by conscious experience, they are
best referred to as spatial representations
rather than mental images
4What are some constraints on a theory of spatial
representation?
- First I will attempt to tease out some functional
requirements that may apply to a system for
representing space and spatial relations in
perception and especially in spatial reasoning - These requirements may explain why people often
assume that there is a unified global frame of
reference for vision and spatial reasoning that
is implemented as a spatial display in the brain.
- These requirements also serve to introduce an
alternative proposal that meets the conditions
without assuming a global spatial display
5Some conditions on a system of codes for
representing spatial relations (1)
- The system must be able to represent magnitudes
- Psychophysical evidence shows that we have
encodings of magnitudes (at least relative
magnitudes) and that the magnitudes that are
encoded (i.e., the semantics of the codes) have a
particular systematic effect in reasoning (e.g.,
scalar variance, Fechners law, symbolic distance
effect, etc). - This suggests that the codes themselves must have
properties that explain these systematic
magnitude effects (which would not be the case if
the magnitudes were encoded as numerals)
6Some conditions on a system of codes for
representing spatial relations (2)
- The system must represent stable spatial
configurations - Spatial configurations involve relations over
multiple objects in that sense they are
holistic and require simultaneous access to
multiple objects (multiple arguments in
relational predicates must be simultaneously
bound) - What is special about such configurations is that
they may allow some spatial inferences by
pattern lookup without reference to independent
geometrical axioms (see Using space to represent
spatial properties later).
7Some conditions on a system of codes for
representing spatial relations (3)
- The system must somehow capture the continuity
and connectedness of space. This leaves many
unanswered questions - Does continuity entail that empty places are
represented as such? - Does continuity entail that the representational
system itself determines that distances meet
metrical axioms (e.g., the triangle inequality AB
BC AC) or that they are Euclidean? - Does continuity entail that the representation of
movements of objects is constrained so that in
getting from A to B objects must pass through
intermediate locations? - The proposal I will present later gives a partial
answer to these
8Some conditions on a system of codes for
representing spatial relations (4)
- The system must represent spatial properties
across modalities, including proprioception and
the motor system - It must be possible for a pattern such as
SQUARE(w,x,y,z) to involve objects in different
modalities - Spatial representations must be able to engage
the motor system in a fairly direct manner - One of the characteristics of what we call a
spatial representation is that we can point
to represented things (e.g., in our mental
image). - But note that motor actions towards perceptual
and imagined representations are not identical
because they engage different perceptual-motor
systems (Goodale et al. 1994)
9Some conditions on a system of codes for
representing spatial relations (5)
- The system must be able to represent spatial
relations in 3D - When relations in the depth are encoded, they
must be in a similar format to the encoding of
relations in the plane since the two have to
operate together - Experimental evidence from such mental imagery
phenomena as mental rotation or mental
scanning show identical functions in depth as in
the plane
10Summary of constraints to be met A system of
spatial representations must somehow do the
following
- It must represent magnitudes
- It must represent holistic configurations which
enable at least some direct one-step inferences
(by pattern-matching) - It must capture connectedness and continuity
- It must represent spatial relations seamlessly
across modalities and to engage the motor system - It must represent distances in depth as well as
in the plane in a uniform manner (i.e., it must
represent 3D) - I will return to these constraints when I discuss
a different proposal for how we represent space
11Two additional common assumptions about spatial
representation
- The foregoing list of constraints has
frequently led people to make two assumptions
about spatial representation that I will argue
are not justified - The single frame of reference assumption is the
assumption that when we represent spatial layouts
in perception or in thought we do so in a single
global frame of reference, as opposed to a
patchwork of distinct but coordinated frames - Our conscious awareness of spatial layouts
suggests a single frame of reference, but like a
lot of properties of conscious awareness this may
be illusory - The holism/stability assumption is the assumption
that when we represent spatial layouts in
perception or thought the representation
simultaneously contains a large number of objects
and properties in a stable spatial configuration
12Why an inner display for vision?
- In vision the spatial-display theory was meant to
explain why our visual experience is panoramic
and stable even though the visual inputs are
highly local, partial and constantly changing - But many studies have shown that there is no such
rich stable panoramic display (e.g., change
blindness, superposition, etc., see ORegan, 1992)
13Why an inner display for spatial reasoning?
- The spatial-display theory was also meant to
explain how a mental representation can meet the
spatial conditions listed earlier by creating a
2D image in a real spatial medium - Such a display was assumed to use the same global
2-D spatial medium that is used in vision. But
both display assumptions have serious problems.
14The global spatial display assumption
- There are many deep problems with the assumption
that spatial properties are represented in vision
and reasoning by an inner spatial display which
corresponds to our experience of a stable world
(perceived or imagined), many of which I have
discussed in connection with the picture theory
of mental imagery (Behavioral and Brain Sciences,
2002) - One of the main problems relevant to the present
discussion is the assumption that visual
perceptual, cross-modal spatial integration,
visuomotor control, and spatial reasoning derive
from a single representation in an allocentric
reference frame - There are many reasons to doubt that there is a
unified global frame of reference for
representing spatial information
15Reasons to reject the Master Map assumption
- There are many known frames of reference between
perception and motor control, relying on both
external and internal sensors - While gaze-centered coordinates are common in
motor control they are gain-modulated by inputs
from eye, head and body positions as well as by
motor intentions (Anderson Buneo, 2002,
Duhamel, 92) - Visual information is also represented in hand-
and body-centered frames of reference (Làdavas,
2002) - The neglect syndrome appears in many different
frames of reference - Motor control necessarily involves many different
frames of reference, including joint-angle,
proprioceptive, kinesthetic, and even frames that
depend on groups of spindle bundles - Earlier (downstream) frames of reference are
often not overwritten but may continue to have
observable consequences in perceptual-motor
coordination and in errors in kinesthetically-guid
ed motion (Baud-Bovy Viviani, 1998) so multiple
frames continue to exist in the nervous system
16A different way of approaching the question of
spatial representation
- Based on such problems with the global spatial
display assumption, I have proposed a provisional
hypothesis that preserves some of the advantages
of the global spatial display, but assumes that
the relevant spatial properties are in the
perceived world and can be accessed if we have
the right access mechanisms for selecting and
indexing objects in the perceived world - For ease of reference let us call this the
Projection Hypothesis because it is as though the
spatial display were projected onto the real
space we perceive (though with only objects
identities and locations, and none of their other
visual properties)
17The projection hypothesis
- The projection hypothesis relies on the spatial
properties of the concurrently perceived world to
meet the 5 conditions outlined earlier. It rests
on two theoretical postulates - We have a system of pointers (such as the FINST
perceptual index mechanism to be described later)
by which a small number of perceived objects in
the world can be selected and indexed. Indexes
provide a fixed reference to their targets
despite changes in targets locations - When we perceive a scene that contains indexed
objects, our perceptual system is able to treat
those selected objects as though they were
assigned unique labels. Thus our perceptual
system is able to detect novel configurational
properties among these indexed objects.
18Aside on FINSTs indexes
- Because FINST Indexes play a central role in this
story I will make a short detour to illustrate
this mechanism and to give some examples of
indexes at work
19Pick out 3 dots I will cue and keep track of them
- After you pick out the 3 cued dots, Ill ask you
move your attention from the center one.
Describe the new relation among the three dots. - In a field of identical elements you can select
several of them and move your attention among
them (e.g., move one up or Move 2 right etc)
so long as at no time do you have to hold on to
more than 4 dots (Intriligator Cavanagh, 2001)
20In making relational judgments you must select
and keep track of several objects at once
When we judge that certain objects are collinear,
we must first pick out the relevant objects while
ignoring all their properties except their
location Such picking out and referring are the
basic functions of FINST Indexes
21Several objects must be picked out at once in
making relational judgments
- In making relational judgments such as inside or
on-the-same-contour you must pick out the
relevant individual objects first. Are dots
Inside-same-contour? On-same-contour?
22Other experimental demonstrations of FINST indexes
- Recognizing the cardinality of small sets of
things Subitizing vs counting (Trick, 1994) - Searching through subsets selecting items to
search through (Burkell, 1997) - Selecting subsets and maintaining the selection
during a saccade (Currie, 2002) - Multiple Object Tracking (MOT)
23Subset selection for search
Burkell, J., Pylyshyn, Z. W. (1997). Searching
through subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2), 225-258.
24Subset search results
- Only properties of the subset matter
- If the subset is a single-feature search it is
fast and the slope (RT vs number of items) is
shallow - If the subset is a conjunction search set, it
takes longer and is more sensitive to the set
size - The distance between targets does not matter, so
observers dont seem to be scanning the display
looking for the target but can switch their
attention directly to the subset items
25Selective search is also found when a saccade
occurs between the late onset cues and start of
search
Even with a saccade between selection and access,
items can be accessed efficiently
26Demonstrating the function of FINSTs
withMultiple Object Tracking (MOT)
- In a typical MOT experiment, 8 simple identical
objects are presented on a screen and 4 of them
are briefly distinguished in some visual manner
usually by flashing them on and off. - After these 4 targets are briefly identified, all
objects resume their identical appearance and
move randomly. The observers task is to keep
track of the ones that had been designated as
targets at the start - After a period of 5-10 seconds the motion stops
and observers must indicate, using a mouse, which
objects are the targets
27Keep track of the objects that flash
28How do we do it? What properties of individual
objects do we use?
29Keep track of the objects that flash
30Our explanation is that FINST indexes are bound
to targets when they flash and remain bound
during the duration of the trial. At the end of
the trial they allow attention to be moved to
each target to select the targets
31FINST indexes allow selected objects to be
accessed directly and without searching for
specific propertiesIndexes stay bound to
objects as the objects move
32If you were like the cartoon character Plastic
Man and could place your fingers on things in the
world so as to refer to them uniquely, and if you
could then move your gaze or attention to them,
you would possess FINgers of INSTantiation
(FINSTs)!
33Summary
End of aside on FINSTs!
- The FINST mechanism provides a limited set of
indexical pointers bound to perceived objects - FINSTs can associate perceived objects with
objects of thought - The binding is stable over some period of time
(e.g., a few seconds) and continues despite
motion of the objects or eye movements. - Perception is able to treat the indexed objects
as though they were perceptually marked
34Examples of the projection hypothesis
- To illustrate how the projection hypothesis
works, first consider index-based projection in
the visual modality, where indexes can convert
some apparently mental-space phenomena into
perceived-space phenomena (although I will return
to the non-visual case shortly, the visual case
is more salient and tends to dominate other
modalities) - Examples from some mental imagery experiments
- Mental scanning (Kosslyn, 1973)
- Mental image superposition (Podgorny Shepard,
1978) - Visual-motor adaptation (Finke, 1979)
- S-R compatibility to imagined locations (Tlauka,
1998)
35Studies of mental scanningOften cited to suggest
that representations have metrical properties
36Brain image or index-based projection?
- A way to do this task
- Associate places on the imagined map with places
in the world that you perceive - Move your attention or gaze from one place to
another as they are named
37Using a perceived room to anchor FINSTs tagged
with map labels
38Using vision with selected labeled objects
- If you project the pattern of map places by
picking out objects in the room in front of you
that correspond roughly to these memorized
locations, then you can scan attention from one
such marked object to another. The space here is
real and the equation time distance ? speed is
a physical principle, not tacit knowledge about
the world. - You can also use the tagged objects to infer
configurational properties you may not have
noticed, despite somehow memorizing the location
of all objects - Which 3 or more places on the map are collinear?
- Which place on the map is furthest North, South,
East, West? - Which 3 places form an isosceles triangle?
- Such configurational consequence can be detected
as opposed to logically inferred, so long as they
involve only a few places, because the visual
system can examine a scene with labeled indexed
objects
39Another example of a result attributable to
FINST-based projection Podgorny-Shepard
experiment
Remember the following pattern and imagine it
after it is gone
Are the following dots on or off the imagined
pattern?
40The pattern of reaction times is the same for
perceived shapes as for recalled shapes
- Both when the F display is seen and when the F is
imagined, the time to judge that the dot was on
the F was fastest when the dot was at the vertex
of the F and slower when it was on an arm of the
F (slowest when it was one square away). - Does this show that the F and dots are
superimposed on a display in the brain and
perceived with the visual system? - A more plausible explanation is that the cells
corresponding to rows and columns of the F in the
matrix are indexed and thus made distinct,
allowing vision to be used to judge whether the
dots fall on those rows/columns?
41Perceptual-motor adaptation to imagined hand
position (Finke, 1979)
- If you wear prism displacing lenses and
repeatedly reach for objects in front of you for
just a few minutes, you adapt to the erroneous
feedback. When the lenses are removed you
overshoot in the opposite direction. - If, instead of wearing lenses, you move your hand
invisibly while you imagine that your hidden hand
is at the displaced location, you get the same
adaptation phenomena - Does this show that both your imagined hand and
other properties of the scene are displayed
somewhere in your visual system? - All you need are indexes to several objects in
the visual scene, together with a distinct label
for each (e.g., hand, block). This allows
attention or even gaze to move to them. - No other visual properties need to be represented
in order to create the discrepancy between felt
and seen (i.e. indexed) position that is
required for adaptation to occur
42S-R Compatibility effect with a visual
displayThe Simon effect It is faster to make a
response in the direction of an attended objects
than in another direction
Response for A is faster when YES in on the left
in these displays
43S-R Compatibility effect with a recalled (mental)
display
The same RT pattern occurs for a recalled display
as for a perceived one
RT is faster when the A is recalled (imagined)
as being on the left
44In all these cases you only need indexes to a few
visual objects located in appropriate places
- In all examples we have seen, the results can be
predicted without appealing to a mental display,
if you assume that - You can index a few visible objects (including
texture elements on an apparently plain surface)
and - The visual system can treat indexed objects as
distinct or visually labeled
45Visual indexes can anchor spatial representations
to a scene containing visual objects But how
does this work without vision (e.g., in the dark)?
- We must rely on our remarkable capacity to orient
to (point to, navigate towards, ) perceived and
recalled objects (including proprioceptive
objects) in space without vision - ? Call this general capacity our spatial sense
- How can the projection hypothesis account for
this apparently world-centered spatial sense
without assuming a global allocentric frame of
reference? - Answer Just as it does with vision, by anchoring
represented objects to (non-visually) perceived
objects in the world
46The spatial sense and the projection hypothesis
- Indexing non-visual objects must exploit
auditory and somatosensory signals, and perhaps
even preparatory motor programs (the
intentional frame of reference proposed by
Anderson Bruneo, 2002 Duhamel, Colby
Goldberg, 1992) - Is there some special problem about somatosensory
inputs that makes them different from visual
inputs?
47Is there a problem about somatosensory inputs
providing objects for anchoring the spatial
sense?
- Unlike visual objects, the objects in the
somatosensory modalities are not fixed in an
allocentric frame of reference - Notice that even in vision and audition, objects
are always moving relative to sensors, so
representations must be updated to take account
of movements (Andersen, 1999 Stricanne, Anderson
Mazzoni, 1996) - Does the spatial sense entail a representation in
a global allocentric frame of reference? - Does coordinating between somatosensory and
visual inputs require a single global
representational frame of reference?
48Some concrete examples of spatial skills that
suggest a global frame of reference
- The assumption of a global spatial representation
underlying sense of space is suggested by such
observations as your ability (not always very
accurate) to do the following in the dark - Point to (or touch) a finger of your other hand
- Move your eye towards or reach towards a source
of sound - Reach towards where your hand was a second or so
earlier - Imagine a rectangle and point to where its
vertices are in space - Pick a random point on each side of the imagined
rectangle and join pairs of points on opposite
sides of the rectangle. Describe and point to
where the newly drawn lines intersect. - Look at the things in front of you and then turn
around and point to the location of one of the
object you saw that is now behind you (as in the
experiments by Attneave Ferrar, 1977)
49How are indexes going to help with such examples?
- In order for the somatosensory case to work the
way the purely visual case worked - We need to specify how it is possible to index an
object in space using somatosensory signals, and - We need to show that a limited number of selected
(indexed) individuals are involved, as in the
nonvisual case.
50What is the real problem of our sense of space?
- In order to solve the problem of how we index
objects in the world using somatosensory inputs
we need to solve the problem of how we recognize
two such inputs as corresponding to (reaching)
the same thing in the world - This is the problem of the equivalence of
movements, or of proprioceptive inputs,
corresponding to reaching the same object its
the problem that Henri Poincaré recognized as the
central problem of understanding our sense of
space (Poincarés Why space has three
dimensions in Les Dernier Penseés, 1913) - Solving this problem requires solving the problem
of coordinating signals across frames of
reference - Thats why mechanisms of coordinate
transformation are of central importance they
generate the relevant equivalences!
51Coordinate transformations are the basis for the
illusory global frame of reference
- A coordinate transformation operation takes a
representation of an object relative to one
coordinate system say retinal coordinates and
produces a representation of that object relative
to another frame of reference say relative to
the location of a hand in proprioceptive or
kinematical coordinates - Coordinate transformations can thus define
equivalence classes of gestures and somatosensory
inputs that correspond to reaching the same
object in space - They are also ubiquitous in the brain (especially
in posterior parietal cortex and superior
colliculus) - Another important consequence of these mechanisms
is that, as (Colby Goldberg, 1999) put it,
Direct sensory-to-motor coordinate
transformation obviates the need for a single
representation of space in environmental
coordinates (p319)
52Coordinate transformations need not transform all
points in a given frame of reference
- Coordinate transformations need not transform all
points in a frame of reference or even all
sensory objects Only a few selected objects need
to be transformed at any one time - The computational complexity of coordinate
transformations can be greatly reduced by only
transforming selected objects - This idea is closely related to the
conversion-on-demand hypothesis proposed by
Henriques et al. (1998) to explain how open-loop
reaching can be carried out during eye movements
using gaze-centered coordinates - In the Henriques et al proposal visual
information is held in a gaze-centered frame of
reference and objects are converted to motor
coordinates only when needed, but the details are
not essential here
53This completes the parallel with the visual case
- Coordinate transformations provide the basis for
computing equivalence classes of somatosensory
signals S to things in real space (S S' iff
there is a coordinate transformation from S to
S') - As in the visual case, the evidence suggests that
only a few such equivalence classes are computed,
corresponding to a few distal objects in the
world - These objects are ones that have been selected
and assigned a reference index, as postulated in
Perceptual Index Theory (call them generalized
FINSTs). - With these few indexes we can anchor a few
objects in perceptual representations or imagined
representations to objects (filled places) in
real space, which is what we require in order to
explain the spatial character of spatial thoughts
and the stable character of perceived space (as
in the visual examples discussed earlier)
54Summary One or many spatial frames of reference?
- There are many coordinated frames of
reference and many topographical spatial
layouts in the brain, but the only frame of
reference that is global and allocentric is the
one outside our head the real space to which we
have only limited indexical access
55Finally Must there always be perceived objects
for there to be a spatial sense?
- A prediction of the projection hypothesis is that
in the absence of any perceived objects there
would be no spatial sense and therefore that none
of the findings demonstrating the spatial
character of representations (e.g., the mental
imagery experiments quoted earlier) would be
observed - I know of no data involving a total lack of
sensory objects, but the following results are
suggestive - In the absence of visual objects, as in the
Ganzefeld (Avant, 1965) orientation and eye
movements become uncoordinated, so it is
reasonable to expect poor spatial coordination
with no perceived objects in any modality - Auditory localization is better when there is
structured visual input (Warren, 1970) or
auditory landmarks (Dufour Despres, 2002)
suggesting that concurrent perception of things
in space is necessary for orientation - Sensory deprivation (while an extreme case) also
leads to disorientation
56The End
- and an appeal for help
- Does anyone know of evidence relevant to the
question whether typical spatial sense skills are
manifested in the absence of structured
perceptual input of any kind? - Typical spatial skills might include being able
to solve geometry problems by constructing
figures in your head - A more direct test might be to see if
deafferented patients tested in the dark have
impaired spatial skills, but I have seen no data
on this
57The End
58(No Transcript)