Title: Spatial representation in the mind/brain: Do we need a global topographical map? Zenon Pylyshyn Rutgers Center for Cognitive Science and Institute Jean Nicod
1Spatial representation in the mind/brainDo we
need a global topographical map?Zenon Pylyshyn
Rutgers Center for Cognitive Science and
Institute Jean Nicod
- What is special about representation of space in
perception and thought? - Do we need a single global spatial
representation? - Do we need a topographical display in the brain?
Workshop on Frames of Reference Paris, November
17-19, 2005
2Outline of talk
- Representing space in LTM vs in Working Memory
(WM) - Some conditions on representing space in WM
- Why a unitary global spatial display is often
assumed as the form of representation and a few
reasons why thats wrong - An alternative way of satisfying the conditions
on spatial representation The Projection
Hypothesis - Aside on Spatial Index (FINST) Theory
- How the projection hypothesis explains the
spatial properties of certain representations
Examples from the visual modality - How to generalize this story to proprioception
The spatial sense - Where is the global allocentric display we
thought we needed?
3What is special about spatial representation?
- I have suggested (Pylyshyn, 1973) that there is
no reason why a form of representation adequate
for general knowledge (i.e., a Language of
Thought or LOT) cannot also serve for encoding
the content of spatial representations in memory - The difference between representing spatial
relations and representing other contents may lie
in their being different topics requiring a
different conceptual vocabulary, rather than in
their having a different form or medium - This general-LOT format fails to account for
certain phenomena that are observed when vision
and spatial reasoning are actively engaged in
solving problems or in determining actions
i.e., when spatial representations are
functioning in working memory.
4Spatial representation during perception and
reasoning
- I have outlined a number of ways that the
representation of space in WM is different in
form from that of other contents of WM. In this
talk I will focus one of these ways, namely in
the way that they deal with space - Because such representations are not tied to
vision or conscious visual experience, they are
best referred to as spatial representations
rather than mental images - By the end I will conclude that even calling them
spatial representations is somewhat misleading
but that comes later!
5What are some constraints on a theory of spatial
representation?
- I begin by trying to set out some functional
requirements (or boundary conditions) that may
apply to a system for representing space and
spatial relations in working memory in perception
and especially in spatial reasoning - I will later argue that the wrong conclusions
have been drawn from these requirements about the
form of such spatial representations
6Some conditions on a system of codes for
representing spatial relations (1)
- The system must be able to represent magnitudes
- Psychophysical evidence shows that we encode
magnitudes (at least relative magnitudes) and
that these magnitudes (i.e., the semantics of the
codes) have systematic effects in behavior (e.g.,
the phenomena of scalar variance ratio, Fechners
law, the symbolic distance effect, etc). - Thus something about the form of the
representation itself must explain these
systematic magnitude effects (e.g., phenomena
such as those listed above would not arise if the
magnitudes were encoded symbolically as numerals)
7Some conditions on a system of codes for
representing spatial relations (2)
- The system must represent stable spatial
configurations - Spatial configurations involve relations over
multiple objects in that sense they are
holistic and require simultaneous access to
multiple objects (i.e., multiple arguments in
relational predicates must be simultaneously
bound) - What is special about such configurations is that
they may allow some spatial inferences by
pattern lookup without reference to independent
geometrical axioms (such as the axiom of
transitivity) - Example of 3-term series problems and spatial
paralogic
8Some conditions on a system of codes for
representing spatial relations (3)
- The system must somehow capture the continuity
and connectedness of space. This requirement
leaves many unanswered questions - Does continuity entail that empty places are
represented as such? - Does continuity entail that the representational
system itself determines that distances meet
metrical axioms (e.g., the triangle inequality AB
BC AC) or that they are Euclidean? - Does continuity entail that the representation of
movements of objects is constrained so that in
getting from A to B objects must pass through
intermediate locations? - The proposal I will present later gives a partial
answer to these
9Some conditions on a system of codes for
representing spatial relations (4)
- The system must represent spatial properties
across modalities, including proprioceptive and
efferent modalities - Spatial representations must be able to engage
the motor system in a fairly direct manner - One of the characteristics of what we call a
spatial representation is that we can point
to represented things (e.g., in our mental
image). Thats why a proposition such as
LEFT-OF(A,B) seems an inadequate representation
for ltA,Bgt - But note that motor actions towards perceptual
and imagined representations are not identical
because they engage different perceptual-motor
pathways (Goodale et al. 1994)
10Some conditions on a system of codes for
representing spatial relations (5)
- The system must be able to represent spatial
relations in 3D - When relations in the depth are encoded, they
must be in a similar format as the encoding of
relations in the plane since the two have to
operate together (e.g., in determining the
Euclidean distance between points in 3D space) - Experimental evidence from such phenomena as
mental rotation or mental scanning show
identical functions in depth as in the plane
11Summary of constraints to be met A system of
spatial representations must somehow do the
following
- It must represent magnitudes
- It must represent holistic configurations which
enable at least some direct one-step inferences
(by pattern-matching) - It must capture connectedness and continuity
- It must represent spatial relations seamlessly
across modalities and to engage the motor system - It must represent distances in depth as well as
in the plane in a uniform manner (i.e., it must
represent 3D) - I will return to these constraints when I discuss
a different proposal for how we represent space
12An additional major assumption about spatial
representation
- The foregoing list of constraints has
frequently led people to make one additional
assumption about spatial representation that I
will argue is not justified - The single frame of reference assumption is the
assumption that we represent spatial layouts in
perception or in thought in a single global frame
of reference, as opposed to a patchwork of
distinct but coordinated frames - Every theory I know that attempts too explain
mental imagery or cross-modal coordination makes
this assumption, explicitly or implicitly
13Why a single display for vision?
- In vision the global spatial-display theory
explains why our visual experience is panoramic
and stable even though the visual inputs are
highly local, partial and constantly changing - But many studies have shown that there is no such
rich stable panoramic display (e.g., change
blindness, superposition, etc., see ORegan, 1992)
14Why a single display for spatial reasoning?
- The global spatial-display theory also explains
how a mental representation can meet the spatial
conditions listed earlier it does so by
creating a 2D image in a real spatial medium - Such a display was assumed to use the same
global spatial medium that is used in vision.
But both display assumptions have serious
problems.
15The global spatial display assumption
- There are many deep problems with the assumption
that spatial properties are represented in vision
and reasoning by an inner spatial display which
corresponds to our experience of a stable world
(perceived or imagined), many of which I have
discussed in connection with the picture theory
of mental imagery (BBS, 2002) - V1 cant serve as the medium for an image
representation for many reasons given in my BBS
paper and book e.g., not stable, not broad
enough, not 3D, images not presented in the right
form (no Emmerts law, no amodal completion,
image size not in the right form, no image
rotation) - One of the main problems relevant to the present
discussion is the assumption that visual spatial
perception, cross-modal spatial integration,
visuomotor control, and spatial reasoning derive
from a single representation in an allocentric
frame of reference - There are many reasons to doubt that there is a
single global allocentric representation (master
map) for spatial information
16Many reasons to reject the Master Map assumption
- There are many known frames of reference between
perception and motor control, relying on both
external and internal sensors - While gaze-centered coordinates are common in
motor control they are gain-modulated by inputs
from eye, head and body positions as well as by
motor intentions (Anderson Buneo, 2002, Duhamel
et al., 1992) - Visual information is also represented in hand-
and body-centered (also personal peripersonal)
frames of reference (Làdavas, 2002) - Spatial neglect appears in many different frames
of reference - Motor control necessarily involves many different
frames of reference, including proprioceptive,
kinesthetic, joint-angle, and even dynamic frames
of reference based on muscle spindle and joint
tendon receptors - Earlier (downstream) frames of reference are
often not overwritten but may continue to have
observable consequences on errors in
kinesthetically-guided movements (Baud-Bovy
Viviani, 1998), so multiple frames can coexist in
the nervous system
17A different way of approaching the question of
spatial representation
- Because of the many problems with the global
spatial display assumption, I have proposed a
provisional hypothesis that preserves some of the
advantages of the global spatial display, but
assumes that the relevant spatial properties are
in the perceived world and can be accessed if we
have the right access mechanisms for selecting
and indexing objects in the perceived world - For ease of reference lets call this the
Projection Hypothesis because it is somewhat
analogous to projecting the spatial display
onto the real space that we perceive even
though only objects identities (labels) and
locations, and none of their other visual
properties, are projected
18The projection hypothesis
- The projection hypothesis claims that the
perceptual systems rely on the spatial properties
of the concurrently perceived world to meet the 5
conditions outlined earlier. The hypothesis
rests on three theoretical postulates - We have a system of pointers (such as the FINST
mechanism) by which a small number of perceived
objects in the world can be selected and indexed.
FINSTs are reference pointers to these target
objects and remain attached to them despite
changes in their locations - When we perceive a scene that contains indexed
objects, our perceptual system is able to treat
those indexed objects as though they were
assigned unique visual labels. (Thus it can
detect previously-unnoticed patterns among
indexed objects) - Our LTM representation of locations need not meet
the 5 conditions because it is not directly used
in spatial reasoning or motor control
19Visual Index (FINST) Theory
SHORT DETOUR (while gray background)!
- Because FINST Indexes play a central role in this
story I will make a short detour to illustrate
this mechanism and to give some examples of
indexes at work
20Pick out the 3 dots I will cue and keep track of
them
- After you pick out the 3 cued dots, Ill ask you
move your attention from the center one to the
dot below it. Describe the new relation among
the three dots. - In a field of identical elements you can select
several of them and move your attention among
them so long as they are not too close together
(Intriligator Cavanagh, 2001)
21In making relational judgments you must select
and keep track of several objects at once
When we judge that certain objects are collinear,
we must first pick out the relevant objects while
ignoring all their properties except their
location Such picking out and referring are the
basic functions of FINST Indexes
22Several objects must be picked out at once in
making relational judgments
- You must have the ability to pick out
several individual items and keep track of them
since in order to make relational judgments, such
as inside or on-the-same-contour you must pick
out the relevant individual objects first. Are
dots Inside-same-contour? On-same-contour?
23Other experimental demonstrations of FINST indexes
- Recognizing the cardinality of small sets of
things Subitizing vs counting (Trick, 1994) - Searching through subsets selecting items to
search through (Burkell, 1997) - Selecting subsets and maintaining the selection
during a saccade (Currie, 2002) - Multiple Object Tracking (MOT)
24Subset selection for search
Burkell, J., Pylyshyn, Z. W. (1997). Searching
through subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2), 225-258.
25Subset search results
- Only properties of the subset matter
- If the subset is a single-feature search it is
fast and the slope (RT vs number of items) is
shallow - If the subset is a conjunction search set, it
takes longer and is more sensitive to the set
size - The distance between targets does not matter, so
observers dont seem to be scanning the display
looking for the target but can switch their
attention directly to the subset items
26Selective search is also found when a saccade
occurs between the late onset cues and start of
search
Even with a saccade between selection and access,
items can be accessed efficiently
27Demonstrating the function of FINSTs
withMultiple Object Tracking (MOT)
- In a typical MOT experiment, 8 simple identical
objects are presented on a screen and 4 of them
are briefly distinguished in some visual manner
usually by flashing them on and off. - After these 4 targets are briefly identified, all
objects resume their identical appearance and
move randomly. The observers task is to keep
track of the ones that had been designated as
targets at the start - After a period of 5-10 seconds the motion stops
and observers must indicate, using a mouse, which
objects are the targets
28Keep track of the objects that flash
29How do we do it? What properties of individual
objects do we use?
30Keep track of the objects that flash
31Our explanation is that FINST indexes are bound
to targets when they flash and remain bound
during the duration of the trial. At the end of
the trial they allow attention to be moved to
each target to select the targets
32FINST indexes allow selected objects to be
accessed directly and without searching for
specific propertiesIndexes stay bound to
objects as the objects move
33If you were like the cartoon character Plastic
Man and could place your fingers on things in the
world so as to refer to them uniquely, and if you
could then move your gaze or attention to any of
them at will, you would possess fingers of
instantiation
34If you were like the cartoon character Plastic
Man and could place your fingers on things in the
world so as to refer to them uniquely, and if you
could then move your gaze or attention to them,
you would possess FINgers of INSTantiation (or
FINSTs)
35Summary
End of aside on FINSTs!
- The FINST mechanism provides a limited set of
indexical pointers bound to perceived objects - FINSTs can associate perceived objects with
objects of thought - The binding is stable over some period of time
(e.g., a few seconds) and continues despite
motion of the objects or eye movements. - Perception is able to treat the indexed objects
as though they were perceptually marked
36Examples of the projection hypothesis
- To illustrate how the projection hypothesis
works, first consider index-based projection in
the visual modality, where indexes can convert
some apparently mental-space phenomena into
perceived-space phenomena (although I will return
to the non-visual case shortly, the visual case
is more salient and tends to dominate other
modalities) - Examples from some mental imagery experiments
- Mental scanning (Kosslyn, 1973)
- Mental image superposition (Podgorny Shepard,
1978) - Visual-motor adaptation (Finke, 1979)
- S-R compatibility to imagined locations (Tlauka,
1998)
37Studies of mental scanningOften cited to suggest
that representations have metrical properties
38Brain image or index-based projection?
- A way to do this task
- Associate places on the imagined map with places
in the world that you perceive - Move your attention or gaze from one place to
another as they are named
39Using a perceived room to anchor FINSTs tagged
with map labels
40Using vision with selected labeled objects
- If you project the pattern of map places by
picking out objects in the room in front of you
that correspond roughly to these memorized
locations, then you can scan attention from one
such marked object to another. The space here is
real and the equation time distance ? speed is
a physical principle, not tacit knowledge about
the world. - You can also use the tagged objects to infer
configurational properties you may not have
noticed, despite somehow memorizing the location
of all objects - Which 3 or more places on the map are collinear?
- Which place on the map is furthest North, South,
East, West? - Which 3 places form an isosceles triangle?
- Such configurational consequence can be detected
as opposed to logically inferred, so long as they
involve only a few places, because the visual
system can examine a scene with labeled indexed
objects
41Another example of a result attributable to
FINST-based projection Podgorny-Shepard
experiment
Remember the following pattern and imagine it
after it is gone
Are the following dots on or off the imagined
pattern?
42The pattern of reaction times is the same for
perceived shapes as for recalled shapes
- Both when the F display is seen and when the F is
imagined, the time to judge that the dot was on
the F was fastest when the dot was at the vertex
of the F and slower when it was on an arm of the
F (slowest when it was one square away). - Does this show that the F and dots are
superimposed on a display in the brain and
perceived with the visual system? - A more plausible explanation is that the cells
corresponding to rows and columns of the F in the
matrix are indexed and thus made distinct,
allowing vision to be used to judge whether the
dots fall on those rows/columns?
43Perceptual-motor adaptation to imagined hand
position (Finke, 1979)
Skip?
- If you wear prism displacing lenses and
repeatedly reach for objects in front of you for
just a few minutes, you adapt to the erroneous
feedback. When the lenses are removed you
overshoot in the opposite direction. - If, instead of wearing lenses, you move your hand
invisibly while you imagine that your hidden hand
is at the displaced location, you get the same
adaptation phenomena - Does this show that both your imagined hand and
other properties of the scene are displayed
somewhere in your visual system? - All you need are indexes to several objects in
the visual scene, together with a distinct label
for each (e.g., hand, block). This allows
attention or even gaze to move to them. - No visual details (e.g. hand properties) need to
be imagined - Some real visual objects (e.g., texture) needs to
be visible to bind indexes just a blank
background will not work (c.f., Rossetti)
44S-R Compatibility effect with a visual
displayThe Simon effect It is faster to make a
response in the direction of an attended objects
than in another direction
Response for A is faster when YES in on the left
in these displays
45S-R Compatibility effect with a recalled (mental)
display
The same RT pattern occurs for a recalled display
as for a perceived one
RT is faster when the A is recalled (imagined)
as being on the left
46In all these examples you only need to index a
few visual objects located in appropriate places
- In all examples that we have seen, the results
can be explained without appealing to a global
spatial display, by assuming that - Vision can index a few visible objects (including
texture elements on an apparently plain surface)
and - Vision can treat indexed objects as distinct or
visually labeled
47Reminder of the constraints to be met by a system
of spatial representations
- Represent magnitudes
- Represent configurations
- Capture connectedness and continuity
- Represent spatial relations across modalities and
must be able to engage the motor system - Represent 3D distances and relations
- By anchoring mental particulars to a few
perceived objects in a scene, the visual system
is able to exploit the above properties of the
perceived world
48Visual indexes can anchor spatial representations
to a scene containing visual objects But how
does this work without vision (e.g., in the dark)?
- We must rely on our remarkable capacity to orient
to (point to, navigate towards, ) perceived and
recalled objects (including proprioceptive
objects) in space without vision - ? Call this general capacity our location- or
spatial-sense - How can the projection hypothesis account for
this apparently world-centered spatial sense
without assuming a global allocentric frame of
reference? - Answer Just as it does with vision, by binding
represented objects to (non-visually) perceived
objects in the world - Indexing non-visual objects must exploit
auditory and general proprioceptive signals, and
perhaps even preparatory motor programs (Anderson
Bruneo, 2002 Duhamel, Colby Goldberg, 1992)
49The real problem of our sense of space
- In order to solve the problem of how we index
generalized objects in the world using
proprioceptive inputs we need to solve the
problem of how we recognize two such inputs as
corresponding to (reaching) the same object in
space - This is the problem of the computing the
equivalence of movements, or of proprioceptive
inputs, that correspond to reaching the same
object. Solving this problem requires solving
the problem of coordinating signals between
different afferent and efferent frames of
reference - Thats why mechanisms of coordinate
transformation are of central importance they
make it possible to compute the relevant
equivalence classes - Such mechanisms are ubiquitous in PPC, SC and
elsewhere
50Proprioception, coordinate transformations and
the allocentric frame of reference
- Coordinate transformations provide the basis for
computing the equivalence classes of
proprioceptive signals S associated with
reaching or sensing individual objects in space
(S S' iff there is an appropriate coordinate
transformation from S to S') - Because of the ability to compute the set S
corresponding reaching/sensing to places in the
world, proprioception is able to provide
allocentric information (c.f., Rossettis point
that we should not equate proprioception with
egocentric and vision with allocentric frames of
reference) - Computing S is the problem that Henri Poincaré
recognized as central to understanding our sense
of space (see Poincarés Why space has three
dimensions in Les Dernier Penseés, 1913).
Without this we could not reach objects in the
dark or from memory!
51Coordinate transformations are the basis for the
illusory global frame of reference
- A coordinate transformation operation takes a
representation of an object relative to one
coordinate system say retinal coordinates and
produces a representation of that object relative
to another frame of reference say relative to
the location of a hand in proprioceptive or
kinematical coordinates. - An important consequence of these mechanisms is
that, as (Colby Goldberg, 1999, p319) put it,
Direct sensory-to-motor coordinate
transformation obviates the need for a single
representation of space in environmental
coordinates
52The spatial sense and FINST Indexes
- Not all points in a representation need to be
converted. As in the visual case, only a few
equivalence classes, corresponding to a few
objects in the world, need to be computed at any
one time. - This idea is closely related to the
conversion-on-demand hypothesis proposed by
Henriques et al. (1998) to explain how open-loop
reaching can be carried out during eye movements
using gaze-centered coordinates - In the Henriques et al proposal visual
information is held in a gaze-centered frame of
reference and objects are converted to motor
coordinates only when needed, but the details are
not essential here
53Generalized FINST Indexes
- According to the projection hypothesis, the
objects that are transformed are ones that have
been selected and assigned a reference index, as
postulated in Perceptual Index Theory (call them
generalized FINSTs). - With these indexes we can anchor a few objects in
perceptual or imagined representations to objects
in real space using propriocentic signals, just
as in the visual examples discussed earlier - This is what we need in order to explain the
spatial character of spatial thoughts and the
stable character of perceived space as argued in
the visual case
54CONCLUSION How many spatial frames of reference
are there?
- There are many coordinated frames of
reference and many topographical spatial layouts
in the brain, but the only frame of reference
that is global and allocentric is the one outside
our head the real space to which we have only
selective indexical access
55PS Must there always be some perceived objects
for there to be a spatial sense?
- A prediction of the projection hypothesis is that
in the absence of any perceived objects there
would be no spatial sense and therefore that none
of the findings demonstrating the spatial
character of representations (e.g., the mental
imagery experiment results) would be observed - I know of no data involving a total lack of
sensory objects, but the following results are
suggestive - In the absence of visual objects, as in the
Ganzefeld (Avant, 1965) orientation and eye
movements become uncoordinated, so one might
reasonably expect poor spatial coordination with
no perceived objects - Auditory localization is better when there is
structured visual input (Warren, 1970) or
auditory landmarks (Dufour Despres, 2002),
suggesting that concurrent perception of things
in space is necessary for orientation - Sensory deprivation (while extreme) also leads to
disorientation
56The End
- and an appeal for help
- Does anyone know of evidence relevant to the
question whether typical spatial sense skills are
manifested in the absence of structured
perceptual input of any kind? - Typical spatial skills might include being able
to solve geometry problems by constructing
figures in your head - A more direct test might be to see if
deafferented patients tested in the dark have
impaired spatial skills, but I have seen no data
on this
57The End
58(No Transcript)