How does the mind connect with the world and how does perception pick out unique individual things (tokens) - PowerPoint PPT Presentation


PPT – How does the mind connect with the world and how does perception pick out unique individual things (tokens) PowerPoint presentation | free to download - id: 703529-M2NkY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

How does the mind connect with the world and how does perception pick out unique individual things (tokens)


How does the mind connect with the world and how does perception pick out unique individual things (tokens) – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 62
Provided by: Zeno58
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: How does the mind connect with the world and how does perception pick out unique individual things (tokens)

How does the mind connect with the world and how
does perception pick out unique individual things
How perception connects with the world
Some topics we will cover and some terminology
  • Coordinating noticings over time one form of
    the Correspondence Problem (arises because
    perceptual representations are built
    incrementally over time)
  • Coordinating across modalities (esp. vision and
  • Coordinating conjunctions of properties the
    Binding Problem (also the many properties problem
    or the qualitative bundling problem)
  • All these are instances of a very general
    problem The inadequacy of satisfaction as the
    sole relation between representations and what
    they represent in John Perrys terms, there is
    an ineliminable need for a special sort of
    picking out or demonstrative reference

Some background .
Setting out the problem
  • The basic assumption of cognitive science is that
    in order to explain/predict peoples behavior we
    need to appeal to what people believe and desire
    and to how they perceive the world around them
    to the content of their mental representations,
    as well as to how they draw inferences from these
  • While these sorts of contents are necessary, they
    are not sufficient. We also need to appeal to a
    special sort of nonconceptual content that is
    related to the world not by the semantic relation
    of satisfaction, such as holds between a
    description and what it describes, but by a
    nonconceptual relation, such as holds between a
    demonstrative like this or that and its referent
    or between a name and its referent. Such a
    relation simply picks out the referent, but does
    not describe it nor refer to it under some
    conceptual category.

Setting out the problem
  • The mind-world relation I will be discussing
    involves picking out individuals without using an
    encoding of any of their properties and without
    representing the individuals as falling under
    some conceptual category it is therefore a
    nonconceptual relation. ltAre these individuals
    what have been referred to as Objects?gt
  • I will be describing empirical evidence for the
    existence of a mechanism, called a Visual Index
    or FINST, that instantiates this relation.
  • But first Why do we need such a relation and
    why do we need nonconceptual contents?

An example from personal experience
  • Back in the 1970s a computer science colleague
    and I set ourselves the overly-ambitious goal of
    developing a computer system that would reason
    about geometry by actually drawing a diagram and
    noticing adventitious properties of the diagram
    from which it would conjecture lemmas to prove
  • We wanted the system to be as psychologically
    realistic as possible so we assumed that it had a
    narrow field of view and noticed only limited,
    spatially-restricted information as it examined
    the drawing
  • This immediately raised the problem of
    coordinating noticings and led us to the idea of
    visual indexes to keep track of previously
    encoded parts of the diagram.

Begin by drawing a line.
Now draw a second line.
And draw a third line.
Notice what you have so far.(noticings are local
you encode what you attend to)
There is an intersection of two lines But which
of the two lines you drew are they? There is no
way to indicate which individual things are seen
again without a way to refer to individual
(token) things
Look around some more to see what is there .
Here is another intersection of two lines Is it
the same intersection as the one seen
earlier? Without a special way to keep track of
individuals the only way to tell would be to
encode unique properties of each of the lines.
Which properties should you encode?
Can we keep track of previous noticings by
encoding unique properties of individual items?
  • No description can pick out a unique individual
    when things in a scene are changing or when the
    representation itself is changing for any reason
    (how about rapid updating?)
  • But a visual representation is always changing
    since it is always built up over time as
    properties are noticed
  • Whether or not anything is changing, we need a
    way to refer to an individual qua individual (as
    in its a bird, its a plane, no its
  • One common way of doing this is by using
    direction of gaze (equivalent to the deictic
    reference what I am looking at now), but we can
    also pick out individuals independent of where we
    are looking, by using focal attention.
  • An observer can also pick out several individual
    tokens even if they are in a field of identical
    tokens e.g., pick out a dot in a uniform field
    of identical dots.

A technical way to say why it would not be
reasonable to keep updating a description as the
representation changed
A pure description can be viewed as a statement
in predicate logic in which all variables are
bound by quantifiers. Thus an assertion that
says an object has property P can be represented
as the first-order logic expression ?!?xP(x).
(The latter is short for ?xP(x)??(y)(P(y) ?
xy.) If we augmented our visual representation
by adding clauses to a pure description, for
example when we notice that some additional
property Q holds of a certain object, we would
have to do something like the following. First
we would have to find a unique description by
which the object in question might have been
encoded earlier. Then, on the assumption that the
earlier description applies to the object that
was now noticed to have Q, we would conjoin the
stored description with the new property Q,
resulting in the augmented description
?!?xP(x)?Q(x). If a further property, R, of
the same object is detected at some later time,
this augmented descriptor would have to be
retrieved and checked against the current
properties of the object newly-noticed to have R.
If it matches, then the description would be
updated again, resulting in the new description
?!xP(x)?Q(y)?R(x). Clearly this is a grotesque
way to incrementally construct visual
Keeping track by encoding unique properties of
individual items will not work in general
  • We need a mind-to-world connection that is more
    like that provided by a demonstrative or proper
    name than like that provided by a (conceptual)
  • But unlike proper names, this mechanism is only
    available while the referent is in view and,
  • Unlike demonstratives in language, the mechanism
    is part of the wired-in architecture and does not
    depend on the intentions of a user. (It is
    primarily data-driven)
  • This function is very like that of a pointer or
    local variable in a computer program it allows
    access without explicitly encoding any of the
    referents properties and may only be available
    inside the scope of an active function (at run
    time). (But this variable-binding is
    interrupt-driven as in production systems)

Descriptions and Visual Demonstratives bear a
very different relation to their referents
  • The sort of relation that a demonstrative bears
    to its referent is indispensable if thoughts are
    to connect with actions
  • John Perry has written about the indispensable
    nature of all indexicals, but the case of what I
    have been calling visual demonstratives is even
    more compelling.

Perry, J. (1979). The problem of the essential
indexical. Noûs, 13, 3-21.
The difference between a description and a
demonstrative (or direct) reference, and the
indispensability of the latter, is illustrated by
this example from John Perrys Essential
Indexical. The author of the book Hikers Guide
to the Desolation Wilderness stands in the
wilderness beside Gilmore Lake, looking at the
Mt. Tallac trail as it leaves the lake and climbs
the mountain. He desires to leave the
wilderness. He believes that the best way out
from Gilmore Lake is to follow the Mt. Tallac
trail up the mountain But he doesnt move. He
is lost. He is not sure whether he is standing
beside Gilmore Lake, looking at Mt. Tallac, or
beside Clyde Lake, looking at the Maggie peaks.
Then he begins to move along the Mt. Tallac
trail. If asked, he would have to explain the
crucial change in his beliefs in this way I
came to believe that this is the Mt. Tallac trail
and that is Gilmore Lake. (Perry, 1979, p
4) The person in this story recognized the
identity of something that was being referred to
in two different ways by a description and by
direct selection, expressed by the demonstrative
this. These are two very different ways of
picking something out.
Another example of why descriptions will not work
in general and why you need demonstrative
Footnote about the geometry example
  • Notice that in our geometry example, it would not
    eliminate the need for a nonconceptual index if
    you labeled parts of the diagram as you drew
    them. Why not?
  • Because to refer to the line with label L1 you
    would have to be able to think This is line L1
    and you could not think that unless you had a
    mechanism for picking out this.
  • Being able to think this is another way to view
    the very problem for which indexes are
    postulated. You still need a mechanism for
    picking out and of referring to an individual
    element qua individual, even if it is labeled!
  • That is the point of John Perrys claim about the
    essential indexical In order to act on what
    you see, you need to bridge the gap from a
    reference (description or name) to an individual
    token thing, and this bridge is not conceptual.

Different types of mind-world relations
  • Two distinct types of mind-world connections
  • The nonconceptual connection cause (selection)
  • The semantic connection satisfaction (reference)
  • The problem of how we make the transition from
    physical cause to meaning/reference is one of the
    great mysteries of mind (Brentanos Problem).
  • I address a (very small) issue related to that
    problem by suggesting that perception must be
    able to preconceptually pick out individuals
    (i.e., without using concepts) and that the
    mechanism for doing this, the Visual Index or
    FINST provides a first step in the mind-world

Why do we need to be able to pick out individuals
without concepts?
  • We need to make nonconceptual contact with the
    world through perception in order to stop the
    regress of concepts being defined in terms of
    other concepts which are defined in terms of
    still other concepts This is known as the
    Grounding Problem. (For more on this see Fodors
    1998 book Concepts or his paper Revenge of the
  • The question of where to stop has received
    different answers by different philosophical
    schools. But sense data (sensory transduction)
    by itself will not work because most concepts
    cannot be reduced to sense data since they are
    not about how things look.
  • Our candidate is individuals as the forerunner of
    conceptualization and predication and Picking Out
    as the basic operation to bring these individuals
    into contact with cognition.
  • Is individual object?

Picking out in cognitive science
  • The sort of picking out that is relevant has
    been studied in psychology under the heading of
    focal attention
  • An important recent development in the study of
    focal attention is the finding that attention is
    object-based that the first contact between
    world and mind is through selection of visual
    objects rather than places or properties
  • Note Attention also appears to select empty
    places when it moves from one object to another,
    but there is reason to think that this is not the
    norm and is not what happens when attention is
    called up exogenously by a visual event
  • Another important finding is that there appears
    to be a mechanism, which is a precursor of focal
    attention, that can select up to 5 objects at
  • This is the mechanism I call a Visual Index or a
    FINST. I am about to describe where it came
    from, what it does, and some of its
    empirically-based properties.

The requirements for picking out individual
things and keeping track of them reminded me of
an early comic book character called Plastic Man
Imagine being able to place several of your
fingers on things in the world without being able
to detect their properties in this way, but being
able to refer to those things so you could move
your gaze or attention to them. If you could you
would possess FINgers of INSTantiation (FINSTs)!
FINST Theory postulates a limited number of
pointers in early vision that are elicited by
causal events in the visual field and that enable
vision to refer to things without doing so under
concept or a description
Demonstrating FINSTs withMultiple Object
Tracking (MOT)
  • MOT has now been used in dozens of laboratories
    in many countries and in many different variants.
    A great deal is know about the conditions under
    which tracking is possible and many
    counterintuitive findings have been demonstrated,
    many of which raise issues of interest to
    philosophy but most of these have to be left
    for another occasion Time!

Demonstrating the function of FINSTs
withMultiple Object Tracking (MOT)
  • In a typical experiment, 8 simple identical
    objects are presented on a screen and 4 of them
    are briefly distinguished in some visual manner
    usually by flashing them on and off.
  • After these 4 targets are briefly identified, all
    objects resume their identical appearance and
    move randomly. The observers task is to keep
    track of the ones designated as targets.
  • After a period of 5-10 seconds the motion stops
    and subjects must indicate, using a mouse, which
    objects were the targets.
  • People are very good at this task (85-98
    correct). The question is How do they do it?

Keep track of the objects that flash
How do we do it? What properties of individual
objects do we use?
Another example Self occlusion
Self occlusion dues not seriously impair
tracking This has made it easier to design
certain experiments where the trajectory patterns
need to be independent
Going behind occluding surfaces does not disrupt
Not all well-defined features can be
trackedTrack endpoints of these linesEndpoints
move exactly as the squares did!
Analyzing Multiple Object Tracking
  • Basic finding Most people (even many 5 year old
    children) can track at least 4 individual objects
    that have no unique visual properties
  • How is it done?
  • We have shown that it is unlikely that the
    tracking is done by keeping a record of the
    targets locations (the only unique instantaneous
    target property) and updating it while serially
    visiting the objects
  • We proposed that tracking uses the primitive
    mechanism of Visual Indexes or FINSTs

Summarizing FINSTs
  • A FINST is a primitive reference mechanism that
    normally refers to individual visible objects in
    the world. There are a small number (4-5) FINSTs
    available at any one time.
  • Objects are picked out and referred to without
    using any encoding of their properties, including
    their location. ? Picking out objects is prior
    to encoding any properties!
  • Indexing is nonconceptual because it does not
    represent an individual as a member of some
    conceptual category.
  • An important function of FINST indexes is to bind
    arguments of visual predicates to things in the
    world to which they refer. Only predicates with
    bound arguments can be evaluated. Since
    predicates are quintessential concepts, an index
    serves as a bridge from nonconceptual to
    conceptual representations.
  • Similarly they can bind arguments of motor
    commands, including the command to move focal
    attention or gaze to the indexed object e.g.,

FINSTs are a mechanism for picking out individual
distal elements directly, as token sensory
individuals, rather than as bearers of some known
  • Examples where such a mechanism is needed
  • Incremental construction of visual
    representations the correspondence problem over
    time (geometry example)
  • We can pick out several individuals in a field of
    identical elements attentional selection is
    different from discrimination

Being able to pick out individual distal elements
directly is essential for many visual functions
  • Other examples where such a mechanism is needed
  • Encoding relational predicates e.g., Collinear
    (x,y,z,..) Closed (C) Inside (x, C) Above
    (x,y) Square (w,x,y,z), requires simultaneously
    binding the arguments of n-place predicates to n
    elements in the visual scene
  • Evaluating such visual predicates requires
    individuating and referring to the objects over
    which the predicate is evaluated i.e., the
    arguments in the predicate must be bound to
    individual elements in the scene.

Pick out 3 dots and keep track of them
  • In a field of identical elements you can select a
    number of them and move your attention among them
    (e.g., move one up or Move 2 right etc) so
    long as at no time do you have to hold on to more
    than 4 dots

Picking out is different from discriminatingPick
out the third contour from the left
Several objects must be picked out at once in
making relational judgments
When we judge that certain objects are
collinear, we must pick out the relevant
individual objects first
Several objects must be picked out at once in
making relational judgments
  • The same is true for other relational judgments
    like inside or on-the-same-contour etc. We must
    pick out the relevant individual objects first.
    Respond Inside-same contour? On-same contour?

More functions of FINSTsFurther experimental
explorationsusing different paradigms
  • Recognizing the cardinality of small sets of
    things without using sortals Subitizing vs
  • Selecting subsets selecting items to search
  • Selecting subsets and holding on to them during a
  • Application of FINST index theory to infant
    cardinality studies (Leslie, Carey, Spelke, etc)
    and to the acquisition of words/names by
    ostensive definitions. These will not be
    discussed here.

Subitizing vs CountingHow many squares are there?
Subitizing indexed objects is fast, accurate and
(relatively) independent of how many items there
are. But a prerequisite for subitizing is being
able to pick out the relevant individuals. Only
the squares on the right can be subitized because
picking out concentric items requires serial
Concentric squares cannot be subitized because
individuating them requires the serial operation
of curve tracing
Signature subitizing phenomena only appear when
objects are automatically individuated and indexed
Counting slope
subitizing slope
Trick, L. M., Pylyshyn, Z. W. (1994). Why are
small and large numbers enumerated differently? A
limited capacity preattentive stage in vision.
Psychological Review, 101(1), 80-102.
Example of the operation of Visual Indexes
Subset selection for search
Burkell, J., Pylyshyn, Z. W. (1997). Searching
through subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2), 225-258.
Subset search results
  • Only properties of the subset matter but note
    that properties of the entire subset are taken
    into account simultaneously (since that is what
    distinguishes a feature search from a conjunction
  • If the subset is a single-feature search it is
    fast and the slope (RT vs number of items) is
  • If the subset is a conjunction search set, it
    takes longer and is more sensitive to the set
  • The distance among the targets does not matter,
    so observers dont seem to be scanning the
    display looking for the target

The stability of the visual world entails the
capacity to reidentify individuals after a saccade
  • There is no problem about how tactile selection
    can provide a stable world when you move around
    while keeping your fingers on the same objects
    because in that case retaining individual
    identity is automatic
  • But with FINSTs the same can be true of vision
    for a small number of visual objects
  • This is compatible with the fact that it appears
    one retains the relative location of only about 4
    elements during saccadic eye movements (Irwin,
    1996)Irwin, D. E. (1996). Integrating
    information across saccadic eye movements.
    Current Directions in Psychological Science,
    5(3), 94-100.

The selective search experiment with a saccade
induced between the late onset cues and start of
Even with a saccade between selection and access,
items can be accessed efficiently
Must we encode location when we detect the
presence of a property?
  • Many researchers claim that detecting a feature
    entails detecting it as being at some particular
    location. The assumption is that this location
    information is used to detect conjunctions of
    properties (Nissen, 1985). This is implicit in
    Treismans Feature-Integration Theory.
  • Discussions (by psychologists and by
    philosophers) of the question how vision
    primitively selects things in the world typically
    confound individuals and locations
  • Experiments mostly use static items which
    confounds location and individuality. When
    moving items are used (as in MOT) the
    individual-object option usually wins over the
    location option i.e., we detect a property as
    belonging to an object rather than as being at a
    particular location. We have also demonstrated
    this using generalized objects that move through
    a property space without changing location.

The view that we must encode location when we
detect a property is also the standard view in
  • Austen Clark (in A Theory of Sentience),
    following the tradition of Quine and Strawson,
    also assumes that location is primary and that in
    our most primitive nonconceptual sensory contact
    with the world, which he calls the level of
    sentience, the only resources available are
    those of what Strawson called a feature-placing
    language. Our sensory system detects the
    presence of Feature F at location L
  • Clark argues that because we can distinguish
    conjunctions e.g., we can distinguish a red
    square beside a blue circle from a blue square
    beside a red circle then the earliest stages of
    sensation must provide this information in a way
    that does not merge properties and their
    locations, hence feature-at-location.
  • But we can do the same with objects we can
    evaluate and record Pn(Oi) for some sensory
    predicate Pn so long as the variable Oi is bound
    to the object i by an index.

How can we unconfound locations and individuals
in order to study the role of individuals in
property encoding?
  • Study moving objects as in MOT experiments, and
    in a number of other paradigms (e.g., IOR and
    Object File priming studies).
  • Continually updating locations by moving
    attention to each object in MOT is ruled out by
    (Pylyshyn Storm, 1988)
  • Study more general types of objects that
    remain fixed in the same spatial locus but move
    through a space of properties. In that case
    encoding location would not help you to keep
    track of the individuals Blaser, E., Pylyshyn,
    Z. W., Holcombe, A. O. (2000). Tracking an
    object through feature-space. Nature, 408(Nov 9),

Superimposed Gabor patches
Blaser, E., Pylyshyn, Z. W., Holcombe, A. O.
(2000). Tracking an object through feature-space.
Nature, 408(Nov 9), 196-199.
Changing feature dimensions
Surfaces in feature-space
Trajectories pseudo-random and independent
frequent changes in speed and direction
Gabors frequently "pass" each other along a
snapshots taken every 250 msec
  1. People are able to track this fixed-location
    object and
  2. Single-object advantage is obtained

The FINST Object File account of MOT
  • Object Files and object-specific priming
    (Kahneman Treisman, 1992)
  • Objectfile Demo
  • Priming is not increased if objects are
    physically the same Gordon, R. D., Irwin, D.
    E. (2000). The role of physical and conceptual
    properties in preserving object continuity.
    Journal of Experimental Psychology Learning,
    Memory, Cognition, 26(1), 136-50

The broader relevance of MOT and the Theory of
Object Files and FINSTs
  • Why this is of interest to cognitive science?
  • What it means for philosophical issues about
  • Are concepts (and in particular sortals) and
    conditions of identity essential for
    individuating and keeping track of individuals?
  • What this means for understanding the nature of
    sentience and the boundary between sensation and

A way of viewing what goes on in MOT
  • According Kahneman Treismans Object File
    account, the appearance of certain new objects
    causes Object Files to be created for those
    objects. Each object file is attached to its
    respective object by a FINST Index.
  • What makes something the same object over time is
    that it remains connected to the same object-file
    (by the same FINST). Thus, being the same
    individual in this sense does not require
    property encoding or conceptualization.
  • The object file may contain information about the
    object to which it is attached, but keeping track
    of the objects identity does not require the use
    of this information. In the case of MOT the
    evidence suggests that little or nothing is
    stored in the object file nor used in tracking.

What role do object properties play in MOT?
  • Certain properties may have to be present in
    order for an object file to be created, and
    certain properties (very likely different
    properties) may be required in order for the
    index to keep track of the object, but this does
    not entail that such properties need to be
    encoded, stored in the object file, or used in
  • Compare this with Kripkes distinction between
    properties that fix a referent (e.g., of a proper
    name) and what the name refers to. The first
    only plays a role at the names initial baptism.

Why is this relevant to foundational questions in
the philosophy of mind?
  • You cannot pick out and individuate objects
    without concepts (cf. Quine, Strawson, etc), but
  • You cannot pick out and individuate objects with
    ONLY concepts
  • Sooner or later you have to ground the concepts
    in purely causal connections between thoughts and
  • The question of how this is accomplished has been
    a puzzle in psychology and philosophy of mind
    (though most psychologists and AI people tacitly
    assume the sense data view, that concepts ground
    in sensor outputs)
  • The present proposal is that FINSTs provide a
    nonconceptual mechanism for individuating objects
    and for tracking their identity that works most
    of the time in our kind of world

Marrs Natural constraints
  • The idea that perceptual mechanisms are wired up
    (by evolutionary forces) to do a required
    function rapidly and accurately, but only in the
    sort of world in which we happen to live, is an
    old idea that was made popular in the 1970s by
    David Marr under the name natural constraints
  • Marr showed that although the mapping from 3D to
    2D is nonreversible, certain constraints that
    reflect the physical structure of our world can
    be built into the visual system so that the right
    3D shape is recovered almost always veridically
    from 2D information in our kind of world.
  • Similarly, the Visual Index (FINST) hypothesis
    postulates a mechanism that picks out and keeps
    track of physical objects almost always correctly
    in our kind of world.
  • The same may be true for such abstract concepts
    as cause!

Schema for how FINSTs function
(No Transcript)
Still to come .
  • The psychology of selection Focal Attention
  • Why must we select?
  • The binding problem
  • The psychology of spatial representation is
    there a spatial display in the head?
  • What role does the experience of space play in
    our science?
  • Where do the spatial properties of mental
    representations come from?

There is an important side-effect of this view
for understanding how we represent space
  • The existence of amodal indexes explains how
    mental representations of space (e.g.. in spatial
    mental images) can exhibit spatial properties
    without having to postulate an internal space in
    the brain.
  • A quick review

What does the experience of spatial layout tell
is about spatial representation?
  • Does it contain a rich texture of details and
    spatial relations?
  • Does it tell us about the format of our
    representation of space?
  • Does it tell us that we represent a panorama of
    objects fixed in space whose locations obey
    Euclidean axioms?

Failure of the picture-theory in vision
  • The picture theory was meant to explain
    why our experience is panoramic, fine grained and
    stable while the visual inputs are highly local,
    partial and constantly changing viz, we create a
    representation that recontructs and fills in
    these properties
  • But the picture theory of spatial vision has been
    thoroughly discredited There is no rich
    panoramic pictorial display available to
    cognition (e.g., see change blindness,
    superposition studies, )

What is the problem of spatial representation?
  • We are so familiar with space and also with
    certain ways of thinking about space (inherited
    from Euclid and Déscartes) that formulating the
    problem for purposes of understanding the
    psychology of spatial representation is a large
    part of the problem itself
  • We find it natural to think in terms of points
    and lines, but we never actually see Euclidean
    points or lines
  • Our intuitions of the content of spatial
    representation is seriously mistaken. Evidence
    shows that we take in and conceptualize very
    little of the spatial content of a scene

The genesis of our sense of space
  • An important (and essentially modern) way of
    viewing this problem was provided by Henri
    Poincaré in one of his Last Essays Why space as
    three dimensions
  • Poincarés insight The 3D space that we sense is
    intimately tied to our ability to act toward
    things located in it (details are beyond the
    scope of this talk)
  • This includes our multimodal perception of
    objects and of our own bodies
  • It assumes only that we can distinguish between
    sensing locations and sensing other qualities,
    and we can distinguish independent movements of
    objects and movements that we produce

Representations constructed in thought Mental
Images and their connection to the perceived world
  • Intuitions about the form of mental
    representations play their most tempting and
    misleading role in the case of spatial imagery.
    They raise the intentional fallacy to its most
    seductive the fallacy that confuses content with
    form, properties of what is represented with
    properties of the representation
  • I will confine my remarks to the case of the
    spatial properties of images because it is there
    that the FINST theory can, with some elaboration
    to cover modalities other than vision, play a
    decisive role.

A typical argument in support of the picture
theory(1) Activity in visual cortex of primates
is retinotopic (see fig)(2) there is activity in
the visual cortex during mental imagery ? ergo
mental images are laid out spatially in visual
Tootell, R. B., Silverman, M. S., Switkes, E.,
de Valois, R. L. (1982). Deoxyglucose analysis of
retinotopic organization in primate striate
cortex. Science, 218(4575), 902-904.
Some spatial imagery phenomena which suggest that
some images are represented spatially (i.e. by a
representational code or format that is itself
  • Image size effects Unconstrained
    by the
  • Image scanning effects architecture of
  • Functional space mind/brain
  • Combining images and perception
  • Podgorny Shepard study
  • Imagery and visuomotor coordination
  • Adaptation to imagined hand position (Finke,
  • S-R compatibility and the Simon Effect (Tlauka,
  • Visual/imaginal neglect (Bisiach, 1978)

Image size effects
  • Imagine a very small mouse across the room. OR
  • Imagine a large mouse in your hand
  • Indicate when you can see its whiskers
  • Result (2) is faster than (1)!
  • But what if you found that (1) was faster? What
    would you conclude?

The Intentional Fallacy strikes again!
Studies of mental scanningDoes it show that
images have metrical space?
(Pylyshyn Bannon. See Pylyshyn, 1981)
Visual illusions with projected images
Bernbaum Chung. (1981)
Alternative explanations include response bias
and attentional allocation (which may be
responsible for the visual illusion as well).
Shepard Podgorny experiment
Both when the displays are seen and when the F is
imagined, RT to say whether the dot was on the F
was fastest when the dot was at the vertex of the
F, then when on an arm of the F, then when far
away from the F and slowest when one square off
the F.
Perceptual-motor adaptation to imagined hand
  • If you wear prism displacing lenses and reach for
    objects in front of you for just a few minutes,
    you adapt to the erroneous feedback. When the
    lenses are removed you overshoot in the opposite
  • Instead of wearing lenses, if you are told to
    imagine that your unseen hand is at the same
    location that the lens-wearing subjects saw their
    hand, you get the same adaptation and post
    adaptation overshoot phenomena. (Finke, R. A.
    (1979). The Functional Equivalence of Mental
    Images and Errors of Movement. Cognitive
    Psychology, 11, 235-264.

S-R Compatibility effect with a visual display
S-R Compatibility effect with a mental image
Tlauka, M., McKenna, F. P. (1998). Mental
imagery yields stimulus-response compatibility.
Acta Psychologica, 67-79.
But we dont need a spatial display in our head
if we have a way to pick out and keep track of a
few stable perceived objects in real space
  1. All the experiments that are alleged to show the
    existence of a spatial display in visual cortex
    refer to only a small number of imagined
  2. If we can pick out a small number of (occupied)
    locations in real space we can use these to
    allocate attention or to control direction of
    gaze. This by itself can explain what goes on in
    mental scanning the Shepard Podgorny, and
    similar experiments involving superimposing
    images onto vision.
  3. If the relevant imagined objects are bound to
    selected perceived objects in the world, the
    mental objects would inherit persisting spatial
    locations and spatial relations (e.g., they would
    obey the metrical Euclidean axioms!).

We dont need a spatial display in our head...
  • Indexing perceived objects would turn imagery
    experiments into perception experiments so far as
    their spatial properties were concerned, with the
    locations of perceived objects in real space
    governing the imagery phenomena. E.g.
  • Mental scanning is really scanning between real
    indexed places
  • S-R compatibility to images is really a
    visuomotor compatibility to real (indexed)
    objects in visual space.
  • A mental image of your hand can induce visuomotor
    adaptation because only the location of the
    imagined hand, not its appearance, is relevant.
    All you need is a conflict between felt position
    and where an index, taken as your hand, is bound
  • Hemispatial Neglect is a deficit in orienting
    attention to real locations thats why it can
    sometimes be mirrored in imagery i.e., patients
    may fail to orient to the indexed objects that
    were in the neglected hemifield.
  • Finally All imagery effects occur in the blind!

  • I have presented a view of how conceptual
    representations might be grounded in (bound to)
    nonconceptually selected (causally efficacious)
    properties of the world.
  • I propose that we have a limited-capacity
    mechanism that picks out a few sensory
    individuals individuals that in our kind of
    world usually turn out to be real objects
  • While tracking (and re-identifying) individuals
    generally requires an apparatus of individuation
    and of identity, perceptual systems are equipped
    to approximate this function, which is generally
    veridical in our kind of world.
  • Picking out is the function of Visual Indexes or
    FINSTs. It closely related to focal attention,
    but is able to select several (4-5) objects (not
    places). Thus sentience does not rest on a
    feature-placing language, but on object-selection
    since only occupied places have causal powers!

  • This view has consequences for a theory of
    sentience. It contrasts with Austen Clarks, by
    replacing the feature-placing assumption with the
    assumption that we only detect properties of
    indexed objects and not of empty places
  • This view also has consequences for understanding
    how we can represent space nonconceptually.
  • It claims that, notwithstanding our subjective
    experience, our nonconceptual representation of
    space is not fine-grained, densely articulated,
    and inherently Euclidean (even locally). (It
    thus denies Peacockes scenario content)
  • It proposes that we only pick out
    causally-efficacious individuals, which we refer
    to demonstratively without using a conceptual
    representation of any of their properties,
  • It assumes that only properties of indexed
    objects are conceptually encoded (in other words
    properties are always encoded as properties of an
    individual object)

  • Finally it connects with my previous deflationary
    claims about the format of mental images by
    suggesting that our representation of space does
    not have intrinsic spatial properties, but
    inherits them by binding objects of thought to
    indexed objects in the concurrently perceived
    world. It is this perceived world, rather than
    the image, that has the stable spatial properties!

For more on these topics buy this book, published
in Dec 2003 by MIT Press ?
Other relevant references Burkell, J.,
Pylyshyn, Z. W. (1997). Searching through
subsets A test of the visual indexing
hypothesis. Spatial Vision, 11(2),
225-258. Scholl, B.J. and Pylyshyn, Z.W. (1999)
Tracking multiple items through occlusion Clues
to visual objecthood. Cognitive Psychology 38
(2), 259-290 Scholl, B. J., Pylyshyn, Z. W.,
Feldman, J. (2001). What is a visual object
Evidence from target-merging in multiple-object
tracking. Cognition, 80, 159-177. Trick, L. M.,
Pylyshyn, Z. W. (1994). Why are small and large
numbers enumerated differently? A limited
capacity preattentive stage in vision.
Psychological Review, 101(1), 80-102. Pylyshyn,
Z.W. and Storm, R.W. (1988) Tracking multiple
independent targets evidence for a parallel
tracking mechanism. Spatial Vision 3 (3), 1-19
Pylyshyn, Z.W. (2000) Situating vision in the
world. Trends in Cognitive Sciences 4 (5),
197-207 Pylyshyn, Z. W. (2001). Connecting vision
and the world Tracking the missing link. In J.
Branquinho (Ed.), The Foundations of Cognitive
Science (pp. 183-195). Oxford, UK Clarendon
Press. Pylyshyn, Z.W. (2001) Visual indexes,
preconceptual objects, and situated vision.
Cognition 80 (1/2), 127-8 Pylyshyn, Z.W. (2004)
Tracking without keeping track some puzzling
findings concerning multiple object tracking.
Visual Cognition 11(7), 801-822
The End
Additional examples of MOT
  • MOT with occlusion
  • MOT with virtual occluders
  • MOT with matched nonoccluding disappearance
  • Track endpoints of lines
  • Track rubber-band linked boxes
  • Track and remember ID by location
  • Track and remember ID by name (number)
  • Track while everything briefly disappears (½ sec)
    and goes on moving while invisible
  • Track while everything briefy disappears and
    reappears where they were when they disappeared