CS564 Lecture 7' Object Recognition and Scene Analysis Reading Assignments: TMB2: Sections 2'2, and - PowerPoint PPT Presentation

Loading...

PPT – CS564 Lecture 7' Object Recognition and Scene Analysis Reading Assignments: TMB2: Sections 2'2, and PowerPoint presentation | free to download - id: 1038ad-NmQxN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CS564 Lecture 7' Object Recognition and Scene Analysis Reading Assignments: TMB2: Sections 2'2, and

Description:

Lecture 7. Object Recognition. CS564 Lecture 7. Object Recognition. and ... TMB2: Sections 2.2, and 5.2 'Handout': Extracts from HBTNN ... Fusiform Face ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 51
Provided by: lauren123
Learn more at: http://www-scf.usc.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS564 Lecture 7' Object Recognition and Scene Analysis Reading Assignments: TMB2: Sections 2'2, and


1
CS564 Lecture 7. Object Recognition and Scene
Analysis Reading Assignments TMB2 Sections
2.2, and 5.2 Handout Extracts from HBTNN 2e
Drafts Shimon Edelman and Nathan Intrator
Visual Processing of Object Structure Guy Wallis
and Heinrich Bülthoff Object recognition,
neurophysiology Simon Thorpe and Michèle
Fabre-Thorpe Fast Visual Processing (My thanks
to Laurent Itti and Bosco Tjan for permission to
use the slides they prepared for lectures on this
topic.)
2
(No Transcript)
3
Bottom-Up Segmentation or Top-Down Control?
4
Object Recognition
  • What is Object Recognition?
  • Segmentation/Figure-Ground Separation
    prerequisite or consequence?
  • Labeling an object The focus of most studies
  • Extracting a parametric description as well
  • Object Recognition versus Scene Analysis
  • An object may be part of a scene or
  • Itself be recognized as a scene
  • What is Object Recognition for?
  • As a context for recognizing something else
    (locating a house by the tree in the garden)
  • As a target for action (climb that tree)

5
"What" versus "How in Human
DF Jeannerod et al. Lesion here Inability to
Preshape (except for objects with size in the
semantics
Monkey Data Mishkin and Ungerleider on What
versus Where
AT Goodale and Milner Lesion here Inability to
verbalize or pantomime size or orientation
6
Clinical Studies
  • Studies with patients with some visual deficits
    strongly argue that tight interaction between
    where and what/how visual streams are necessary
    for scene interpretation.
  • Visual agnosia can see objects, copy drawings of
    them, etc., but cannot recognize or name them!
  • Dorsal agnosia cannot recognize objects
  • if more than two are presented simulta-
  • neously problem with localization
  • Ventral agnosia cannot identify objects.

7
These studies suggest…
  • We bind features of objects into objects (feature
    binding)
  • We bind objects in space into some arrangement
    (space binding)
  • We perceive the scene.
  • Feature binding what/how stream
  • Space binding where stream
  • Double role of spatial relationships
  • To relate different portions of an object or
    scene as a guide to recognition
  • Augmented by other how parameters, to guide
    our behavior with respect to the observed scene.

8
Inferotemporal Pathways
Later stages of IT (AIT/CIT) connect to the
frontal lobe, whereas earlier ones (CIT/PIT)
connect to the parietal lobe. This functional
distinction may well be important in forming a
complete picture of inter-lobe interaction.
9
Shape perception and scene analysis
  • Shape-selective neurons in cortex
  • Coding one neuron per object
  • or population codes?
  • Biologically-inspired algorithms
  • for shape perception
  • The "gist" of a scene how can we get
  • it in 100ms or less?
  • Visual memory how much do we remember
  • of what we have seen?
  • The world as an outside memory and our eyes as a
    lookup tool

10
Face Cells in Monkey
11
Object recognition
  • The basic issues
  • Translation and rotation invariance
  • Neural models that do it
  • 3D viewpoint invariance (data and models)
  • Classical computer vision approaches template
    matching and matched filters wavelet transforms
    correlation etc.
  • Examples face recognition.
  • More examples of biologically-
  • inspired object recognition systems
  • which work remarkably well

12
Extended Scene Perception
  • Attention-based analysis Scan scene with
    attention, accumulate evidence from detailed
    local analysis at each attended location.
  • Main issues
  • what is the internal representation?
  • how detailed is memory?
  • do we really have a detailed internal
    representation at all!!?
  • Gist Can very quickly (120ms) classify entire
    scenes or do simple recognition tasks can only
    shift attention twice in that much time!

13
Thorpe Recognizing Whether a Scene Contains an
Animal
Claim This is so quick that only feedforward
processing can be involved
14
Eye Movements Beyond Feedforward Processing
  • 1) Examine scene freely
  • 2) estimate material
  • circumstances of family
  • 3) give ages of the people
  • 4) surmise what family has
  • been doing before arrival
  • of unexpected visitor
  • 5) remember clothes worn by
  • the people
  • 6) remember position of people
  • and objects
  • 7) estimate how long the unexpected
  • visitor has been away from family

15
The World as an Outside Memory
  • Kevin ORegan, early 90s
  • why build a detailed internal representation of
    the world?
  • too complex…
  • not enough memory…
  • … and useless?
  • The world is the memory. Attention and the eyes
    are a look-up tool!

16
The Attention Hypothesis
  • Rensink, 2000
  • No integrative buffer
  • Early processing extracts information up to
    proto-object complexity in massively parallel
    manner
  • Attention is necessary to bind the different
    proto-objects into complete objects, as well as
    to bind object and location
  • Once attention leaves an object, the binding
    dissolves. Not a problem, it can be formed
    again whenever needed, by shifting attention back
    to the object.
  • Only a rather sketchy virtual representation is
    kept in memory, and attention/eye movements are
    used to gather details as needed

17
Challenges of Object Recognition
  • The binding problem binding different features
    (color, orientation, etc) to yield a unitary
    percept.
    (see next slide)
  • Bottom-up vs. top-down processing how
  • much is assumed top-down vs. extracted
  • from the image?
  • Perception vs. recognition vs. categorization
    seeing an object vs. seeing is as something.
    Matching views of known objects to memory vs.
    matching a novel object to object categories in
    memory.
  • Viewpoint invariance a major issue is to
    recognize objects irrespective of the viewpoint
    from which we see them.

18
Four stages of representation (Marr, 1982)
  • 1) pixel-based (light intensity)
  • 2) primal sketch (discontinuities in intensity)
  • 3) 2 ½ D sketch (oriented surfaces, relative
    depth between surfaces)
  • 4) 3D model (shapes, spatial relationships,
    volumes)
  • TMB2 view This may work in ideal cases, but in
    general cooperative computation of multiple
    visual cues and perceptual schemas will be
    required.
  • problem computationally intractable!

19
VISIONS
  • A computer vision system from 1987 developed by
  • Allen Hanson and Edward Riseman on the basis of
  • the HEARSAY system for speech understanding (TMB2
    Sec. 4.2)
  • and Arbibs Schema Theory (TMB2 Sec. 2.2 and
    Chap. 5)
  • This is schema-based and can be mapped onto
    hypotheses
  • about cooperative computation in the brain.
  • Key idea Bringing context and scene knowledge
    into play so that recognition of objects proceeds
    via islands of reliability to yield a consensus
    interpretation of the scene.
  • See TMB2 Sec. 5.2 for the figures.

20
Biederman Recognition by Components
geons units of 3D geometric structure
21
JIM 3 (Hummel)
22
Collection of Fragments (Edelman and Intrator)
23
Collection of Fragments 2
24
(No Transcript)
25
Viewpoint Invariance
  • Major problem for recognition.
  • Biederman Gerhardstein, 1994
  • We can recognize two views of an unfamiliar
    object as being the same object.
  • Thus, viewpoint invariance cannot only rely on
    matching views to memory.

26
Models of Object Recognition
  • See Hummel, 1995, The Handbook of Brain Theory
    Neural Networks
  • Direct Template Matching
  • Processing hierarchy yields activation of
    view-tuned units.
  • A collection of view-tuned units is associated
    with one object.
  • View tuned units are built from V4-like units,
  • using sets of weights which differ for each
    object.
  • e.g., Poggio Edelman, 1990 Riesenhuber
    Poggio, 1999

27
Computational Model of Object Recognition (Riesen
huber and Poggio, 1999)
28
  • the model neurons are
  • tuned for size
  • and 3D orientation
  • of object

29
Models of Object Recognition
  • Hierarchical Template Matching
  • Image passed through layers of units with
    progressively more complex features at
    progressively less specific locations.
  • Hierarchical in that features at one stage are
    built from features at
  • earlier stages.
  • e.g., Fukushima Miyake (1982)s Neocognitron
  • Several processing layers, comprising
  • simple (S) and complex (C) cells.
  • S-cells in one layer respond to conjunc-
  • tions of C-cells in previous layer.
  • C-cells in one layer are excited by
  • small neighborhoods of S-cells.

30
Models of Object Recognition
  • Transform Match
  • First take care of rotation, translation, scale,
    etc. invariances.
  • Then recognize based on standardized pixel
    representation of objects.
  • e.g., Olshausen et al, 1993,
  • dynamic routing model
  • Template match e.g., with
  • an associative memory based on
  • a Hopfield network.

31
Recognition by Components
  • Structural approach to object recognition
  • Biederman, 1987
  • Complex objects are composed so simpler pieces
  • We can recognize a novel/unfamiliar object by
    parsing it in terms of its component pieces, then
    comparing the assemblage of pieces to those of
    known objects.

32
Recognition by components (Biederman, 1987)
  • GEONS geometric elements of which all objects
    are composed (cylinders, cones, etc). On the
    order of 30 different shapes.
  • Skips 2 ½ D sketch Geons are directly recognized
    from edges, based on their nonaccidental
    properties (i.e., 3D features that are usually
    preserved by the projective imaging process).

33
Basic Properties of GEONs
  • They are sufficiently different from each other
    to be easily discriminated
  • They are view-invariant (look identical from most
    viewpoints)
  • They are robust to noise (can be identified even
    with parts of image missing)

34
Support for RBC We can recognize partially
occluded objects easily if the occlusions do not
obscure the set of geons which constitute the
object.
35
Potential difficulties
  • Structural description not
  • enough, also need metric info
  • Difficult to extract geons
  • from real images
  • Ambiguity in the structu-
  • ral description most often
  • we have several candidates
  • For some objects,
  • deriving a structural repre-
  • sentation can be difficult
  • Edelman, 1997

36
Geon Neurons in IT?
  • These are preferred
  • stimuli for some IT neurons.

37
(No Transcript)
38
Fusiform Face Area in Humans
39
Standard View on Visual Processing
representation
visual processing
  • Image specific
  • Supports fine discrimination
  • Noise tolerant
  • Image invariant
  • Supports generalization
  • Noise sensitive

Tjan, 1999
40
Face
Early visual processing
Place
?
Common objects
(e.g. Kanwisher et al Ishai et al)
primary visual processing
Multiple memory/decision sites
(Tjan, 1999)
41
Tjans Recognition by Anarchy
primary visual processing
...
Sensory Memory
memory
memory
memory
Independent Decisions
R1
Ri
Rn
t1
ti
tn
Delays
Homunculus Response
the first arriving response
42
A toy visual system
Task Identify letters from arbitrary positions
orientations
e
43
normalize position
normalize orientation
Image
down- sampling
memory
44
normalize position
normalize orientation
Image
down- sampling
Site 2
Site 3
memory
memory
memory
Site 1
45
Study stimuli 5 orientations ? 20 positions at
high SNR
Test stimuli 1) familiar (studied) views, 2) new
positions, 3) new position orientations
1800 30
1500 25
800 20
450 15
210 10
Signal-to-Noise Ratio RMS Contrast
46
Processing speed for each recognition module
depends on recognition difficulty by that module.
47
Novel positions orientations
Familiar views
Novel positions
Proportion Correct
Contrast ()
48
Novel positions orientations
Novel positions
Familiar views
Proportion Correct
Contrast ()
Black curve full model in which recognition is
based on the fastest of the responses from the
three stages.
49
(No Transcript)
50
Experimental techniques in visual neuroscience
  • Recording from neurons electrophysiology
  • Multi-unit recording using electrode arrays
  • Stimulating while recording
  • Anesthetized vs. awake animals
  • Single-neuron recording in awake humans
  • Probing the limits of vision visual
    psychophysics
  • Functional neuroimaging Techniques
  • Experimental design issues
  • Optical imaging
  • Transcranial magnetic stimulation
About PowerShow.com