A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project - PowerPoint PPT Presentation

1 / 123
About This Presentation
Title:

A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project

Description:

Title: A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project Author: James A. Anderson Last modified by: James A. Anderson Created Date – PowerPoint PPT presentation

Number of Views:367
Avg rating:3.0/5.0
Slides: 124
Provided by: JamesAA4
Category:

less

Transcript and Presenter's Notes

Title: A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project


1
A Brain-Like Computer for Cognitive
Applications The Ersatz Brain Project
  • James A. Anderson
  • James_Anderson_at_brown.edu
  • Department of Cognitive and Linguistic Sciences
  • Brown University, Providence, RI 02912
  • Paul Allopenna
  • pallopenna_at_aptima.com
  • Aptima, Inc.
  • 12 Gill Street, Suite 1400, Woburn, MA
  • Our Goal 
  • We want to build a first-rate, second-rate brain. 

2
Participants
  • Faculty 
  • Jim Anderson, Cognitive Science. 
  • Gerry Guralnik, Physics.
  • Tom Dean, Computer Science.
  • David Sheinberg, Neuroscience.
  • Students
  • Socrates Dimitriadis, Cognitive Science.
  • Brian Merritt, Cognitive Science.
  • Benjamin Machta, Physics.
  • Private Industry
  • Paul Allopenna, Aptima, Inc.
  • John Santini, Anteon, Inc.

3
Acknowledgements
  • This work was supported by
  • A seed money grant from the Office of the Vice
    President for Research, Brown University.
  • An SBIR, The Ersatz Brain Project,
    FA8750-05-C-0122, to Aptima, Inc. (Woburn MA),
    Dr. Paul Allopenna, Project Manager.
  • Also Early support was received from a DARPA
    grant to Brown University Engineering Department
    in the Bio/Info/Micro program, MDA972-00-1-0026.

4
Comparison of Silicon Computers and Carbon
Computer
  • Digital computers are
  • Made from silicon
  • Accurate (essentially no errors)
  • Fast (nanoseconds)
  • Execute long chains of logical operations
    (billions)
  • Often irritating (because they dont think like
    us).

5
Comparison of Silicon Computers and Carbon
Computer
  • Brains are
  • Made from carbon
  • Inaccurate (low precision, noisy)
  • Slow (milliseconds, 106 times slower)
  • Execute short chains of parallel alogical
    associative operations (perhaps 10
    operations/second)
  • Yet largely understandable (because they think
    like us).

6
Comparison of Silicon Computers and Carbon
Computer
  • Huge disadvantage for carbon more than 1012 in
    the product of speed and power.
  • But we still do better than them in many
    perceptual skills speech recognition, object
    recognition, face recognition, motor control.
  • Implication Cognitive software uses only a
    few but very powerful elementary operations.

7
Major Point
  • Brains and computers are very different in their
    underlying hardware, leading to major differences
    in software.
  • Computers, as the result of 60 years of
    evolution, are great at modeling physics.
  • They are not great (after 50 years of and largely
    failing) at modeling human cognition.
  • One possible reason inappropriate hardware leads
    to inappropriate software.
  • Maybe we need something completely different new
    software, new hardware, new basic operations,
    even new ideas about computation.

8
So Why Build a Brain-Like Computer?
  • 1. Engineering.
  •  
  • Computers are all special purpose devices.
  •  
  • Many of the most important practical computer
    applications of the next few decades will be
    cognitive in nature
  •  
  •         Natural language processing. 
  •         Internet search.
  •         Cognitive data mining.
  •         Decent human-computer interfaces.
  •         Text understanding.
  •  
  • We claim it will be necessary to have a
    cortex-like architecture (either software or
    hardware) to run these applications efficiently.

9
  • 2. Science
  •  
  • Such a system, even in simulation, becomes a
    powerful research tool.
  •  
  • It leads to designing software with a particular
    structure to match the brain-like computer.
  •  
  • If we capture any of the essence of the cortex,
    writing good programs will give insight into
    biology and cognitive science.
  •  
  • If we can write good software for a vaguely brain
    like computer we may show we really understand
    something important about the brain.
  •  

10
  • 3. Personal 
  • It would be the ultimate cool gadget.
  • A technological vision
  • In 2055 the personal computer you buy in Wal-Mart
    will have two CPUs with very different
    architectures
  •  
  • First, a traditional von Neumann machine that
    runs spreadsheets, does word processing, keeps
    your calendar straight, etc. etc. What they do
    now.
  •  
  • Second, a brain-like chip
  •         To handle the interface with the von
    Neumann machine,
  •         Give you the data that you need from the
    Web or your files (but didnt think to ask for).
  •         Be your silicon friend, guide, and
    confidant.

11
History Technical Issues
  • Many have proposed the construction of brain-like
    computers.
  • These attempts usually start with
  •         massively parallel arrays of neural
    computing elements
  •         elements based on biological neurons,
    and
  •         the layered 2-D anatomy of mammalian
    cerebral cortex.
  • Such attempts have failed commercially.
  • The early connection machines from Thinking
    Machines,Inc.,(W.D. Hillis, The Connection
    Machine, 1987) was most nearly successful
    commercially and is most like the architecture we
    are proposing here.
  •  
  • Consider the extremes of computational brain
    models.

12
First Extreme Biological Realism
  • The human brain is composed of the order of 1010
    neurons, connected together with at least 1014
    neural connections. (Probably underestimates.)
  • Biological neurons and their connections are
    extremely complex electrochemical structures.
  • The more realistic the neuron approximation the
    smaller the network that can be modeled.
  • There is good evidence that for cerebral cortex a
    bigger brain is a better brain.
  •  
  • Projects that model neurons in detail are of
    scientific importance.
  •  
  • But they are not large enough to simulate
    interesting cognition.

13
 Neural Networks.
  •  
  • The most successful brain inspired models are
    neural networks.
  •  
  • They are built from simple approximations of
    biological neurons nonlinear integration of many
    weighted inputs.
  •  
  • Throw out all the other biological detail.

14
Neural Network Systems
  • Units with these approximations can build systems
    that
  •    can be made large,
  •    can be analyzed,
  •    can be simulated,
  •    can display complex cognitive behavior. 
  • Neural networks have been used to model (rather
    well) important aspects of human cognition.

15
Second Extreme Associatively Linked Networks.
  •  
  • The second class of brain-like computing models
    is a basic part of computer science
  •  
  • Associatively linked structures.
  •  
  • One example of such a structure is a semantic
    network.
  • Such structures underlie most of the practically
    successful applications of artificial
    intelligence.

16
Associatively Linked Networks (2)
  • The connection between the biological nervous
    system and such a structure is unclear.
  •  
  • Few believe that nodes in a semantic network
    correspond in any sense to single neurons.
  •  
  • Physiology (fMRI) suggests that a complex
    cognitive structure a word, for instance
    gives rise to widely distributed cortical
    activation.
  •  
  • Major virtue of Linked Networks They have
    sparsely connected interesting nodes. (words,
    concepts)
  •  
  • In practical systems, the number of links
    converging on a node range from one or two up to
    a dozen or so.
  •  

17

The Ersatz Brain Approximation The Network of
Networks.
  • Conventional wisdom says neurons are the basic
    computational units of the brain.
  • The Ersatz Brain Project is based on a different
    assumption.
  • The Network of Networks model was developed in
    collaboration with Jeff Sutton (Harvard Medical
    School, now at NSBRI).
  •  
  • Cerebral cortex contains intermediate level
    structure, between neurons and an entire cortical
    region.
  •  
  • Intermediate level brain structures are hard to
    study experimentally because they require
    recording from many cells simultaneously.

18
Network of Networks Approximation
  • We use the Network of Networks NofN
    approximation to structure the hardware and to
    reduce the number of connections.
  •  
  • We assume the basic computing units are not
    neurons, but small (104 neurons) attractor
    networks.
  •  
  • Basic Network of Networks Architecture
  • 2 Dimensional array of modules 
  • Locally connected to neighbors

19
Cortical Columns Minicolumns
  • The basic unit of cortical operation is the
    minicolumn It contains of the order of 80-100
    neurons except in the primate striate cortex,
    where the number is more than doubled. The
    minicolumn measures of the order of 40-50 ?m in
    transverse diameter, separated from adjacent
    minicolumns by vertical, cell-sparse zones The
    minicolumn is produced by the iterative division
    of a small number of progenitor cells in the
    neuroepithelium. (Mountcastle, p. 2)
  • VB Mountcastle (2003). Introduction to a
    special issue of Cerebral Cortex on columns.
    Cerebral Cortex, 13, 2-4.  
  • Figure Nissl stain of cortex in planum
    temporale.

20
Columns Functional
  •  
  • Groupings of minicolumns seem to form the
    physiologically observed functional columns.
    Best known example is orientation columns in V1.
  • They are significantly bigger than minicolumns,
    typically around 0.3-0.5 mm.
  • Mountcastles summation
  • Cortical columns are formed by the binding
    together of many minicolumns by common input and
    short range horizontal connections. The number
    of minicolumns per column varies between 50 and
    80. Long range intracortical projections link
    columns with similar functional properties. (p.
    3)
  •  
  • Cells in a column (80)(100) 8000

21
Elementary Modules
  • The activity of the non-linear attractor networks
    (modules) is dominated by their attractor states.
  •  
  • Attractor states may be built in or acquired
    through learning.
  •  
  • We approximate the activity of a module as a
    weighted sum of attractor states.That is an
    adequate set of basis functions.
  • Activity of Module
  • x S ciai
  • where the ai are the attractor states.

22
The Single Module BSB
  • The attractor network we use for the individual
    modules is the BSB network (Anderson, 1993).
  •  
  • It can be analyzed using the eigenvectors and
    eigenvalues of its local connections.

23
Interactions between Modules
  • Interactions between modules are described by
    state interaction matrices, M.
  • The state interaction matrix elements give the
    contribution of an attractor state in one module
    to the amplitude of an attractor state in a
    connected module.
  • In the BSB linear region
  • x(t1) S Misi f x(t)
  • weighted sum input ongoing
  • from other modules activity

24
The Linear-Nonlinear Transition
  • The first BSB processing stage is linear and sums
    influences from other modules.
  • The second processing stage is nonlinear.
  • This linear to nonlinear transition is a powerful
    computational tool for cognitive applications.
  •  
  • It describes the processing path taken by many
    cognitive processes.
  •  
  • A generalization from cognitive science
  •  
  • Sensory inputs ? (categories, concepts, words)
  •  
  • Cognitive processing moves from continuous values
    to discrete entities.

25
Scaling
  • We can extend this associative model to larger
    scale groupings.
  •  
  • It may become possible to suggest a natural way
    to bridge the gap in scale between single neurons
    and entire brain regions.
  •  
  • Networks gt
  • Networks of Networks gt
  • Networks of
  • (Networks of Networks) gt
  • Networks of
  • (Networks of (Networks
  • of Networks))
  • and so on

26
Binding Module Patterns Together.
  • An associative Hebbian learning event will tend
    to link f with g through the local connections.
  •  
  • There is a speculative connection to the
    important binding problem of cognitive science
    and neuroscience.
  •  
  • The larger groupings will act like a unit.
  • Responses will be stronger to the pair f,g than
    to either f or g by itself.

27
Sparse Connectivity
  • The brain is sparsely connected. (Unlike most
    neural nets.)
  •  
  • A neuron in cortex may have on the order of
    100,000 synapses. There are more than 1010
    neurons in the brain. Fractional connectivity is
    very low 0.001.
  • Implications 
  • Connections are expensive biologically since they
    take up space, use energy, and are hard to wire
    up correctly.
  • Therefore, connections are valuable.
  • The pattern of connection is under tight control.
  • Short local connections are cheaper than long
    ones.
  • Our approximation makes extensive use of local
    connections for computation.

28
Interference Patterns
  • We are using local transmission of (vector)
    patterns, not scalar activity level.
  • We have the potential for traveling pattern waves
    using the local connections.
  • Lateral information flow allows the potential for
    the formation of feature combinations in the
    interference patterns where two different
    patterns collide.

29
Learning the Interference Pattern
  • The individual modules are nonlinear learning
    networks.
  • We can form new attractor states when an
    interference pattern forms when two patterns meet
    at a module.

30
Module Evolution
  • Module evolution with learning
  •  
  •         From an initial repertoire of basic
    attractor states
  •  
  •         to the development of specialized
    pattern combination states unique to the history
    of each module.

31
Biological Evidence
32
Biological EvidenceColumnar Organization in
Inferotemporal Cortex
  • Tanaka (2003) suggests a columnar organization of
    different response classes in primate
    inferotemporal cortex.
  •  
  • There seems to be some internal structure in
    these regions for example, spatial
    representation of orientation of the image in the
    column.

33
IT Response Clusters Imaging
  • Tanaka (2003) used intrinsic visual imaging of
    cortex. Train video camera on exposed cortex,
    cell activity can be picked up.
  •  
  • At least a factor of ten higher resolution than
    fMRI.
  •  
  • Size of response is around the size of functional
    columns seen elsewhere 300-400 microns.

34
Columns Inferotemporal Cortex
  • Responses of a region of IT to complex images
    involve discrete columns.
  •  
  • The response to a picture of a fire extinguisher
    shows how regions of activity are determined.
  •  
  • Boundaries are where the activity falls by a
    half.
  •  
  • Note some spots are roughly equally spaced.

35
Active IT Regions for a Complex Stimulus
  • Note the large number of roughly equally distant
    spots (2 mm) for a familiar complex image.

36
Histogram of Distances
  • Were able to plot histograms of distances in a
    number of published IT intrinsic images of
    complex figures.
  • Distances computed from data in previous figure
    (Dimitriadis)

37
Back-of-the-Envelope Engineering Considerations
38
Network of Networks Functional Summary.
  • The NofN approximation assumes a two dimensional
    array of attractor networks.
  • The attractor states dominate the output of the
    system at all levels.
  • Interactions between different modules are
    approximated by interactions between their
    attractor states.
  • Lateral information propagation plus nonlinear
    learning allows formation of new attractors at
    the location of interference patterns.
  • There is a linear and a nonlinear region of
    operation in both single and multiple modules.
  • The qualitative behavior of the attractor
    networks can be controlled by analog gain control
    parameters.

39
Engineering Hardware Considerations
  • We feel that there is a size, connectivity, and
    computational power sweet spot at the level of
    the parameters of the network of network model.
  •  
  • If an elementary attractor network has 104 actual
    neurons, that network display 50 attractor
    states. Each elementary network might connect to
    50 others through state connection matrices.
  •  
  • A brain-sized system might consist of 106
    elementary units with about 1011 (0.1 terabyte)
    numbers specifying the connections.
  •  
  • If 100 to 1000 elementary units can be placed on
    a chip there would be a total of 1,000 to 10,000
    chips in a cortex sized system.
  •  
  • These numbers are large but within the upper
    bounds of current technology.

40
Modules
  • Function of Computational (NofN) Modules
  • Simulate local integration Addition of inputs
    from outside, other modules.
  • Simulate local dynamics.
  • Communications Controller Handle long range
    (i.e. not neighboring) interactions.
  • Simpler approximations are possible
  • Cellular automaton. (Ignore local dynamics.)
  • Approximations to dynamics.

41
Topographic Model for Information Integration
42
A Software Example Sensor Fusion
  • A potential application is to sensor fusion.
    Sensor fusion means merging information from
    different sensors into a unified interpretation.
  •  
  • Involved in such a project in collaboration with
    Texas Instruments and Distributed Data Systems,
    Inc.
  •  
  • The project was a way to do the de-interleaving
    problem in radar signal processing using a neural
    net.
  •  
  • In a radar environment the problem is to
    determine how many radar emitters are present and
    whom they belong to.
  •  
  • Biologically, this corresponds to the
    behaviorally important question, Who is looking
    at me? (To be followed, of course, by And what
    am I going to do about it?)

43
Radar
  • A receiver for radar pulses provide several kinds
    of quantitative data
  • frequency,
  • intensity,
  • pulse width,
  • angle of arrival, and
  • time of arrival.
  •  
  • The user of the radar system wants to know
    qualitative information
  •  
  • How many emitters?
  • What type are they?
  • Who owns them?
  • Has a new emitter appeared?

44
Concepts
  • The way we solved the problem was by using a
    concept forming model from cognitive science.
  •  
  • Concepts are labels for a large class of members
    that may differ substantially from each other.
    (For example, birds, tables, furniture.)
  •  
  • We built a system where a nonlinear network
    developed an attractor structure where each
    attractor corresponded to an emitter.
  •  
  • That is, emitters became discrete, valid
    concepts. 

45
Human Concepts
  • One of the most useful computational properties
    of human concepts is that they often show a
    hierarchical structure.
  •  
  • Examples might be
  •  
  • animal gt bird gt canary gt Tweetie
  •  
  • or
  •  
  • artifact gt motor vehicle gt car gt Porsche gt 911.
  •  
  • A weakness of the radar concept model is that it
    did not allow development of these important
    hierarchical structures.

46
Sensor Fusion and Information Integration with
the Ersatz Brain.
  • We can do simple sensor fusion in the Ersatz
    Brain.
  • The data representation we develop is directly
    based on the topographic data representations
    used in the brain topographic computation.
  •  
  • Spatializing the data, that is letting it find a
    natural topographic organization that reflects
    the relationships between data values, is a
    technique of potential utility.
  • We are working with relationships between values,
    not with the values themselves.
  •  
  • Spatializing the problem provides a way of
    programming a parallel computer.  

47
Topographic Data Representation
  • Low Values Medium Values High
    Values



  • We initially will use a simple bar code to code
    the value of a single parameter.
  • The precision of this coding is low.
  •  
  • But we dont care about quantitative precision
    We want qualitative analysis.
  • Brains are good at qualitative analysis, poor at
    quantitative analysis. (Traditional computers
    are the opposite.)

48
Demo
  • For our demo Ersatz Brain program, we will assume
    we have four parameters derived from a source.
  •  
  • An object is characterized by values of these
    four parameters, coded as bar codes on the edges
    of the array of CPUs.
  •  
  • We assume local linear transmission of patterns
    from module to module.

49
  • Each pair of input patterns gives rise to an
    interference pattern, a line perpendicular to the
    midpoint of the line between the pair of input
    locations.

50
  • There are places where three or four features
    meet at a module. Geometry determines location.
  •  
  • The higher-level combinations represent relations
    between several individual data values in the
    input pattern.
  •  
  • Combinations have literally fused spatial
    relations of the input data,

51
Formation of Hierarchical Concepts.
  • This approach allows the formation of what look
    like hierarchical concept representations.
  •  
  • Suppose we have three parameter values that are
    fixed for each object and one value that varies
    widely from example to example.
  •  
  • The system develops two different types of
    spatial data.
  •  
  • In the first, some high order feature
    combinations are fixed since the three fixed
    input (core) patterns never change.
  •  
  • In the second there is a varying set of feature
    combinations corresponding to the details of each
    specific example of the object.
  •  
  • The specific examples all contain the common core
    pattern.

52
Core Representation
  • The group of coincidences in the center of the
    array is due to the three input values arranged
    around the left, top and bottom edges.

53
  • Left are two examples where there is a different
    value on the right side of the array. Note the
    common core pattern (above).

54
Development of A Hierarchy Through Spatial
Localization.
  • The coincidences due to the core (three values)
    and to the examples (all four values) are
    spatially separated.
  •  
  • We can use the core as a representation of the
    examples since it is present in all of them.
  • It acts as the higher level in a simple
    hierarchy all examples contain the core.
  • Key Point This approach is based on
    relationships between parameter values and not on
    the values themselves.

55
Relationships are ValuableConsider
56
Which pair is most similar?
57
Experimental Results
  • One pair has high physical similarity to the
    initial stimulus, that is, one half of the figure
    is identical.
  • The other pair has high relational similarity,
    that is, they form a pair of identical figures.
  • Adults tend to choose relational similarity.
  • Children tend to choose physical similarity.
  • However, It is easy to bias adults and children
    toward either relational or physical similarity.
    Potentially very a very flexible and programmable
    system.

58
Filtering UsingTopographical Representations
Now, show how to use these ideas to do something
(perhaps) useful.
59
The Problem
  • Develop a topographic data representation
    inspired by the the perceptual invariances seen
    in human speech.
  • Look at problems analyzing vowels in a speech
    signal as an example of an important class of
    signals.
  • First in a a series of demonstrations using the
    topography of data representations to do useful
    computation.

60
Speech Signal Basics
  • Vowels are long duration and often stable.
  • But still hard to analyze correctly.
  • Problems different speakers, accents, high
    variability, dipthongs, similarity between
    vowels, context effects, gender
  • The acoustic signals from a vowel are dominated
    by the resonances of the vocal tract, called
    formants.
  • We are interested in using this problem as a test
    case.
  • Show difficulties of biological signal
    processing.
  • But Important signal types, brains very good
    with this type of data.

61
Vowel Processing
  • Vocal tracts come in different sizes men,
    women, children, Alvin the Chipmunk.
  • Resonant peaks change their frequency as a
    function of vocal tract length.
  • This frequency shift can be substantial.
  • But causes little problem for human speech
    perception.
  • An important perceptual feature for phoneme
    recognition seems to be the ratios between the
    formant frequencies, not just absolute values of
    frequency.
  • How can we make a system respond to ratios?

62
Power Spectrum of a Steady State Vowel
63
Sound Spectrogram Male American
Words heed, hid, head, had, hod, hawed, hood,
whod From P Ladefoged (2000), A Course in
Phonetics, 4th Edition, Henle
64
Sound Spectrogram Female American
Words heed, hid, head, had, hod, hawed, hood,
whod From P Ladefoged (2000), A Course in
Phonetics, 4th Edition, Henle
65
Average Formant Frequencies for Men, Women and
Children.
i æ
u Men F1 267 (0.86) 664 (0.77)
307 (0.81) Women F1 310 (1.00) 863
(1.00) 378 (1.00) Children F1 360 (1.16)
1017 (1.18) 432 (1.14)   Men F2 2294
(0.82) 1727 (0.84) 876 (0.91) Women F2
2783 (1.00) 2049 (1.00) 961 (1.00) Children
F2 3178 (1.14) 2334 (1.14) 1193 (1.24)   Men
F3 2937 (0.89) 2420 (0.85) 2239
(0.84) Women F3 3312 (1.00) 2832 (1.00)
2666 (1.00) Children F3 3763 (1.14) 3336
(1.18) 3250 (1.21)   Data taken from Watrous
(1991) derived originally from Peterson and
Barney (1952).
66
Ratios Between Formant Frequencies (Hz) for Men,
Women and Children.
  i æ u Men
F1/F2 0.12 0.38 0.35 Women F1/F2
0.11 0.42 0.39 Children F1/F2 0.11 0.43
0.36   Men F2/F3 0.78 0.71
0.39 Women F2/F3 0.84 0.72
0.36 Children F2/F3 0.84 0.70 0.37   Data
taken from Watrous (1991) derived originally from
Peterson and Barney (1952).
67
Other Representation Issues
  • There is a roughly logarithmic spatial mapping of
    frequency onto the surface of auditory cortex.
  • Sometimes called a tonotopic mapping.
  • Logarithmic coding of a parameter changes
    multiplication by a constant into the addition of
    a constant.
  • A logarithmic spatial coding has the effect of
    translating the parameters multiplied by the
    constant the same distance.

68
Spatial Coding of Frequency
Three data points on a map of frequency. Multiply
by c. Distance moved on map varies from point
to point. Suppose use the log of data value. Now
scale by c Each point moves an amount D.
69
Multiple Maps
Human fMRI derived maps in human auditory cortex.
Note at least five, probably six maps. Some
joined at high frequency end and some at low
frequency end. (Figure 6 from Talavage, et
al., p. 1290)
70
Representational Filtering
  • Our computational goal
  • Enhance the representation of ratios between
    formant frequencies
  • De-emphasize the exact values of those
    frequencies.
  • We wish to make filter using the data
    representation that responds to one aspect of the
    input data.
  • We suggest that brain-like computers can make use
    of this strategy.

71
Use the Information Integration Architecture
  • Assume the information integration square array
    of modules with parameters fed in from the edges
  • Map of frequency along an edge.
  • Assume formant frequencies are precise points.
    (Actually they are somewhat broad.)
  • We start by duplicating the frequency
    representation along the edges of a square.

72
Simple Topographic System To Represent
Relationships
  • Simplest system Two opposing maps of frequency.
  • Look at points equally distant between f1 on one
    map and f2 on the other.
  • Shift frequency by constant amount, D.
  • The point of equal distance between new
    frequencies (f1D) and (f2D) does not move.

73
Problems
  • Unfortunately, this desirable invariance property
    only holds on the center line.
  • Two points determine a line, not a point. There
    are many equidistant points.
  • What happens off the center line is more complex.
  • Still interesting, but a triple equidistant
    coincidence would be much more stable.

74
Three Parameter Coincidences
  • Assume we are interested in the more complex
    system where three frequency components come
    together at a single module at the same time.
  • We conjecture the target module may form a new
    internal representation corresponding to this
    triple coincidence.
  • Assume uniform transmission speed between
    modules.
  • Then we look for module locations equidistant
    from the locations of triple sets of frequencies.

75
Triple Coincidences
76
Construction
  • Location of triple coincidences is a function of
  • Ratios of fs.
  • Values of fs.
  • Careful parametric study has not yet been done.
  • But Now mixes frequency and ratios.

77
Data Representation
Multiple triple coincidence locations are
present.  Depending on the triple different
modules are activated. A three formant system
has six locations corresponding to possible
triples. If we shift the frequency by an
amount D (multiplication by a constant!)the
location of the triple shifts slightly.
78
Two Different Stimuli Selectivity of
Representation
The geometry of the triple coincidence points
varies with the location of the inputs along the
edges. A different set of frequencies will give
rise to a different set of triple coincidences.
Representation is selective.
79
Robust Data Representation
The system is robust. Changes in the shape of
the maps do not affect the qualitative results.
Different spatial data arrangements work
nicely. Changes in geometry have possibilities
for computation. The non-square arrangement
spreads out the triple coincidence points along
the vertical axis.
80
Module Assemblies
Representation of a vowel is composed of multiple
triple coincidences (multiple active modules).
But since information can move laterally.
Closed loops of activity become possible. Idea
proposed before Hebb cell assemblies were self
exciting neural loops. Corresponded to cognitive
entities concepts. Hebbs cell assemblies were
hard to make work because of the use of scalar
interconnected units. We have pattern sensitive
interconnections. Module assemblies may become
a powerful feature of the Network of Networks
approach. See if we can integrate relatively
dense local connections to form module assemblies.
81
Loops
If the modules are simultaneously active the
pairwise associations forming the loop abcda can
be learned through simple Hebb learning. The
path closes on itself. Consider a. After
traversing the linked path agtbgtcgtdgta, the pattern
arriving at a around the loop is a constant times
the pattern on a. If the constant is positive
there is the potential for positive feedback if
the total loop gain is greater than one.
82
Formation of Module Assemblies
A single frequency pattern will give rise to
multiple triple coincidences.   Speculation
Assume a module assembly mechanism Simultaneous
activation can associate the active regions
together for a particular pattern. Two different
patterns can give rise to different module
assemblies.
83
Provocative Neurobiology
  • The behavior of the active regions under
    transformations (i.e. multiplication by a
    constant) has similarity to one of Tanakas
    observations.
  • Tanaka shows an intrinsic imaging response
    inferotemporal cortex to the image of a model
    head.
  • As the head rotates there is a gradual shift of
    the columnar-sized region.
  • The total movement for 180 degree rotation is
    about 1 mm (three or four columns).
  • The shift seems to be smooth with rotation.
  • Tanaka was sufficiently impressed with this
    result to modify his columnar model.  

84
Rotating Face Representation in Inferotemporal
Cortex(from Tanaka, 2003)
85
Revised Tanaka Columnar Model for Inferotemporal
Cortex
86
Theme, Variations, Transformations
  • Speculation Cortical processing involving
    common continuous transformations may be working
    on a theme and variations principle.
  • There are an infinite number of possible
    transformations.
  • But the most common seems to be topographically
    represented by a small physically contiguous
    range of locations on the surface of cortex.
  • By far the most common transformation for a head
    would be rotation around the vertical axis of the
    head caused by different viewing angles.

87
Potential Value
  • This is an example of an approach to signal
    processing for biological and cognitive signals.
  • Many important problems for example, vision,
    speech, even much cognition and information
    integration.
  • Potentially interesting aspects to algorithm.
  • Largely parallel
  • Conjecture Should be robust.
  • Conjecture May be able to handle common
    important transformations.
  • Speculation May put information in useful form
    for later cognitive processing.
  • Speculation If many small active areas (modules)
    is the right form for output, then this technique
    may work.

88
Potential Value (2)
  • To be done
  • Develop general rules for topographic geometries
  • Are the filter characteristics good? Over what
    range of values?
  • Example Could we develop a pure stable ratio
    filter? Right now, mixed.
  • Since we are assuming traveling waves underlying
    this model, what are the temporal dynamics?
  • gtDoes it work for real data?

89
Conclusions Representation
  • Topographic maps of the type we suggest can do
    information processing.
  • They can act like filters, enhancing some aspects
    of the input pattern and suppressing others.
  • Here, enhancing ratios of frequency components
    and suppressing absolute frequency values.
  • Speculation Their behavior may have some
    similarities to effects seen in cortex.

90
Sparse Neural Systems The Ersatz Brain gets
Thinner

91
 Neural Networks.
  •  
  • The most successful brain inspired models are
    neural networks.
  •  
  • They are built from simple approximations of
    biological neurons nonlinear integration of many
    weighted inputs.
  •  
  • Throw out all the other biological detail.

92
Layers
  • Up to now we have emphasized local, lateral
    interactions between cells and cortical columns.
  • But there are also long range projections in
    cortex where one large group of cells projects to
    another one some distance away.
  • Traditional neural net processing is built around
    these projection systems and have little lateral
    interaction.
  • They usually assume full connectivity between
    layers.
  • Is this correct?

93
Neural Network Systems
  • Standard neural network is formed using
  • multiple layers
  • projections between layers. 

94
A Fully Connected Network
Most neural nets assume full connectivity between
layers. A fully connected neural net uses lots
of connections!
95
Limitation 1 Sparse Connectivity
  • We believe that the computational strategy used
    by the brain is strongly determined by severe
    hardware limitations.
  • Example The brain is sparsely connected.
    Fractional connectivity of the brain is very low
    0.001.
  • Implications 
  • Connections are expensive biologically since they
    take up space, use energy, and are hard to wire
    up correctly.
  • Connections are valuable.
  • The pattern of connection is under tight control.
  • Short local connections are cheaper than long
    ones.
  • But many long projections do exist and are very
    important.

96
Limitation 2 Sparse Coding
In sparse coding only a few active units
represent an event. In recent years a
combination of experimental, computational, and
theoretical studies have pointed to the existence
of a common underlying principle involved in
sensory information processing, namely that
information is represented by a relatively small
number of simultaneously active neurons out of a
large population, commonly referred to as sparse
coding. (p. 481) BA Olshausen and DJ Field
(2004). Sparse coding of sensor inputs. Current
Opinions in Neurobiology, 14, 481-487. 
97
Advantages of Sparse Coding
  • There are numerous advantages to sparse coding.
  • Sparse coding provides
  • increased storage capacity in associative
    memories
  • is easy to work with computationally,
  • Very fast! (Few or no network interactions).
  • is energy efficient
  • Best of all It seems to exist!
  • Higher levels (further from sensory inputs) show
    sparser coding than lower levels.
  • Inferotemporal cortex seems to be more selective,
    less spontaneously active than primary areas (V1).

98
Sparse Connectivity Sparse Coding
  • See if we can make a learning system that starts
    from the assumption of both
  • sparse connectivity and
  • sparse coding.
  • If we use simple neural net units it doesnt work
    so well.
  • But if we use our Network of Networks
    approximation, it works better and makes some
    interesting predictions.

99
The Simplest Connection
The simplest sparse system has a single active
unit connecting to a single active unit. If the
potential connection does exist, simple
outer-product Hebb learning can learn it
easily. Not interesting.

100
Paths
A useful notion in sparse systems is the idea of
a path. A path connects a sparsely coded input
unit with a sparsely coded output unit. Paths
have strengths just as connections do. Strengths
are based on the entire path, from input to
output, which may involve intermediate
connections. It is easy for Hebb synaptic
learning to learn paths.
101
Common Parts of a Path
One of many problems. Suppose there is a common
portion of a path for two single active unit
associations, a with d (agtbgtcgtd) and e with f
(egtbgtcgtf). We cannot easily weaken or
strengthen the common part of the path (bgtc)
because it is used in multiple associations.
Interference occurs.
102
Make Many, Many Paths!
Some speculations If independent paths are
desirable an initial construction bias would be
to make available as many potential paths as
possible. In a fully connected system, adding
more units than contained in the input and output
layers would be redundant. They would add no
additional processing power. Obviously not so
in sparse systems! Fact There is a huge
expansion in number of units going from retina to
thalamus to cortex. In V1, a million input
fibers drive 200 million V1 neurons.
103
Network of Networks Approximation
  • Single units do not work so well in sparse
    systems.
  • Let us our Network of Networks approximation and
    see if we can do better.
  •  
  • Network of Networks the basic computing units
    are not neurons, but small (104 neurons)
    attractor networks.
  •  
  • Basic Network of Networks Architecture
  • 2 Dimensional array of modules 
  • Locally connected to neighbors

104
Interactions between Modules
  • Interactions between modules are vector in nature
    not simple scalar activity.
  • Interactions between modules are described by
    state interaction matrices instead of simple
    scalar weights.
  • Gain greater path selectivity this way.

105
Feedforward, Feedback
  • Emphasize Cortex is not a simple feedforward
    system moving upward from layer to layer.
    (Input to output)
  • It has massive connections backwards from layer
    to layer, at least as dense as the forward
    connections.
  • There is not a simple processing hierarchy!

106
Columns and Their Connections
Columnar organization is maintained in both
forward and backward projections The
anatomical column acts as a functionally tuned
unit and point of information collation from
laterally offset regions and feedback pathways.
(p. 12) feedback projections from
extra-striate cortex target the clusters of
neurons that provide feedforward projections to
the same extra-striate site. . (p. 22).
Lund, Angelucci and Bressloff (2003). Cerebral
Cortex, 12, 15-24.
107
Sparse Network of Networks
Return to the simplest situation for layers
Modules a and b can display two orthogonal
patterns, A and C on a and B and D on b. The
same pathways can learn to associate A with B and
C with D.   Path selectivity can overcome the
limitations of scalar systems. Paths are both
upward and downward.
108
Common Paths Revisted
Consider the common path situation again. We
want to associate patterns on two paths, a-b-c-d
and e-b-c-f with link b-c in common. Parts of
the path are physically common but they can be
functionally separated if they use different
patterns.   Pattern information propagating
forwards and backwards can sharpen and strengthen
specific paths without interfering with the
strengths of other paths.

109
Associative Learning along a Path
Just stringing together simple associators
works For module b Change in coupling term
between a and b ?(Sab) ?baT Change in
coupling term between c and b ?(Tcb)
?bcT   For module c   ?(coupling term Udc)
?cdT ?(coupling term Tbc) ?cbT   If pattern a
is presented at layer 1 then Pattern on d
(Ucd) (Tbc) (Sab) a ?3 dcT cbT
baT a (constant) d
110
Module Assemblies
Because information moves backward, forward, and
sideways, closed loops are possible and
likely. Tried before Hebb cell assemblies were
self exciting neural loops. Corresponded to
cognitive entities for example,
concepts. Hebbs cell assemblies are hard to
make work because of the use of scalar
interconnected units. But module assemblies can
become a powerful feature of the sparse
approach. We have more selective
connections. See if we can integrate relatively
dense local connections with relatively sparse
projections to and from other layers to form
module assemblies.
111
Biological EvidenceColumnar Organization in IT
  • Tanaka (2003) suggests a columnar organization of
    different response classes in primate
    inferotemporal cortex.
  •  
  • There seems to be some internal structure in
    these regions for example, spatial
    representation of orientation of the image in the
    column.

112
Columns Inferotemporal Cortex
  • Responses of a region of IT to complex images
    involve discrete columns.
  •  
  • The response to a picture of a fire extinguisher
    shows how regions of activity are determined.
  •  
  • Boundaries are where the activity falls by a
    half.
  •  
  • Note some spots are roughly equally spaced.

113
Active IT Regions for a Complex Stimulus
  • Note the large number of roughly equally distant
    spots (2 mm) for a familiar complex image.

114
Intralayer Connections
  • Intralayer connections are sufficiently dense so
    that active modules a little distance apart can
    become associatively linked.
  • Recurrent collaterals of cortical pyramidal cells
    form relatively dense projections around a
    pyramidal cell. The extent of lateral spread of
    recurrent collaterals in cortex seems to be over
    a circle of roughly 3 mm diameter.
  • If we assume that
  • A column is roughly a third of a mm,
  • There are roughly 10 columns in a square mm.
  • A 3 mm diameter circle has an area of roughly 10
    square mm,
  • A column projects locally to about 100 other
    columns.

115
Loops
If the modules are simultaneously active the
pairwise associations forming the loop abcda can
be learned through simple Hebb learning. The
path closes on itself. Consider a. After
traversing the linked path agtbgtcgtdgta, the pattern
arriving at a around the loop is a constant times
the pattern on a. If the constant is positive
there is the potential for positive feedback if
the total loop gain is greater than one.
116
Loops with Common Modules
Loops can be kept separate even with common
modules. If the b pattern is different in the
two loops, there is no problem. The selectivity
of links will keep activities separate. Activity
from one loop will not spread into the other
(unlike Hebb cell assemblies). 
If b is identical in the two loops b is
ambiguous. There is no a priori reason to
activate Loop 1, Loop 2, or both.   Selective
loop activation is still possible, though it
requires additional assumptions to accomplish.
117
Richly Connected Loops
More complex connection patterns are
possible. Richer interconnection patterns might
have all connections learned. Ambiguous module
b will receive input from d as well as a and c.
A larger context would allow better loop
disambiguation by increasing the coupling
strength of modules.
118
Working Together
Putting in All Together Sparse interlayer
connections and dense intralayer connections work
together.   Once a coupled module assembly is
formed, it can be linked to by other layers.
Now becomes a dynamic, adaptive computational
architecture that becomes both workable and
interesting.
119
Two Parts
Suppose we have two such assemblies that co-occur
frequently. Parts of an object say
120
Make a Whole!
As learning continues Groups of module
assemblies bind together through Hebb associative
learning. The small assemblies can act as the
sub-symbolic substrate of cognition and the
larger assemblies, symbols and concepts. Note
the many new interconnections.
121
Conclusion (1)
  • The binding process looks like compositionality.
  • The virtues of compositionality are well known.
  • It is a powerful and flexible way to build
    cognitive information processing systems.
  • Complex mental and cognitive objects can be built
    from previously constructed, statistically
    well-designed pieces. (Like cognitive Legos.)

122
Conclusion (2)
  • We are suggesting here a possible model for the
    dynamics and learning in a compositional-like
    system.
  • It is built based on constraints derived from
    connectivity, learning, and dynamics and not as a
    way to do optimal information processing.
  • Perhaps this property of cognitive systems is
    more like a splendid bug fix than a well chosen
    computational strategy. 
  • Sparseness is an idea worth pursuing.
  • May be a way to organize and teach a cognitive
    computer.

123
Conclusions
  • Speculation Perhaps digital computers and humans
    (and brain-like computers??) are evolving toward
    a complementary relationship.
  • Each computational style has its virtues
  • Humans (and brain-like computers??) show
    flexibility, estimation, connection to the
    physical world
  • Digital Computers show speed, logic, accuracy.
  • Both styles of computation are valuable. There
    is a place for both.
  • But their hardware is so different that
    brain-like coprocessors make sense.
  • As always, software will be more difficult build
    and understand than hardware.
Write a Comment
User Comments (0)
About PowerShow.com