Title: Learning with spikes, and the Unresolved Question in Neuroscience/Complex Systems
1Learning with spikes, and the Unresolved Question
in Neuroscience/Complex Systems
- Tony Bell
- Helen Wills Neuroscience Institute
- University of California at Berkeley
2Learning in real neurons
Long-term potentiation and depression (LTP/LTD)
Bliss Lomo 1973 discovered associative and
input specific (Hebbian) changes in sizes of
EPSCs a potential memory mechanism (the
memory trace). Found first in hippocampus
known to be implicated in learning and memory.
LTP from high-frequency presynaptic stimulation,
or low-frequency presynaptic stimulation and
postsynaptic depolarisation. LTD from prolonged
low-frequency stimulation. Levy Steward (1983)
played with timing of weak and strong input from
entorhinal cortex to hippocampus, finding LTD
when weak after strong, LTP when strong up to
20ms after weak or simultaneous. Spike
Timing-Dependent Plasticity (STDP) Markram et
al (1997) find 10ms window for time-dependence
of plasticity, by manipulating pre- and
post-synaptic timings.
3Spike Timing Dependent Plasticity
Experimenting with pre- and post-synaptic spike-t
imings at a synapse between a retinal ganglion
cell and a tectal cell. (Zhang et al, 1998)
4STDP is different in different neurons. Diverse
mechanisms - Common objective??
Figure from Abbott and Nelson
5STDP is different in different neurons. Diverse
mechanisms - Common objective?? This may be
true, but first we had better understand the
mechanism, or we will most likely think up a bad
theory based on our current prejudices and it
wont have any relevance to biology (which,
like the rest of the world, is stranger than we
can suppose.)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Equation for membrane voltage (cable equation)
membrane capacitance conductance along
dendrite maximum conductance for channel
species k time-varying fraction of those
channels open reversal potential for channel
species k
14Equation for ion channel kinetics (non-linear
Markov model)
etc
. voltage information from within the cell
. extracellular ligand information from other
cells
intracellular calcium information from other
molecules
15Can we connect the information-theoretic
learning principles we studied yesterday to the
biophysical and molecular reality of these
processes? Lets give it a go in a simplified
model. the Spike Response Model (a
sophisticated variant of the integrate-and-fire
model).
16HOW DOES ONE SPIKE TIMING AFFECT ANOTHER?
Gerstners SPIKE RESPONSE MODEL
IMPLICIT DIFFERENTIATION
17Assuming a deterministic feedforward invertible
network,
The Idea is output spikes to be as sensitive as
possible to inputs.
Maximum Likelihood try to map inputs uniformly
into unit hypercube
Maximum Spikelihood map inputs into independent
Poisson processes
try to
p(t' i')
18OBJECTIVE FUNCTIONS FOR RATE AND SPIKING MODELS
USE THE BANDWIDTH
BE NON-LOSSY
use all firing rates equally
L(x) log W ? log q(u )
i
i
make the spikecount Poisson
L(t i) gt log T ? log q(n' )
i
i
19THE LEARNING RULE
L(t i) gt log T ? log q(n' )
i
i
for the objective
is
mean rate
rate at input synapse
sum over spikes from neuron j
when T is a single
20Simulation results Coincidence detection
(Demultiplexing).
A 9x9 network extracts independent point
processes from correlated ones
demulti- plexed spike trains
unmixing matrix (learned)
multiplexed spike trains
mixing matrix
original spike trains
21demulti- plexed
original
22Compare with STDP
The Spike Response Model is causal. It only takes
into account how output spikes talk about past
input spikes
Froemke Dan, Nature 2002
Bell Parra (NIPS 17)
But real STDP has a predictive component (spikes
also talk about future spikes)
?
Postsynaptic calcium integrates this information
(Zucker 98), both causal (NMDA channels -gt
CAM-K) and predictive (L-channels -gt calcineurin)
23Problems with this spikelihood model
- requires a non-lossy map t, i -gt t, i
(which we enforced) - learning is (horrendously) non-local
- model does not match STDP curves
- model ignores predictive information
- information only flows from synapse to soma, and
not back down
in
out
24By infomaxing from input spikes to output spikes,
we are ignoring the information that flows from
output spikes (and elsewhere in the dendrites)
back down to where the input information came
from - the site of learning the protein/calcium
machinery at post-synaptic densities, where the
plasticity calculation actually takes
place. What happens if you include this in your
Jacobian? Then the Jacobian between all
spike-timings becomes the sum total of all
intradendritic causalities. And spikes are
talking to synapses, not other spikes. This is a
massively overcomplete inter-level information
flow (1000 times as many synaptic events as
neural events). What kind of density estimation
scheme do we then have?
25The Within models and creates the Between
ie inside the cells (timings voltage
calcium) models and creates between the
cell (spikes)
26Post-synaptic machinery (site of learning)
integrates incoming spike information with
global cell state.
2
Ca converts timing and voltage information into
molecular change
vesicle with glu receptors is trafficked to
plasma membrane
protein machinery
2
2
Ca
Ca
endoplasmic reticulum
AMPA channel
voltage- dependent L-channel
dendrite
NMDA channel
neurotransmitter (glutamate)
synapse
27Networks within networks
network of neurons
network of 2 agents
1 cell
1 brain
network of protein complexes (eg synapses)
network of macromolecules
28A Multi-Level View of Learning
( STDP)
Increasing Timescale
LEARNING at a LEVEL is CHANGE IN INTERACTIONS
between its UNITS, implemented by INTERACTIONS at
the LEVEL beneath, and by extension resulting in
CHANGE IN LEARNING at the LEVEL above.
Interactionsfast Learningslow
Separation of timescales allows INTERACTIONS at
one LEVEL to be LEARNING at the LEVEL above.
29Advantages A closed system can model itself
(sleep, thought) World modeling is not done
directly. Rather, it occurs as a side-effect of
self-modeling. The world is a boundary-condition
on this modeling, imposed by the level above -
by the social level. The variables which form
the probability model are explicitly located at
the level beneath the level being
modeled. Generalising to molecular and social
networks suggests that gene expression and
reward-based social agency may just be
other forms of inter-level density estimation.
30Does the standard model really suffice?
Reinforcement
Decision
Eh..somewhere else
Action
V whatever
V1
Thalamus
Retina
31Does the standard model really suffice?
Reinforcement
Decision
Eh..somewhere else
Action
V whatever
V1
Thalamus
Retina
Or is it levels-chauvinism?
32The standard (or rather the slightly more
emerged) neurostatistical model, as articulated
by Emo Todorov
The emerging computational theory of perception
is Bayesian inference. It postulates that the
sensory system combines a prior probability over
possible states of the world, with a likelihood
that observed sensory data was caused by each
possible state, and computes a posterior
probability over the states of the world given
the sensory data. The emerging computational
theory of movement is stochastic optimal
control. It postulates that the motor system
combines a utility function quantifying the
goodness of each possible outcome, with a
dynamics model of how outcomes are caused by
control sequences, and computes a control law
(state-control mapping) which optimizes
expected utility.
But we havent seen yet what unsupervised models
may do when they are involved in sensory-motor
loops. They may sidestep common criticisms of
feedforward unsupervised theories
33Infomax between Levels. (eg synapses
density-estimate spikes)
Infomax between Layers. (eg V1 density-estimates
Retina)
1
2
all neural spikes
t
synapses, dendites
y
all synaptic readout
- between-level
- includes all feedback
- molecular net models/creates
- social net is boundary condition
- permits arbitrary activity dependencies
- models input and intrinsic together
- within-level
- feedforward
- molecular sublevel is implementation
- social superlevel is reward
- predicts independent activity
- only models outside input
pdf of all spike times
pdf of all synaptic readouts
If we can make this pdf uniform
then we have a model constructed from
all synaptic and dendritic causality
34What about the mathematics? Is it tractable? Not
yet. A new, in many ways satisfactory, objective
is defined, but the gradient calculation seems
very difficult. But this is still progress.
35Density Estimation when the input is affected
Make the model
like the reality
by minimising the Kullback-Leibler Divergence
by gradient descent in a parameter of the
model
changing ones model to fit the world
change the world to fit the model, as
well as
36Conclusion This should be easier, but it isnt
yet. Im open to suggestions What have we
learned from other complex self-organising
systems? Is there a simpler model which
captures the essence of the problem?