Title: Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November
1Population CodingAlexandre PougetOkinawa
Computational Neuroscience CourseOkinawa, Japan
November 2004
2Outline
- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher
information - Basis functions and optimal computation
3Outline
- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher
information - Basis functions and optimal computation
4Receptive field
s Direction of motion
Response
Stimulus
5Receptive field
s Direction of motion
Trial 1
Trial 2
Trial 3
Trial 4
Stimulus
6(No Transcript)
7Tuning curves and noise
- Example of tuning curves
- Retinal location, orientation, depth, color, eye
movements, arm movements, numbers etc.
8Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
9Bayesian approach
- We want to recover P(sr). Using Bayes theorem,
we have
10Bayesian approach
11Bayesian approach
- We want to recover P(sr). Using Bayes theorem,
we have
12Bayesian approach
- If we are to do any type of computation with
population codes, we need a probabilistic model
of how the activity are generated, p(rs), i.e.,
we need to model the encoding process.
13Activity distribution
14Tuning curves and noise
- The activity ( of spikes per second) of a
neuron can be written as -
- where fi(q) is the mean activity of the neuron
(the tuning curve) and ni is a noise with zero
mean. If the noise is gaussian, then
15Probability distributions and activity
- The noise is a random variable which can be
characterized by a conditional probability
distribution, P(nis). - The distributions of the activity P(ris). and
the noise differ only by their means (Eni0,
Erifi(s)).
16Examples of activity distributions
- Gaussian noise with fixed variance
- Gaussian noise with variance equal to the mean
17- Poisson distribution
- The variance of a Poisson distribution is equal
to its mean.
18Comparison of Poisson vs Gaussian noise with
variance equal to the mean
0.09
0.08
0.07
0.06
0.05
Probability
0.04
0.03
0.02
0.01
0
0
20
40
60
80
100
120
140
Activity (spike/sec)
19Population of neurons
- Gaussian noise with fixed variance
-
20Population of neurons
- Gaussian noise with arbitrary covariance matrix
S -
21Outline
- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher
information - Basis functions and optimal computation
22Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
23Nature of the problem
In response to a stimulus with unknown value s,
you observe a pattern of activity r. What can you
say about s given r?
Bayesian approach recover p(sr) (the posterior
distribution)
24Estimation Theory
Activity vector r
25(No Transcript)
26Estimation Theory
r
27Estimation theory
- A common measure of decoding performance is the
mean square error between the estimate and the
true value - This error can be decomposed as
28Efficient Estimators
- The smallest achievable variance for an unbiased
estimator is known as the Cramer-Rao bound, sCR2. - An efficient estimator is such that
-
- In general
-
29Estimation Theory
Activity vector r
Examples of decoders
30Voting Methods
31Linear Estimators
32Linear Estimators
33Linear Estimators
X and Y must be zero mean
Trust cells that have small variances and large
covariances
34Voting Methods
35Voting Methods
- Optimal Linear Estimator
-
- Center of Mass
-
-
-
36Center of Mass/Population Vector
- The center of mass is optimal (unbiased and
efficient) iff The tuning curves are gaussian
with a zero baseline, uniformly distributed and
the noise follows a Poisson distribution - In general, the center of mass has a large bias
and a large variance
37Voting Methods
- Optimal Linear Estimator
-
- Center of Mass
-
-
- Population Vector
38Population Vector
39Voting Methods
- Optimal Linear Estimator
-
- Center of Mass
-
-
- Population Vector
40Population Vector
Typically, Population vector is not the optimal
linear estimator.
41Population Vector
42Population Vector
- Population vector is optimal iff The tuning
curves are cosine, uniformly distributed and the
noise follows a normal distribution with fixed
variance - In most cases, the population vector is biased
and has a large variance
43Maximum Likelihood
- The maximum likelihood estimate is the value of
s maximizing the likelihood P(rs). Therefore, we
seek such that - is unbiased and efficient.
44Maximum Likelihood
100
80
60
Activity
40
20
0
-100
0
100
Direction (deg)
Tuning Curves
45Maximum Likelihood
Template
46Maximum Likelihood
Activity
0
Template
Preferred Direction (deg)
47ML and template matching
- Maximum likelihood is a template matching
procedure BUT the metric used is not always the
Euclidean distance, it depends on the noise
distribution.
48Maximum Likelihood
- The maximum likelihood estimate is the value of
s maximizing the likelihood P(rs). Therefore, we
seek such that
49Maximum Likelihood
- If the noise is gaussian and independent
- Therefore
- and the estimate is given by
50Maximum Likelihood
Activity
0
Preferred Direction (deg)
51Gaussian noise with variance proportional to the
mean
- If the noise is gaussian with variance
proportional to the mean, the distance being
minimized changes to
52Bayesian approach
- We want to recover P(sr). Using Bayes theorem,
we have
53Bayesian approach
- The prior P(s) correspond to any knowledge we may
have about s before we get to see any activity. - Note the Bayesian approach does not reduce to
the use of a prior
54Bayesian approach
Once we have P(sr), we can proceed in two
different ways. We can keep this distribution for
Bayesian inferences (as we would do in a Bayesian
network) or we can make a decision about s. For
instance, we can estimate s as being the value
that maximizes P(sr), This is known as the
maximum a posteriori estimate (MAP). For flat
prior, ML and MAP are equivalent.
55Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least n(n-1)(n-1)/2 parameters)
56Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least O(n2) parameters, n100, n210000)
Alternative estimate P(sr) directly using a
nonlinear estimate (if s is a scalar and P(sr)
is gaussian, we only need to estimate two
parameters!).
57(No Transcript)
58Outline
- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher
information - Basis functions and optimal computation
59Fisher Information
Fisher information is defined as and
it is equal to
where P(rs) is the distribution of the neuronal
noise.
60Fisher Information
61Fisher Information
- For one neuron with Poisson noise
- For n independent neurons
62Fisher Information and Tuning Curves
- Fisher information is maximum where the slope is
maximum - This is consistent with adaptation experiments
- Fisher information adds up for independent
neurons (unlike Shannon information!)
63Fisher Information
- In 1D, Fisher information decreases as the width
of the tuning curves increases - In 2D, Fisher information does not depend on the
width of the tuning curve - In 3D and above, Fisher information increases as
the width of the tuning curves increases - WARNING this is true for independent gaussian
noise.
64Ideal observer
- The discrimination threshold of an ideal
observer, ds, is proportional to the variance of
the Cramer-Rao Bound. -
- In other words, an efficient estimator is an
ideal observer.
65- An ideal observer is an observer that can recover
all the Fisher information in the activity (easy
link between Fisher information and behavioral
performance) - If all distributions are gaussian, Fisher
information is the same as Shannon information.
66Population Vector and Fisher Information
1/Fisher information
Population vector
CR bound
Population vector should NEVER be used to
estimate information content!!!! The indirect
method is prone to severe problems
67(No Transcript)
68Outline
- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher
information - Basis functions and optimal computation
69- So far we have only talked about decoding from
the point of view of an experimentalists. - How is that relevant to neural computation?
Neurons do not decode, they compute! - What kind of computation can we perform with
population codes?
70Computing functions
- If we denote the sensory input as a vector S and
the motor command as M, a sensorimotor
transformation is a mapping from S to M - Mf(S)
- Where f is typically a nonlinear function
71Example
q2
y
q1
x
72Basis functions
- Most nonlinear functions can be approximated by
linear combinations of basis functions - Ex Fourier Transform
- Ex Radial Basis Functions
73Basis Functions
74Basis Functions
- A basis functions decomposition is like a three
layer network. The intermediate units are the
basis functions
y
X
75Basis Functions
- Networks with sigmoidal units are also basis
function networks
76A
B
Z
2
3
Z
X
Y
Y
X
C
D
Z
Y
X
Linear Combination
Z
Basis Function Layer
Y
X
Y
X
Y
X
Y
X
77Basis Functions
- Decompose the computation of Mf(S,P) in two
stages - Compute basis functions of S and P
- Combine the basis functions linearly to obtain
the motor command
78Basis Functions
- Note that M can be a population code, e.g. the
components of that vector could correspond to
units with bell-shaped tuning curves.
79Example Computing the head-centered location of
an object from its retinal location
Head position
Gaze
Fixation point
80Basis Functions
81Basis Function Units
Ri
82Basis Function Units
Ri
83Visual receptive fields in VIP are partially
shifting with the eye
Fixation point
Head-centered location
Retinotopic location
Screen
(Duhamel, Bremmer, BenHamed and Graf, 1997)
84Summary
- Definition
- Population codes involve the concerted activity
of large populations of neurons - The encoding process
- The activity of the neurons can be formalized as
being the sum of a tuning curve plus noise
85Summary
- Decoding population codes
- Optimal decoding can be performed with Maximum
Likelihood estimation (xML) or Bayesian
inferences (p(sr)) - Quantifying information Fisher information
- Fisher information provides an upper bound on the
amount of information available in a population
code
86Summary
- Basis functions and optimal computation
- Population codes can be used to perform arbitrary
nonlinear transformations because they provide
basis sets.
87Where do we go from here?
- Computation and Bayesian inferences
- Knill, Koerding, Todorov Experimental evidence
for Bayesian inferences in humans. - Shadlen Neural basis of Bayesian inferences
- Latham, Olshausen Bayesian inferences in
recurrent neural nets
88Where do we go from here?
- Other encoding hypothesis probabilistic
interpretations - Zemel, Rao