Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November - PowerPoint PPT Presentation

1 / 88

About This Presentation

Title:

Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November

Description:

The activity (# of spikes per second) of a neuron can be written as: ... that maximizes P(s|r), This is known as the maximum a posteriori estimate (MAP) ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 89

Provided by: Alexandr205

Category:

more less

Transcript and Presenter's Notes

Title: Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November

1
Population CodingAlexandre PougetOkinawa
Computational Neuroscience CourseOkinawa, Japan
November 2004
2
Outline

Definition
The encoding process
Decoding population codes
Quantifying information Shannon and Fisher
information
Basis functions and optimal computation

3
Outline

Definition
The encoding process
Decoding population codes
Quantifying information Shannon and Fisher
information
Basis functions and optimal computation

4
Receptive field
s Direction of motion
Response
Stimulus
5
Receptive field
s Direction of motion
Trial 1
Trial 2
Trial 3
Trial 4
Stimulus
6
(No Transcript)
7
Tuning curves and noise

Example of tuning curves
Retinal location, orientation, depth, color, eye
movements, arm movements, numbers etc.

8
Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
9
Bayesian approach

We want to recover P(sr). Using Bayes theorem,
we have

10
Bayesian approach

Bayes rule

11
Bayesian approach

We want to recover P(sr). Using Bayes theorem,
we have

12
Bayesian approach

If we are to do any type of computation with
population codes, we need a probabilistic model
of how the activity are generated, p(rs), i.e.,
we need to model the encoding process.

13
Activity distribution
14
Tuning curves and noise

The activity ( of spikes per second) of a
neuron can be written as
where fi(q) is the mean activity of the neuron
(the tuning curve) and ni is a noise with zero
mean. If the noise is gaussian, then

15
Probability distributions and activity

The noise is a random variable which can be
characterized by a conditional probability
distribution, P(nis).
The distributions of the activity P(ris). and
the noise differ only by their means (Eni0,
Erifi(s)).

16
Examples of activity distributions

Gaussian noise with fixed variance
Gaussian noise with variance equal to the mean

Poisson distribution
The variance of a Poisson distribution is equal
to its mean.

18
Comparison of Poisson vs Gaussian noise with
variance equal to the mean
0.09
0.08
0.07
0.06
0.05
Probability
0.04
0.03
0.02
0.01
0
0
20
40
60
80
100
120
140
Activity (spike/sec)
19
Population of neurons

Gaussian noise with fixed variance

20
Population of neurons

Gaussian noise with arbitrary covariance matrix
S

21
Outline

Definition
The encoding process
Decoding population codes
Quantifying information Shannon and Fisher
information
Basis functions and optimal computation

22
Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
23
Nature of the problem
In response to a stimulus with unknown value s,
you observe a pattern of activity r. What can you
say about s given r?
Bayesian approach recover p(sr) (the posterior
distribution)
24
Estimation Theory
Activity vector r
25
(No Transcript)
26
Estimation Theory
r
27
Estimation theory

A common measure of decoding performance is the
mean square error between the estimate and the
true value
This error can be decomposed as

28
Efficient Estimators

The smallest achievable variance for an unbiased
estimator is known as the Cramer-Rao bound, sCR2.
An efficient estimator is such that
In general

29
Estimation Theory
Activity vector r
Examples of decoders
30
Voting Methods

Optimal Linear Estimator

31
Linear Estimators
32
Linear Estimators
33
Linear Estimators
X and Y must be zero mean
Trust cells that have small variances and large
covariances
34
Voting Methods

Optimal Linear Estimator

35
Voting Methods

Optimal Linear Estimator
Center of Mass

36
Center of Mass/Population Vector

The center of mass is optimal (unbiased and
efficient) iff The tuning curves are gaussian
with a zero baseline, uniformly distributed and
the noise follows a Poisson distribution
In general, the center of mass has a large bias
and a large variance

37
Voting Methods

Optimal Linear Estimator
Center of Mass
Population Vector

38
Population Vector
39
Voting Methods

Optimal Linear Estimator
Center of Mass
Population Vector

40
Population Vector
Typically, Population vector is not the optimal
linear estimator.
41
Population Vector
42
Population Vector

Population vector is optimal iff The tuning
curves are cosine, uniformly distributed and the
noise follows a normal distribution with fixed
variance
In most cases, the population vector is biased
and has a large variance

43
Maximum Likelihood

The maximum likelihood estimate is the value of
s maximizing the likelihood P(rs). Therefore, we
seek such that
is unbiased and efficient.

44
Maximum Likelihood
100
80
60
Activity
40
20
0
-100
0
100
Direction (deg)
Tuning Curves
45
Maximum Likelihood
Template
46
Maximum Likelihood
Activity
0
Template
Preferred Direction (deg)
47
ML and template matching

Maximum likelihood is a template matching
procedure BUT the metric used is not always the
Euclidean distance, it depends on the noise
distribution.

48
Maximum Likelihood

The maximum likelihood estimate is the value of
s maximizing the likelihood P(rs). Therefore, we
seek such that

49
Maximum Likelihood

If the noise is gaussian and independent
Therefore
and the estimate is given by

50
Maximum Likelihood
Activity
0
Preferred Direction (deg)
51
Gaussian noise with variance proportional to the
mean

If the noise is gaussian with variance
proportional to the mean, the distance being
minimized changes to

52
Bayesian approach

We want to recover P(sr). Using Bayes theorem,
we have

53
Bayesian approach

The prior P(s) correspond to any knowledge we may
have about s before we get to see any activity.
Note the Bayesian approach does not reduce to
the use of a prior

54
Bayesian approach
Once we have P(sr), we can proceed in two
different ways. We can keep this distribution for
Bayesian inferences (as we would do in a Bayesian
network) or we can make a decision about s. For
instance, we can estimate s as being the value
that maximizes P(sr), This is known as the
maximum a posteriori estimate (MAP). For flat
prior, ML and MAP are equivalent.
55
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least n(n-1)(n-1)/2 parameters)
56
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least O(n2) parameters, n100, n210000)
Alternative estimate P(sr) directly using a
nonlinear estimate (if s is a scalar and P(sr)
is gaussian, we only need to estimate two
parameters!).
57
(No Transcript)
58
Outline

Definition
The encoding process
Decoding population codes
Quantifying information Shannon and Fisher
information
Basis functions and optimal computation

59
Fisher Information
Fisher information is defined as and
it is equal to
where P(rs) is the distribution of the neuronal
noise.
60
Fisher Information
61
Fisher Information

For one neuron with Poisson noise
For n independent neurons

62
Fisher Information and Tuning Curves

Fisher information is maximum where the slope is
maximum
This is consistent with adaptation experiments
Fisher information adds up for independent
neurons (unlike Shannon information!)

63
Fisher Information

In 1D, Fisher information decreases as the width
of the tuning curves increases
In 2D, Fisher information does not depend on the
width of the tuning curve
In 3D and above, Fisher information increases as
the width of the tuning curves increases
WARNING this is true for independent gaussian
noise.

64
Ideal observer

The discrimination threshold of an ideal
observer, ds, is proportional to the variance of
the Cramer-Rao Bound.
In other words, an efficient estimator is an
ideal observer.

An ideal observer is an observer that can recover
all the Fisher information in the activity (easy
link between Fisher information and behavioral
performance)
If all distributions are gaussian, Fisher
information is the same as Shannon information.

66
Population Vector and Fisher Information
1/Fisher information
Population vector
CR bound
Population vector should NEVER be used to
estimate information content!!!! The indirect
method is prone to severe problems
67
(No Transcript)
68
Outline

Definition
The encoding process
Decoding population codes
Quantifying information Shannon and Fisher
information
Basis functions and optimal computation

So far we have only talked about decoding from
the point of view of an experimentalists.
How is that relevant to neural computation?
Neurons do not decode, they compute!
What kind of computation can we perform with
population codes?

70
Computing functions

If we denote the sensory input as a vector S and
the motor command as M, a sensorimotor
transformation is a mapping from S to M
Mf(S)
Where f is typically a nonlinear function

71
Example

2 Joint arm

q2
y
q1
x
72
Basis functions

Most nonlinear functions can be approximated by
linear combinations of basis functions
Ex Fourier Transform
Ex Radial Basis Functions

73
Basis Functions
74
Basis Functions

A basis functions decomposition is like a three
layer network. The intermediate units are the
basis functions

y
X
75
Basis Functions

Networks with sigmoidal units are also basis
function networks

76
A
B
Z
2
3
Z
X
Y
Y
X
C
D
Z
Y
X
Linear Combination
Z
Basis Function Layer
Y
X
Y
X
Y
X
Y
X
77
Basis Functions

Decompose the computation of Mf(S,P) in two
stages
Compute basis functions of S and P
Combine the basis functions linearly to obtain
the motor command

78
Basis Functions

Note that M can be a population code, e.g. the
components of that vector could correspond to
units with bell-shaped tuning curves.

79
Example Computing the head-centered location of
an object from its retinal location
Head position

Gaze
Fixation point
80
Basis Functions
81
Basis Function Units
Ri
82
Basis Function Units
Ri
83
Visual receptive fields in VIP are partially
shifting with the eye
Fixation point
Head-centered location
Retinotopic location
Screen
(Duhamel, Bremmer, BenHamed and Graf, 1997)
84
Summary

Definition
Population codes involve the concerted activity
of large populations of neurons
The encoding process
The activity of the neurons can be formalized as
being the sum of a tuning curve plus noise

85
Summary

Decoding population codes
Optimal decoding can be performed with Maximum
Likelihood estimation (xML) or Bayesian
inferences (p(sr))
Quantifying information Fisher information
Fisher information provides an upper bound on the
amount of information available in a population
code

86
Summary

Basis functions and optimal computation
Population codes can be used to perform arbitrary
nonlinear transformations because they provide
basis sets.

87
Where do we go from here?

Computation and Bayesian inferences
Knill, Koerding, Todorov Experimental evidence
for Bayesian inferences in humans.
Shadlen Neural basis of Bayesian inferences
Latham, Olshausen Bayesian inferences in
recurrent neural nets

88
Where do we go from here?