Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November - PowerPoint PPT Presentation

PPT – Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November PowerPoint presentation | free to view - id: 1b49da-ZDc1Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November

Description:

The activity (# of spikes per second) of a neuron can be written as: ... that maximizes P(s|r), This is known as the maximum a posteriori estimate (MAP) ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 89
Provided by: Alexandr205
Category:
Transcript and Presenter's Notes

Title: Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November

1
Population CodingAlexandre PougetOkinawa
Computational Neuroscience CourseOkinawa, Japan
November 2004
2
Outline
• Definition
• The encoding process
• Decoding population codes
• Quantifying information Shannon and Fisher
information
• Basis functions and optimal computation

3
Outline
• Definition
• The encoding process
• Decoding population codes
• Quantifying information Shannon and Fisher
information
• Basis functions and optimal computation

4
Receptive field
s Direction of motion
Response
Stimulus
5
Receptive field
s Direction of motion
Trial 1
Trial 2
Trial 3
Trial 4
Stimulus
6
(No Transcript)
7
Tuning curves and noise
• Example of tuning curves
• Retinal location, orientation, depth, color, eye
movements, arm movements, numbers etc.

8
Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
9
Bayesian approach
• We want to recover P(sr). Using Bayes theorem,
we have

10
Bayesian approach
• Bayes rule

11
Bayesian approach
• We want to recover P(sr). Using Bayes theorem,
we have

12
Bayesian approach
• If we are to do any type of computation with
population codes, we need a probabilistic model
of how the activity are generated, p(rs), i.e.,
we need to model the encoding process.

13
Activity distribution
14
Tuning curves and noise
• The activity ( of spikes per second) of a
neuron can be written as
• where fi(q) is the mean activity of the neuron
(the tuning curve) and ni is a noise with zero
mean. If the noise is gaussian, then

15
Probability distributions and activity
• The noise is a random variable which can be
characterized by a conditional probability
distribution, P(nis).
• The distributions of the activity P(ris). and
the noise differ only by their means (Eni0,
Erifi(s)).

16
Examples of activity distributions
• Gaussian noise with fixed variance
• Gaussian noise with variance equal to the mean

17
• Poisson distribution
• The variance of a Poisson distribution is equal
to its mean.

18
Comparison of Poisson vs Gaussian noise with
variance equal to the mean
0.09
0.08
0.07
0.06
0.05
Probability
0.04
0.03
0.02
0.01
0
0
20
40
60
80
100
120
140
Activity (spike/sec)
19
Population of neurons
• Gaussian noise with fixed variance

20
Population of neurons
• Gaussian noise with arbitrary covariance matrix
S

21
Outline
• Definition
• The encoding process
• Decoding population codes
• Quantifying information Shannon and Fisher
information
• Basis functions and optimal computation

22
Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
23
Nature of the problem
In response to a stimulus with unknown value s,
you observe a pattern of activity r. What can you
Bayesian approach recover p(sr) (the posterior
distribution)
24
Estimation Theory
Activity vector r
25
(No Transcript)
26
Estimation Theory
r
27
Estimation theory
• A common measure of decoding performance is the
mean square error between the estimate and the
true value
• This error can be decomposed as

28
Efficient Estimators
• The smallest achievable variance for an unbiased
estimator is known as the Cramer-Rao bound, sCR2.
• An efficient estimator is such that
• In general

29
Estimation Theory
Activity vector r
Examples of decoders
30
Voting Methods
• Optimal Linear Estimator

31
Linear Estimators
32
Linear Estimators
33
Linear Estimators
X and Y must be zero mean
Trust cells that have small variances and large
covariances
34
Voting Methods
• Optimal Linear Estimator

35
Voting Methods
• Optimal Linear Estimator
• Center of Mass

36
Center of Mass/Population Vector
• The center of mass is optimal (unbiased and
efficient) iff The tuning curves are gaussian
with a zero baseline, uniformly distributed and
the noise follows a Poisson distribution
• In general, the center of mass has a large bias
and a large variance

37
Voting Methods
• Optimal Linear Estimator
• Center of Mass
• Population Vector

38
Population Vector
39
Voting Methods
• Optimal Linear Estimator
• Center of Mass
• Population Vector

40
Population Vector
Typically, Population vector is not the optimal
linear estimator.
41
Population Vector
42
Population Vector
• Population vector is optimal iff The tuning
curves are cosine, uniformly distributed and the
noise follows a normal distribution with fixed
variance
• In most cases, the population vector is biased
and has a large variance

43
Maximum Likelihood
• The maximum likelihood estimate is the value of
s maximizing the likelihood P(rs). Therefore, we
seek such that
• is unbiased and efficient.

44
Maximum Likelihood
100
80
60
Activity
40
20
0
-100
0
100
Direction (deg)
Tuning Curves
45
Maximum Likelihood
Template
46
Maximum Likelihood
Activity
0
Template
Preferred Direction (deg)
47
ML and template matching
• Maximum likelihood is a template matching
procedure BUT the metric used is not always the
Euclidean distance, it depends on the noise
distribution.

48
Maximum Likelihood
• The maximum likelihood estimate is the value of
s maximizing the likelihood P(rs). Therefore, we
seek such that

49
Maximum Likelihood
• If the noise is gaussian and independent
• Therefore
• and the estimate is given by

50
Maximum Likelihood
Activity
0
Preferred Direction (deg)
51
Gaussian noise with variance proportional to the
mean
• If the noise is gaussian with variance
proportional to the mean, the distance being
minimized changes to

52
Bayesian approach
• We want to recover P(sr). Using Bayes theorem,
we have

53
Bayesian approach
• The prior P(s) correspond to any knowledge we may
have about s before we get to see any activity.
• Note the Bayesian approach does not reduce to
the use of a prior

54
Bayesian approach
Once we have P(sr), we can proceed in two
different ways. We can keep this distribution for
Bayesian inferences (as we would do in a Bayesian
network) or we can make a decision about s. For
instance, we can estimate s as being the value
that maximizes P(sr), This is known as the
maximum a posteriori estimate (MAP). For flat
prior, ML and MAP are equivalent.
55
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least n(n-1)(n-1)/2 parameters)
56
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least O(n2) parameters, n100, n210000)
Alternative estimate P(sr) directly using a
nonlinear estimate (if s is a scalar and P(sr)
is gaussian, we only need to estimate two
parameters!).
57
(No Transcript)
58
Outline
• Definition
• The encoding process
• Decoding population codes
• Quantifying information Shannon and Fisher
information
• Basis functions and optimal computation

59
Fisher Information
Fisher information is defined as and
it is equal to
where P(rs) is the distribution of the neuronal
noise.
60
Fisher Information
61
Fisher Information
• For one neuron with Poisson noise
• For n independent neurons

62
Fisher Information and Tuning Curves
• Fisher information is maximum where the slope is
maximum
• This is consistent with adaptation experiments
• Fisher information adds up for independent
neurons (unlike Shannon information!)

63
Fisher Information
• In 1D, Fisher information decreases as the width
of the tuning curves increases
• In 2D, Fisher information does not depend on the
width of the tuning curve
• In 3D and above, Fisher information increases as
the width of the tuning curves increases
• WARNING this is true for independent gaussian
noise.

64
Ideal observer
• The discrimination threshold of an ideal
observer, ds, is proportional to the variance of
the Cramer-Rao Bound.
• In other words, an efficient estimator is an
ideal observer.

65
• An ideal observer is an observer that can recover
all the Fisher information in the activity (easy
link between Fisher information and behavioral
performance)
• If all distributions are gaussian, Fisher
information is the same as Shannon information.

66
Population Vector and Fisher Information
1/Fisher information
Population vector
CR bound
Population vector should NEVER be used to
estimate information content!!!! The indirect
method is prone to severe problems
67
(No Transcript)
68
Outline
• Definition
• The encoding process
• Decoding population codes
• Quantifying information Shannon and Fisher
information
• Basis functions and optimal computation

69
• So far we have only talked about decoding from
the point of view of an experimentalists.
• How is that relevant to neural computation?
Neurons do not decode, they compute!
• What kind of computation can we perform with
population codes?

70
Computing functions
• If we denote the sensory input as a vector S and
the motor command as M, a sensorimotor
transformation is a mapping from S to M
• Mf(S)
• Where f is typically a nonlinear function

71
Example
• 2 Joint arm

q2
y
q1
x
72
Basis functions
• Most nonlinear functions can be approximated by
linear combinations of basis functions
• Ex Fourier Transform

73
Basis Functions
74
Basis Functions
• A basis functions decomposition is like a three
layer network. The intermediate units are the
basis functions

y
X
75
Basis Functions
• Networks with sigmoidal units are also basis
function networks

76
A
B
Z
2
3
Z
X
Y
Y
X
C
D
Z
Y
X
Linear Combination
Z
Basis Function Layer
Y
X
Y
X
Y
X
Y
X
77
Basis Functions
• Decompose the computation of Mf(S,P) in two
stages
• Compute basis functions of S and P
• Combine the basis functions linearly to obtain
the motor command

78
Basis Functions
• Note that M can be a population code, e.g. the
components of that vector could correspond to
units with bell-shaped tuning curves.

79
Example Computing the head-centered location of
an object from its retinal location

Gaze
Fixation point
80
Basis Functions
81
Basis Function Units
Ri
82
Basis Function Units
Ri
83
Visual receptive fields in VIP are partially
shifting with the eye
Fixation point
Retinotopic location
Screen
(Duhamel, Bremmer, BenHamed and Graf, 1997)
84
Summary
• Definition
• Population codes involve the concerted activity
of large populations of neurons
• The encoding process
• The activity of the neurons can be formalized as
being the sum of a tuning curve plus noise

85
Summary
• Decoding population codes
• Optimal decoding can be performed with Maximum
Likelihood estimation (xML) or Bayesian
inferences (p(sr))
• Quantifying information Fisher information
• Fisher information provides an upper bound on the
amount of information available in a population
code

86
Summary
• Basis functions and optimal computation
• Population codes can be used to perform arbitrary
nonlinear transformations because they provide
basis sets.

87
Where do we go from here?
• Computation and Bayesian inferences
• Knill, Koerding, Todorov Experimental evidence
for Bayesian inferences in humans.
• Shadlen Neural basis of Bayesian inferences
• Latham, Olshausen Bayesian inferences in
recurrent neural nets

88
Where do we go from here?
• Other encoding hypothesis probabilistic
interpretations
• Zemel, Rao