Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November

Description:

The activity (# of spikes per second) of a neuron can be written as: ... that maximizes P(s|r), This is known as the maximum a posteriori estimate (MAP) ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 89
Provided by: Alexandr205
Category:

less

Transcript and Presenter's Notes

Title: Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November


1
Population CodingAlexandre PougetOkinawa
Computational Neuroscience CourseOkinawa, Japan
November 2004
2
Outline
  • Definition
  • The encoding process
  • Decoding population codes
  • Quantifying information Shannon and Fisher
    information
  • Basis functions and optimal computation

3
Outline
  • Definition
  • The encoding process
  • Decoding population codes
  • Quantifying information Shannon and Fisher
    information
  • Basis functions and optimal computation

4
Receptive field
s Direction of motion
Response
Stimulus
5
Receptive field
s Direction of motion
Trial 1
Trial 2
Trial 3
Trial 4
Stimulus
6
(No Transcript)
7
Tuning curves and noise
  • Example of tuning curves
  • Retinal location, orientation, depth, color, eye
    movements, arm movements, numbers etc.

8
Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
9
Bayesian approach
  • We want to recover P(sr). Using Bayes theorem,
    we have

10
Bayesian approach
  • Bayes rule

11
Bayesian approach
  • We want to recover P(sr). Using Bayes theorem,
    we have

12
Bayesian approach
  • If we are to do any type of computation with
    population codes, we need a probabilistic model
    of how the activity are generated, p(rs), i.e.,
    we need to model the encoding process.

13
Activity distribution
14
Tuning curves and noise
  • The activity ( of spikes per second) of a
    neuron can be written as
  • where fi(q) is the mean activity of the neuron
    (the tuning curve) and ni is a noise with zero
    mean. If the noise is gaussian, then

15
Probability distributions and activity
  • The noise is a random variable which can be
    characterized by a conditional probability
    distribution, P(nis).
  • The distributions of the activity P(ris). and
    the noise differ only by their means (Eni0,
    Erifi(s)).

16
Examples of activity distributions
  • Gaussian noise with fixed variance
  • Gaussian noise with variance equal to the mean

17
  • Poisson distribution
  • The variance of a Poisson distribution is equal
    to its mean.

18
Comparison of Poisson vs Gaussian noise with
variance equal to the mean
0.09
0.08
0.07
0.06
0.05
Probability
0.04
0.03
0.02
0.01
0
0
20
40
60
80
100
120
140
Activity (spike/sec)
19
Population of neurons
  • Gaussian noise with fixed variance

20
Population of neurons
  • Gaussian noise with arbitrary covariance matrix
    S

21
Outline
  • Definition
  • The encoding process
  • Decoding population codes
  • Quantifying information Shannon and Fisher
    information
  • Basis functions and optimal computation

22
Population Codes
100
100
s?
80
80
60
60
Activity
Activity
40
40
20
20
0
0
-100
0
100
-100
0
100
Direction (deg)
Preferred Direction (deg)
Tuning Curves
Pattern of activity (r)
23
Nature of the problem
In response to a stimulus with unknown value s,
you observe a pattern of activity r. What can you
say about s given r?
Bayesian approach recover p(sr) (the posterior
distribution)
24
Estimation Theory
Activity vector r
25
(No Transcript)
26
Estimation Theory
r
27
Estimation theory
  • A common measure of decoding performance is the
    mean square error between the estimate and the
    true value
  • This error can be decomposed as

28
Efficient Estimators
  • The smallest achievable variance for an unbiased
    estimator is known as the Cramer-Rao bound, sCR2.
  • An efficient estimator is such that
  • In general

29
Estimation Theory
Activity vector r
Examples of decoders
30
Voting Methods
  • Optimal Linear Estimator

31
Linear Estimators
32
Linear Estimators
33
Linear Estimators
X and Y must be zero mean
Trust cells that have small variances and large
covariances
34
Voting Methods
  • Optimal Linear Estimator

35
Voting Methods
  • Optimal Linear Estimator
  • Center of Mass

36
Center of Mass/Population Vector
  • The center of mass is optimal (unbiased and
    efficient) iff The tuning curves are gaussian
    with a zero baseline, uniformly distributed and
    the noise follows a Poisson distribution
  • In general, the center of mass has a large bias
    and a large variance

37
Voting Methods
  • Optimal Linear Estimator
  • Center of Mass
  • Population Vector

38
Population Vector
39
Voting Methods
  • Optimal Linear Estimator
  • Center of Mass
  • Population Vector

40
Population Vector
Typically, Population vector is not the optimal
linear estimator.
41
Population Vector
42
Population Vector
  • Population vector is optimal iff The tuning
    curves are cosine, uniformly distributed and the
    noise follows a normal distribution with fixed
    variance
  • In most cases, the population vector is biased
    and has a large variance

43
Maximum Likelihood
  • The maximum likelihood estimate is the value of
    s maximizing the likelihood P(rs). Therefore, we
    seek such that
  • is unbiased and efficient.

44
Maximum Likelihood
100
80
60
Activity
40
20
0
-100
0
100
Direction (deg)
Tuning Curves
45
Maximum Likelihood
Template
46
Maximum Likelihood
Activity
0
Template
Preferred Direction (deg)
47
ML and template matching
  • Maximum likelihood is a template matching
    procedure BUT the metric used is not always the
    Euclidean distance, it depends on the noise
    distribution.

48
Maximum Likelihood
  • The maximum likelihood estimate is the value of
    s maximizing the likelihood P(rs). Therefore, we
    seek such that

49
Maximum Likelihood
  • If the noise is gaussian and independent
  • Therefore
  • and the estimate is given by

50
Maximum Likelihood
Activity
0
Preferred Direction (deg)
51
Gaussian noise with variance proportional to the
mean
  • If the noise is gaussian with variance
    proportional to the mean, the distance being
    minimized changes to

52
Bayesian approach
  • We want to recover P(sr). Using Bayes theorem,
    we have

53
Bayesian approach
  • The prior P(s) correspond to any knowledge we may
    have about s before we get to see any activity.
  • Note the Bayesian approach does not reduce to
    the use of a prior

54
Bayesian approach
Once we have P(sr), we can proceed in two
different ways. We can keep this distribution for
Bayesian inferences (as we would do in a Bayesian
network) or we can make a decision about s. For
instance, we can estimate s as being the value
that maximizes P(sr), This is known as the
maximum a posteriori estimate (MAP). For flat
prior, ML and MAP are equivalent.
55
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least n(n-1)(n-1)/2 parameters)
56
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data (estimating P(rs) requires at
least O(n2) parameters, n100, n210000)
Alternative estimate P(sr) directly using a
nonlinear estimate (if s is a scalar and P(sr)
is gaussian, we only need to estimate two
parameters!).
57
(No Transcript)
58
Outline
  • Definition
  • The encoding process
  • Decoding population codes
  • Quantifying information Shannon and Fisher
    information
  • Basis functions and optimal computation

59
Fisher Information
Fisher information is defined as and
it is equal to
where P(rs) is the distribution of the neuronal
noise.
60
Fisher Information
61
Fisher Information
  • For one neuron with Poisson noise
  • For n independent neurons

62
Fisher Information and Tuning Curves
  • Fisher information is maximum where the slope is
    maximum
  • This is consistent with adaptation experiments
  • Fisher information adds up for independent
    neurons (unlike Shannon information!)

63
Fisher Information
  • In 1D, Fisher information decreases as the width
    of the tuning curves increases
  • In 2D, Fisher information does not depend on the
    width of the tuning curve
  • In 3D and above, Fisher information increases as
    the width of the tuning curves increases
  • WARNING this is true for independent gaussian
    noise.

64
Ideal observer
  • The discrimination threshold of an ideal
    observer, ds, is proportional to the variance of
    the Cramer-Rao Bound.
  • In other words, an efficient estimator is an
    ideal observer.

65
  • An ideal observer is an observer that can recover
    all the Fisher information in the activity (easy
    link between Fisher information and behavioral
    performance)
  • If all distributions are gaussian, Fisher
    information is the same as Shannon information.

66
Population Vector and Fisher Information
1/Fisher information
Population vector
CR bound
Population vector should NEVER be used to
estimate information content!!!! The indirect
method is prone to severe problems
67
(No Transcript)
68
Outline
  • Definition
  • The encoding process
  • Decoding population codes
  • Quantifying information Shannon and Fisher
    information
  • Basis functions and optimal computation

69
  • So far we have only talked about decoding from
    the point of view of an experimentalists.
  • How is that relevant to neural computation?
    Neurons do not decode, they compute!
  • What kind of computation can we perform with
    population codes?

70
Computing functions
  • If we denote the sensory input as a vector S and
    the motor command as M, a sensorimotor
    transformation is a mapping from S to M
  • Mf(S)
  • Where f is typically a nonlinear function

71
Example
  • 2 Joint arm

q2
y
q1
x
72
Basis functions
  • Most nonlinear functions can be approximated by
    linear combinations of basis functions
  • Ex Fourier Transform
  • Ex Radial Basis Functions

73
Basis Functions
74
Basis Functions
  • A basis functions decomposition is like a three
    layer network. The intermediate units are the
    basis functions

y
X
75
Basis Functions
  • Networks with sigmoidal units are also basis
    function networks

76
A
B
Z
2
3
Z
X
Y
Y
X
C
D
Z
Y
X
Linear Combination
Z
Basis Function Layer
Y
X
Y
X
Y
X
Y
X
77
Basis Functions
  • Decompose the computation of Mf(S,P) in two
    stages
  • Compute basis functions of S and P
  • Combine the basis functions linearly to obtain
    the motor command

78
Basis Functions
  • Note that M can be a population code, e.g. the
    components of that vector could correspond to
    units with bell-shaped tuning curves.

79
Example Computing the head-centered location of
an object from its retinal location
Head position

Gaze
Fixation point
80
Basis Functions
81
Basis Function Units
Ri
82
Basis Function Units
Ri
83
Visual receptive fields in VIP are partially
shifting with the eye
Fixation point
Head-centered location
Retinotopic location
Screen
(Duhamel, Bremmer, BenHamed and Graf, 1997)
84
Summary
  • Definition
  • Population codes involve the concerted activity
    of large populations of neurons
  • The encoding process
  • The activity of the neurons can be formalized as
    being the sum of a tuning curve plus noise

85
Summary
  • Decoding population codes
  • Optimal decoding can be performed with Maximum
    Likelihood estimation (xML) or Bayesian
    inferences (p(sr))
  • Quantifying information Fisher information
  • Fisher information provides an upper bound on the
    amount of information available in a population
    code

86
Summary
  • Basis functions and optimal computation
  • Population codes can be used to perform arbitrary
    nonlinear transformations because they provide
    basis sets.

87
Where do we go from here?
  • Computation and Bayesian inferences
  • Knill, Koerding, Todorov Experimental evidence
    for Bayesian inferences in humans.
  • Shadlen Neural basis of Bayesian inferences
  • Latham, Olshausen Bayesian inferences in
    recurrent neural nets

88
Where do we go from here?
  • Other encoding hypothesis probabilistic
    interpretations
  • Zemel, Rao
Write a Comment
User Comments (0)
About PowerShow.com