Loading...

PPT – Population Coding Alexandre Pouget Okinawa Computational Neuroscience Course Okinawa, Japan November PowerPoint presentation | free to view - id: 1b49da-ZDc1Z

The Adobe Flash plugin is needed to view this content

Population CodingAlexandre PougetOkinawa

Computational Neuroscience CourseOkinawa, Japan

November 2004

Outline

- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher

information - Basis functions and optimal computation

Outline

- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher

information - Basis functions and optimal computation

Receptive field

s Direction of motion

Response

Stimulus

Receptive field

s Direction of motion

Trial 1

Trial 2

Trial 3

Trial 4

Stimulus

(No Transcript)

Tuning curves and noise

- Example of tuning curves
- Retinal location, orientation, depth, color, eye

movements, arm movements, numbers etc.

Population Codes

100

100

s?

80

80

60

60

Activity

Activity

40

40

20

20

0

0

-100

0

100

-100

0

100

Direction (deg)

Preferred Direction (deg)

Tuning Curves

Pattern of activity (r)

Bayesian approach

- We want to recover P(sr). Using Bayes theorem,

we have

Bayesian approach

- Bayes rule

Bayesian approach

- We want to recover P(sr). Using Bayes theorem,

we have

Bayesian approach

- If we are to do any type of computation with

population codes, we need a probabilistic model

of how the activity are generated, p(rs), i.e.,

we need to model the encoding process.

Activity distribution

Tuning curves and noise

- The activity ( of spikes per second) of a

neuron can be written as - where fi(q) is the mean activity of the neuron

(the tuning curve) and ni is a noise with zero

mean. If the noise is gaussian, then

Probability distributions and activity

- The noise is a random variable which can be

characterized by a conditional probability

distribution, P(nis). - The distributions of the activity P(ris). and

the noise differ only by their means (Eni0,

Erifi(s)).

Examples of activity distributions

- Gaussian noise with fixed variance
- Gaussian noise with variance equal to the mean

- Poisson distribution
- The variance of a Poisson distribution is equal

to its mean.

Comparison of Poisson vs Gaussian noise with

variance equal to the mean

0.09

0.08

0.07

0.06

0.05

Probability

0.04

0.03

0.02

0.01

0

0

20

40

60

80

100

120

140

Activity (spike/sec)

Population of neurons

- Gaussian noise with fixed variance

Population of neurons

- Gaussian noise with arbitrary covariance matrix

S

Outline

- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher

information - Basis functions and optimal computation

Population Codes

100

100

s?

80

80

60

60

Activity

Activity

40

40

20

20

0

0

-100

0

100

-100

0

100

Direction (deg)

Preferred Direction (deg)

Tuning Curves

Pattern of activity (r)

Nature of the problem

In response to a stimulus with unknown value s,

you observe a pattern of activity r. What can you

say about s given r?

Bayesian approach recover p(sr) (the posterior

distribution)

Estimation Theory

Activity vector r

(No Transcript)

Estimation Theory

r

Estimation theory

- A common measure of decoding performance is the

mean square error between the estimate and the

true value - This error can be decomposed as

Efficient Estimators

- The smallest achievable variance for an unbiased

estimator is known as the Cramer-Rao bound, sCR2. - An efficient estimator is such that
- In general

Estimation Theory

Activity vector r

Examples of decoders

Voting Methods

- Optimal Linear Estimator

Linear Estimators

Linear Estimators

Linear Estimators

X and Y must be zero mean

Trust cells that have small variances and large

covariances

Voting Methods

- Optimal Linear Estimator

Voting Methods

- Optimal Linear Estimator
- Center of Mass

Center of Mass/Population Vector

- The center of mass is optimal (unbiased and

efficient) iff The tuning curves are gaussian

with a zero baseline, uniformly distributed and

the noise follows a Poisson distribution - In general, the center of mass has a large bias

and a large variance

Voting Methods

- Optimal Linear Estimator
- Center of Mass
- Population Vector

Population Vector

Voting Methods

- Optimal Linear Estimator
- Center of Mass
- Population Vector

Population Vector

Typically, Population vector is not the optimal

linear estimator.

Population Vector

Population Vector

- Population vector is optimal iff The tuning

curves are cosine, uniformly distributed and the

noise follows a normal distribution with fixed

variance - In most cases, the population vector is biased

and has a large variance

Maximum Likelihood

- The maximum likelihood estimate is the value of

s maximizing the likelihood P(rs). Therefore, we

seek such that - is unbiased and efficient.

Maximum Likelihood

100

80

60

Activity

40

20

0

-100

0

100

Direction (deg)

Tuning Curves

Maximum Likelihood

Template

Maximum Likelihood

Activity

0

Template

Preferred Direction (deg)

ML and template matching

- Maximum likelihood is a template matching

procedure BUT the metric used is not always the

Euclidean distance, it depends on the noise

distribution.

Maximum Likelihood

- The maximum likelihood estimate is the value of

s maximizing the likelihood P(rs). Therefore, we

seek such that

Maximum Likelihood

- If the noise is gaussian and independent
- Therefore
- and the estimate is given by

Maximum Likelihood

Activity

0

Preferred Direction (deg)

Gaussian noise with variance proportional to the

mean

- If the noise is gaussian with variance

proportional to the mean, the distance being

minimized changes to

Bayesian approach

- We want to recover P(sr). Using Bayes theorem,

we have

Bayesian approach

- The prior P(s) correspond to any knowledge we may

have about s before we get to see any activity. - Note the Bayesian approach does not reduce to

the use of a prior

Bayesian approach

Once we have P(sr), we can proceed in two

different ways. We can keep this distribution for

Bayesian inferences (as we would do in a Bayesian

network) or we can make a decision about s. For

instance, we can estimate s as being the value

that maximizes P(sr), This is known as the

maximum a posteriori estimate (MAP). For flat

prior, ML and MAP are equivalent.

Bayesian approach

Limitations the Bayesian approach and ML require

a lot of data (estimating P(rs) requires at

least n(n-1)(n-1)/2 parameters)

Bayesian approach

Limitations the Bayesian approach and ML require

a lot of data (estimating P(rs) requires at

least O(n2) parameters, n100, n210000)

Alternative estimate P(sr) directly using a

nonlinear estimate (if s is a scalar and P(sr)

is gaussian, we only need to estimate two

parameters!).

(No Transcript)

Outline

- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher

information - Basis functions and optimal computation

Fisher Information

Fisher information is defined as and

it is equal to

where P(rs) is the distribution of the neuronal

noise.

Fisher Information

Fisher Information

- For one neuron with Poisson noise
- For n independent neurons

Fisher Information and Tuning Curves

- Fisher information is maximum where the slope is

maximum - This is consistent with adaptation experiments
- Fisher information adds up for independent

neurons (unlike Shannon information!)

Fisher Information

- In 1D, Fisher information decreases as the width

of the tuning curves increases - In 2D, Fisher information does not depend on the

width of the tuning curve - In 3D and above, Fisher information increases as

the width of the tuning curves increases - WARNING this is true for independent gaussian

noise.

Ideal observer

- The discrimination threshold of an ideal

observer, ds, is proportional to the variance of

the Cramer-Rao Bound. - In other words, an efficient estimator is an

ideal observer.

- An ideal observer is an observer that can recover

all the Fisher information in the activity (easy

link between Fisher information and behavioral

performance) - If all distributions are gaussian, Fisher

information is the same as Shannon information.

Population Vector and Fisher Information

1/Fisher information

Population vector

CR bound

Population vector should NEVER be used to

estimate information content!!!! The indirect

method is prone to severe problems

(No Transcript)

Outline

- Definition
- The encoding process
- Decoding population codes
- Quantifying information Shannon and Fisher

information - Basis functions and optimal computation

- So far we have only talked about decoding from

the point of view of an experimentalists. - How is that relevant to neural computation?

Neurons do not decode, they compute! - What kind of computation can we perform with

population codes?

Computing functions

- If we denote the sensory input as a vector S and

the motor command as M, a sensorimotor

transformation is a mapping from S to M - Mf(S)
- Where f is typically a nonlinear function

Example

- 2 Joint arm

q2

y

q1

x

Basis functions

- Most nonlinear functions can be approximated by

linear combinations of basis functions - Ex Fourier Transform
- Ex Radial Basis Functions

Basis Functions

Basis Functions

- A basis functions decomposition is like a three

layer network. The intermediate units are the

basis functions

y

X

Basis Functions

- Networks with sigmoidal units are also basis

function networks

A

B

Z

2

3

Z

X

Y

Y

X

C

D

Z

Y

X

Linear Combination

Z

Basis Function Layer

Y

X

Y

X

Y

X

Y

X

Basis Functions

- Decompose the computation of Mf(S,P) in two

stages - Compute basis functions of S and P
- Combine the basis functions linearly to obtain

the motor command

Basis Functions

- Note that M can be a population code, e.g. the

components of that vector could correspond to

units with bell-shaped tuning curves.

Example Computing the head-centered location of

an object from its retinal location

Head position

Gaze

Fixation point

Basis Functions

Basis Function Units

Ri

Basis Function Units

Ri

Visual receptive fields in VIP are partially

shifting with the eye

Fixation point

Head-centered location

Retinotopic location

Screen

(Duhamel, Bremmer, BenHamed and Graf, 1997)

Summary

- Definition
- Population codes involve the concerted activity

of large populations of neurons - The encoding process
- The activity of the neurons can be formalized as

being the sum of a tuning curve plus noise

Summary

- Decoding population codes
- Optimal decoding can be performed with Maximum

Likelihood estimation (xML) or Bayesian

inferences (p(sr)) - Quantifying information Fisher information
- Fisher information provides an upper bound on the

amount of information available in a population

code

Summary

- Basis functions and optimal computation
- Population codes can be used to perform arbitrary

nonlinear transformations because they provide

basis sets.

Where do we go from here?

- Computation and Bayesian inferences
- Knill, Koerding, Todorov Experimental evidence

for Bayesian inferences in humans. - Shadlen Neural basis of Bayesian inferences
- Latham, Olshausen Bayesian inferences in

recurrent neural nets

Where do we go from here?

- Other encoding hypothesis probabilistic

interpretations - Zemel, Rao