The Robota Dolls - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

The Robota Dolls

Description:

MACHINE LEARNING Information Theory and The Neuron - II Aude Billard Anti-Hebbian Learning and ICA Application for Blind Source Separation UNMIXED SIGNALS THROUGH ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 46
Provided by: AudeBi
Category:

less

Transcript and Presenter's Notes

Title: The Robota Dolls


1
MACHINE LEARNINGInformation Theory and The
Neuron - II

Aude Billard
2
Overview
  • LECTURE I
  • Neuron Biological Inspiration
  • Information Theory and the Neuron
  • Weight Decay Anti-Hebbian Learning ? PCA
  • Anti-Hebbian Learning ? ICA

  • LECTURE II
  • Capacity of the single Neuron
  • Capacity of Associative Memories (Willshaw Net,
  • Extended Hopfield Network)
  • LECTURE III
  • Continuous Time-Delay NN
  • Limit-Cycles, Stability and Convergence

3
Neural Processing - The Brain

Decay-depolarization
Integration
E Electrical Potential
Synapse
Refractory
Dendrites
Cell Body
time
A neuron receives and integrate input from other
neurons. Once the input exceeds a critical level,
the neuron discharges a spike. This spiking event
is also called depolarization, and is followed by
a refractory period, during which the neuron is
unable to fire.
4
Information Theory and The Neuron

W1
Output y
W2
W3
W4
  • You can view the neuron as a memory.
  • What can you store in this memory?
  • What is the maximal capacity?
  • How can you find a learning rule that maximizes
    the capacity?

5
Information Theory and The Neuron
A fundamental principle of learning systems is
their robustness to noise. One way to measure
the systems robustness to noise is to determine
the joint information between its inputs and
output.
Output y
6
Information Theory and The Neuron

W1
Output y
W2
W3
W4
Consider the neuron as a sender-receiver system,
with X being the message sent and y the received
message. Information theory can give you a
measure of the information conveyed by y about X.
If the transmission system is imperfect
(noisy), you must find a way to ensure minimal
disturbance in the transmission.
7
Information Theory and The Neuron

W1
Output y
W2
W3
W4
In order to maximize the ratio, one can increase
the magnitude of the weights.
8
Information Theory and The Neuron

W1
Output y
W2
W3
W4
The mutual information between the neuron output
y and its Inputs X is given by
This time, one cannot simply increase the
magnitude of the weights, as this affects the
value of as well.
9
Information Theory and The Neuron

10

How to define a learning rule to optimize the
mutual information?
11
Hebbian Learning

Input
Output
If x I and y I fire simultaneously, the weight
of the connection between them will be
strengthened in proportion to their strength of
firing.
12
Hebbian Learning Limit Cycle

This is true for all i, thus, w_j is an
eigenvector of C, with associated Eigenvalue 0
C is a positive, symmetric and semi-definite
matrix ? all eigenvalues are gt0.
Under a small disturbance
? The weights tend to grow in the direction of
the largest eigenvalue of C.
13
Hebbian Learning Weight Decay

The simple weight decay rule belong to a class of
decay rule called Substractive Rule
The only advantage of substractive rules over
simply clipping the weights lies in that it
allows to eliminates weights that have little
importance.
The advantage of multiplicative rules is that, in
addition to giving small weights, they also give
useful weights.
14
Information Theory and The Neuron

W1
Output y
W2
W3
W4
Ojas one neuron model
The weights converge toward the first eigenvector
of the input covariance matrix and are normalized.
15
Hebbian Learning Weight Decay
Ojas subspace algorithm

Equivalent to minimizing the generalized form of
J
16
Hebbian Learning Weight Decay
  • Why PCA, LDA, ICA with ANN?
  • Explain the way the brain could derive important
    properties of the sensory and motor space.
  • Allows to discover new mode of computation with
    simple iterative and local learning rules.

17
Recurrence in Neural Networks
  • Sofar, we have considered only feed-forward
    neural networks
  • Most biological network have recurrent
    connections.
  • This change of direction in the flow of
    information is interesting, as it can allow
  • To keep a memory of the activation of the neuron
  • To propagate the information across output
    neurons

18
Anti-Hebbian Learning

How to maximize information transmission in a
network, I.e. maximize I(xy)
19
Anti-Hebbian Learning
Anti-Hebbian learning is also known as lateral
inhibition
Average of values taken over all training
patterns
20
Anti-Hebbian Learning

If the two outputs are highly correlated, then,
the weights between them will grow to a large
negative value and each will tend to turn the
other off.
No need for weight decay or renormalizing on
anti-Hebbian weights, as they are automatically
self-limiting!
21
Anti-Hebbian Learning

Foldiaks first Model
22
Anti-Hebbian Learning

Foldiaks first Model
One can further show that there is a stable point
in the weight space.
23
Anti-Hebbian Learning

Foldiaks 2ND Model
Allows all neurons to receive their own outputs
with weight 1
  • This network will converge when
  • the outputs are decorrelated
  • the expected variance of the outputs is equal to
    1.

24
PCA versus ICA
PCA looks at the covariance matrix only. What if
the data is not well described by the covariance
matrix? The only distribution which is
uniquely specified by its covariance (with the
subtracted mean) is the Gaussian distribution.
Distributions which deviate from the Gaussian are
poorly described by their covariances.
25
PCA versus ICA
Even with non-Gaussian data, variance
maximization leads to the most faithful
representation in a reconstruction error
sense. The mean-square error measure implicitly
assumes Gaussianity, since it penalizes
datapoints close to the mean less that those that
are far away. But it does not in general lead to
the most meaningful representation. ? We need to
perform gradient descent in some function other
than the reconstruction error.
26
Uncorrelated and Statistical Independent
Independent
Uncorrelated
True for any non-linear transformation f
Statistical Independence is a stronger constraint
than decorrelation.
27
Objective Function of ICA
We want to ensure that the outputs yi are
maximally independent. This is identical to
requiring that the mutual information be
small. Or alternately that the joint entropy be
large.
H(x,y)
H(x)
H(y)
H(xy)
I(x,y)
H(yx)
28
Anti-Hebbian Learning and ICA
Anti-Hebbian Learning can also lead to a
decomposition in Statistically Independent
Component, and, as such allow to do a
decomposition of the type of ICA.
29
ICA for TIME-DEPENDENT SIGNALS
Original Signal
Adapted from Hyvarinen _at_ 2000
30
ICA for TIME-DEPENDENT SIGNALS
Mixed Signal
Adapted from Hyvarinen _at_ 2000
31
Anti-Hebbian Learning and ICA

Jutten and Herault Model
32
Anti-Hebbian Learning and ICA

HINT Use two odd functions for f and g
(f(-x)-f(x)), then their taylor series expansion
consists solely of the odd terms
Since most (audio) signals have an even
distribution, at convergence, one has
33
Anti-Hebbian Learning and ICA Application for
Blind Source Separation

MIXED SIGNALS                                 
                                                  
                                                  
                                                
                
Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998,
ICASSP 1999
34
Anti-Hebbian Learning and ICA Application for
Blind Source Separation

UNMIXED SIGNALS THROUGH GENERALIZED
ANTI-HEBBIAN LEARNING                           
                                                  
                                                  
                                                
                      
Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998,
ICASSP 1999
35
Anti-Hebbian Learning and ICA Application for
Blind Source Separation

MIXED SIGNALS                                 
                                                  
                                                  
                                                
                
Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998,
ICASSP 1999
36
Anti-Hebbian Learning and ICA Application for
Blind Source Separation

UNMIXED SIGNALS THROUGH GENERALIZED ANTI
HEBBIAN LEARNING                                  
                               
Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998,
ICASSP 1999
37
Information Maximization
Bell Sejnowsky proposed a network to maximize
the mutual information between the output and the
input when those are not subjected to noise (or
rather when the input and the noise can no longer
be distinguished, then H(YX) tend to negative
infinity).
W1
Output y
W2
W3
W0
W4
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
38
Information Maximization
Bell Sejnowsky proposed a network to maximize
the mutual information between the output and the
input when those are not subjected to noise (or
rather when the input and the noise can no longer
be distinguished, then H(YX) tend to negative
infinity).
H(YX) is independent of the weights W and so
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
39
Information Maximization
The entropy of a distribution is maximized when
all outcomes are equally likely. ? We must
choose an activation function at the output
neurons which equalizes each neurons chances of
firing and so maximizes their collective entropy.
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
40
Anti-Hebbian Learning and ICA
The sigmoid is the optimal solution to even out a
gaussian distribution so that all outputs are
equally probable
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
41
Anti-Hebbian Learning and ICA
The sigmoid is the optimal solution to even out a
gaussian distribution so that all outputs are
equally probable
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
42
Anti-Hebbian Learning and ICA
The sigmoid is the optimal solution to even out a
gaussian distribution so that all outputs are
equally probable
W1
Output y
W2
W3
W0
W4
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
43
Anti-Hebbian Learning and ICA
The pdf of the output can be written as
The entropy of the output is then given by
The learning rules that optimize this entropy are
given by
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
44
Anti-Hebbian Learning and ICA
Bell A.J. and Sejnowski T.J. 1995. An information
maximization approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
45
Anti-Hebbian Learning and ICA
This can be generalized to a many inputs - many
outputs network with sigmoid function for the
output. The learning rules that optimizes the
mutual information between input and output are
then given by
Such a network can linearly decompose up to 10
sources.
Bell A.J. and Sejnowski T.J. 1995. An information
maximisation approach to blind separation and
blind deconvolution, Neural Computation, 7, 6,
1129-1159
Write a Comment
User Comments (0)
About PowerShow.com