# Blind Source Separation by Independent Components Analysis - PowerPoint PPT Presentation

PPT – Blind Source Separation by Independent Components Analysis PowerPoint presentation | free to download - id: 426e32-ODZmN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Blind Source Separation by Independent Components Analysis

Description:

### Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 40
Provided by: Jer7150
Category:
Tags:
Transcript and Presenter's Notes

Title: Blind Source Separation by Independent Components Analysis

1
Blind Source Separation by Independent Components
Analysis
• Professor Dr. Barrie W. Jervis
• School of Engineering
• Sheffield Hallam University
• England
• B.W.Jervis_at_shu.ac.uk

2
The Problem
• Temporally independent unknown source signals are
linearly mixed in an unknown system to produce a
set of measured output signals.
• It is required to determine the source signals.

3
• Methods of solving this problem are known as
Blind Source Separation (BSS) techniques.
• In this presentation the method of Independent
Components Analysis (ICA) will be described.
• The arrangement is illustrated in the next slide.

4
Arrangement for BSS by ICA
y1g1(u1) y2g2(u2) yngn(un)
g(.)
s1 s2 sn
x1 x2 xn
u1 u2 un
A
W
5
Neural Network Interpretation
• The si are the independent source signals,
• A is the linear mixing matrix,
• The xi are the measured signals,
• W ? A-1 is the estimated unmixing matrix,
• The ui are the estimated source signals or
activations, i.e. ui ?? si,
• The gi(ui) are monotonic nonlinear functions
(sigmoids, hyperbolic tangents),
• The yi are the network outputs.

6
Principles of Neural Network Approach
• Use Information Theory to derive an algorithm
which minimises the mutual information between
the outputs yg(u).
• This minimises the mutual information between the
source signal estimates, u, since g(u) introduces
no dependencies.
• The different u are then temporally independent
and are the estimated source signals.

7
Cautions I
• The magnitudes and signs of the estimated source
signals are unreliable since
• the magnitudes are not scaled
• the signs are undefined
between the source signal vector and the unmixing
matrix, W.
• The order of the outputs is permutated compared
wiith the inputs

8
Cautions II
• Similar overlapping source signals may not be
properly extracted.
• If the number of output channels ? number of
source signals, those source signals of lowest
variance will not be extracted. This is a problem
when these signals are important.

9
Information Theory I
• If X is a vector of variables (messages) xi which
occur with probabilities P(xi), then the average
information content of a stream of N messages is

bits
and is known as the entropy of the random
variable, X.
10
Information Theory II
• Note that the entropy is expressible in terms of
probability.
• Given the probability density distribution (pdf)
of X we can find the associated entropy.
• This link between entropy and pdf is of the
greatest importance in ICA theory.

11
Information Theory III
• The joint entropy between two random variables X
and Y is given by
• For independent variables

12
Information Theory IV
• The conditional entropy of Y given X measures the
average uncertainty remaining about y when x is
known, and is
• The mutual information between Y and X is
• In ICA, X represents the measured signals, which
are applied to the nonlinear function g(u) to
obtain the outputs Y.

13
Bell and Sejnowskis ICA Theory (1995)
• Aim to maximise the amount of mutual information
between the inputs X and the outputs Y of the
neural network.

(Uncertainty about Y when X is unknown)
• Y is a function of W and g(u).
• Here we seek to determine the W which
produces the ui ? si, assuming the correct g(u).

14
Differentiating
(0, since it did not come through W from X.) So,
maximising this mutual information is equivalent
to maximising the joint output entropy,
which is seen to be equivalent to minimising the
mutual information between the outputs and hence
the ui, as desired.
15
The Functions g(u)
• The outputs yi are amplitude bounded random
variables, and so the marginal entropies H(yi)
are maximum when the yi are uniformly distributed
- a known statistical result.
• With the H(yi) maximised, I(Y,X) 0, and the yi
uniformly distributed, the nonlinearity gi(ui)
has the form of the cumulative distribution
function of the probability density function of
the si, - a proven result.

16
Pause and review g(u) and W
• W has to be chosen to maximise the joint output
entropy H(Y,X), which minimises the mutual
information between the estimated source signals,
ui.
• The g(u) should be the cumulative distribution
functions of the source signals, si.
• Determining the g(u) is a major problem.

17
One input and one output
• For a monotonic nonlinear function, g(x),
• Also
• Substituting

(independent of W)
(we only need to maximise this term)
18
• A stochastic gradient ascent learning rule is
adopted to maximise H(y) by assuming
• Further progress requires knowledge of g(u).
Assume for now, after Bell and Sejnowski, that
g(u) is sigmoidal, i.e.
• Also assume

19
Learning Rule 1 input, 1 output
Hence, we find
20
Learning Rule N inputs, N outputs
• Need
• Assuming g(u) is sigmoidal again, we obtain

21
• The network is trained until the changes in the
weights become acceptably small at each
iteration.
• Thus the unmixing matrix W is found.

22
• The computation of the inverse matrix

is time-consuming, and may be avoided by
rescaling the entropy gradient by multiplying it
by
• Thus, for a sigmoidal g(u) we obtain
• This is the natural gradient, introduced by
Amari (1998), and now widely adopted.

23
The nonlinearity, g(u)
• We have already learnt that the g(u) should be
the cumulative probability densities of the
individual source distributions.
• So far the g(u) have been assumed to be
sigmoidal, so what are the pdfs of the si?
• The corresponding pdfs of the si are
super-Gaussian.

24
Super- and sub-Gaussian pdfs
Gaussian
Super-Gaussian
Sub-Gaussian
• Note there are no mathematical definitions of
super- and sub-Gaussians

25
Super- and sub-Gaussians
• Super-Gaussians kurtosis (fourth order
central moment, measures the flatness of the
pdf) gt 0. infrequent signals of short duration,
e.g. evoked brain signals.
• Sub-Gaussians kurtosis lt 0 signals
mainly on, e.g. 50/60 Hz electrical mains

26
Kurtosis
• Kurtosis 4th order central moment
• and is seen to be calculated from the current
estimates of the source signals.
• To separate the independent sources, information
about their pdfs such as skewness (3rd. moment)
and flatness (kurtosis) is required.
• First and 2nd. moments (mean and variance) are
insufficient.

27
A more generalised learning rule
• Girolami (1997) showed that tanh(ui) and
-tanh(ui) could be used for super- and
sub-Gaussians respectively.
• Cardoso and Laheld (1996) developed a stability
analysis to determine whether the source signals
were to be considered super- or sub-Gaussian.
• Lee, Girolami, and Sejnowski (1998) applied these
findings to develop their extended infomax
algorithm for super- and sub-Gaussians using a
kurtosis-based switching rule.

28
Extended Infomax Learning Rule
• With super-Gaussians modelled as

and sub-Gaussians as a Pearson mixture model
the new extended learning rule is
29
Switching Decision
and the ki are the elements of the N-dimensional
diagonal matrix, K, and
• Modifications of the formula for ki exist, but
in our experience the extended algorithm has been
unsatisfactory.

30
Reasons for unsatisfactory extended algorithm
• 1) Initial assumptions about super- and
sub-Gaussian distributions may be too inaccurate.
• 2) The switching criterion may be inadequate.

Alternatives
• Postulate vague distributions for the source
signals which are then developed iteratively
during training.
• Use an alternative approach, e.g, statistically

31
Summary so far
• We have seen how W may be obtained by training
the network, and the extended algorithm for
switching between super- and sub-Gaussians has
been described.
• Alternative approaches have been mentioned.
• Next we consider how to obtain the source signals
knowing W and the measured signals, x.

32
Source signal determination
• The system is

Mixing matrix A
Unmixing matrix W
g(u)
si unknown
xi measured
ui?si estimated
yi
• Hence UW.x and xA.S where A?W-1, and U?S.
• The rows of U are the estimated source signals,
known as activations (as functions of time).
• The rows of x are the time-varying measured
signals.

33
Source Signals
Channel number
Time, or sample number
34
Expressions for the Activations
• We see that consecutive values of u are obtained
by filtering consecutive columns of x by the same
row of W.
• The ith row of u is the ith row of w by the
columns of x.

35
Procedure
• Record N time points from each of M sensors,
where N ? 5M.
• Pre-process the data, e.g. filtering, trend
removal.
• Sphere the data using Principal Components
Analysis (PCA). This is not essential but speeds
up the computation by first removing first and
second order moments.
• Compute the ui ? si. Include desphering.
• Analyse the results.

36
Optional Procedures I
• The contribution of each activation at a sensor
may be found by back-projecting it to the
sensor.

37
Optional Procedures II
• A measured signal which is contaminated by
artefacts or noise may be extracted by
back-projecting all the signal activations to
the measurement electrode, setting other
activations to zero. (An artefact and noise
removal method).

38
Current Developments
• Overcomplete representations - more signal
sources than sensors.
• Nonlinear mixing.
• Nonstationary sources.
• General formulation of g(u).

39
Conclusions
• It has been shown how to extract temporally
independent unknown source signals from their
linear mixtures at the outputs of an unknown
system using Independent Components Analysis.
• Some of the limitations of the method have been
mentioned.
• Current developments have been highlighted.