Loading...

PPT – Blind Source Separation by Independent Components Analysis PowerPoint presentation | free to download - id: 426e32-ODZmN

The Adobe Flash plugin is needed to view this content

Blind Source Separation by Independent Components

Analysis

- Professor Dr. Barrie W. Jervis
- School of Engineering
- Sheffield Hallam University
- England
- B.W.Jervis_at_shu.ac.uk

The Problem

- Temporally independent unknown source signals are

linearly mixed in an unknown system to produce a

set of measured output signals. - It is required to determine the source signals.

- Methods of solving this problem are known as

Blind Source Separation (BSS) techniques. - In this presentation the method of Independent

Components Analysis (ICA) will be described. - The arrangement is illustrated in the next slide.

Arrangement for BSS by ICA

y1g1(u1) y2g2(u2) yngn(un)

g(.)

s1 s2 sn

x1 x2 xn

u1 u2 un

A

W

Neural Network Interpretation

- The si are the independent source signals,
- A is the linear mixing matrix,
- The xi are the measured signals,
- W ? A-1 is the estimated unmixing matrix,
- The ui are the estimated source signals or

activations, i.e. ui ?? si, - The gi(ui) are monotonic nonlinear functions

(sigmoids, hyperbolic tangents), - The yi are the network outputs.

Principles of Neural Network Approach

- Use Information Theory to derive an algorithm

which minimises the mutual information between

the outputs yg(u). - This minimises the mutual information between the

source signal estimates, u, since g(u) introduces

no dependencies. - The different u are then temporally independent

and are the estimated source signals.

Cautions I

- The magnitudes and signs of the estimated source

signals are unreliable since - the magnitudes are not scaled
- the signs are undefined
- because magnitude and sign information is shared

between the source signal vector and the unmixing

matrix, W. - The order of the outputs is permutated compared

wiith the inputs

Cautions II

- Similar overlapping source signals may not be

properly extracted. - If the number of output channels ? number of

source signals, those source signals of lowest

variance will not be extracted. This is a problem

when these signals are important.

Information Theory I

- If X is a vector of variables (messages) xi which

occur with probabilities P(xi), then the average

information content of a stream of N messages is

bits

and is known as the entropy of the random

variable, X.

Information Theory II

- Note that the entropy is expressible in terms of

probability. - Given the probability density distribution (pdf)

of X we can find the associated entropy. - This link between entropy and pdf is of the

greatest importance in ICA theory.

Information Theory III

- The joint entropy between two random variables X

and Y is given by

- For independent variables

Information Theory IV

- The conditional entropy of Y given X measures the

average uncertainty remaining about y when x is

known, and is

- The mutual information between Y and X is

- In ICA, X represents the measured signals, which

are applied to the nonlinear function g(u) to

obtain the outputs Y.

Bell and Sejnowskis ICA Theory (1995)

- Aim to maximise the amount of mutual information

between the inputs X and the outputs Y of the

neural network.

(Uncertainty about Y when X is unknown)

- Y is a function of W and g(u).
- Here we seek to determine the W which

produces the ui ? si, assuming the correct g(u).

Differentiating

(0, since it did not come through W from X.) So,

maximising this mutual information is equivalent

to maximising the joint output entropy,

which is seen to be equivalent to minimising the

mutual information between the outputs and hence

the ui, as desired.

The Functions g(u)

- The outputs yi are amplitude bounded random

variables, and so the marginal entropies H(yi)

are maximum when the yi are uniformly distributed

- a known statistical result. - With the H(yi) maximised, I(Y,X) 0, and the yi

uniformly distributed, the nonlinearity gi(ui)

has the form of the cumulative distribution

function of the probability density function of

the si, - a proven result.

Pause and review g(u) and W

- W has to be chosen to maximise the joint output

entropy H(Y,X), which minimises the mutual

information between the estimated source signals,

ui. - The g(u) should be the cumulative distribution

functions of the source signals, si. - Determining the g(u) is a major problem.

One input and one output

- For a monotonic nonlinear function, g(x),

- Also

- Substituting

(independent of W)

(we only need to maximise this term)

- A stochastic gradient ascent learning rule is

adopted to maximise H(y) by assuming

- Further progress requires knowledge of g(u).

Assume for now, after Bell and Sejnowski, that

g(u) is sigmoidal, i.e.

- Also assume

Learning Rule 1 input, 1 output

Hence, we find

Learning Rule N inputs, N outputs

- Need

- Assuming g(u) is sigmoidal again, we obtain

- The network is trained until the changes in the

weights become acceptably small at each

iteration. - Thus the unmixing matrix W is found.

The Natural Gradient

- The computation of the inverse matrix

is time-consuming, and may be avoided by

rescaling the entropy gradient by multiplying it

by

- Thus, for a sigmoidal g(u) we obtain

- This is the natural gradient, introduced by

Amari (1998), and now widely adopted.

The nonlinearity, g(u)

- We have already learnt that the g(u) should be

the cumulative probability densities of the

individual source distributions. - So far the g(u) have been assumed to be

sigmoidal, so what are the pdfs of the si? - The corresponding pdfs of the si are

super-Gaussian.

Super- and sub-Gaussian pdfs

Gaussian

Super-Gaussian

Sub-Gaussian

- Note there are no mathematical definitions of

super- and sub-Gaussians

Super- and sub-Gaussians

- Super-Gaussians kurtosis (fourth order

central moment, measures the flatness of the

pdf) gt 0. infrequent signals of short duration,

e.g. evoked brain signals. - Sub-Gaussians kurtosis lt 0 signals

mainly on, e.g. 50/60 Hz electrical mains

supply, but also eye blinks.

Kurtosis

- Kurtosis 4th order central moment

- and is seen to be calculated from the current

estimates of the source signals. - To separate the independent sources, information

about their pdfs such as skewness (3rd. moment)

and flatness (kurtosis) is required. - First and 2nd. moments (mean and variance) are

insufficient.

A more generalised learning rule

- Girolami (1997) showed that tanh(ui) and

-tanh(ui) could be used for super- and

sub-Gaussians respectively. - Cardoso and Laheld (1996) developed a stability

analysis to determine whether the source signals

were to be considered super- or sub-Gaussian. - Lee, Girolami, and Sejnowski (1998) applied these

findings to develop their extended infomax

algorithm for super- and sub-Gaussians using a

kurtosis-based switching rule.

Extended Infomax Learning Rule

- With super-Gaussians modelled as

and sub-Gaussians as a Pearson mixture model

the new extended learning rule is

Switching Decision

and the ki are the elements of the N-dimensional

diagonal matrix, K, and

- Modifications of the formula for ki exist, but

in our experience the extended algorithm has been

unsatisfactory.

Reasons for unsatisfactory extended algorithm

- 1) Initial assumptions about super- and

sub-Gaussian distributions may be too inaccurate. - 2) The switching criterion may be inadequate.

Alternatives

- Postulate vague distributions for the source

signals which are then developed iteratively

during training. - Use an alternative approach, e.g, statistically

based, JADE (Cardoso).

Summary so far

- We have seen how W may be obtained by training

the network, and the extended algorithm for

switching between super- and sub-Gaussians has

been described. - Alternative approaches have been mentioned.
- Next we consider how to obtain the source signals

knowing W and the measured signals, x.

Source signal determination

- The system is

Mixing matrix A

Unmixing matrix W

g(u)

si unknown

xi measured

ui?si estimated

yi

- Hence UW.x and xA.S where A?W-1, and U?S.
- The rows of U are the estimated source signals,

known as activations (as functions of time). - The rows of x are the time-varying measured

signals.

Source Signals

Channel number

Time, or sample number

Expressions for the Activations

- We see that consecutive values of u are obtained

by filtering consecutive columns of x by the same

row of W.

- The ith row of u is the ith row of w by the

columns of x.

Procedure

- Record N time points from each of M sensors,

where N ? 5M. - Pre-process the data, e.g. filtering, trend

removal. - Sphere the data using Principal Components

Analysis (PCA). This is not essential but speeds

up the computation by first removing first and

second order moments. - Compute the ui ? si. Include desphering.
- Analyse the results.

Optional Procedures I

- The contribution of each activation at a sensor

may be found by back-projecting it to the

sensor.

Optional Procedures II

- A measured signal which is contaminated by

artefacts or noise may be extracted by

back-projecting all the signal activations to

the measurement electrode, setting other

activations to zero. (An artefact and noise

removal method).

Current Developments

- Overcomplete representations - more signal

sources than sensors. - Nonlinear mixing.
- Nonstationary sources.
- General formulation of g(u).

Conclusions

- It has been shown how to extract temporally

independent unknown source signals from their

linear mixtures at the outputs of an unknown

system using Independent Components Analysis. - Some of the limitations of the method have been

mentioned. - Current developments have been highlighted.