Independent Component Analysis ICA and Factor Analysis FA - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Independent Component Analysis ICA and Factor Analysis FA

Description:

Focus is on Independent and Non-Gaussian components in ... in room (independent) ... 0 and zero if and only if the variables are statistically independent ... – PowerPoint PPT presentation

Number of Views:446
Avg rating:3.0/5.0
Slides: 29
Provided by: amit189
Category:

less

Transcript and Presenter's Notes

Title: Independent Component Analysis ICA and Factor Analysis FA


1
Independent Component Analysis (ICA)and Factor
Analysis(FA)
  • Amit Agrawal

2
Outline
  • Motivation for ICA
  • Definitions, restrictions and ambiguities
  • Comparison of ICA and FA with PCA
  • Estimation Techniques
  • Applications
  • Conclusions

3
Motivation
  • Method for finding underlying components from
    multi-dimensional data
  • Focus is on Independent and Non-Gaussian
    components in ICA as compared to uncorrelated and
    gaussian components in FA and PCA

4
Cocktail-party Problem
  • Multiple speakers in room (independent)
  • Multiple sensors receiving signals which are
    mixture of original signals
  • Estimate original source signals from mixture of
    received signals
  • Can be viewed as Blind-Source Separation as
    mixing parameters are not known

5
ICA Definition
  • Observe n random variables which are
    linear combinations of n random variables
    which are mutually independent
  • In Matrix Notation, X AS
  • Assume source signals are statistically
    independent
  • Estimate the mixing parameters and source signals
  • Find a linear transformation of observed signals
    such that the resulting signals are as
    independent as possible

6
Restrictions and Ambiguities
  • Components are assumed independent
  • Components must have non-gaussian densities
  • Energies of independent components cant be
    estimated
  • Sign Ambiguity in independent components

7
Gaussian and Non-Gaussian components
  • If some components are gaussian and some are
    non-gaussian.
  • Can estimate all non-gaussian components
  • Linear combination of gaussian components can be
    estimated.
  • If only one gaussian component, model can be
    estimated

8
Why Non-Gaussian Components
  • Uncorrelated Gaussian r.v. are independent
  • Orthogonal mixing matrix cant be estimated from
    Gaussian r.v.
  • For Gaussian r.v. estimate of model is up to an
    orthogonal transformation
  • ICA can be considered as non-gaussian factor
    analysis

9
ICA vs. PCA
  • PCA
  • Find smaller set of components with reduced
    correlation. Based on finding uncorrelated
    components
  • Needs only second order statistics
  • ICA
  • Based on finding independent components
  • Needs higher order statistics

10
Factor Analysis
  • Based on a generative latent variable model
  • where Y is zero mean, gaussian and
    uncorrelated
  • N is zero mean gaussian noise
  • Elements of Y are the unobservable factors
  • Elements of A are called factor loadings
  • In practice, have a good estimated of covariance
    of X
  • Solve for A and Noise Covariance
  • Variables should have high loadings on a small
    number of factors

11
FA vs. PCA
  • PCA
  • Not based on generative model, although can be
    derived from one
  • Linear transformation of observed data based on
    variance maximization or minimum mean-square
    representation
  • Invertible, if all components are retained
  • FA
  • Based on generative model
  • Value of factors cannot be directly computed from
    observations due to noise
  • Rows of Matrix A (factor loadings) are NOT
    proportional to eigenvectors of covariance of X
  • Both are based on second order statistics due to
    the assumption of gaussianity of factors

12
Whitening as Preprocessing for ICA
  • Elements are uncorrelated and have unit variances
  • Decorrelation followed by scaling
  • Any orthogonal transformation of whitened r.v.
    will be white
  • So whitening gives components up to orthogonal
    transformation.
  • Useful as preprocessing step for ICA.
  • Search is restricted to orthogonal mixing
    matrices
  • Parameters reduced from

13
ICA Techniques
  • Maximization of non-gaussianity
  • Maximum Likelihood Estimation
  • Minimization of Mutual Information
  • Non-Linear Decorrelation

14
ICA by Maximization of non-gaussianity
  • S is linear combination of observed signals X
  • By Central Limit Theorem (CLT), sum is more
    gaussian. So maximize non-gaussianity of mixture
    of observed signals
  • How to measure non-gaussianity
  • Kurtosis
  • Negentropy

15
Non-gaussianity using Kurtosis
  • Kurtosis
  • Kurtosis 0 for gaussian r.v.
  • Kurtosis lt 0 sub-gaussian e.g. uniform
  • Kurtosis gt 0 super-gaussian e.g. laplacian
  • Simple to compute
  • Whiten observed data x to get z z Vx
  • Maximize absolute value (or square) of kurtosis
    of wTz subject to w 1

16
Gradient Algorithm using Kurtosis
  • Start from some initial w
  • Compute direction in which absolute value of
    kurtosis of y wTz is increasing
  • Move vector w in that direction (Gradient Descent)

17
FastICA Algorithm
  • Usual problems with gradient descent such as
    learning rate, slow convergence
  • FastICA is a fixed point algorithm
  • Main Idea
  • At convergence, gradient must point in direction
    of true w or
  • Use
  • Normalize w to unit vector after each step

18
Non-gaussianity using Negentropy
  • Kurtosis is sensitive to outliers. Not a robust
    measure
  • Negentropy based on information-theoretic
    concepts
  • Underlying Principle
  • Gaussian r.v. has largest entropy among all
    r.v.s of equal variance. So gaussian r.v. is
    most random
  • Negentropy H(Ygauss) H(Y) where Ygauss is a
    gaussian r.v. with same variance as Y
  • Negentropy gt 0. Is an optimal estimator of
    nongaussianity
  • Computationally difficult. Require an estimate of
    pdf. Approximate using higher order statistics

19
Negentropy
  • This is same as square of kurtosis if first term
    is zero ( for r.v. with symmetric pdf )
  • Suffers from same problems as kurtosis
  • Generalize higher order cumulant information
  • Replace y2 and y3 by some other functions
  • where v is a gaussian r.v. with zero mean and
    unit variance and G is a non-quadratic function
  • Useful choices

20
ICA using Maximum Likelihood (ML) Estimation
  • Express the Likelihood as a function of
    parameters of model, i.e. the elements of the
    mixing matrix

21
ML estimation
  • But the likelihood is also a function of
    densities of independent components. Hence
    semi-parametric estimation.
  • Estimation is easier
  • If prior information on densities is available.
  • Likelihood is a function of mixing parameters
    only
  • If the densities can be approximated by a family
    of densities which are specified by a limited no.
    of parameters

22
ICA by Minimizing Mutual Information
  • In many cases, we cant assume that data follows
    ICA model
  • This approach doesnt assume anything about data
  • ICA is viewed as a linear decomposition that
    minimizes the dependence measure among components
    or finding maximally independent components
  • Mutual Information gt0 and zero if and only if
    the variables are statistically independent

23
Mutual Information
  • Minimization of mutual information is equivalent
    to maximizing the sum of nongausssianities of
    estimates of independent components
  • But in maximizing sum of nongausssianities, the
    estimates are forced to be uncorrelated

24
ICA by Non-Linear Decorrelation
  • Independent components can be found as
    nonlinearly uncorrelated linear combinations
  • Non-Linear Correlation is defined as
  • where f and g are two functions with at least
    one of them being non-linear
  • Y1 and Y2 are independent if and only if
  • Assume Y1 and Y2 are non-linearly decorrelated
    i.e.
  • A sufficient condition for this is that Y1 and Y2
    are independent and for one of them the
    non-linearity is an odd function such that
    has zero mean

25
Applications
  • Feature Extraction
  • Taking windows from signals and considering them
    as multi-dimensional signals
  • Medical Applications
  • Removing artifacts (due to muscle activity) from
    Electroencephalography (EEG) and MEG data in
    Brain Imaging
  • Removal of artifacts from cardio graphic signals
  • Telecommunications CDMA signal model can be cast
    in form of a ICA model
  • Econometrics Finding hidden factors in financial
    data

26
Conclusions
  • General purpose technique
  • Formulated as estimation of a generative model
  • Problem can be simplified by whitening of data
  • Estimated techniques include ML estimation,
    non-gaussianity maximization, minimization of
    mutual information
  • Can be applied in diverse fields

27
References
  • A. Hyvarinen, J. Karhunen, E. Oja, Independent
    Component Analysis, Wiley Interscience, 2001
  • A. Hyvarinen, E. Oja, Independent Component
    Analysis A tutorial, April 1999
    (http//www.cis.hut.fi/projects/ica/)

28
Thank You
Write a Comment
User Comments (0)
About PowerShow.com