Independent Component Analysis - PowerPoint PPT Presentation

Loading...

PPT – Independent Component Analysis PowerPoint presentation | free to download - id: 6b60c-Y2Q0Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Independent Component Analysis

Description:

An Information-Maximazation Approach to Blind Separation and ... (Duda&Hart) intractable. so far. Blind Separation 'cocktail-party' problem, with no delay ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 21
Provided by: Jinhyu
Learn more at: http://ai.kaist.ac.kr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Independent Component Analysis


1
  • Independent Component Analysis
  • PART I
  • CS679 Lecture Note
  • by Gil-Jin Jang
  • Computer Science Department
  • KAIST

2
REFERENCES
  • A.J. Bell T.J. Sejnowski. 1995. An
    Information-Maximazation Approach to Blind
    Separation and Blind Deconvolution, Neural
    Computation 7 section 1-4.
  • P. Comon. 1994. Independent Component Analysis,
    a New Concept?, Signal Processing 36
    pp.287-314.
  • K.Pope R.Bogner. 1996. Blind Signal
    Separation Linear, Instaneous Combinations,
    Digital Signal Processing pp.5-16.

3
ABSTRACT
  • New Self-Organizing Learning Algorithm
  • no knowledge of the input distributions
  • maximizes the information of the output of the
    neuron
  • input is transferred by non-linear units (sigmoid
    function, etc.)
  • Extra Properties of Non-Linear Transfer Function
  • pick up higher-order moments of the input
    distributions
  • redundancy reduction btn. outputs
  • separate statistically independent components
  • a higher-order generalization of PCA
  • Simulations
  • blind separation blind deconvolution
  • time-delayed source separation

4
(sec.3) Background Terminology
BSS ? (blind source separation) redundancy
reduction
Nonlinearly mixed
Linearly mixed
ICA blind separation
whitening a signal blind deconvolution
general mixing structures for multiple signals
PCA Karhunen-Loéve transform (DudaHart)
intractable so far
5
Blind Separation
  • cocktail-party problem, with no delay
  • Problem Linear Combination by a matrix A
  • Solution a square matrix W PDA-1
  • P a permutation matrix, D a diagonal matrix
  • Minimizing Mutual Information Btn Output Units

x As u Wx ? PDA-1x ? PDs x
observed signals s source signals u estimated
independent components
6
Blind Deconvolution
  • Problem a single signal, Corrupted by an Unknown
    filter
  • a1, a2,, aK, K-th order causal filter
  • other sources are time-delayed versions of itself
  • Solution a reverse filter w1, w2,, wL
  • removing statistical dependencies across time

x(t) observed, corrupted signal s(t) unknown
source signal u(t) recovered signal
7
1. Introduction
  • Information Theoretic Unsupervised Learning Rules
  • applied to NN with non-linear units
  • ? a single unit
  • ? N?N mapping
  • ? a causal filter (blind deconvolution)
  • ? a time-delayed system
  • flexible non-linearity
  • selection of activation function

8
(sec.4) Reducing Statistical Independence via
information maximization
  • Statistically Independent
  • Using Information-Theoretic Approach
  • Purpose Max. sum of individual entropy and Min.
    M.I.
  • Practically, maximizing the joint entropy H(x,y)
    yields minimizing the mutual information I(x,y)
  • super-Gaussian input signals (e.g. speech)
  • - max. the joint entropy in sigmoidal networks
  • min. the M.I. btn outputs (experimental
    results)

9
2. Information Maximization
  • Maximize the M.I. of the output Y of a Neural
    Network

G invertible transformation (deterministic) W
NN weights ? activation function (Sigmoid) N
noise
H(Y) differential entropy of output Y H(YX)
entropy of output not from the input X gradient
ascent rule w.r.t. the weight W
10
M.I. Minimization
  • Basis stochastic gradient ascent rule w.r.t. W
  • as G is invertible transformation H(YX) H(N)

11
CASE1 1 Input and 1 Output
  • Example. sigmoidal transfer function
  • stochastic gradient ascent learning rule

?w0-rule center the steepest part of sigmoid on
the peak of f(x) yielding most informative
bias ?w-rule scale the slope of the sigmoid to
match the variance of f(x) yielding most
informative weight narrow pdf ? sharply-sloping
sigmoid
12
1 Input and 1 Output
  • Infomax Principle (Laughlin 1981)
  • matching a neurons output function to input
    distribution
  • inputs are passed through a sigmoid function
  • maximum information transmission high density
    part of pdf f(x) is lined up with sloping part of
    the sigmoid g(x)
  • fy(y) is close to the flat unit distribution
  • - maximum entropy for a variable bounded in (0,1)

wO peak of distribution wopt scale for flat
distribution
13
CASE2 N?N Network
  • Expansion of 1 ? 1 unit mapping
  • Multi-dimensional Learning Rule
  • Refer to the paper for detail derivation of the
    learning rule

14
CASE3 A Causal Filter (Blind Deconvolution)
  • Assume the Single Output Signal is Dependent to
    Itself
  • Transform the Problem into Blind Separation Domain

x(t) time series of length M w(t) a causal
filter of length L (ltM), w1, w2, , wL u(t)
output time series X, Y, U corresponding
vectors W M?M matrix, lower traiangular
Special Case of Blind Separation
15
Blind Deconvolution
  • Learning Rules
  • when g is the tanh()

wL a leading weight, same role as a single
unit wL-j multiplied delay lines from xt-j to
yt decorrelate the past input from the
present output
16
CASE4 Weights with Time Delays
  • Assume the signal Itself is Dependent on Its
    Time-delayed Version
  • Learning Rule for delay d for g is tanh()
  • example.) if y receives mixture of sinusoids of
    the same frequencies and different phases
  • ? d is adjusted until the same frequency
    sinusoids
  • have same phase
  • applications.) removing echo or reverberation

17
CASE5 Generalized Sigmoid Function
  • Selection of Non-Linear Function g
  • suitable to NNs input(u) cumulative pdf
  • flexible sigmoid asymmetric generalized
    logistic function
  • defined by the differential equation

p,rgt1 very peaked (super-Gaussian) p,rlt1
flat, unit-like (sub-Gaussian) p?r skew
distributions
18
(sec.4) Real-World Considerations
  • learning rules are altered by incorporating (p,r)
  • example.1) for single unit
  • example.2) for N?N network

19
SUMMARY
  • Self-Organizing Learning Algorithm
  • based on informax principle
  • separate statistically independent components
  • blind separation, blind deconvolution,
    time-delayed signal
  • a higher-order generalization of PCA
  • Non-Linear Transfer Function
  • pick up higher-order moments
  • selected to best match the networks output
    distributions

20
Appendix (sec.3) Higher Order Statistics
  • a 2nd-order Decorrelation (Barlow Földiák)
  • find uncorrelated, linearly independent
    projection
  • BS PCA, unsuitable to asymmetric mixing matrix
    A
  • BD autocorrelation, only amplitude
    (phase-blind)
  • Higher-order Statistics
  • minimizing the M.I.(mutual information) btn
    outputs
  • M.I. involves higher-order statistics - cumulants
    of all orders
  • explicit estimation,
  • intensive computation
  • static non-linear functions
About PowerShow.com