Title: Machine learning techniques for quantifying neural synchrony: application to the diagnosis of Alzheimer's disease from EEG
1Machine learning techniques for quantifying
neural synchrony application to the diagnosis
of Alzheimer's disease from EEG
- Justin Dauwels
- LIDS, MIT
- Amari Research Unit, Brain Science Institute,
RIKEN - June 9, 2008
2RIKEN Brain Science Institute
- RIKEN Wako Campus (near Tokyo)
- about 400 researchers and staff (20 foreign)
- 300 research fellows and visiting scientists
- about 60 laboratories
- research covers most aspects of brain science
Collaborators François Vialatte, Theo Weber,
Shun-ichi Amari, Andrzej Cichocki (RIKEN,
MIT) Project Early diagnosis of Alzheimers
disease based on EEG Financial Support
3Research Overview
Machine learning signal processing for
applications in NEUROSCIENCE
development of ALGORITHMS to analyze brain signals
- EEG (RIKEN, MIT, MGH)
- diagnosis of Alzheimers disease
- detection/prediction of epileptic seizures
- analysis of EEG evoked by visual/auditory
stimuli - EEG during meditation
- projects related to brain-computer interface
(BMI) - Calcium imaging (RIKEN, NAIST, MIT)
- effect of calcium on neural growth
- role of calcium propagation in gliacells and
neurons
subject of this talk
4Overview
- Alzheimers Disease (AD)
- EEG of AD patients decrease in synchrony
- Synchrony measure in time-frequency domain
- Pairs of EEG signals
- Collections of EEG signals
- Numerical Results
- Outlook
5Alzheimer's disease
Outside glimpse clinical perspective
Evolution of the disease (stages)
One disease, many symptoms
EEG data
- 2 to 5 years before
- mild cognitive impairment (often unnoticed)
- 6 to 25 progress to Alzheimer's per year
memory, language, executive functions, apraxia,
apathy, agnosia, etc
- Mild (early stage)
- becomes less energetic or spontaneous
- noticeable cognitive deficits
- still independent (able to compensate)
Memory (forgetting relatives)
- Moderate (middle stage)
- Mental abilities decline
- personality changes
- become dependent on caregivers
Apathy
- Severe (late stage)
- complete deterioration of the personality
- loss of control over bodily functions
- total dependence on caregivers
Loss of Self-control
- 2 to 5 of people over 65 years old
- up to 20 of people over 80
- Jeong 2004 (Nature)
Video sources Alzheimer society
6Alzheimer's disease
Inside glimpse brain atrophy
amyloid plaques and neurofibrillary tangles
Video source Alzheimer society
Images Jannis Productions. (R. Fredenburg S.
Jannis)
Video source P. Thompson, J.Neuroscience, 2003
7Overview
- Alzheimers Disease (AD)
- EEG of AD patients decrease in synchrony
- Synchrony measure in time-frequency domain
- Pairs of EEG signals
- Collections of EEG signals
- Numerical Results
- Outlook
8Alzheimer's disease
Inside glimpse abnormal EEG
EEG system inexpensive, mobile, useful for
screening
Brain slow-down
slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz)
(Babiloni et al., 2004 Besthorn et al.,
1997 Jelic et al. 1996, Jeong 2004 Dierks et
al., 1993).
focus of this project
Decrease of synchrony
- AD vs. MCI (Hogan et al. 203 Jiang et
al., 2005) - AD vs. Control (Hermann, Demilrap, 2005,
Yagyu et al. 1997 Stam et al., 2002 Babiloni et
al. 2006) - MCI vs. mildAD (Babiloni et al., 2006).
Images www.cerebromente.org.br
9Spontaneous (scalp) EEG
Time-frequency X(t,f)2 (wavelet transform)
f (Hz)
Time-frequency patterns (bumps)
Fourier X(f)2
Fourier power
t (sec)
amplitude
EEG x(t)
10Fourier transform
2
3
1
3
2
1
Frequency
High frequency
Low frequency
11Windowed Fourier transform
Fourier basis functions
Window function
windowed basis functions
f
Windowed Fourier Transform
t
12Spontaneous EEG
Time-frequency X(t,f)2 (wavelet transform)
f (Hz)
Time-frequency patterns (bumps)
Fourier X(f)2
Fourier power
t (sec)
amplitude
EEG x(t)
13Signatures of local synchrony
f (Hz)
Time-frequency patterns (bumps)
EEG stems from thousands of neurons bump if
neurons are phase-locked local synchrony
t (sec)
14Alzheimer's disease
Inside glimpse abnormal EEG
EEG system inexpensive, mobile, useful for
screening
Brain slow-down
slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz)
(Babiloni et al., 2004 Besthorn et al.,
1997 Jelic et al. 1996, Jeong 2004 Dierks et
al., 1993).
focus of this project
Decrease of synchrony
- AD vs. MCI (Hogan et al. 203 Jiang et
al., 2005) - AD vs. Control (Hermann, Demilrap, 2005,
Yagyu et al. 1997 Stam et al., 2002 Babiloni et
al. 2006) - MCI vs. mildAD (Babiloni et al., 2006).
Images www.cerebromente.org.br
15Overview
- Alzheimers Disease (AD)
- EEG of AD patients decrease in synchrony
- Synchrony measure in time-frequency domain
- Pairs of EEG signals
- Collections of EEG signals
- Numerical Results
- Outlook
16Comparing EEG signal rhythms ?
2 signals
PROBLEM I Signals of 3 seconds sampled at
100 Hz (? 300 samples) Time-frequency
representation of one signal about 25 000
coefficients
17Comparing EEG signal rhythms ?(2)
PROBLEM II Shifts in time-frequency!
18Sparse representation bump model
f(Hz)
f(Hz)
Bumps Sparse representation
t (sec)
f(Hz)
t (sec)
104- 105 coefficients
- Assumptions
- time-frequency map is suitable representation
- oscillatory bursts (bumps) convey key
information
t (sec)
about 102 parameters
Normalization
F. Vialatte et al. A machine learning approach
to the analysis of time-frequency maps and its
application to neural dynamics, Neural Networks
(2007).
19Similarity of bump models...
How similar or synchronous are two bump
models? GLOBAL synchrony
Reminder bumps due to LOCAL synchrony
MULTI-SCALE approach
20... by matching bumps
y2
y1
Some bumps match Offset between matched bumps
SIMILAR bump models if Many matches Strongly
overlapping matches
21... by matching bumps (2)
- Bumps in one model, but NOT in other
- ? fraction of spurious bumps ?spur
- Bumps in both models, but with offset
- ? Average time offset dt (delay)
- ? Timing jitter with variance st
- ? Average frequency offset df
- ? Frequency jitter with variance sf
- Synchrony only st and ?spur relevant
Stochastic Event Synchrony (SES) (?spur,
dt, st, df, sf )
PROBLEM Given two bump models, compute (?spur,
dt, st, df, sf )
22Overview
- Alzheimers Disease (AD)
- EEG of AD patients decrease in synchrony
- Synchrony measure in time-frequency domain
- Pairs of EEG signals
- Collections of EEG signals
- Numerical Results
- Outlook
23Average synchrony
3. SES for each pair of models 4. Average the SES
parameters
- Group electrodes in regions
- Bump model for each region
24Beyond pairwise interactions...
Multi-variate similarity
Pairwise similarity
25...by clustering
HARD combinatorial problem!
y2
y1
y3
y4
y5
- Models similar if
- few deletions/large clusters
- little jitter
y2
y1
y3
y4
y5
Constraint in each cluster at most one bump from
each signal
26Overview
- Alzheimers Disease (AD)
- EEG of AD patients decrease in synchrony
- Synchrony measure in time-frequency domain
- Pairs of EEG signals
- Collections of EEG signals
- Numerical Results
- Outlook
27EEG Data
- EEG of 22 Mild Cognitive Impairment (MCI)
patients and 38 age-matched - control subjects (CTR) recorded while in rest
with closed eyes - ? spontaneous EEG
- All 22 MCI patients suffered from Alzheimers
disease (AD) later on - Electrodes located on 21 sites according to
10-20 international system - Electrodes grouped into 5 zones (reduces number
of pairs) - 1 bump model per zone
- Used continuous artifact-free intervals of 20s
- Band pass filtered between 4 and 30 Hz
EEG data provided by Prof. T. Musha
28Similarity measures
- Correlation and coherence
- Granger causality (linear system) DTF, ffDTF,
dDTF, PDC, PC, ... - Phase Synchrony compare instantaneous phases
(wavelet/Hilbert transform) - State space based measures
- sync likelihood, S-estimator,
S-H-N-indices, ... - Information-theoretic measures
FREQUENCY
TIME
No Phase Locking
Phase Locking
29(No Transcript)
30Sensitivity (average synchrony)
Corr/Coh
Granger
Info. Theor.
State Space
Phase
SES
Mann-Whitney test small p value suggests large
difference in statistics of both groups
Significant differences for ffDTF and ?!
31Classification
ffDTF
- Clear separation, but not yet useful as
diagnostic tool - Additional indicators needed (fMRI, MEG, DTI,
...) - Can be used for screening population
(inexpensive, simple, fast)
32Correlations
Strong (anti-) correlations families of
sync measures
33Overview
- Alzheimers Disease (AD)
- EEG of AD patients decrease in synchrony
- Synchrony measure in time-frequency domain
- Pairs of EEG signals
- Collections of EEG signals
- Numerical Results
- Outlook
34Ongoing work
- Time-varying similarity parameters
-
- st
no stimulus
no stimulus
stimulus
high st
low st
high st
high st
low st
high st
35Future work
- Matching event patterns instead of single events
- allows us to extract patterns in
time-frequency map of EEG! - HYPOTHESIS
- Perhaps specific patterns occur in time-frequency
EEG maps - of AD patients
- before onset of epileptic seizures
- REMARK
f(Hz)
coupling between frequency bands
t (sec)
36Conclusions
- Measure for similarity of point processes
(stochastic event synchrony) - Key idea alignment of events
- Solved by statistical inference
- Application EEG synchrony of MCI patients
- About 85 correctly classified perhaps useful
for screening population - Ongoing/future work time-varying SES, extracting
patterns of bumps
37References software
- References
- Quantifying Statistical Interdependence by
Message Passing on Graphs Algorithms and
Application to Neural Signals, Neural Computation
(under revision) - A Comparative Study of Synchrony Measures for the
Early Diagnosis of Alzheimer's Disease Based on
EEG, NeuroImage (under revision) - Measuring Neural Synchrony by Message Passing,
NIPS 2007 - Quantifying the Similarity of Multiple
Multi-Dimensional Point Processes by Integer
Programming with Application to Early Diagnosis
of Alzheimer's Disease from EEG, EMBC 2008
(submitted)
- Software
- MATLAB implementation of the synchrony measures
38Machine learning techniques for quantifying
neural synchrony application to the diagnosis
of Alzheimer's disease from EEG
- Justin Dauwels
- LIDS, MIT
- Amari Research Unit, Brain Science Institute,
RIKEN - June 9, 2008
39Machine learning for neuroscience
- Multi-scale in time and space
- Data fusion EEG, fMRI, spike data, bio-imaging,
... - Large-scale inference
- Visualization
Behavior ? Brain ? Brain Regions ? Neural
Assemblies ? Single neurons ? Synapses ? Ion
channels
40Estimation
Simple closed form expressions
Deltas average offset
Sigmas var of offset
artificial observations (conjugate prior)
...where
41Large-scale synchrony
Apparently, all brain regions affected...
42Alzheimer's disease
Outside glimpse the future (prevalence)
USA (Hebert et al. 2003)
- 2 to 5 of people over 65 years old
- Up to 20 of people over 80
- Jeong 2004 (Nature)
Million of sufferers
World (Wimo et al. 2003)
Million of sufferers
43Ongoing and future work
Applications
- Fluctuations of EEG synchrony
- Caused by auditory stimuli and music (T.
Rutkowski) - Caused by visual stimuli (F. Vialatte)
- Yoga professionals (F. Vialatte)
- Professional shogi players (RIKEN Fujitsu)
- Brain-Computer Interfaces (T. Rutkowski)
- Spike data from interacting monkeys (N. Fujii)
- Calcium propagation in gliacells (N. Nakata)
- Neural growth (Y. Tsukada Y. Sakumura)
- ...
Algorithms
- alternative inference techniques (e.g., MCMC,
linear programming) - time dependent (Gaussian processes)
- multivariate (T.Weber)
44Fitting bump models
?
Signal
gradient method
F. Vialatte et al. A machine learning approach
to the analysis of time-frequency maps and its
application to neural dynamics, Neural Networks
(2007).
45Boxplots
- SURPRISE!
- No increase in jitter, but significantly less
matched activity! - Physiological interpretation
- neural assemblies more localized?
- harder to establish large-scale synchrony?
46Similarity of bump models...
How similar or synchronous are two bump
models?
47Probabilistic inference
POINT ESTIMATION ?(i1) argmaxx log p(y, y,
c(i1) ,? )
Uniform prior p(?) dt, df average
offset, st, sf variance of offset Conjugate
prior p(?) still closed-form expression Other
kind of prior p(?) numerical optimization
(gradient method)
48Probabilistic inference
MATCHING c(i1) argmaxc log p(y, y, c, ?(i)
)
EQUIVALENT to (imperfect) bipartite max-weight
matching problem c(i1) argmaxc log p(y, y,
c, ?(i) ) argmaxc Skk wkk(i) ckk
s.t. Sk ckk 1 and Sk ckk 1 and ckk 2
0,1
find heaviest set of disjoint edges
not necessarily perfect
- ALGORITHMS
- Polynomial-time algorithms gives optimal
solution(s) (Edmond-Karp and Auction
algorithm) - Linear programming relaxation extreme points of
LP polytope are integral - Max-product algorithm gives optimal solution if
unique Bayati et al. (2005), Sanghavi (2007)
49Max-product algorithm
MATCHING c(i1) argmaxc log p(y, y, c, ?(i)
)
Generative model
p(y, y, c, ?) / I(c) p?(?) ?kk (N(t k tk
dt ,st,kk) N(f k fk df ,sf, kk) ß-2)ckk
50Max-product algorithm
MATCHING c(i1) argmaxc log p(y, y, c, ?(i)
)
Conditioning on ?
µ?
µ?
µ?
µ?
51Max-product algorithm (2)
- Iteratively compute messages
- At convergence, compute marginals p(ckk)
µ?(ckk) µ?(ckk) µ?(ckk)
52Algorithm
PROBLEM Given two bump models, compute (?spur,
dt, st, df, sf )
?
APPROACH (c,?) argmaxc,? log p(y, y, c,
?)
SOLUTION Coordinate descent c(i1)
argmaxc log p(y, y, c, ?(i) ) ?(i1)
argmaxx log p(y, y, c(i1) ,? )
MATCHING ? max-product
ESTIMATION ? closed-form
53Generative model
yhidden
- Generate bump model (hidden)
- geometric prior for number n of bumps
- p(n) (1- ? S) (? S)-n
- bumps are uniformly distributed in rectangle
- amplitude, width (in t and f) all i.i.d.
- Generate two noisy observations
- offset between hidden and observed bump
- Gaussian random vector with
- mean ( dt /2, df /2)
- covariance diag(st/2, sf /2)
- amplitude, width (in t and f) all i.i.d.
- deletion with probability pd
y
y
( -dt /2, -df /2)
( dt /2, df /2)
Easily extendable to more than 2 observations
54Generative model (2)
y
y
i
( -dt /2, -df /2)
i
j
( dt /2, df /2)
- Binary variables ckk
- ckk 1 if k and k are observations of
same hidden bump, else ckk 0 (e.g., cii 1
cij 0) - Constraints bk Sk ckk and bk Sk ckk
are binary (matching constraints) - Generative Model p(y, y, yhidden , c, dt , df
, st , sf ) (symmetric in y and y) - Eliminate yhidden ? offset is Gaussian RV with
mean ( dt , df ) and covariance diag (st , sf) - Probabilistic Inference
?
p(y, y, c, ?) ? p(y, y, yhidden , c, ?)
dyhidden
(c,?) argmaxc,? log p(y, y, c, ?)
55Summary
- Bumps in one model, but NOT in other
- ? fraction of spurious bumps ?spur
- Bumps in both models, but with offset
- ? Average time offset dt (delay)
- ? Timing jitter with variance st
- ? Average frequency offset df
- ? Frequency jitter with variance sf
PROBLEM Given two bump models, compute (?spur,
dt, st, df, sf )
?
APPROACH (c,?) argmaxc,? log p(y, y, c,
?)
56Objective function
y
y
i
( -dt /2, -df /2)
i
j
( dt /2, df /2)
- Logarithm of model log p(y, y, c, ?) Skk
wkk ckk log I(c) log p?(?) ?
wkk -(1/st (t k tk dt)2 1/sf (f k
fk df)2 ) - 2 log ß ß pd (?/V)1/2
Euclidean distance between bump centers
- Large wkk if a) bumps are close b)
small pd c) few bumps per volume element - No need to specify pd , ?, and V, they only
appear through ß knob to control matches
57Distance measures
Scaling
wkk 1/st,kk (t k tk dt)2 1/sf,kk
(f k fk df)2 2 log ß st,kk (?tk
?tk) st sf,kk (?fk ?fk) sf
Non-Euclidean
58Generative model
p(y, y, c, ?) / I(c) p?(?) ?kk (N(t k tk
dt ,st,kk) N(f k fk df ,sf, kk) ß-2)ckk
59Prior for parameters
- Expect bumps to appear at about same frequency,
but delayed - Frequency shift requires non-linear
transformation, less likely than delay - Conjugate priors for st and sf (scaled inverse
chi-squared) - Improper prior for dt and dt p(dt) 1 p(df)
60Preliminary results for multi-variate model
linear comb of pc
CTR
MCI
61Probabilistic inference
PROBLEM Given two bump models, compute (?spur,
dt, st, df, sf )
?
APPROACH (c,?) argmaxc,? log p(y, y, c,
?)
SOLUTION Coordinate descent c(i1)
argmaxc log p(y, y, c, ?(i) ) ?(i1)
argmaxx log p(y, y, c(i1) ,? )
MATCHING
POINT ESTIMATION
Minx2 X, y2Y d(x,y)
X
Y
62Generative model
yhidden
- Generate bump model (hidden)
- geometric prior for number n of bumps
- p(n) (1- ? S) (? S)-n
- bumps are uniformly distributed in rectangle
- amplitude, width (in t and f) all i.i.d.
y1
y2
y3
y4
y5
- Generate M noisy observations
- offset between hidden and observed bump
- Gaussian random vector with
- mean ( dt,m /2, df,m /2)
- covariance diag(st,m/2, sf,m
/2) - amplitude, width (in t and f) all i.i.d.
- deletion with probability pd
- (other prior pc0 for cluster size)
pc (i) p(cluster size i y) (i
1,2,,M)
Parameters ? dt,m , df,m , st,m , sf,m, pc
63Role of local synchrony
Stimuli
Consolidation
Stimulus
Assembly activation
Assembly recall
Hebbian consolidation
Voice
Face
Voice
(Hebb 1949, Fuster 1997)
64Probabilistic inference
PROBLEM Given M bump models, compute ? dt,m ,
df,m , st,m , sf,m, pc
APPROACH (c,?) argmaxc,? log p(y, y, c,
?)
SOLUTION Coordinate descent c(i1)
argmaxc log p(y, y, c, ?(i) ) ?(i1)
argmaxx log p(y, y, c(i1) ,? )
CLUSTERING (IP or MP)
POINT ESTIMATION
- Integer program
- Max-product algorithm (MP) on sparse graph
- Integer programming methods (e.g., LP
relaxation)