Analyzing Quantitative EEG Using Machine Learning and Information Theoretic Techniques

About This Presentation

Title:

Analyzing Quantitative EEG Using Machine Learning and Information Theoretic Techniques

Description:

Design a method for extracting most discriminative features. Representing EEG signals ... For MI, start with 1,891 and use only 40 highly discriminative features ... – PowerPoint PPT presentation

Number of Views:197

Avg rating:3.0/5.0

Slides: 22

Provided by: Ped99

Category:

more less

Transcript and Presenter's Notes

Title: Analyzing Quantitative EEG Using Machine Learning and Information Theoretic Techniques

1
Analyzing Quantitative EEG Using Machine Learning
and Information Theoretic Techniques
Pedja Neskovic

Booz Allen Hamilton Arlington, VA Institute
for Brain and Neural Systems Brown University,
Providence, RI
2

Objective
Classify EEG signals into different tasks
(learning, memory, etc) within and across
subjects decoding brain signals

Methods
To extract informative characteristics use
information theory and to learn/detect patterns
in EEG use machine learning

3
n-back task

The n-back task requires subjects to decide
whether a currently present stimulus matches one
presented n trials previously

Challenges
Low signal to noise ratio the noise level is
typically about 20 , and the signal of
interest about 5
Signal is contaminated by artifacts (eye
movements/blinks, muscle contr.)
Processing large amounts of data in real time
Design a method for extracting most
discriminative features

5
Representing EEG signals

Representation what characteristics/features to
use?
Most widely used features are power spectrum
(PS), coherence analysis, linear correlation
(LC), auto regression (AR) analysis, PCA, etc.
- Main drawbacks capture only linear
dependences and 2nd order statistics

Introduce new features Entropy (H) and Mutual
Information (MI)
In contrast to power spectrum which is based on
second order statistics,
H encompasses higher order statistics
In contrast to linear approaches (e.g
correlation, coherence, partial coherence
analysis, linear Granger causality and Directed
Transfer Function (DTF)), MI captures both linear
and nonlinear dependences

6
Definitions

Entropy measures the uncertainty of a discrete
random variable X
If H0 it means that every measurement occurs
with probability 1 or 0
Mutual information is the reduction in the
uncertainty of X due to the knowledge of Y
MI0 iff X and Y are statistically independent
(in contrast with Pearson correlation which
quantifies only linear dependencies)

7
Calculating p(x)

Standard approach use histograms to estimate p

v
3
2
5
t
3
4
8
Stochastic process

Shortcoming of using H and MI ignored temporal
dependences
Model outputs as a Markov stochastic process
represent outputs of one electrode as a sequence
of random variables

v
t
9
Capturing temporal dependences

Conditional entropy
Conditional MI - capture both spatial and
temporal dependences

10
How to estimate entropy?

Histogram approach is not good if we have to
estimate joint probabilities of many variables
Some of the bins will have zero counts
sometimes there are more bins than data points
To calculate expected Entropy use Bayesian
approach

- vector of true (unknown) probabilities
- vector of counts
11
Entropy estimation

Likelihood multinomial distribution
Dirichlet prior
- reflects the prior knowledge of the number
of points in each bin

- Digamma function
12
Classification task subject

Goal Associate a given EEG segment with
both the subject and the class
Classifiers Naïve Bayes (NB) and SVM
Features 62 for H, and 1,891 for MI
24 classes 6 subject and 4 tasks

13
Dependence on prior parameters
Single trial classification rates with SVM
Five trial classification rates with SVM

EF N
prior parameter becomes less important as more
observations become available

- effective number of points
14
SVM classification rates 64 features
Five trial long segments
Single trial long segments

PS Power Spectra
H Entropy, H(X)
CH Conditional Entropy, H(XX), captures
temporal dependences within a single electrode

Band A 1-20Hz
Band B 20-40Hz
Band C 40-60Hz

15
SVM classification rates 1,891 features
Five trial long segments
Single trial long segments

LC Linear Correlation
MI(2) MI between two electrodes, MII(X,Y) -
captures "spatial" dependences
MI(3) MI between two electrodes (using 3 random
variables), MI(3) (XYX) - captures spatial
and temporal dependences
MI(4) MI using 4 random variables, MI(4)
(XYX,Y)

16
Classifying EEG across subjects

Using all the features classification is around
chance!

Problems too many features e.g., 6262 for MI
features and not all are important (noisy
features, irrelevant features, etc.)

Solution reduce the number of features by
measuring the statistical dependence between
features and a class variable (e.g. MI)

17
Classifying EEG across subjects

Problem have to calculate joint probability
that involves many features and a class variable

Solution select features sequentially, treat
one feature at a time and calculate MI between
every feature and a class variable.
Maximize the dependence between a feature and
the class variable and minimize the redundancy
among features

18
Classification across subjects
Goal Associate a given EEG segment with the
task across subjects (test on a subject that
has not been used for training)

Classifiers
Naïve Bayes (Bayes)
Support Vector Machine (SVM)
Nearest Neighbor (NN)

Features
For MI, start with 1,891 and use only 40 highly
discriminative features
Use 6 subjects 5 for training and 1 for
testing

19
Results
Power Spectra
Conditional Entropy
Conditional MI
20
Conclusions

Information-theoretic features (H, CH and MI(i))
outperform conventional features (power spectrum,
lin. correlation)
Although H and MI features capture non-linear
dependences, they assume that the signal is
stationary
To capture temporal dependences, treat electrode
outputs as a stochastic process and introduce
conditional entropy and conditional mutual
information features
To detect more difficult patterns (classify
across subjects from single trials) necessary to
reduce dimensionality of the feature space

21
Collaborators