Analyzing Quantitative EEG Using Machine Learning and Information Theoretic Techniques - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Analyzing Quantitative EEG Using Machine Learning and Information Theoretic Techniques

Description:

Design a method for extracting most discriminative features. Representing EEG signals ... For MI, start with 1,891 and use only 40 highly discriminative features ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 22
Provided by: Ped99
Category:

less

Transcript and Presenter's Notes

Title: Analyzing Quantitative EEG Using Machine Learning and Information Theoretic Techniques


1
Analyzing Quantitative EEG Using Machine Learning
and Information Theoretic Techniques
Pedja Neskovic

Booz Allen Hamilton Arlington, VA Institute
for Brain and Neural Systems Brown University,
Providence, RI
2
  • Objective
  • Classify EEG signals into different tasks
    (learning, memory, etc) within and across
    subjects decoding brain signals
  • Methods
  • To extract informative characteristics use
    information theory and to learn/detect patterns
    in EEG use machine learning

3
n-back task
  • The n-back task requires subjects to decide
    whether a currently present stimulus matches one
    presented n trials previously

4
  • Challenges
  • Low signal to noise ratio the noise level is
    typically about 20 , and the signal of
    interest about 5
  • Signal is contaminated by artifacts (eye
    movements/blinks, muscle contr.)
  • Processing large amounts of data in real time
  • Design a method for extracting most
    discriminative features

5
Representing EEG signals
  • Representation what characteristics/features to
    use?
  • Most widely used features are power spectrum
    (PS), coherence analysis, linear correlation
    (LC), auto regression (AR) analysis, PCA, etc.
  • - Main drawbacks capture only linear
    dependences and 2nd order statistics
  • Introduce new features Entropy (H) and Mutual
    Information (MI)
  • In contrast to power spectrum which is based on
    second order statistics,
  • H encompasses higher order statistics
  • In contrast to linear approaches (e.g
    correlation, coherence, partial coherence
    analysis, linear Granger causality and Directed
    Transfer Function (DTF)), MI captures both linear
    and nonlinear dependences

6
Definitions
  • Entropy measures the uncertainty of a discrete
    random variable X
  • If H0 it means that every measurement occurs
    with probability 1 or 0
  • Mutual information is the reduction in the
    uncertainty of X due to the knowledge of Y
  • MI0 iff X and Y are statistically independent
  • (in contrast with Pearson correlation which
    quantifies only linear dependencies)

7
Calculating p(x)
  • Standard approach use histograms to estimate p

v
3
2
5
t
3
4
8
Stochastic process
  • Shortcoming of using H and MI ignored temporal
    dependences
  • Model outputs as a Markov stochastic process
    represent outputs of one electrode as a sequence
    of random variables

v
t
9
Capturing temporal dependences
  • Conditional entropy
  • Conditional MI - capture both spatial and
    temporal dependences

10
How to estimate entropy?
  • Histogram approach is not good if we have to
    estimate joint probabilities of many variables
  • Some of the bins will have zero counts
    sometimes there are more bins than data points
  • To calculate expected Entropy use Bayesian
    approach

- vector of true (unknown) probabilities
- vector of counts
11
Entropy estimation
  • Likelihood multinomial distribution
  • Dirichlet prior
  • - reflects the prior knowledge of the number
    of points in each bin

- Digamma function
12
Classification task subject
  • Goal Associate a given EEG segment with
    both the subject and the class
  • Classifiers Naïve Bayes (NB) and SVM
  • Features 62 for H, and 1,891 for MI
  • 24 classes 6 subject and 4 tasks

13
Dependence on prior parameters
Single trial classification rates with SVM
Five trial classification rates with SVM
 
  • EF N
  • prior parameter becomes less important as more
    observations become available

- effective number of points
14
SVM classification rates 64 features
Five trial long segments
Single trial long segments
  • PS Power Spectra
  • H Entropy, H(X)
  • CH Conditional Entropy, H(XX), captures
    temporal dependences within a single electrode
  • Band A 1-20Hz
  • Band B 20-40Hz
  • Band C 40-60Hz

 
15
SVM classification rates 1,891 features
Five trial long segments
Single trial long segments
  • LC Linear Correlation
  • MI(2) MI between two electrodes, MII(X,Y) -
    captures "spatial" dependences
  • MI(3) MI between two electrodes (using 3 random
    variables), MI(3) (XYX) - captures spatial
    and temporal dependences
  • MI(4) MI using 4 random variables, MI(4)
    (XYX,Y)

 
16
Classifying EEG across subjects
  • Using all the features classification is around
    chance!
  • Problems too many features e.g., 6262 for MI
    features and not all are important (noisy
    features, irrelevant features, etc.)
  • Solution reduce the number of features by
    measuring the statistical dependence between
    features and a class variable (e.g. MI)

17
Classifying EEG across subjects
  • Problem have to calculate joint probability
    that involves many features and a class variable
  • Solution select features sequentially, treat
    one feature at a time and calculate MI between
    every feature and a class variable.
  • Maximize the dependence between a feature and
    the class variable and minimize the redundancy
    among features

18
Classification across subjects
Goal Associate a given EEG segment with the
task across subjects (test on a subject that
has not been used for training)
  • Classifiers
  • Naïve Bayes (Bayes)
  • Support Vector Machine (SVM)
  • Nearest Neighbor (NN)
  • Features
  • For MI, start with 1,891 and use only 40 highly
    discriminative features
  • Use 6 subjects 5 for training and 1 for
    testing

19
Results
Power Spectra
Conditional Entropy
Conditional MI
20
Conclusions
  • Information-theoretic features (H, CH and MI(i))
    outperform conventional features (power spectrum,
    lin. correlation)
  • Although H and MI features capture non-linear
    dependences, they assume that the signal is
    stationary
  • To capture temporal dependences, treat electrode
    outputs as a stochastic process and introduce
    conditional entropy and conditional mutual
    information features
  • To detect more difficult patterns (classify
    across subjects from single trials) necessary to
    reduce dimensionality of the feature space

21
Collaborators
  • Liang Wu and Leon Cooper
  • Physics Department
  • Institute for Brain and Neural Systems
  • Brown University, Providence, RI
  • Bill Heindel and Elena Festa
  • Department of Psychology
  • Brown University, Providence, RI
Write a Comment
User Comments (0)
About PowerShow.com