Feature Selection and Causality Inference - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Feature Selection and Causality Inference

Description:

How are we going to make progress? Feature Selection ... tissues (tumor= G3/4, control= NL, Dysplasia, BPH), U133A array (20,000 genes) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 32
Provided by: Isabell45
Category:

less

Transcript and Presenter's Notes

Title: Feature Selection and Causality Inference


1
Feature Selection and Causality Inference
  • Isabelle Guyon, Clopinet
  • André Elisseeff, IBM Zürich

2
Purpose
  • What affects your health?
  • What affects the economy?
  • What affects climate changes?
  • and
  • Which actions will have beneficial effects?

3
Road Map
  • What is feature selection?
  • Why is it hard?
  • What works best in practice?
  • How are we going to make progress?

4
Feature Selection
Y
X
Remove features to improve (or least degrade)
performance.
5
Uncovering Dependencies
?
Factors of variability
Factual
Artifactual
Known
Unknown
Unobservable
Observable
Controllable
Uncontrollable
6
Predictions and Actions
Y
X
Judea Pearl, Causality, 2000
7
Individual Feature Irrelevance
  • P(Xi, Y) P(Xi) P(Y)
  • P(Xi Y) P(Xi)

density
xi
8
Individual Feature Relevance
m-
m
-1
s-
s
1
9
Multivariate Cases
Guyon-Elisseeff, JMLR 2004 Springer 2006
10
Is multivariate FS always best?
Kohavi-John, 1997
n features, 2n possible feature subsets!
11
In practice
  • Univariate feature selection often gives better
    results than multivariate feature selection.
  • NO feature selection at all gives sometimes the
    best results, even in the presence of known
    distracters.
  • How can we make multivariate FS work better?

NIPS 2003 and WCCI 2006 challenges
http//clopinet.com/challenges
12
Definition of relevance
  • We want to determine whether one variable Xi is
    relevant to the target Y.
  • Surely irrelevant feature
  • P(Xi, Y S\i) P(Xi S\i)P(Y S\i)
  • for all S\i ? X\i
  • for all assignment of values to S\i
  • Are all non-irrelevant features relevant?

13
Is X2 relevant?
1
P(X1, X2 , Y) P(X1 X2 , Y) P(X2) P(Y)
14
Are X1 and X2relevant?
2
Y
disease
normal
P(X1, X2 , Y) P(X1 X2 , Y) P(X2) P(Y)
15
Adding a variable
3
16
X1 Y X2
3
life expectancy
Is chocolate good for your health?
chocolate intake
17
Really?
3
life expectancy
Is chocolate good for your health?
chocolate intake
18
Same independence relationsDifferent causal
relations
P(X1, X2 , Y) P(X1 X2) P(Y X2) P(X2)
P(X1, X2 , Y) P(Y X2) P(X2 X1) P(X1)
P(X1, X2 , Y) P(X1 X2) P(X2 Y) P(Y)
19
Is X1 relevant?
3
20
Non-causal features may be predictive yet not
relevant
1
2
3
21
Causal Features
P(X,Y) P(XY) P(Y)
22
Experiments
  • Features Gene expression coefficients.
  • Samples Prostate tissues tumor vs. control.
  • Training data, Stanford 87 laser microdissected
    tissues (tumor G3/4, control NL, Dysplasia,
    BPH), U133A array (20,000 genes).
  • Test data, Oncomine (3 datasets) 164 tissues ,
    U95A array (12,500 genes).
  • X (87, 6839) Y (87) Xt (164, 6839) Yt (164)

23
Univariate Filter AUC
Fnum5 Errate0.28 BER0.26 AUC0.83
1
0.9
0.8
0.7
0.6
0.5
Sensitivity
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Specificity
24
Causal Feature Selection
Fnum5 Errate0.15 BER0.12 AUC0.95
1
0.9
0.8
0.7
0.6
0.5
Sensitivity
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Specificity
25
Causal features are robust under change of
distribution
35
30
25
20
Number of genes found (among 1000 best to predict
test data)
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
Gene rank (ranking performed with training
examples)
26
Conclusion
  • Feature selection focuses on uncovering subsets
    of variables X1, X2, predictive of the target
    Y.
  • Taking a closer look at the type of dependencies
    may help refining the notion of variable
    relevance.
  • Uncovering causal relationships may yield better
    feature selection, robust under distribution
    changes.
  • These causal features may be better targets of
    action.

27
http//clopinet.com/fextract-book
Feature Extraction, Foundations and
Applications I. Guyon et al, Eds. Springer, 2006.
  • Tutorials
  • NIPS03 challenge results
  • Challenge data
  • Sample code
  • Teaching material

28
Extras (not in the talk)
29
Individual Feature Relevance
S2N
-1
1
m
m-
m
1
-1
-yi
yi
Golub et al, Science Vol 28615 Oct. 1999
-1
S2N ? R x ? y after standardization x
?(x-mx)/sx
s-
s
30
Is X1 relevant?
peak
temperature
Population selection bias
P(X1, X2 , Y) P(X1 X2) P(Y X2) P(X2)
31
Is X1 relevant?
peak
plate (X2)
health (X1)
peak (Y)
health status
Confounding factor
P(X1, X2 , Y) P(Y X2) P(X2 X1) P(X1)
Write a Comment
User Comments (0)
About PowerShow.com