Title: Non-linear%20Principal%20Manifolds%20a%20Useful%20Tool%20in%20Bioinformatics%20and%20Medical%20Applications
1Non-linear Principal Manifoldsa Useful Tool in
Bioinformatics and Medical Applications
- Andrei Zinovyev
- Institute des Hautes Etudes Scientifique,
- France
2Plan of the talk
- Object of study
- Definition of principal manifold (PM)
- Constructing PMs elastic maps
- Examples of biomedical applications
3Principal manifoldsElastic maps framework
LLE
ISOMAP
Clustering
Multidim. scaling
Principal manifolds
PCA
K- means
Visualization
SOM
Non-linear Data-mining methods
Factor analysis
Supervised classification
SVM
Regression, approximation
4Finite set of objects in RN
IRIS database IRIS database IRIS database IRIS database
Petal heght Petal width Sepal width Sepal height SPECIES
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.3 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor
6.3 3.3 6 2.5 Iris-virginica
5.8 2.7 X 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica
6.3 2.9 5.6 1.8 Iris-virginica
X i
i1..m
5Mean point
6Principal Object
,
7Principal Component Analysis
,
8Principal manifold
9What do we want?
- Non-linear surface (1D, 2D, 3D )
- Smooth and not twisted
- The data model is unknown
- Speed (time linear with Nm)
- Uniqueness
- Fast way to project datapoints
10Metaphor of elasticity
U(Y)
U(E), U(R)
Data points
Graph nodes
11Constructing elastic nets
12Definition of elastic energy
.
13Elastic manifold
14Global minimum and softening
?0, ?0 ? 103
?0, ?0 ? 102
?0, ?0 ? 101
?0, ?0 ? 10-1
15Adaptive algorithms
Refining net
Growing net
Idea of scaling
Adaptive net
16Projection onto the manifold
Closest node of the net
Closest point of the manifold
17Colorings visualize any function
Value of the coordinate
18Density visualization
19Example different topologies
RN
R2
20VIDAExpert tool and elmap C package
21Regression and principal manifolds
22Image skeletonization or clustering around curves
23Approximation of molecular surfaces
24Application economical data
Density
Gross output
Profit
Growth temp
25Medical table1700 patients with infarctus
myocarde
Patients map, density
Lethal cases
26Medical table1700 patients with infarctus
myocarde
128 indicators
Stenocardia functional class
Numberof infarctus in anamnesis
Age
27Codon usage in all genes of one genome
Escherichia coli
Bacillus subtilis
Majority of genes
Foreign genes
Hydrophobic genes
Highly expressed genes
28Golubs leukemia dataset3051 genes, 38 samples
(ALL/B-cell,ALL/T-cell,AML)
Map of genes vote for ALL vote for AML
used by T.Golub used by W.Lie
ALL sample
AML sample
29Golubs leukemia datasetmap of samples AML
ALL/B-cell ALL/T-cell
Retinoblastoma binding protein P48
Cystatin C
density
CA2 Carbonic anhydrase II
X-linked Helicase II
30Thank you for your attention!