Title: Decomposing%20Complex%20Clinical%20Phenotypes%20by%20Biologically%20Structured%20Microarray%20Analysis
1Decomposing Complex Clinical Phenotypes by
Biologically Structured Microarray Analysis
- Claudio Lottaz and Rainer Spang
Berlin Center for Genome Based Bioinformatics,
Berlin (Germany)
Computational Diagnostics, Max Planck Institute
for Molecular Genetics, Berlin (Germany)
2Overview
- Introduction
- Using functional annotation forsemi-supervised
classification - Heterogeneity vs. performance
- Evaluation on cancer related data
- Concluasions
3Tumor Classification
- Setting
- Data gene expression profiles
- Goal prediction/classification of
outcome/sub-type - More formally
- Many expression levels measured
- Samples labelled as disease and control
- Train classifier
4State-of-the-Art
- Various powerful methods
- Support vector machines
- Shrunken centroids...
- Regularization to fight overfitting
- Feature selection
- Large margins...
- Common hypothesisGenerate a single molecular
signature
5Complex Phenotypes
- A single clinical phenotype may be caused by
different molecular mechanisms - Our approach discover several sub-classes in
disease group - Each sub-class has a homogeneous molecular
signature
6Molecular Symptoms
- Classical signatures are globally optimal
- They have no biological focus
- Genes are corregulated thus correlated? in a
global signature genes can be replaced with
little loss - Molecular Symptom
- A functionally focused signature to identify a
disease sub-class - High specificity sub-optimal sensitivity
7Molecular Patient Stratification
- Patterns of molecular symptoms define a
molecular patient stratification
8Using Functionl AnnotationsA Priori vs. A
Posteriori
9Gene Ontology
- Biological terms ina directed graph
- Genes annotatedto terms
- Levelsrepresentspecificityof terms
10Structured Analysis of Microarrays
- Classification in leaf nodes
- Regularized multivariate classifier
- Local signatures
- Diagnosis propagation
- Combine child diagnoses in inner nodes
- Generate more general diagnoses
- Regularization
- Shrink the classifier graph
- Remove uninformative branches
11Leaf Node Classification
- Shrunken centroid classification(Tibshirani et
al. 2002) - Classificatino according to distance to centroids
- Regularization via gene shrinkage
- Determine probability-like values as
classification results
12Propagation of Classification
- Weighted averages
- Weight according to child performance
- Weights are normalized per inner node
Pa
w1
w3
w2
C1
C3
C2
13Graph Shrinkage
- Weights of nodes are shrunken by a constant
- Negative weights are set to zero? uninformative
branches vanish - Best shrinkage level chosen in cross-validation
14Biased Classifier Evaluation
Calibration of Sensitivity and Specificity
Shrinkage Parameter
Worst Performance in Leaf Node
?Cj DCi ( ?j Dj )-1
15Classifier Heterogeneity
- Difference between two classifiersmeasures
inconsistency of classifications - Nodes redundancy
- Graphs redundancy(K? nodes of the shrunken
graph)
16Calibration
- Sensitivity vs. Specificity??
- Best classifiers set to control prevalence
- More molecular symptoms set ? higher than
control prevalence - Heterogeneity vs. Performance ?
- Molecular symptoms are heterogeneous
- Thus high ? eliminates them
17Leukemia Data Set
- Data set by Yeoh et al. 2002
- Acute lymphocytic leukemia
- 327 patients of 7 clinical sub-types
- Expression profiles by HG-U95Av2
- Task for illustration
- Detect MLL sub-type
- 20 MLL samples
- 109 test set / 218 training set
18Functional Annotations
- Focus on GOs Biological Process branch(8173
terms) - 12625 probesets on the chip
- 8679 genes (68.7 of probesets)
- In 1359 leaf nodes
- 845 inner nodes (total 2204 nodes)
19MLL Classifier
- 2796 genes accessible through 32 nodes
20MLL Stratification
21Conclusions
- Semi-supervised classification
- Datect sub-classes
- In labelled disease groups
- Functional annotation
- Use in an a priori fashion
- To find biologically focused signatures??
molecular symptoms - Resolve complex clinical phenotypes
(stratification through molecular symptoms)