Decomposing%20Complex%20Clinical%20Phenotypes%20by%20Biologically%20Structured%20Microarray%20Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Decomposing%20Complex%20Clinical%20Phenotypes%20by%20Biologically%20Structured%20Microarray%20Analysis

Description:

Decomposing Complex Clinical Phenotypes by Biologically Structured Microarray Analysis Claudio Lottaz and Rainer Spang Berlin Center for Genome Based Bioinformatics ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Decomposing%20Complex%20Clinical%20Phenotypes%20by%20Biologically%20Structured%20Microarray%20Analysis


1
Decomposing Complex Clinical Phenotypes by
Biologically Structured Microarray Analysis
  • Claudio Lottaz and Rainer Spang

Berlin Center for Genome Based Bioinformatics,
Berlin (Germany)
Computational Diagnostics, Max Planck Institute
for Molecular Genetics, Berlin (Germany)
2
Overview
  • Introduction
  • Using functional annotation forsemi-supervised
    classification
  • Heterogeneity vs. performance
  • Evaluation on cancer related data
  • Concluasions

3
Tumor Classification
  • Setting
  • Data gene expression profiles
  • Goal prediction/classification of
    outcome/sub-type
  • More formally
  • Many expression levels measured
  • Samples labelled as disease and control
  • Train classifier

4
State-of-the-Art
  • Various powerful methods
  • Support vector machines
  • Shrunken centroids...
  • Regularization to fight overfitting
  • Feature selection
  • Large margins...
  • Common hypothesisGenerate a single molecular
    signature

5
Complex Phenotypes
  • A single clinical phenotype may be caused by
    different molecular mechanisms
  • Our approach discover several sub-classes in
    disease group
  • Each sub-class has a homogeneous molecular
    signature

6
Molecular Symptoms
  • Classical signatures are globally optimal
  • They have no biological focus
  • Genes are corregulated thus correlated? in a
    global signature genes can be replaced with
    little loss
  • Molecular Symptom
  • A functionally focused signature to identify a
    disease sub-class
  • High specificity sub-optimal sensitivity

7
Molecular Patient Stratification
  • Patterns of molecular symptoms define a
    molecular patient stratification

8
Using Functionl AnnotationsA Priori vs. A
Posteriori
  • Common procedure
  • Our suggestion

9
Gene Ontology
  • Biological terms ina directed graph
  • Genes annotatedto terms
  • Levelsrepresentspecificityof terms

10
Structured Analysis of Microarrays
  • Classification in leaf nodes
  • Regularized multivariate classifier
  • Local signatures
  • Diagnosis propagation
  • Combine child diagnoses in inner nodes
  • Generate more general diagnoses
  • Regularization
  • Shrink the classifier graph
  • Remove uninformative branches

11
Leaf Node Classification
  • Shrunken centroid classification(Tibshirani et
    al. 2002)
  • Classificatino according to distance to centroids
  • Regularization via gene shrinkage
  • Determine probability-like values as
    classification results

12
Propagation of Classification
  • Weighted averages
  • Weight according to child performance
  • Weights are normalized per inner node

Pa
w1
w3
w2
C1
C3
C2
13
Graph Shrinkage
  • Weights of nodes are shrunken by a constant
  • Negative weights are set to zero? uninformative
    branches vanish
  • Best shrinkage level chosen in cross-validation

14
Biased Classifier Evaluation
Calibration of Sensitivity and Specificity
Shrinkage Parameter
Worst Performance in Leaf Node
?Cj DCi ( ?j Dj )-1
15
Classifier Heterogeneity
  • Difference between two classifiersmeasures
    inconsistency of classifications
  • Nodes redundancy
  • Graphs redundancy(K? nodes of the shrunken
    graph)

16
Calibration
  • Sensitivity vs. Specificity??
  • Best classifiers set to control prevalence
  • More molecular symptoms set ? higher than
    control prevalence
  • Heterogeneity vs. Performance ?
  • Molecular symptoms are heterogeneous
  • Thus high ? eliminates them

17
Leukemia Data Set
  • Data set by Yeoh et al. 2002
  • Acute lymphocytic leukemia
  • 327 patients of 7 clinical sub-types
  • Expression profiles by HG-U95Av2
  • Task for illustration
  • Detect MLL sub-type
  • 20 MLL samples
  • 109 test set / 218 training set

18
Functional Annotations
  • Focus on GOs Biological Process branch(8173
    terms)
  • 12625 probesets on the chip
  • 8679 genes (68.7 of probesets)
  • In 1359 leaf nodes
  • 845 inner nodes (total 2204 nodes)

19
MLL Classifier
  • 2796 genes accessible through 32 nodes

20
MLL Stratification
21
Conclusions
  • Semi-supervised classification
  • Datect sub-classes
  • In labelled disease groups
  • Functional annotation
  • Use in an a priori fashion
  • To find biologically focused signatures??
    molecular symptoms
  • Resolve complex clinical phenotypes
    (stratification through molecular symptoms)
Write a Comment
User Comments (0)
About PowerShow.com