Part 5: Linking Microarray Data with Survival Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Part 5: Linking Microarray Data with Survival Analysis

Description:

Part 5: Linking Microarray Data with Survival Analysis ... benefit from adjuvant therapy. Lung Cancer Data Sets (see http://www.camda.duke.edu/camda03) ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 39
Provided by: kar143
Category:

less

Transcript and Presenter's Notes

Title: Part 5: Linking Microarray Data with Survival Analysis


1
Part 5 Linking Microarray Data with Survival
Analysis
2
Use of microarray data via model-based
classification in the study and prediction of
survival from lung cancer (Ben-Tovim Jones et
al., 2005)
3
Problems
  • Censored Observations the time of occurrence of
    the event
  • (death) has not yet been observed.
  • Small Sample Sizes study limited by patient
    numbers
  • Specific Patient Group is the study applicable
    to other
  • populations?
  • Difficulty in integrating different studies
    (different
  • microarray platforms)

4
A Case Study The Lung Cancer data sets from
CAMDA03
Four independently acquired lung cancer data sets
(Harvard, Michigan, Stanford and Ontario). The
challenge To integrate information from
different data sets (2 Affy chips of different
versions, 2 cDNA arrays). The final goal To
make an impact on cancer biology and eventually
patient care. Especially, we welcome the
methodology of survival analysis using
microarrays for cancer prognosis (Park et al.
Bioinformatics S120, 2002).
5
Methodology of Survival Analysis using Microarrays
Cluster the tissue samples (eg using hierarchical
clustering), then compare the survival curves for
each cluster using a non-parametric Kaplan-Meier
analysis (Alizadeh et al. 2000). Park et al.
(2002), Nguyen and Rocke (2002) used partial
least squares with the proportional hazards
model of Cox. Unsupervised vs. Supervised
Methods Semi-supervised approach of Bair and
Tibshirani (2004), to combine gene expression
data with the clinical data.
6
AIM To link gene-expression data with survival
from lung cancer in the CAMDA03
challenge A CLUSTER ANALYSIS We apply a
model-based clustering approach to classify
tumour tissues on the basis of microarray gene
expression. B SURVIVAL ANALYSIS The
association between the clusters so formed and
patient survival (recurrence) times is
established. C DISCRIMINANT ANALYSIS We
demonstrate the potential of the
clustering-based prognosis as a predictor of the
outcome of disease.
7
Lung Cancer
Approx. 80 of lung cancer patients have NSCLC
(of which adenocarcinoma is the most common
form). All Patients diagnosed with NSCLC are
treated on the basis of stage at presentation
(tumour size, lymph node involvement and
presence of metastases). Yet 30 of patients
with resected stage I lung cancer will die of
metastatic cancer within 5 years of
surgery. Want a prognostic test for early-stage
lung adenocarcinoma to identify patients more
likely to recur, and therefore who would benefit
from adjuvant therapy.
8
Lung Cancer Data Sets
(see http//www.camda.duke.edu/camda03)
Wigle et al. (2002), Garber et al. (2001),
Bhattacharjee et al. (2001), Beer et al. (2002).
9
Heat Map for 2880 Ontario Genes (39 Tissues)
Genes
Tissues
10
(No Transcript)
11
Heat Maps for the 20 Ontario Gene-Groups (39
Tissues)
Genes
Tissues
Tissues are ordered as Recurrence (1-24) and
Censored (25-39)
12
Expression Profiles for Useful Metagenes (Ontario
39 Tissues)
Gene Group 1
Gene Group 2
Our Tissue Cluster 1
Our Tissue Cluster 2
Log Expression Value
Recurrence (1-24)
Censored (25-39)
Gene Group 19
Gene Group 20
Tissues
13
Tissue Clusters
CLUSTER ANALYSIS via EMMIX-GENE of 20 METAGENES
yields TWO CLUSTERS CLUSTER 1 (31) 23
(recurrence) plus
8 (censored) CLUSTER 2 (8) 1 (recurrence)
plus 7
(censored)
Poor-prognosis
Good-prognosis
14
SURVIVAL ANALYSIS LONG-TERM SURVIVOR (LTS)
MODEL where T is time to recurrence and p1
1- p2 is the prior prob. of recurrence. Adopt
Weibull model for the survival function
for recurrence S1(t).
15
Fitted LTS Model vs. Kaplan-Meier
16
PCA of Tissues Based on Metagenes
Second PC
First PC
17
PCA of Tissues Based on Metagenes
Second PC
First PC
18
PCA of Tissues Based on All Genes (via SVD)
Second PC
First PC
19
PCA of Tissues Based on All Genes (via SVD)
Second PC
First PC
20
Cluster-Specific Kaplan-Meier Plots
21
Survival Analysis for Ontario Dataset
  • Nonparametric analysis

Cluster No. of Tissues No. of Censored Mean time to Failure (?SE)
1 2 29 8 8 7 665 ? 85.9 1388 ? 155.7
A significant difference between Kaplan-Meier
estimates for the two clusters (P0.027).
  • Coxs proportional hazards analysis

Variable Hazard ratio (95 CI) P-value
Cluster 1 vs. Cluster 2 Tumor stage (I vs. IIIII) 6.78 (0.9 51.5) 1.07 (0.57 2.0) 0.06 0.83
22
Discriminant Analysis (Supervised
Classification) A prognosis classifier was
developed to predict the class of origin of a
tumor tissue with a small error rate after
correction for the selection bias. A support
vector machine (SVM) was adopted to identify
important genes that play a key role on
predicting the clinical outcome, using all the
genes, and the metagenes. A cross-validation
(CV) procedure was used to calculate the
prediction error, after correction for the
selection bias.  
23
ONTARIO DATA (39 tissues) Support Vector Machine
(SVM) with Recursive Feature Elimination (RFE)
0.12
0.1
0.08
Error Rate (CV10E)
0.06
0.04
0.02
0
0
2
4
6
8
10
12
log2 (number of genes)
Ten-fold Cross-Validation Error Rate (CV10E) of
Support Vector Machine (SVM). applied to g2
clusters (G1 1-14, 16- 29,33,36,38 G2
15,30-32,34,35,37,39)
24
STANFORD DATA
918 genes based on 73 tissue samples from 67
patients. Row and column normalized, retained
451 genes after select-genes step. Used 20
metagenes to cluster tissues. Retrieved
histological groups.
25
Heat Maps for the 20 Stanford Gene-Groups (73
Tissues)
Genes
Tissues
Tissues are ordered by their histological
classification Adenocarcinoma (1-41), Fetal Lung
(42), Large cell (43-47), Normal (48-52),
Squamous cell (53-68), Small cell (69-73)
26
STANFORD CLASSIFICATION Cluster 1 1-19
(good prognosis) Cluster 2 20-26
(long-term survivors) Cluster 3 27-35
(poor prognosis)
27
Heat Maps for the 15 Stanford Gene-Groups (35
Tissues)
Genes
Tissues
Tissues are ordered by the Stanford
classification into AC groups AC group 1 (1-19),
AC group 2 (20-26), AC group 3 (27-35)
28
Expression Profiles for Top Metagenes (Stanford
35 AC Tissues)
Gene Group 1
Gene Group 2
Stanford AC group 1
Stanford AC group 2
Stanford AC group 3
Misallocated
Log Expression Value
Gene Group 4
Gene Group 3
Tissues
29
Cluster-Specific Kaplan-Meier Plots
30
Cluster-Specific Kaplan-Meier Plots
31
Survival Analysis for Stanford Dataset
  • Kaplan-Meier estimation

Cluster No. of Tissues No. of Censored Mean time to Failure (?SE)
1 2 17 5 10 0 37.5 ? 5.0 5.2 ? 2.3
A significant difference in survival between
clusters (Plt0.001)
  • Coxs proportional hazards analysis

Variable Hazard ratio (95 CI) P-value
Cluster 3 vs. Clusters 12 Grade 3 vs. grades 1 or 2 Tumor size No. of tumors in lymph nodes Presence of metastases 13.2 (2.1 81.1) 1.94 (0.5 8.5) 0.96 (0.3 2.8) 1.65 (0.7 3.9) 4.41 (1.0 19.8) 0.005 0.38 0.93 0.25 0.05
32
Survival Analysis for Stanford Dataset
  • Univariate Coxs proportional hazards analysis
    (metagenes)

Metagene Coefficient (SE) P-value
1 2 3 4 5 1.37 (0.44) -0.24 (0.31) 0.14 (0.34) -1.01 (0.56) 0.66 (0.65) 0.002 0.44 0.68 0.07 0.31
6 7 8 9 10 -0.63 (0.50) -0.68 (0.57) 0.75 (0.46) -1.13 (0.50) 0.73 (0.39) 0.20 0.24 0.10 0.02 0.06
11 12 13 14 15 0.35 (0.50) -0.55 (0.41) -0.61 (0.48) 0.22 (0.36) 1.70 (0.92) 0.48 0.18 0.20 0.53 0.06
33
Survival Analysis for Stanford Dataset
  • Multivariate Coxs proportional hazards
    analysis (metagenes)

Metagene Coefficient (SE) P-value
1 2 8 11 3.44 (0.95) -1.60 (0.62) -1.55 (0.73) 1.16 (0.54) 0.0003 0.010 0.033 0.031
The final model consists of four metagenes.
34
STANFORD DATA Support Vector Machine (SVM) with
Recursive Feature Elimination (RFE)
0.07
0.06
0.05
0.04
Error Rate (CV10E)
0.03
0.02
0.01
0
0
1
2
3
4
5
6
7
8
9
10
log2 (number of genes)
Ten-fold Cross-Validation Error Rate (CV10E) of
Support Vector Machine (SVM). Applied to g2
clusters.
35
  • CONCLUSIONS
  • We applied a model-based clustering approach to
  • classify tumors using their gene signatures into
  • clusters corresponding to tumor type
  • clusters corresponding to clinical outcomes for
    tumors of a given subtype
  • In (a), almost perfect correspondence between
  • cluster and tumor type, at least for non-AC
  • tumors (but not in the Ontario dataset).

36
CONCLUSIONS (cont.)
The clusters in (b) were identified with clinical
outcomes (e.g. recurrence/recurrence-free and
death/long-term survival). We were able to show
that gene-expression data provide prognostic
information, beyond that of clinical indicators
such as stage.
37
CONCLUSIONS (cont.)
Based on the tissue clusters, a discriminant
analysis using support vector machines (SVM)
demonstrated further the potential of gene
expression as a tool for guiding treatment
therapy and patient care to lung cancer patients.
This supervised classification procedure was
used to provide marker genes for prediction of
clinical outcomes. (In addition to those
provided by the cluster-genes step in the initial
unsupervised classification.)
38
LIMITATIONS

Small number of tumors available (e.g Ontario and
Stanford datasets). Clinical data available
for only subsets of the tumors often for only
one tumor type (AC). High proportion of
censored observations limits comparison of
survival rates.
Write a Comment
User Comments (0)
About PowerShow.com