Development and Validation of Predictive

Classifiers using Gene Expression Profiles

- Richard Simon, D.Sc.
- Chief, Biometric Research Branch
- National Cancer Institute
- http//brb.nci.nih.gov

BRB Websitebrb.nci.nih.gov

- Powerpoint presentations and audio files
- Reprints Technical Reports
- BRB-ArrayTools software
- BRB-ArrayTools Data Archive
- 100 published cancer gene expression datasets

with clinical annotations - Sample Size Planning for Clinical Trials with

Predictive Biomarkers

(No Transcript)

(No Transcript)

Types of Clinical Outcome

- Survival or disease-free survival
- Response to therapy

- 90 publications identified that met criteria
- Abstracted information for all 90
- Performed detailed review of statistical analysis

for the 42 papers published in 2004

Major Flaws Found in 40 Studies Published in 2004

- Inadequate control of multiple comparisons in

gene finding - 9/23 studies had unclear or inadequate methods to

deal with false positives - 10,000 genes x .05 significance level 500 false

positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete

cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression

clusters based on differentially expressed genes

could help distinguish clinical outcomes - 50 of studies contained one or more major flaws

(No Transcript)

(No Transcript)

Kinds of Biomarkers

- Surrogate endpoint
- Pre post rx, early measure of clinical outcome
- Pharmacodynamic
- Pre post rx, measures an effect of rx on

disease - Prognostic
- Which patients need rx
- Predictive
- Which patients are likely to benefit from a

specific rx - Product characterization

Cardiac Arrhythmia Supression Trial

- Ventricular premature beats was proposed as a

surrogate for survival - Antiarrythmic drugs supressed ventricular

premature beats but killed patients at

approximately 2.5 times that of placebo

Prognostic Biomarkers

- Most prognostic factors are not used because they

are not therapeutically relevant - Most prognostic factor studies are poorly

designed - They are not focused on a clear therapeutically

relevant objective - They use a convenience sample of patients for

whom tissue is available. Generally the patients

are too heterogeneous to support therapeutically

relevant conclusions - They address statistical significance rather than

predictive accuracy relative to standard

prognostic factors

Pusztai et al. The Oncologist 8252-8, 2003

- 939 articles on prognostic markers or

prognostic factors in breast cancer in past 20

years - ASCO guidelines only recommend routine testing

for ER, PR and HER-2 in breast cancer - With the exception of ER or progesterone

receptor expression and HER-2 gene amplification,

there are no clinically useful molecular

predictors of response to any form of anticancer

therapy.

Prognostic and Predictive Classifiers

- Most cancer treatments benefit only a minority of

patients to whom they are administered - Particularly true for molecularly targeted drugs
- Being able to predict which patients are likely

to benefit would - save patients from unnecessary toxicity, and

enhance their chance of receiving a drug that

helps them - Help control medical costs
- Improve the success rate of clinical drug

development

(No Transcript)

- Molecularly targeted drugs may benefit a

relatively small population of patients with a

given primary site/stage of disease - Iressa
- Herceptin

(No Transcript)

Prognostic Biomarkers Can be Therapeutically

Relevant

- 3-5 of node negative ER breast cancer patients

require or benefit from systemic rx other than

endocrine rx - Prognostic biomarker development should focus on

specific therapeutic decision context

B-14 ResultsRelapse-Free Survival

Paik et al, SABCS 2003

Key Features of OncotypeDx Development

- Identification of important therapeutic decision

context - Prognostic marker development was based on

patients with node negative ER positive breast

cancer receiving tamoxifen as only systemic

treatment - Use of patients in NSABP clinical trials
- Staged development and validation
- Separation of data used for test development from

data used for test validation - Development of robust assay with rigorous

analytical validation - 21 gene RTPCR assay for FFPE tissue
- Quality assurance by single reference laboratory

operation

Predictive Biomarkers

- Cancers of a primary site are often a

heterogeneous grouping of diverse molecular

diseases - The molecular diseases vary enormously in their

responsiveness to a given treatment - It is feasible (but difficult) to develop

prognostic markers that identify which patients

need systemic treatment and which have tumors

likely to respond to a given treatment - e.g. breast cancer and ER/PR, Her2

Mutations Copy number changes Translocations Expre

ssion profile

Treatment

DNA Microarray Technology

- Powerful tool for understanding mechanisms and

enabling predictive medicine - Challenges ability of biomedical scientists to

use effectively to produce biological knowledge

or clinical utility - Challenges statisticians with new problems for

which existing analysis paradigms are often

inapplicable - Excessive hype and skepticism

Myth

- That microarray investigations should be

unstructured data-mining adventures without clear

objectives

- Good microarray studies have clear objectives,

but not generally gene specific mechanistic

hypotheses - Design and analysis methods should be tailored to

study objectives

Good Microarray Studies Have Clear Objectives

- Class Comparison
- Find genes whose expression differs among

predetermined classes - Fing genes whose expression varies over a time

course in response to a defined stimulus - Class Prediction
- Prediction of predetermined class (phenotype)

using information from gene expression profile - Survival risk group prediction
- Class Discovery
- Discover clusters of specimens having similar

expression profiles - Discover clusters of genes having similar

expression profiles

Class Comparison and Class Prediction

- Not clustering problems
- Global similarity measures generally used for

clustering arrays may not distinguish classes - Dont control multiplicity or for distinguishing

data used for classifier development from data

used for classifier evaluation - Supervised methods
- Requires multiple biological samples from each

class

Levels of Replication

- Technical replicates
- RNA sample divided into multiple aliquots and

re-arrayed - Biological replicates
- Multiple subjects
- Replication of the tissue culture experiment

- Biological conclusions generally require

independent biological replicates. The power of

statistical methods for microarray data depends

on the number of biological replicates. - Technical replicates are useful insurance to

ensure that at least one good quality array of

each specimen will be obtained.

Class Prediction

- Predict which tumors will respond to a particular

treatment - Predict which patients will relapse after a

particular treatment

Microarray Platforms for Developing Predictive

Classifiers

- Single label arrays
- Affymetrix GeneChips
- Dual label arrays using common reference design
- Dye swaps are unnecessary

Common Reference Design

A1

A2

B1

B2

RED

R

R

R

R

GREEN

Array 1

Array 2

Array 3

Array 4

Ai ith specimen from class A

Bi ith specimen from class B

R aliquot from reference pool

- The reference generally serves to control

variation in the size of corresponding spots on

different arrays and variation in sample

distribution over the slide. - The reference provides a relative measure of

expression for a given gene in a given sample

that is less variable than an absolute measure. - The reference is not the object of comparison.
- The relative measure of expression will be

compared among biologically independent samples

from different classes.

(No Transcript)

Class Prediction

- A set of genes is not a classifier
- Testing whether analysis of independent data

results in selection of the same set of genes is

not an appropriate test of predictive accuracy of

a classifier

Components of Class Prediction

- Feature (gene) selection
- Which genes will be included in the model
- Select model type
- E.g. Diagonal linear discriminant analysis,

Nearest-Neighbor, - Fitting parameters (regression coefficients) for

model - Selecting value of tuning parameters
- Estimating prediction accuracy

Class Prediction ? Class Comparison

- The criteria for gene selection for class

prediction and for class comparison are different - For class comparison false discovery rate is

important - For class prediction, predictive accuracy is

important - Demonstrating statistical significance of

prognostic factors is not the same as

demonstrating predictive accuracy. - Statisticians are used to inference, not

prediction - Most statistical methods were not developed for

pgtgtn prediction problems

Myth

- Complex classification algorithms such as neural

networks perform better than simpler methods for

class prediction.

Simple Gene Selection

- Select genes that are differentially expressed

among the classes at a significance level ? (e.g.

0.01) - The ? level is a tuning parameter
- For class comparison false discovery rate is

important - For class prediction, predictive accuracy is

important - For prediction it is usually more serious to

exclude an informative variable than to include

some noise variables

Optimal significance level cutoffs for gene

selection. 50 differentially expressed genes

out of 22,000 on n arrays

2d/s standardized difference n10 n30 n50

1 0.167 0.003 0.00068

1.25 0.085 0.0011 0.00035

1.5 0.045 0.00063 0.00016

1.75 0.026 0.00036 0.00006

2 0.015 0.0002 0.00002

(No Transcript)

Complex Gene Selection

- Small subset of genes which together give most

accurate predictions - Genetic algorithms
- Little evidence that complex feature selection is

useful in microarray problems - Failure to compare to simpler methods
- Improper use of cross-validation

Linear Classifiers for Two Classes

Linear Classifiers for Two Classes

- Fisher linear discriminant analysis
- Diagonal linear discriminant analysis (DLDA)

assumes features are uncorrelated - Compound covariate predictor (Radmacher)
- Golubs weighted voting method
- Support vector machines with inner product kernel
- Perceptron

Fisher LDA

The Compound Covariate Predictor (CCP)

- Motivated by J. Tukey, Controlled Clinical

Trials, 1993 - A compound covariate is built from the basic

covariates (log-ratios) - tj is the two-sample t-statistic for gene j.
- xij is the log-expression measure of sample i for

gene j. - Sum is over selected genes.
- Threshold of classification midpoint of the CCP

means for the two classes.

Linear Classifiers for Two Classes

- Compound covariate predictor
- Instead of for DLDA

Support Vector Machine

Perceptrons

- Perceptrons are neural networks with no hidden

layer and linear transfer functions between input

output - Number of input nodes equals number of genes

selected - Number of output nodes equals number of classes

minus 1 - Number of inputs may be major principal

components of genes or major principal components

of informative genes - Perceptrons are linear classifiers

Other Simple Methods

- Nearest neighbor classification
- Nearest k-neighbors
- Nearest centroid classification
- Shrunken centroid classification

Nearest Neighbor Classifier

- To classify a sample in the validation set as

being in outcome class 1 or outcome class 2,

determine which sample in the training set its

gene expression profile is most similar to. - Similarity measure used is based on genes

selected as being univariately differentially

expressed between the classes - Correlation similarity or Euclidean distance

generally used - Classify the sample as being in the same class as

its nearest neighbor in the training set

When pgtgtn

- It is always possible to find a set of features

and a weight vector for which the classification

error on the training set is zero. - Why consider more complex models?

- Artificial intelligence sells to journal

reviewers and peers who cannot distinguish hype

from substance when it comes to microarray data

analysis. - Comparative studies generally indicate that

simpler methods work as well or better for

microarray problems because they avoid

overfitting the data.

(No Transcript)

Other Methods

- Top-scoring pairs
- CART
- Random Forrest

Apparent Dimension Reduction Based Methods

- Principal component regression
- Supervised principal component regression
- Partial least squares
- Stepwise logistic regression

When There Are More Than 2 Classes

- Nearest neighbor type methods
- Decision tree of binary classifiers

Decision Tree of Binary Classifiers

- Partition the set of classes 1,2,,K into two

disjoint subsets S1 and S2 - e.g. S11, S2 2,3,4
- Develop a binary classifier for distinguishing

the composite classes S1 and S2 - Compute the cross-validated classification error

for distinguishing S1 and S2 - Repeat the above steps for all possible

partitions in order to find the partition S1and

S2 for which the cross-validated classification

error is minimized - If S1and S2 are not singleton sets, then repeat

all of the above steps separately for the classes

in S1and S2 to optimally partition each of them

Evaluating a Classifier

- Prediction is difficult, especially the future.
- Neils Bohr

Validating a Predictive Classifier

- Fit of a model to the same data used to develop

it is no evidence of prediction accuracy for

independent data - Goodness of fit is not prediction accuracy
- Demonstrating statistical significance of

prognostic factors is not the same as

demonstrating predictive accuracy - Demonstrating stability of selected genes is not

demonstrating predictive accuracy of a model for

independent data

(No Transcript)

(No Transcript)

Split-Sample Evaluation

- Training-set
- Used to select features, select model type,

determine parameters and cut-off thresholds - Test-set
- Withheld until a single model is fully specified

using the training-set. - Fully specified model is applied to the

expression profiles in the test-set to predict

class labels. - Number of errors is counted
- Ideally test set data is from different centers

than the training data and assayed at a different

time

Cross-Validated Prediction (Leave-One-Out Method)

1. Full data set is divided into training and

test sets (test set contains 1 specimen). 2.

Prediction rule is built from scratch

using the training

set. 3. Rule is applied to the specimen in the

test set for class prediction. 4. Process is

repeated until each specimen has appeared once in

the test set.

Leave-one-out Cross Validation

- Omit sample 1
- Develop multivariate classifier from scratch on

training set with sample 1 omitted - Predict class for sample 1 and record whether

prediction is correct

Leave-one-out Cross Validation

- Repeat analysis for training sets with each

single sample omitted one at a time - e number of misclassifications determined by

cross-validation - Subdivide e for estimation of sensitivity and

specificity

- Cross validation is only valid if the test set is

not used in any way in the development of the

model. Using the complete set of samples to

select genes violates this assumption and

invalidates cross-validation. - With proper cross-validation, the model must be

developed from scratch for each leave-one-out

training set. This means that feature selection

must be repeated for each leave-one-out training

set. - Simon R, Radmacher MD, Dobbin K, McShane LM.

Pitfalls in the analysis of DNA microarray data.

Journal of the National Cancer Institute

9514-18, 2003. - The cross-validated estimate of misclassification

error is an estimate of the prediction error for

model fit using specified algorithm to full

dataset

Prediction on Simulated Null Data

- Generation of Gene Expression Profiles
- 14 specimens (Pi is the expression profile for

specimen i) - Log-ratio measurements on 6000 genes
- Pi MVN(0, I6000)
- Can we distinguish between the first 7 specimens

(Class 1) and the last 7 (Class 2)? - Prediction Method
- Compound covariate prediction (discussed later)
- Compound covariate built from the log-ratios of

the 10 most differentially expressed genes.

(No Transcript)

Partial Cross-Validation of Random Data

- Generate data for p features and n cases

identically distributed in two classes - No model should predict more accurately than the

flip of a fair coin - Using all the data select kltltp features that

appear most differentially expressed between the

two classes - Cross validate the estimation of model parameters

using the same k features for all LOOCV training

sets - The cross-validated estimate of prediction error

will be 0 over 99 of the time.

(No Transcript)

Major Flaws Found in 40 Studies Published in 2004

- Inadequate control of multiple comparisons in

gene finding - 9/23 studies had unclear or inadequate methods to

deal with false positives - 10,000 genes x .05 significance level 500 false

positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete

cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression

clusters based on differentially expressed genes

could help distinguish clinical outcomes - 50 of studies contained one or more major flaws

Class Prediction

- Cluster analysis is frequently used in

publications for class prediction in a misleading

way

Fallacy of Clustering Classes Based on Selected

Genes

- Even for arrays randomly distributed between

classes, genes will be found that are

significantly differentially expressed - With 10,000 genes measured, about 500 false

positives will be differentially expressed with

p lt 0.05 - Arrays in the two classes will necessarily

cluster separately when using a distance measure

based on genes selected to distinguish the

classes

Major Flaws Found in 40 Studies Published in 2004

- Inadequate control of multiple comparisons in

gene finding - 9/23 studies had unclear or inadequate methods to

deal with false positives - 10,000 genes x .05 significance level 500 false

positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete

cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression

clusters based on differentially expressed genes

could help distinguish clinical outcomes - 50 of studies contained one or more major flaws

(No Transcript)

Myth

- Split sample validation is superior to LOOCV or

10-fold CV for estimating prediction error

(No Transcript)

Simulated Data40 cases, 10 genes selected from

5000

Method Estimate Std Deviation

True .078

Resubstitution .007 .016

LOOCV .092 .115

10-fold CV .118 .120

5-fold CV .161 .127

Split sample 1-1 .345 .185

Split sample 2-1 .205 .184

.632 bootstrap .274 .084

Comparison of Internal Validation

MethodsMolinaro, Pfiffer Simon

- For small sample sizes, LOOCV is much more

accurate than split-sample validation - Split sample validation over-estimates prediction

error - For small sample sizes, LOOCV is preferable to

10-fold, 5-fold cross-validation or repeated

k-fold versions - For moderate sample sizes, 10-fold is preferable

to LOOCV - Some claims for bootstrap resampling for

estimating prediction error are not valid for

pgtgtn problems

(No Transcript)

Simulated Data40 cases, 10 genes selected from

5000

Method Estimate Std Deviation

True .078

Resubstitution .007 .016

LOOCV .092 .115

10-fold CV .118 .120

5-fold CV .161 .127

Split sample 1-1 .345 .185

Split sample 2-1 .205 .184

.632 bootstrap .274 .084

Simulated Data40 cases

Method Estimate Std Deviation

True .078

10-fold .118 .120

Repeated 10-fold .116 .109

5-fold .161 .127

Repeated 5-fold .159 .114

Split 1-1 .345 .185

Repeated split 1-1 .371 .065

DLBCL Data

Method Bias Std Deviation MSE

LOOCV -.019 .072 .008

10-fold CV -.007 .063 .006

5-fold CV .004 .07 .007

Split 1-1 .037 .117 .018

Split 2-1 .001 .119 .017

.632 bootstrap -.006 .049 .004

Permutation Distribution of Cross-validated

Misclassification Rate of a Multivariate

Classifier

- Randomly permute class labels and repeat the

entire cross-validation - Re-do for all (or 1000) random permutations of

class labels - Permutation p value is fraction of random

permutations that gave as few misclassifications

as e in the real data

Gene-Expression Profiles in Hereditary Breast

Cancer

- Breast tumors studied
- 7 BRCA1 tumors
- 8 BRCA2 tumors
- 7 sporadic tumors
- Log-ratios measurements of 3226 genes for each

tumor after initial data filtering

RESEARCH QUESTION Can we distinguish BRCA1 from

BRCA1 cancers and BRCA2 from BRCA2 cancers

based solely on their gene expression profiles?

BRCA1

BRCA2

Classification of BRCA2 Germline Mutations

Classification Method LOOCV Prediction Error

Compound Covariate Predictor 14

Fisher LDA 36

Diagonal LDA 14

1-Nearest Neighbor 9

3-Nearest Neighbor 23

Support Vector Machine (linear kernel) 18

Classification Tree 45

Myth

- Huge sample sizes are needed to develop effective

predictive classifiers

Sample Size Planning References

- K Dobbin, R Simon. Sample size determination in

microarray experiments for class comparison and

prognostic classification. Biostatistics 627,

2005 - K Dobbin, R Simon. Sample size planning for

developing classifiers using high dimensional DNA

microarray data. Biostatistics 8101, 2007 - K Dobbin, Y Zhao, R Simon. How large a training

set is needed to develop a classifier for

microarray data? Clinical Cancer Res 14108, 2008

Sample Size Planning for Classifier Development

- The expected value (over training sets) of the

probability of correct classification PCC(n)

should be within ? of the maximum achievable

PCC(?)

Probability Model

- Two classes
- Log expression or log ratio MVN in each class

with common covariance matrix - m differentially expressed genes
- p-m noise genes
- Expression of differentially expressed genes are

independent of expression for noise genes - All differentially expressed genes have same

inter-class mean difference 2? - Common variance for differentially expressed

genes and for noise genes

Classifier

- Feature selection based on univariate t-tests for

differential expression at significance level ? - Simple linear classifier with equal weights

(except for sign) for all selected genes. Power

for selecting each of the informative genes that

are differentially expressed by mean difference

2? is 1-?(n)

- For 2 classes of equal prevalence, let ?1 denote

the largest eigenvalue of the covariance matrix

of informative genes. Then

(No Transcript)

(No Transcript)

Sample size as a function of effect size

(log-base 2 fold-change between classes divided

by standard deviation). Two different tolerances

shown, . Each class is equally represented in the

population. 22000 genes on an array.

(No Transcript)

BRB-ArrayToolsSurvival Risk Group Prediction

- No need to transform data to good vs bad outcome.

Censored survival is directly analyzed - Gene selection based on significance in

univariate Cox Proportional Hazards regression - Uses k principal components of selected genes
- Gene selection re-done for each resampled

training set - Develop k-variable Cox PH model for each

leave-one-out training set

BRB-ArrayToolsSurvival Risk Group Prediction

- Classify left out sample as above or below median

risk based on model not involving that sample - Repeat, leaving out 1 sample at a time to obtain

cross-validated risk group predictions for all

cases - Compute Kaplan-Meier survival curves of the two

predicted risk groups - Permutation analysis to evaluate statistical

significance of separation of K-M curves

BRB-ArrayToolsSurvival Risk Group Prediction

- Compare Kaplan-Meier curves for gene expression

based classifier to that for standard clinical

classifier - Develop classifier using standard clinical

staging plus genes that add to standard staging

Does an Expression Profile Classifier Predict

More Accurately Than Standard Prognostic

Variables?

- Some publications fit logistic model to standard

covariates and the cross-validated predictions of

expression profile classifiers - This is valid only with split-sample analysis

because the cross-validated predictions are not

independent

Does an Expression Profile Classifier Predict

More Accurately Than Standard Prognostic

Variables?

- Not an issue of which variables are significant

after adjusting for which others or which are

independent predictors - Predictive accuracy and inference are different
- The predictiveness of the expression profile

classifier can be evaluated within levels of the

classifier based on standard prognostic variables

Survival Risk Group Prediction

- LOOCV loop
- Create training set by omitting ith case
- Develop PH model for training set
- Compute predictive index for ith case using PH

model developed for training set - Compute percentile of predictive index for ith

case among predictive indices for cases in the

training set

Survival Risk Group Prediction

- Plot Kaplan Meier survival curves for cases with

predictive index percentiles above 50 and for

cases with cross-validated risk percentiles below

50 - Or for however many risk groups and thresholds is

desired - Compute log-rank statistic comparing the

cross-validated Kaplan Meier curves

Survival Risk Group Prediction

- Evaluate individual genes by fitting single

variable proportional hazards regression models

to log expression for each gene - Select genes based on p-value threshold for

single gene PH regressions - Compute first k principal components of the

selected genes - Fit PH regression model with the k pcs as

predictors. Let b1 , , bk denote the estimated

regression coefficients - To predict for case with expression profile

vector x, compute the k supervised pcs y1 , ,

yk and the predictive index ? b1 y1 bk yk

Survival Risk Group Prediction

- Repeat the entire procedure for permutations of

survival times and censoring indicators to

generate the null distribution of the log-rank

statistic - The usual chi-square null distribution is not

valid because the cross-validated risk

percentiles are correlated among cases - Evaluate statistical significance of the

association of survival and expression profiles

by referring the log-rank statistic for the

unpermuted data to the permutation null

distribution

- Outcome prediction in estrogen-receptor positive,

chemotherapy and tamoxifen treated patients with

locally advanced breast cancer

R. Simon, G. Bianchini, M. Zambetti, S. Govi, G.

Mariani, M. L. Carcangiu, P. Valagussa, L.

Gianni National Cancer Institute, Bethesda, MD

Fondazione IRCCS - Istituto Tumori di Milano,

Milan, Italy

PATIENTS AND METHODS - I

- Fifty-seven patients with ER positive tumors

enrolled in a neoadjuvant clinical trial for LABC

were evaluated. All patients had been treated

with doxorubicin and paclitaxel q 3wk x 3,

followed by weekly paclitaxel x 12 before

surgery, then adjuvant intravenous CMF q 4wk x 4

and thereafter tamoxifen. - High-throughput qRT-PCR gene expression analysis

in paraffin-embedded formalin-fixed core biopsies

at diagnosis was performed by Genomic Health to

quantify expression of 363 genes (plus 21 for

Oncotype DXTM determination), as described

previously (Gianni L, JCO 2005). RS genes were

excluded from analysis.

PATIENTS AND METHODS - II

- Three models (prognostic index) were developed to

predict Distant Event Free Survival (DEFS) - GENE MODEL Using only expression data, genes were

selected based on univariate Cox analysis p value

under a specific threshold significance level. - COVARIATES MODEL Using RS (as continuous

variable), age and IBC status (covariates) a

multivariate proportional hazards model was

developed. - COMBINED MODEL Using a combination of these

covariates and expression data, genes were

selected which add to predicting survival over

the predictive value provided by the covariates

and under a specific threshold significance

level. - Survival risk groups were constructed using the

supervised principal component method implemented

in BRB-ArrayTools (Bair E, Tibshirani R, PLOS

Biology 2004).

PATIENTS AND METHODS - III

- In order to evaluate the predictive value for

each model a complete Leave-One-Out

Cross-Validation was used. - For each i-th cross-validated training set (with

one case removed) a prognostic index (PI)

function was created. The PI for the omitted

patient is ranked relative to the PI for the i-th

training set. Because the PI is a continuous

variable, a cut-off percentiles have to be

pre-specified for defining the risk groups. The

omitted patient is placed into a risk group based

on her percentile ranking. The entire procedure

has been repeated using different cut-off

percentiles (BRB-ArrayTools Users Manual v3.7).

PATIENTS AND METHODS - IV

- Statistical significance was determined by

repeating the entire cross-validation process

1000 random permutations of the survival data. - For GENE MODEL the p value was testing the null

hypothesis that there is no relation between the

expression data and survival (by providing a

null-distribution of the log-rank statistic) - For COVARIATES MODEL the p value was the

parametric log-rank test statistic between risk

groups - For COMBINED MODEL the p value addressed whether

the expression data adds significantly to risk

prediction compared to the covariates

RESULTSPatients characteristics at diagnosis

- The median follow-up was 76 months (range 18-103)

(by inverse Kaplan-Meier method) - Patients characteristics were summarized in Table

1.

OS and DEFS of all patients

Overall Survival and Distant Event Free survival

All patients

Genes selected for the GENE MODEL and COMBINED

MODEL

- The significance level for gene selection used

for the identified models was p0.005. - All genes included in the COMBINED MODEL were

also selected in the GENE MODEL.

Cross-validated Kaplan-Meier curves for risk

groups using 50th percentile cut-off

DISTANT EVENT FREE SURVIVAL

DISTANT EVENT FREE SURVIVAL

DISTANT EVENT FREE SURVIVAL

COMBINED MODEL

COVARIATES MODEL

GENE MODEL

BRB-ArrayTools

- Contains analysis tools that I have selected as

valid and useful - Analysis wizard and multiple help screens for

biomedical scientists - Imports data from all platforms and major

databases - Automated import of data from NCBI Gene Express

Omnibus

Predictive Classifiers in BRB-ArrayTools

- Classifiers
- Diagonal linear discriminant
- Compound covariate
- Bayesian compound covariate
- Support vector machine with inner product kernel
- K-nearest neighbor
- Nearest centroid
- Shrunken centroid (PAM)
- Random forrest
- Tree of binary classifiers for k-classes
- Survival risk-group
- Supervised pcs

- Feature selection options
- Univariate t/F statistic
- Hierarchical variance option
- Restricted by fold effect
- Univariate classification power
- Recursive feature elimination
- Top-scoring pairs
- Validation methods
- Split-sample
- LOOCV
- Repeated k-fold CV
- .632 bootstrap

Selected Features of BRB-ArrayTools

- Multivariate permutation tests for class

comparison to control number and proportion of

false discoveries with specified confidence level - Permits blocking by another variable, pairing of

data, averaging of technical replicates - SAM
- Fortran implementation 7X faster than R versions
- Extensive annotation for identified genes
- Internal annotation of NetAffx, Source, Gene

Ontology, Pathway information - Links to annotations in genomic databases
- Find genes correlated with quantitative factor

while controlling number of proportion of false

discoveries - Find genes correlated with censored survival

while controlling number or proportion of false

discoveries - Analysis of variance

Selected Features of BRB-ArrayTools

- Gene set enrichment analysis.
- Gene Ontology groups, signaling pathways,

transcription factor targets, micro-RNA putative

targets - Automatic data download from Broad Institute
- KS LS test statistics for null hypothesis that

gene set is not enriched - Efron/Tibshirani max mean test
- Goemans Global test of null hypothesis that no

genes in set are differentially expressed - Class prediction
- Multiple classifiers
- Complete LOOCV, k-fold CV, repeated k-fold, .632

bootstrap - permutation significance of cross-validated error

rate

Selected Features of BRB-ArrayTools

- Survival risk-group prediction
- Supervised principal components with and without

clinical covariates - Cross-validated Kaplan Meier Curves
- Permutation test of cross-validated KM curves
- Clustering tools for class discovery with

reproducibility statistics on clusters - Internal access to Eisens Cluster and Treeview
- Visualization tools including rotating 3D

principal components plot exportable to

Powerpoint with rotation controls - Extensible via R plug-in feature
- Tutorials and datasets

BRB-ArrayTools

- Extensive built-in gene annotation and linkage to

gene annotation websites - Publicly available for non-commercial use
- http//brb.nci.nih.gov

Conclusions

- New technology and biological knowledge make it

increasingly feasible to identify which patients

are most likely to benefit from a specified

treatment - Predictive medicine is feasible based on

genomic characterization of a patients tumor - Targeting treatment can greatly improve the

therapeutic ratio of benefit to adverse effects - Treated patients benefit
- Economic benefit for society

Conclusions

- Achieving the potential of new technology

requires paradigm changes in focus and methods of

correlative science. - Effective interdisciplinary research requires

increased emphasis on cross education of

laboratory, clinical and statistical scientists

Acknowledgements

- Kevin Dobbin
- Alain Dupuy
- Wenyu Jiang
- Annette Molinaro
- Michael Radmacher
- Joanna Shih
- Yingdong Zhao
- BRB-ArrayTools Development Team