Clinical Trials of Predictive Medicine New Challenges and Paradigms - PowerPoint PPT Presentation

Loading...

PPT – Clinical Trials of Predictive Medicine New Challenges and Paradigms PowerPoint presentation | free to download - id: 4abbb6-NWRlZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Clinical Trials of Predictive Medicine New Challenges and Paradigms

Description:

Clinical Trials of Predictive Medicine New Challenges and Paradigms Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 59
Provided by: rsi97
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Clinical Trials of Predictive Medicine New Challenges and Paradigms


1
Clinical Trials of Predictive Medicine New
Challenges and Paradigms
  • Richard Simon, D.Sc.
  • Chief, Biometric Research Branch
  • National Cancer Institute
  • http//brb.nci.nih.gov

2
Biometric Research Branch Website brb.nci.nih.gov
  • Powerpoint presentations
  • Reprints
  • BRB-ArrayTools software
  • Data archive
  • Web based Sample Size Planning
  • Clinical Trials
  • Development of gene expression based predictive
    classifiers

3
Why We Need Prognostic Predictive Biomarkers
  • Most cancer patients dont benefit from the
    systemic treatments they receive
  • Being able to predict which patients are likely
    to benefit would
  • Benefit patients
  • Control medical costs
  • Improve the success rate of clinical drug
    development

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
  • Predictive biomarkers
  • Measured before treatment to identify who will or
    will not benefit from a particular treatment
  • ER, HER2, KRAS
  • Prognostic biomarkers
  • Measured before treatment to indicate long-term
    outcome for patients untreated or receiving
    standard treatment
  • Used to identify who does not require more
    intensive treatment
  • OncotypeDx

8
Prognostic and Predictive Biomarkers in Oncology
  • Single gene or protein measurement
  • ER protein expression
  • HER2 amplification
  • KRAS mutation
  • Index or classifier that summarizes expression
    levels of multiple genes
  • OncotypeDx recurrence score

9
Most Prognostic Factors are not Used
  • They are developed in unfocused studies not
    designed to address an intended medical use
  • The studies are based on convenience samples of
    heterogeneous patients for whom tissue is
    available
  • Although they correlate with a clinical endpoint,
    they have no demonstrated medical utility
  • They are not actionable

10
Types of Validation for Prognostic and Predictive
Biomarkers
  • Analytical validation
  • Accuracy compared to gold-standard assay
  • Robust and reproducible if there is no
    gold-standard
  • Clinical validation
  • Does the biomarker predict what its supposed to
    predict for independent data
  • Clinical/Medical utility
  • Does use of the biomarker result in patient
    benefit
  • Is it actionable?
  • Generally by improving treatment decisions

11
Clinical Trials Should Be Science Based
  • Cancers of a primary site are generally composed
    of a heterogeneous group of diverse molecular
    diseases
  • The molecular diseases vary fundamentally with
    regard to the oncogenic mutations that cause
    them, and in their responsiveness to specific
    drugs

12
Standard Clinical Trial Approaches
  • Based on assumptions that
  • Qualitative treatment by subset interactions are
    unlikely
  • Costs of over-treatment are less than costs
    of under-treatment
  • Have led to widespread over-treatment of patients
    with drugs to which few benefit

13
Predictive Biomarkers
  • In the past often studied as exploratory post-hoc
    subset analyses of RCTs.
  • Numerous subsets examined
  • Not focused pre-specified hypothesis
  • No control of type I error

14
How Can We Develop New Drugs in a Manner More
Consistent With Modern Tumor Biology and Obtain
Reliable Information About What Regimens Work for
What Kind of Tumors?
15
Prospective Drug Development With a Companion
Diagnostic
  1. Develop a completely specified genomic classifier
    of the patients likely to benefit from a new drug
  2. Establish analytical validity of the classifier
  3. Use the completely specified classifier to design
    and analyze a new clinical trial to evaluate
    effectiveness of the new treatment and how it
    relates to the classifier

16
Guiding Principle
  • The data used to develop the classifier must be
    distinct from the data used to test hypotheses
    about treatment effect in subsets determined by
    the classifier
  • Developmental studies are exploratory
  • Studies on which treatment effectiveness claims
    are to be based should be definitive studies that
    test a treatment hypothesis in a patient
    population completely pre-specified by the
    classifier

17
Targeted Design
  • Restrict entry to the phase III trial based on
    the binary predictive classifier

18
Develop Predictor of Response to New Drug
Using phase II data, develop predictor of
response to new drug
Patient Predicted Responsive
Patient Predicted Non-Responsive
Off Study
New Drug
Control
19
Applicability of Targeted Design
  • Primarily for settings where the classifier is
    based on a single gene whose protein product is
    the target of the drug
  • eg trastuzumab
  • With a strong biological basis for the
    classifier, it may be unacceptable to expose
    classifier negative patients to the new drug
  • Analytical validation, biological rationale and
    phase II data provide basis for regulatory
    approval of the test

20
Evaluating the Efficiency of Targeted Design
  • Simon R and Maitnourim A. Evaluating the
    efficiency of targeted designs for randomized
    clinical trials. Clinical Cancer Research
    106759-63, 2004 Correction and supplement
    123229, 2006
  • Maitnourim A and Simon R. On the efficiency of
    targeted clinical trials. Statistics in Medicine
    24329-339, 2005.
  • reprints and interactive sample size calculations
    at http//linus.nci.nih.gov

21
  • Relative efficiency of targeted design depends on
  • proportion of patients test positive
  • effectiveness of new drug (compared to control)
    for test negative patients
  • When less than half of patients are test positive
    and the drug has little or no benefit for test
    negative patients, the targeted design requires
    dramatically fewer randomized patients

22
Trastuzumab Herceptin
  • Metastatic breast cancer
  • 234 randomized patients per arm
  • 90 power for 13.5 improvement in 1-year
    survival over 67 baseline at 2-sided .05 level
  • If benefit were limited to the 25 assay
    patients, overall improvement in survival would
    have been 3.375
  • 4025 patients/arm would have been required

23
Web Based Software for Designing RCT of Drug and
Predictive Biomarker
  • http//brb.nci.nih.gov

24
Biomarker Stratified Design
25
  • Do not use the diagnostic to restrict
    eligibility, but to structure a prospective
    analysis plan
  • Having a prospective analysis plan is essential
  • Stratifying (balancing) the randomization is
    useful to ensure that all randomized patients
    have tissue available but is not a substitute for
    a prospective analysis plan
  • The purpose of the study is to evaluate the new
    treatment overall and for the pre-defined
    subsets not to modify or refine the classifier
  • The purpose is not to demonstrate that repeating
    the classifier development process on independent
    data results in the same classifier

26
  • R Simon. Using genomics in clinical trial design,
    Clinical Cancer Research 145984-93, 2008

27
(No Transcript)
28
Analysis Plan B (Limited confidence in test)
  • Compare the new drug to the control overall for
    all patients ignoring the classifier.
  • If poverall? 0.03 claim effectiveness for the
    eligible population as a whole
  • Otherwise perform a single subset analysis
    evaluating the new drug in the classifier
    patients
  • If psubset? 0.02 claim effectiveness for the
    classifier patients.

29
Analysis Plan C
  • Test for difference (interaction) between
    treatment effect in test positive patients and
    treatment effect in test negative patients
  • If interaction is significant at level ?int then
    compare treatments separately for test positive
    patients and test negative patients
  • Otherwise, compare treatments overall

30
Sample Size Planning for Analysis Plan C
  • 88 events in test patients needed to detect 50
    reduction in hazard at 5 two-sided significance
    level with 90 power
  • If 25 of patients are positive, when there are
    88 events in positive patients there will be
    about 264 events in negative patients
  • 264 events provides 90 power for detecting 33
    reduction in hazard at 5 two-sided significance
    level

31
Simulation Results for Analysis Plan C
  • Using ?int0.10, the interaction test has power
    93.7 when there is a 50 reduction in hazard in
    test positive patients and no treatment effect in
    test negative patients
  • A significant interaction and significant
    treatment effect in test positive patients is
    obtained in 88 of cases under the above
    conditions
  • If the treatment reduces hazard by 33 uniformly,
    the interaction test is negative and the overall
    test is significant in 87 of cases

32
Does the RCT Need to Be Significant Overall for
the T vs C Treatment Comparison?
  • No
  • It is incorrect to require that the overall T vs
    C comparison be significant to claim that T is
    better than C for test patients but not for
    test patients
  • That requirement has been traditionally used to
    protect against data dredging. It is
    inappropriate for focused trials of a treatment
    with a companion test.

33
Biomarker Adaptive Threshold Design
  • Wenyu Jiang, Boris Freidlin Richard Simon
  • JNCI 991036-43, 2007

34
Biomarker Adaptive Threshold Design
  • Randomized trial of T vs C
  • Previously identified a biomarker score B thought
    to be predictive of patients likely to benefit
    from T relative to C
  • Eligibility not restricted by biomarker
  • No threshold for biomarker determined
  • Time-to-event data

35
Procedure A
  • Compare T vs C for all patients
  • If results are significant at level .04 claim
    broad effectiveness of T
  • Otherwise proceed as follows

36
Procedure A
  • Test T vs C restricted to patients with biomarker
    B gt b
  • Let S(b) be log likelihood ratio statistic
  • Repeat for all values of b
  • Let S maxS(b)
  • Compute null distribution of S by permuting
    treatment labels
  • If the data value of S is significant at 0.01
    level, then claim effectiveness of T for a
    patient subset
  • Compute point and bootstrap interval estimates of
    the threshold b

37
Estimated Power of Broad Eligibility Design
(n386 events) vs Adaptive Design A (n412
events) 80 power for 30 hazard reduction
Model Broad Eligibility Design Biomarker Adaptive Threshold A
40 reduction in 50 of patients (22 overall reduction) .70 .78
60 reduction in 25 of patients (20 overall reduction) .65 .91
79 reduction in 10 of patients (14 overall reduction) .35 .93
38
Generalization of Biomarker Adaptive Threshold
Design
  • Have identified K candidate predictive binary
    classifiers B1 , , BK thought to be predictive
    of patients likely to benefit from T relative to
    C
  • Eligibility not restricted by candidate
    classifiers

39
  • Compare T vs C for all patients
  • If results are significant at level .04 claim
    broad effectiveness of T
  • Otherwise proceed as follows

40
  • Test T vs C restricted to patients positive for
    Bk for k1,,K
  • Let S(Bk) be log likelihood ratio statistic for
    treatment effect in patients positive for Bk
    (k1,,K)
  • Let S maxS(Bk) , k argmaxS(Bk)
  • Compute null distribution of S by permuting
    treatment labels
  • If the data value of S is significant at 0.01
    level, then claim effectiveness of T for patients
    positive for Bk

41
Adaptive Signature Design
  • Boris Freidlin and Richard Simon
  • Clinical Cancer Research 117872-8, 2005

42
Adaptive Signature Design End of Trial Analysis
  • Compare E to C for all patients at significance
    level 0.04
  • If overall H0 is rejected, then claim
    effectiveness of E for eligible patients
  • Otherwise

43
  • Otherwise
  • Using only the first half of patients accrued
    during the trial, develop a binary classifier
    that predicts the subset of patients most likely
    to benefit from the new treatment T compared to
    control C
  • Compare T to C for patients accrued in second
    stage who are predicted responsive to T based on
    classifier
  • Perform test at significance level 0.01
  • If H0 is rejected, claim effectiveness of T for
    subset defined by classifier

44
Treatment effect restricted to subset. 10 of
patients sensitive, 10 sensitivity genes, 10,000
genes, 400 patients.
Test Power
Overall .05 level test 46.7
Overall .04 level test 43.1
Sensitive subset .01 level test (performed only when overall .04 level test is negative) 42.2
Overall adaptive signature design 85.3
45
Cross-Validated Adaptive Signature Design (to be
submitted for publication)
  • Wenyu Jiang, Boris Freidlin, Richard Simon

46
Cross-Validated Adaptive Signature Design End of
Trial Analysis
  • Compare T to C for all patients at significance
    level ?overall
  • If overall H0 is rejected, then claim
    effectiveness of T for eligible patients
  • Otherwise

47
Otherwise
  • Partition the full data set into K parts
  • Form a training set by omitting one of the K
    parts. The omitted part is the test set
  • Using the training set, develop a predictive
    classifier of the subset of patients who benefit
    preferentially from the new treatment T compared
    to control C using the methods developed for the
    ASD
  • Classify the patients in the test set as
    sensitive (classifier ) or insensitive
    (classifier -)
  • Repeat this procedure K times, leaving out a
    different part each time
  • After this is completed, all patients in the full
    dataset are classified as sensitive or
    insensitive

48
  • Compare T to C for sensitive patients by
    computing a test statistic S e.g. the difference
    in response proportions or log-rank statistic
    (for survival)
  • Generate the null distribution of S by permuting
    the treatment labels and repeating the entire
    K-fold cross-validation procedure
  • Perform test at significance level 0.05 -
    ?overall
  • If H0 is rejected, claim effectiveness of T for
    subset defined by classifier
  • The sensitive subset is determined by developing
    a classifier using the full dataset

49
70 Response to T in Sensitive Patients 25
Response to T Otherwise 25 Response to C 20
Patients Sensitive
ASD CV-ASD
Overall 0.05 Test 0.486 0.503
Overall 0.04 Test 0.452 0.471
Sensitive Subset 0.01 Test 0.207 0.588
Overall Power 0.525 0.731
50
Does It Matter If the Randomization in the RCT
Was Not Stratified By the Test?
  • No
  • Stratification improves balance of stratification
    factors in overall comparisons
  • Stratification does not improve comparability of
    treatment (T) and control (C) groups within test
    positive patients or within test negative
    patients.
  • In a fully prospective trial, stratification of
    the randomization by the test is only useful for
    ensuring that all patients have adequate test
    performed

51
Biotechnology Has Forced Biostatistics to Focus
on Prediction
  • This has led to many exciting methodological
    developments
  • pgtgtn problems in which number of genes is much
    greater than the number of cases
  • Statistics has over-focused on inference. Many of
    the methods and much of the conventional wisdom
    of statistics are based on inference problems and
    not applicable to prediction problems

52
  • pgtn prediction problems are not multiple
    comparison problems
  • Feature selection should be optimized for
    accurate prediction, not for controlling the
    false discovery rate
  • Standard statistical methods for model building
    and evaluation are not effective
  • e.g. Fishers LDA vs diagonal LDA
  • Model performance on the training set is
    extremely misleading for pgtn problems and should
    never be reported

53
  • Goodness of fit is not a proper measure of
    predictive accuracy
  • Statistical significance of regression
    coefficients or of the model is not a proper
    measure of predictive accuracy

54
  • Validation of a predictive model means that the
    model predicts accurately for independent data
  • Validation does not mean that the model is stable
    or that using the same algorithm on independent
    data will give a similar model
  • Validation of model prediction does not indicate
    that the model has medical utility for any
    intended use

55
Prediction Based Clinical Trials
  • Using cross-validation we can evaluate new
    methods for analysis of clinical trials in terms
    of their intended use which is informing
    therapeutic decision making

56
Conclusions
  • New biotechnology and knowledge of tumor biology
    provide important opportunities to improve
    therapeutic decision making
  • Treatment of broad populations with regimens that
    do not benefit most patients is increasingly no
    longer necessary nor economically sustainable
  • The established molecular heterogeneity of human
    diseases requires the use new approaches to the
    development and evaluation of therapeutics

57
  • While developing and applying these new
    approaches, statisticians should continue to
  • Make sure that they are solving the right problem
  • Focus on the big picture
  • Prepare themselves to be full partners with their
    collaborators

58
Acknowledgements
  • Boris Freidlin
  • Yingdong Zhao
  • Aboubakar Maitournam
  • Wenyu Jiang
About PowerShow.com