Title: Clinical Validation of Prognostic Biomarkers of Risk and Predictive Biomarkers of Drug Efficacy or Safety
1Clinical Validation of Prognostic Biomarkers of
Risk and Predictive Biomarkers of Drug Efficacy
or Safety
- Gene Pennello, Ph.D.
- Team Leader, Diagnostics Devices Branch
- Division of Biostatistics
- Office of Surveillance and Biometrics
- Center for Devices and Radiological Health, FDA
SAMSI Risk Perception Policy Practice
Workshop October 3, 2007
2Outline
- FDA and Device Regulation
- Types of Biomarkers
- Validation of Diagnostics
- Predictive and Prognostic Biomarkers
- Definitions, Endpoints
- Study Designs for Predictive Biomarkers
- Prospective Designs efficiency comparison
- Prospective-Retrospective Designs
- Summary
3FDA
CDERDrugs
CDRH,Devices
CBER,Biologics
CVM,Veterinary
CFSAN,Food
NCTR
4What are Medical Devices?
An item for treating or diagnosing a health
condition whose intended use is not achieved
primarily by chemical or biological action within
the body (Section 201(h) of the Federal Food Drug
Cosmetic (FDC) Act). Definition by
exclusion Simply put, a medical device is any
medical item for use in humans that is not a drug
nor a biological product.
5Example of Medical Devices
- Relatively Simple Devices tongue depressors
thermometers latex gloves simple surgical
instruments - Ophthalmic devices intraocular lenses PRK
lasers, - Radiological devices MRI machines CT
scannersdigital mammographycomputer aided
detection
Cardiovascular Devices pacemakers
defibrillators heart valves coronary stents
artificial hearts Monitoring Devices
glucometers bone densitometers Diagnostic
Devices diagnostic test kits for
HIVprostate-specific antigen (PSA) testhuman
papillomavirus (HPV) test
6Example of Medical Devices
Dental, Ear, Nose, andThroat Devices hearing
aidsbronchoscopy system General, Surgical, and
Restorative Devices breast implants artificial
hips spinal fixation devices artificial skin
Emerging technologies multiplex genetic tests
(e.g., for multiple mutations or
microbes) Genomic and proteomic Dx
tests Nanotechnological devices Microspheres for
molecular treatment of cancer Robotics Theranostic
s (predictive biomarkers of response or adverse
reaction to therapy). Artificial pancreas
7Example of Medical Devices
Due to the wide variety in technology,
complexity, and intended use, medical devices
can present novel statistical design and analysis
challenges.
8Device Regulation
- Decision to approve a PMA application must rely
upon valid scientific evidence to determine
whether there is reasonable assurance that the
device is safe and effective. - Valid scientific evidence is evidence from well
controlled studies, partially controlled studies
and objective trials without matched controls,
well documented case histories conducted by
qualified experts that there is a reasonable
assurance of safety and effectiveness . . . - U.S. Code of Federal Regulations, Title 21 (Food
and Drugs), U.S. Government Printing Office,
Washington DC, 2001, Part 860.7 Web address
http//www.access.gpo.gov/nara/cfr/waisidx_01/21cf
r860_01.html (Accessed February, 2002)
9Device Regulation
- Least Burdensome Provisions of FDA Modernization
Act (1997) - Secretary shall only request information that is
necessary to making substantial equivalence
determinations. - Secretary shall consider, , the least
burdensome appropriate means of evaluating device
effectiveness that would have a reasonable
likelihood of resulting in approval. - U.S. Code of Federal Regulations, Title 21 (Food
and Drugs), U.S. Government Printing Office,
Washington DC, 2001, Part 513(i)(1)(D) and
513(a)(3)(D)(ii). Web address http//www.access.g
po.gov/nara/cfr/waisidx_01/21cfr860_01.html
10FDA Least Burdensome Guidance
- FDA Guidance The Least Burdensome Provisions of
the FDA Modernization Act of 1997 Concept and
Principles (2002) - Modern statistical methods may also play an
important role in achieving a least burdensome
path to market. For example, through the use of
Baysian sic analyses, studies can be combined
in order to help reduce the sample size needed
for the experimental and/or control device.
11Examples of Less Burdensome
- Non-U.S. data
- Surrogate endpoints (e.g., acute follow-up)
- Interim analysis, Adaptive design
- Bayesian methods (e.g., to reduce sample size)
- Propensity Scores for historical controls
- Sensitivity analysis for missing data.Note,
could trade clinical for statistical burden - FDA Draft Guidance for the Use of Bayesian
Statistics in Medical Device (released May 23,
2006) www.fda.gov/cdrh/osb/guidance/1601.html
12Least Burdensome Provision
- Least burdensome provision in FDAMA of 1997 is
directed to both medical devices and diagnostics
(including biomarkers).
13Device Risk Classification
- Class I Devices for which general controls
provide reasonable assurance of the safety and
effectiveness. - Class II General controls insufficient, Can
establish special controls (performance
standards CLIA, ISO, FDA guidance. May require
clinical data on a 510(k). - Class III General and special controls
insufficient. Life-sustaining/supporting,
substantial importance in preventing impairment
of human health, potential unreasonable risk of
illness or injury. Needs pre-market approval
(PMA).
14Post-Market Transformation
- Make postmarket data more widely available to
Center staff and supplement search and reporting
tools - "Investigate the use of data and text mining
techniques to identify the "needles in the
haystack" by identifying patterns in the incoming
data that equate to public health signals. - Example is WebVDME Bayesian data-mining
- Design a pilot project to test the usefulness of
quantitative decision-making methods for medical
device regulation across the total product life
cycle
http//www.fda.gov/cdrh/postmarket/mdpi-report-110
6.html
15Types of Biomarkers
- Diagnostic
- Early detection (screening), enabling
intervention at an earlier and potentially more
curable stage than under usual clinical
diagnostic conditions - Monitoring of disease response during therapy,
with potential for adjusting level of
intervention (e.g. dose) on a dynamic and
personal basis - Risk assessment leading to preventive
interventions for those at sufficient risk - Prognosis, allowing for more aggressive therapy
for patients with poorer prognosis - Prediction of safety or efficacy (response) of a
therapy, thereby providing guidance in choice of
therapy
16Types of Biomarkers
- Diagnostic
- Early Detection (screening)
- Monitoring
- Risk Assessment
- Prognostic
- Predictive of Safety or Efficacy
- The first three are considered together, where
the focus is on identifying the disease or
condition.
17Types of Biomarkers
- Diagnostic
- Early Detection (screening)
- Monitoring
- Risk Assessment
- Prognostic
- Predictive of Safety or EfficacyThe last three
are attempting to predict the future.
18Analytical Validation
- How well are you measuring the measurand?
- Precision / Reproducibility
- Method Comparison
- LoB, LoD, LoQ
- Linearity
- Stability
- Clinical Laboratory Standards Institute (CLSI)
- (http//www.nccls.org/)
19Clinical Validation (Qualification)
- Does the test have clinical utility?
- Does it have added value over standard tests
(e.g, clinical covariates like age, tumor size,
stage)? - May or may not require a clinical study
- EX. Roche Amplichip
CDRH guidance document Statistical Guidance on
Reporting Results from Studies Evaluating
Diagnostic Tests issued in final form in March,
2007, concerns reporting agreement when there is
no perfect standard and also discrepancy
resolution. http//www.fda.gov/cdrh/osb/guidance/
1620.html
20Roche AmpliChip CYP450 Test (CDRH de novo 510(k)
K042259)
- Genotypes two cytochrome P450 genes (29
polymorphisms in CYP2D6 gene, 2 in CYP2C19) to
provide the predictive phenotype of the metabolic
rate for a class of therapeutics metabolized
primarily by CYP2D6 or CYP2C19 gene products.
The phenotypes are - (1) Poor metabolizers (3) Extensive
metabolizers - (2) Intermediate metabolizers (4) Ultrarapid
metabolizers - Cytochrome P450s are a large multi-gene family of
enzymes found in the liver, and are linked to the
metabolism of approximately 70-80 of all drugs.
Among them, the polymorphic CYP2D6 and CYP2C19
genes are responsible for approximately 25 of
all CYP450-mediated drug metabolism. A
polymorphism in these enzymes can lead to an
excessive or prolonged therapeutic effect or
drug-related toxicity after a typical dose by
failing to clear a drug from the blood or by
changing the pattern of metabolism to produce
toxic metabolites.
http//www.accessdata.fda.gov/scripts/cdrh/cfdocs/
cfPMN/pmn.cfm
21Adding Value to Standard Clinical Predictors
- Head to Head Marker superior to clinical
predictors at predicting outcome. - Incremental Improvement Combination superior to
clinical predictors alone. - Marker Predictive within Clinical Strata e.g.,
HR(, ) significant within age, tumor grade,
tumor size groups.
22Multivariate Index Assays
- An IVDMIA is a device that
- Combines the values of multiple variables using
an interpretation function to yield a single,
patient-specific result (e.g., a
classification, score, index, etc.), that
is intended for use in the diagnosis of disease
or other conditions, or in the cure, mitigation,
treatment or prevention of disease, and - Provides a result whose derivation is
non-transparent and cannot be independently
derived or verified by the end user. MIA result
could be a binary (dichotomous) (such as yes or
no), categorical (such as disease type), ordinal
(such as low, medium, high) or a continuous
scale. - Source FDA MIA Draft Guidance
- http//www.fda.gov/cdrh/oivd/guidance/1610.html
23Typical Endpoints for Prognostic or Predictive
Biomarkers
- Time to Event
- Event by Time t
Treatment Median Survival Time
A 6 months
B 12 months
Hazard Ratio 0.5
Treatment R Not R Response Rate
A 30 30 0.50 (30/60)
B 10 50 0.13 (10/60)
24Relative Risk vs. Diagnostic Accuracy
Event by Time t
RelativeRisk 3.0 (30/60)/(10/60)
Se 0.75 (30/40)
Sp 0.63 (50/80)
PPV 0.50 (30/60)
NPV 0.83 (50/60)
E No E
30 30 60
10 50 60
40 80 120
Marker
- Relative Risk looks good, but Dx accuracy not
great ? limited clinical utility?
Example taken from Emir, Wieand, Su, Cha,
Analysis of repeated markers used to predict
progression of cancer Statist. Med., 17, 2563-78,
1998.
25Hazard Ratio vs. Diagnostic Accuracy
- NCCTG Mayo Clinic Study. CA15-3 ratio as
diagnostic for progression of breast cancer (as
determined by physical exam).
Hazard Ratio 2.3 (p 0.0002)
Se 0.30 (0.17,0.43)
Sp 0.82 (0.74,0.89)
PPV 0.27 (0.21,0.33)
Example taken from Emir, Wieand, Su, Cha,
Analysis of repeated markers used to predict
progression of cancer Statist. Med., 17, 2563-78,
1998.
26Diagnostic Performance
Sensitivity Specificity (TP rate) (TN
rate) FP rate fraction of fraction
of fraction of responders non-responders
non-responders who test who test who test
Test is useful if TP rate gt FP rate, i.e.,
sensitivity specificity gt 1. EX. Useless
test sensitivity 0.80, specificity 0.20
27Diagnostic Performance
Positive Negative predictive predictive
value (PPV) value (NPV) 1 NPVfraction
of fraction of fraction of test s
who test s who test s whorespond dont
respond respondTest is useful if PPV NPV gt
1 EX. Useless test PPV 0.60, NPV 0.40
28d
A ROC curve is a plot of sensitivity (true
positive rate) vs. 1-specificity (false positive
rate) over all possible cutoff points for the
test. The test is informative if the area under
the curve is greater than 0.5.
29Prognostic Biomarker (Strong Defn)
- Prognostic factor. Informs about an outcome
independent of specific treatment (ability of
tumor to proliferate, invade, and/or spread). - Prognostic biomarker is associated with
likelihood of an outcome (e.g., survival,
response, recurrence) such that magnitude of
association is independent of treatment. - On some scale, treatment and biomarker effects
are additive, that is, do not interact.
30HR(A,B)0.67
HR(A,B)0.67
31(No Transcript)
32Prognostic Biomarker (Weak Defn)
- Prognostic factor. Informs about an outcome
independent of specific treatment (ability of
tumor to proliferate, invade, and/or spread). - Prognostic biomarker is associated with
likelihood of an outcome (e.g., survival,
response, recurrence) in a population that is
untreated or on a standard (non-targeted)
treatment. - If population is clearly defined, than can use
to choose more or less aggressive therapy, but
not specific therapies, per se.
33HR(A,B)0.67
HR(A,B)0.67
34Prognostic Biomarker
- Her2-neu for node-negative women with breast
cancer prognostic for recurrence - Breast cancer prognostic test based on microarray
gene expression of RNAs extracted from breast
tumor tissue to assess a patients risk for
distant metastasis for women less than 61 with
Stage I or II disease with tumor size less than
or equal 5.0 cm and who are lymph node negative. - (Ref. Buyse et al. JNCI 98, 1183-1192)
35Agendia Mammaprint Gene Signature for Time to
Distant Metastasis (N302)
5-year Low risk group 0.95 (0.91-0.99) High
risk group 0.78 (0.72-0.84) 10-year Low risk
group 0.90 (0.85-0.96) High risk group0.71
(0.65-0.78) Buyse et al JNCI (2006),
98,1183-1192
36Proportion alive at 10 years
- Clinical Gene N Proportion
- Signature
- Low Risk Low Risk 52 0.88 (0.74 to 0.95) Sp
- Low Risk High Risk 28 0.69 (0.45 to 0.84) 1Se
- High Risk Low Risk 59 0.89 (0.77 to 0.95) Sp
- High Risk High Risk 163 0.69 (0.61 to 0.76) 1Se
- Buyse et al JNCI 2006
37Predictive Biomarker
- Predictive factor. Implies relative sensitivity
or resistance to specific treatments or agents. - Predictive biomarker predicts differential effect
of treatment on outcome. - Treatment and biomarker interact.Predictive
biomarker can be useful for selecting specific
therapy.
38HR(A,B)0.5
HR(A,B)1.0
39Predictive Biomarker of Efficacy
- Marker HER2/neuTreatment Trastuzumab
(Herceptin) - Objective response rate
- HerceptinChemo ChemoFISH 95/176 (54)
51/168 (30)FISH- 19/50 (38) 22/57
(39) - Arch. Pathol. Lab Med Jan 2007 (ASCO/CAP
Guidelines)
40Predictive Biomarkers for Safety
- Predict risk of an adverse event dependent on the
biomarker - Example
- UGT1A1, cleared by FDA, to predict the risk of
neutropenia in patients taking irinotecan for
colorectal cancer
41Prospective Study Designs for Predictive Markers
- Untargeted Design (Reference)
- Validate Treatment, Marker Simultaneously
- Marker by Treatment Design
- Targeted Design (Marker Subset Only)
- Marker Strategy Design
- Historical Control
42Untargeted Design (Reference)
- Test if drug works in entire population.
- Mixture of marker and drug effects.
- Can store samples if test is not ready.
43Marker by Treatment (Interaction) Design
- A Randomized Block Design
- Can test for biomarker by treatment interaction
(predictive biomarker) - Test needs to be available before trial ensues.
44Marker by Treatment Design Questions
- Test Drug Overall and within Marker Subset
- 0.04, 0.01 tests suggested to control Type I
error rate at 0.05 (Simon), but subset could
drive overall result. - Frequentist multiplicity penalty may preclude
subset testing as good business strategy. - Statement about drug, not biomarker
- Test Marker Overall and within Drug Subset
- Statement about marker, not drug.
- Test for Treatment by Marker Interaction
- Simultaneously validates drug and marker.
45Targeted Design
- Test if drug works in subset.
- Cannot test if marker discriminates. Only PPV
available.
46Efficiency of Designs
Relative Efficiency Relative Efficiency
Marker Prevalence Relative Efficacy Targeted Design Interaction Design
25 0 16x 8x
50 0 4x 2x
75 0 1.8x 0.9x
- Efficiency gain depends on marker prevalence,
relative efficacy, and difference tested.
Marker to Marker Patients Simon
Maitournam, CCR 2004 Marker by Treatment
Design Test for Interaction approx. efficiency
enriching with half s, half s.
47Efficiency of Designs
Relative Efficiency Relative Efficiency
Marker Prevalence Relative Efficacy Targeted Design Interaction Design
25 25 5.2x 1.5x
50 25 2.6x 0.7x
75 25 1.5x 0.4x
- Efficiency gain depends on marker prevalence,
relative efficacy, and difference tested.
Marker to Marker Patients Simon
Maitournam, CCR 2004 Marker by Treatment
Design Test for Interaction approx. efficiency
enriching with half s, half s.
48Efficiency of Designs
Relative Efficiency Relative Efficiency
Marker Prevalence Relative Efficacy Targeted Design Interaction Design
25 50 2.5x 0.3x
50 50 1.8x 0.2x
75 50 1.3x 0.1x
- Efficiency gain depends on marker prevalence,
relative efficacy, and difference tested.
Marker to Marker Patients Simon
Maitournam, CCR 2004 Marker by Treatment
Design Test for Interaction approx. efficiency
when enriching with half s, half s.
49Improving Efficiency of Interaction Design
- Enrich with Test Positives if Pr() is low
- Find scale such that marker and treatment effects
are additive - Adaptive Randomization
- Bayesian subset analysis
- If reader variability (e.g., IHC), then use
multiple readers. - Prior Information
50Possibilities for Increasing Efficiency of
Interaction Design
- Enrich with Test Positives if Pr() is low
- Estimates of Sensitivity and Specificity are
biased because they depend on Pr(). - Use inverse probability weighting (Horvitz,
Thompson, 1952) or Bayes Theorem (Begg, Greenes,
1983) to obtain unbiased estimates.
51A Marker-Based Strategy
- Pro More ethical, perhaps. More patients given
experimental drug. Test utility based on PPVE,
NPVE. - Con Cannot assess test-treatment interaction.
52Marker-Based Strategy
Response
E Naïve E Unbd
Se a / (ac) a / (a2c)
Sp d / (db) 2d / (2db)
PPV a / (ab) same
NPV d / (cd) same
R Not R
E a b
P 0 0
Test
R Not R
E c d
P e f
Test
53A Marker-Based Strategy
Response
E Naïve E Unbd
Se 20/43(0.47) 20/66 (0.30)
Sp 157/177(0.89) 314/334(0.94)
PPV 20/40(0.50) Same
NPV 157/180(0.88) Same
R Not R
E 20 20 40
P 0 0 0
20 20 40
Test
R Not R
E 23 157 180
P 24 156 180
46 314 360
Test
54Possibilities for Increasing Efficiency of
Interaction Design
- Transformation
- Find a transformation (Box-Cox?) of outcome that
makes treatment and effects additive. - Can then pool marker effect estimates within
treatments A and B. - Can also pool drug effect estimates within marker
and marker s.
55Possibilities for Increasing Efficiency of
Interaction Design
- Adaptive Randomization
- Adapt randomization ratio to treatment A and B
within biomarker subsets to maximize (a) power,
or (b) fraction of patients on better treatment - If response rate lt 0.5 for both treatments, then
(a) and (b) are compatible, otherwise in tension. - Pr() disturbed, so need to adjust Se, Sp
56Possibilities for Increasing Efficiency of
Interaction Design
- Bayesian subset analysis (cf. Dixon, Simon)
- Subsets modeled as exchangeable via random
effects. - Subset estimate borrows strength from complement
subset, increasing precision of estimate. - However, interaction estimate more conservative
relative to usual non-Bayesian analysis.
57Bayesian Subset Analysis
- Power is enhanced to show drug works in marker
subset (blue). - Power is enhanced to show marker works
(discriminates) in patients taking drug (red)
58Possibilities for Increasing Efficiency of
Interaction Design
- Use Multiple Readers
- EGFR IHC test (Dako) and Cetuximab and
Panitumumab (Amgen) for Colorectal Cancer. of
cells stained and maximum staining intensity
subject to reader variability - Use multiple readers, account for random reader
effects. - Multiple Reader, Multiple Case Designs (MRMC) are
used for digital mammography systems and computed
aided detection (CAD) systems - Analysis can be difficult.
59Possibilities for Increasing Efficiency of
Interaction Design
- Prior Information (Bayesian analysis)
- Borrow strength from previous study regarded as
exchangeable with current study.
60Marker Based Strategy Design
Marker Level (-)
Treatment A
Marker Based Strategy
Marker Level ()
Treatment B
Register
Randomize
Test Marker
Non Marker Based Strategy
Treatment A
Sargent et al., JCO 2005
61Marker Based Strategy Design
Marker Level (-)
Treatment A
Marker Based Strategy
Marker Level ()
Treatment B
Register
Randomize
Test Marker
Treatment A
Non Marker Based Strategy
Randomize
Treatment B
Sargent et al., JCO 2005
62Marker Based Strategy Design
- Lacks power Differential effect comparison
diluted because some patients in non-marker-based
strategy arm get marker-based treatment (could
eliminate these to increase power). - Might be best suited if have gt 2 treatments or gt
2 markers - EX. Irinotecan regiment (dose, timing, frequency)
determined by UGT1A1 genotype (6/6, 6/7, or 7/7)
in colorectal cancer patients.
63Marker Based Strategy
- If no gold standard, then can be only way to
assess effectiveness of a test. - EX. Detection tumor of origin in cancers of
unknown primary. - No gold standard IHC, imaging, may fail to
identify TOO. - Randomize patients to be managed with
- new test standard, or
- with standard alone
- Compare arms on survival
64Targeted Design w. Historical Control
- Drug already on market, but has poor response
rate. - If response rate in marker study is
significantly greater than historical rate, then
marker discriminates. - Limitations
- Lacks power because effect diluted.
- Need to calibrate historical rate to marker
study (adjust for covariates).
65Prospective-Retrospective Designs
- Prospectively apply marker to stored samples (in
retrospect). - Can test overall, w. subset, or for interaction.
- Missing samples could introduce bias.
- RCT samples. Randomization ensures case and
control samples have similar characteristics. - Case-control samples. Avoid selection bias by
matching on sample processing date, processing
sites, etc., and not excluding censored times. - Reserve samples only for analytically validated
markers that are biologically plausible.
66The Challenge of Multiplicity
- Multiplicity of classifiers
- Microarrays and proteomics
- Many predictive models could be built with so
many inputs. - The challenge is to confirm any such model with
an independent data set. - A caveat the independent test data set cannot
be continually reused. Great discipline is
required in this regard.
67Cross-Validation Pitfall
Simon, Radmacher, Dobbin, McShane (2003),
Pitfalls in the Use of DNA Microarray Data for
Diagnostic and Prognostic Classification, JNCI,
95 (1)
68Summary Remarks
- How to assess a test or biomarker is well-known,
but not as well-known in therapeutic circles. - Need to assess whether the biomarker adds
anything to what we already know. - The number of possibly good biomarker candidates
is enormous but great care is needed in
restricting the search.
69Summary Remarks
- Need to encourage least burdensome approaches to
validating biomarkers without compromising level
of evidence - Essential to confirm marker in independent
dataset - Studies to demonstrate informativeness of a
biomarker can be quite difficult to design,
conduct and analyze.
70Acknowledgements
- CDRH Division of Biostatistics (DBS)
- Greg Campbell, Division Director
- Diagnostic Devices Branch (DDB)
- Lakshmi Vishnuvajjala, Branch Chief
- Estelle Russek-Cohen, Team Leader
- Gene Pennello, Team Leader
Bipasa Biswas Kyungsook Kim, Harry Bushar Samir
Lababidi Arkendra De Kristen Meier Shanti
Gomatam Kyunghee Song Thomas Gwise Rong Tang
71More References
- Sargent et al (2005). Clinical trial designs for
predictive marker validation in cancer treatment
trials. J Clin Oncol 232020-2027. - Pennello Vishnuvajjala (2005). Statistical
design and analysis issues with pharmacogenomic
drug-diagnostic co-development, In American Stat.
Assoc. 2005 Proc. of the Biopharm. Section, Joint
Statistical Meetings, Minneapolis, MN, August,
2005 American Stat. Assoc. Alexandria, VA. - FDA Drug-Diagnostic Co-Development Concept Paper.
April 2005.http//www.fda.gov/cder/genomics/pharm
acoconceptfn.pdf
72(No Transcript)