Statistical aspects of clinical research - PowerPoint PPT Presentation

1 / 64
About This Presentation

Statistical aspects of clinical research


Randomization Practical Tips ... safety data in a timely fashion, ... endpoints which can be measured in a completely objective fashion are preferred ... – PowerPoint PPT presentation

Number of Views:182
Avg rating:3.0/5.0
Slides: 65
Provided by: davidgi1


Transcript and Presenter's Notes

Title: Statistical aspects of clinical research

Statistical aspects of clinical research
  • David Giltinan
  • May 2006

  • Why is clinical research hard?
  • Key statistical concerns
  • Get the correct answer to the right question,
    using the appropriate number of subjects
  • Key components of a clinical trial
  • Clear, feasible, appropriate study objective(s)
  • Target patient population
  • Study design visit and evaluation schedule
  • Efficacy and safety endpoints
  • Sample size
  • Analysis methods
  • Next week
  • Interim analyses early termination?
  • Subgroup analyses

Clinical research is not for sissies
  • Answering even relatively simple questions under
    the best conditions a controlled clinical
    trial can be tricky. Possible sources of bias
    abound, and if appropriate safeguards are not
    taken, may combine to give a false or misleading
  • Some of the factors which make clinical research
  • Formulating the right scientific question can
    be deceptively tricky
  • Logistical complexity, especially the need to use
    multiple sites
  • Trial conduct is highly interdisciplinary,
    requiring sustained, well-coordinated effort from
    many groups
  • Staggered recruitment of subjects, uncertainty
    about accrual pattern is unavoidable
  • Patient dropout, particularly in longer trials
  • Potential for the goalpost to move mid-trial
    unforeseen events can destroy, or severely
    reduce, the relevance of the study even before it

Laws governing clinical trial conduct ¹
  • Lasagnas Law
  • The prevalence of any disease under study drops
    dramatically once study enrolment opens up, and
    returns to previous levels only once enrolment
  • Murphys Law
  • Anything that can go wrong, will go wrong
  • In particular, the most egregious breach of
    protocol instructions will occur at the
    highest-enrolling site
  • Giltinans Law
  • The quality of data obtained from any site is
    inversely proportional to the degree of
    exaltation of the thought leader or principal
    investigator at the site (in extreme cases, the
    role of thought leader is so all-consuming that
    delays in filing the necessary paperwork result
    in actual enrolment levels close to zero)
  • ¹ clearly, all just different manifestations of
    Murphys Law

Strategy to tactics protocol development
  • A key concern is that each individual study
    protocol must achieve its goals, not just on its
    own terms it must also make sense within the
    broader picture
  • A major practical issue is the ever-changing
    nature of the landscape the long duration of
    most trials, and the uncertainty about the
    results means that the original target may have
    shifted by completion of a given trial
  • Nonetheless, a key requirement when designing any
    trial is that the proposed design should give the
    best chance possible of enabling the development
    plan to proceed to the next stage, once results
    from the trial become available
  • The previous condition should be met, even when
    results do not correspond to the desired answer
    it is important to remember that a failed
    clinical trial is not one which fails to give the
    desired answer, but rather one which fails to
    give an unambiguous answer

Study objectives should be clear, specific, and
  • Phase III objectives determined primarily by (i)
    target product profile (think desired label
    claim) (ii) norms for the given disease
  • Primary and secondary objectives should map
    readily to corresponding statistical hypotheses
  • Safety objectives are given greater emphasis in
    Phases I and II Phase III focuses on efficacy
    and safety
  • Objectives should be specified as precisely as
    possible. At a minimum, include information on
  • What measure of efficacy/safety will be used?
  • Key features of the target patient population
  • Dosing regimen, i.e. amount, frequency, and route
    of dosing
  • Preferable to use neutral language when
    specifying objectives (personal opinion). Phrases
    like to compare (investigate) the efficacy or
    to characterize the pharmacokinetics are
    preferable to, e.g., to demonstrate efficacy or
    to establish superiority

Protocol Tip 1 Specify clear study objectives
  • Examples
  • To investigate the effect of a single 5mg dose
    of rhwonderprotein, administered by transgenic
    snakebite, on clotting ability in Irish
    clergymen, as measured by the change from
    baseline in prothrombin time, rather than To
    demonstrate the efficacy of rhwonderprotein in
    improving clotting ability
  • To investigate the effect of twice daily SC
    injection of 40µg/kg of rhIGF-I for 12 weeks on
    glycemic control, in subjects with moderate to
    severe Type II diabetes, as measured by the
    average change from baseline in HbA1c, compared
    to subjects in the placebo group

Bias sources and precautions
  • Selection bias
  • Allocation bias
  • Evaluation bias (observer/instrument)
  • Recall bias
  • Time (systematic change in patient population,
    treatment, or other aspect of study conduct as
    trial progresses)
  • Withdrawal / drop out patterns
  • Lack of compliance with study protocol
  • Unblinding (of patient, physician, or study
  • Unambiguous eligibility criteria
  • Randomization, stratification, blinding
  • Blinding, standardization
    (training, or central evaluation)
  • Appropriate data collection instruments
  • Balanced treatment allocation, protocol should
    specify salient details of study conduct,
    avoiding room for differential interpretations
  • Pre-specified analysis conventions, sensitivity
  • Training engaged study coordinators at site
  • Randomized allocation suitable precautions
    surrounding treatment codes and drug

Bias the statisticians arch-nemesis
  • Loosely speaking bias arises as a result of
  • Groups differ at baseline w.r.t. an important
    prognostic factor
  • Groups differ w.r.t. some aspect of study conduct
    that could affect response
  • Key statistical tools against bias are
  • Randomization (allocation of subjects to
    treatment groups is randomized)
  • Blinding
  • Stratification
  • Uniform implementation of study procedures across
    study sites is also critical. Differences may
    complicate interpretation, or compromise
    generalizability of results. Of particular
  • Different interpretation of eligibility criteria
  • Systematic differences across sites in how key
    variables are measured

Bias, efficiency, and generalizability
  • Trial design and execution should
  • Avoid bias - wrong, or misleading, result
  • Generalize to the target population of interest -
    avoid an irrelevant result
  • Be efficient - avoid using more subjects than
  • Studies which are inadequately powered, or
    otherwise deficiently designed, may be viewed as
    particularly inefficient (and ethically dubious)

  • Randomization is the basis for statistical
  • A significance level represents the probability
    that differences in outcome can be the result of
    random fluctuations.
  • Without randomization a statistically
    significant difference may be the result of non
    random differences in the distribution of unknown
    prognostic factors
  • Randomization does not ensure that groups are
    medically equivalent, but it distributes randomly
    the unknown biasing factors
  • Randomization plays an important role for the
    generalization of the observed clinical trials

Randomization Practical Tips
  • If prognostic factors are known use randomization
    methods that can account for it
  • Stratification / blocking
  • Adaptive randomization
  • If possible randomize patients within a site
  • Patients enrolled early may differ from patients
    enrolled later
  • Watch out for staggered enrollment
  • Temporary closing of study sites or arms can
    cause problems
  • Protocol amendments that affect
    inclusion/exclusion criteria may be tricky
  • Even in open label studies randomization codes
    should be locked

  • Randomization does not guarantee that there will
    be no bias by subjective judgment in evaluating
    and reporting the treatment effect
  • Such bias can be minimized by blocking the
    identity of treatment (blinding)
  • Types of blinding
  • Challenges
  • Ethical considerations
  • Unblinding procedures for safety reasons
  • Unblinding procedures at final analysis

Protocol Tip 2 Avoid Ambiguity
  • Protection against certain types of bias is
    through appropriate design precautions
    (stratification, randomization, blinding)
  • Other types of bias are prevented only by giving
    unambiguous instructions to the sites on the
    intended patient population and how all aspects
    of the study should be conducted
  • Sites will sniff out each ambiguity in the
    protocol, and interpret and execute the
    instructions more divergently than you can
  • There is vagueness regarding key aspects of study
    conduct, e.g. use of con meds, evaluation
    schedule, endpoint definition, handling of
    dropouts, how key evaluations will be carried
    out, etc. etc. etc.
  • Major divergence in interpretation (e.g. in
    deciding eligibility, or how to measure a key
    response variable)
  • has the potential to torpedo the protocol
  • may not become evident until its too late

Protocol Tip 3 Accommodating multiple sites
  • As a routine precaution, it is advisable to limit
    the contribution to enrolment of any single site
    to no more than 15 of the total. Note that this
    limit is generally not specified explicitly in
    protocol text, but is communicated to sites at
    study initiation nonetheless
  • Non-standard evaluations may require intensive
    training of site personnel to reduce systematic
    differences in evaluation among sites
  • Centralized (blinded) evaluation, when feasible,
    is often the best option
  • It is a good idea to develop a prospective
    publication strategy, securing upfront buy-in
    from key stakeholders
  • A plan and timetable for disseminating study
    results should be developed, following existing
    SOPs, and communicated to sites prospectively

Protocol Tip 3 Accommodating multiple sites
  • Regular, frequent communication with sites is
  • Early monitoring of key variables is advisable,
    to allow problems to be detected and fixed early
  • Appropriate mechanisms should be in place to
    allow evaluation of aggregated safety data in a
    timely fashion, (remember that individual sites
    may not be able to discern adverse patterns,
    based only on their data)
  • Each team member should try to attain at least a
    basic understanding of the role of every other
    team member

Endpoints (1)
  • Discussion here will focus primarily on efficacy
  • What about other kinds of endpoints?
  • Pharmacokinetic endpoints are generally standard
    parameters derived from the observed
    concentration-time profiles
  • Safety endpoints also tend to be fairly standard
    most are common across protocols, with occasional
    disease/drug-specific markers
  • Incidence of adverse events (general,
    protocol-specified, by body system, etc.)
  • Changes in key laboratory parameters
  • Incidence of antibodies (neutralizing or not)
  • Pharmacodynamic endpoints, in contrast, are
    measures of activity, and will vary from study to
    study. Recommendations for efficacy endpoints

Endpoints (2) General Remarks
  • No problem in Phase I, where focus is primarily
    on safety and PK endpoints. Limited sample sizes
    preclude formal evaluation of efficacy if it
    must be mentioned in the protocol, it is
    preferable to refer to activity, rather than
  • Drug approval requires establishing an acceptable
    risk-benefit profile. It is important to bear in
    mind that the regulatory expectation is that of
    clinical benefit to the patient
  • Thus, in general, the primary efficacy endpoint
    should be a measure of clinical effect (as
    opposed to, e.g. a biochemical or physiological
  • Taking the primary efficacy endpoint in a pivotal
    trial to be a biomarker which is not a direct
    measure of clinical benefit is something which
    should be done only with prior buy-in from all
    relevant regulatory agencies
  • In general, such buy-in can be attained only in
    the case of an established surrogate endpoint
    more on this below

Endpoints (3) relevance should be accepted
  • Ideally, there is a well-established primary
    efficacy endpoint, accepted as a suitable
    measure of patient benefit.
  • This can circumvent much tedious discussion, and
    has the added advantage that consensus on what
    constitutes a meaningful treatment effect is
    likely already to exist.
  • When such consensus exists, to ignore it would be
  • Often there may be consensus on the choice of
    primary efficacy variable, but secondary aspects,
    such as definition of relapse or rebound may
    still be under debate
  • For diseases with no consensus on how best to
    measure efficacy, expect longer development times
  • It is not recommended to launch Phase I without a
    reasonably clear vision of what the primary
    efficacy variable will be in pivotal studies
    postponing difficult discussions wont
    necessarily make them any easier
  • Agreement on conventions for handling
    dropouts/missing data is also important

Endpoints (4) Objective is better
  • Generally speaking, endpoints which can be
    measured in a completely objective fashion are
  • This may not always be possible some degree of
    subjectivity may be unavoidable (e.g. in
    endpoints such as physicians or patients
    evaluation of improvement)
  • The degree to which this kind of subjectivity may
    be acceptable is likely to depend on perceptions
    about the integrity of blinding in the study
  • In evaluating quality of life, use of a
    validated instrument is preferable. In many
    cases, a disease-specific QOL questionnaire
  • Consultation with the Health Economics group is
    highly recommended, to ensure that collection of
    QOL data supports the target product profile
    (dont wait until Phase III to do this)

Endpoints (5) measurement aspects
  • In general, key efficacy endpoints should be
    straightforward to measure. Avoid measures which
    might still be considered experimental, which
    require highly complex instrumentation, or
    involve extremely specialized assays.
    Measurements which rely heavily on technician
    skill or judgement can also be problematic
  • Centralized evaluation of key endpoints may help
    guard against inter-site variation
  • If key variables do involve specialized assays,
    make sure that assay procedures are thoroughly
    understood, and consistently implemented

Endpoints (6) Multiple Endpoints
  • Multiple secondary endpoints are common
  • Multiple primary endpoints are sometimes used
  • If consensus on a single 1? endpoint is
  • Should be a course of last resort (personal view)
  • Have an associated penalty, in terms of a higher
    bar to declare statistical significance at a
    given level ?
  • A common approach is to require significance at
    level ? k, where k is the number of
    co-primary endpoints (Bonferroni)
  • Bonferroni works reasonably, provided k is not
    too large, and if the constituent endpoints are
  • For highly correlated endpoints, Bonferroni is
    inefficient true attained significance will be lt
  • Especially problematic if there is interest in
    multiple subsets
  • Try to show some discipline regarding of 2?

Endpoints (7) a statistical taxonomy
  • Continuous - e.g. reduction in cholesterol,
    HbA1c, visual acuity
  • Categorical
  • Multiple categories with no natural ordering
  • Ordered categorical - e.g. different degrees of
  • Dichotomous e.g. response/non-response,
    dead/alive at a specific time post-treatment
  • Time-to-event e.g. survival, time to
  • Different analysis methods are appropriate for
    each main
  • endpoint type sample size requirements differ as
  • (3) is obviously a special case of (2)

Endpoints (8) statistical properties
  • Approximate ordering by information content (from
    highest to lowest) is
  • Continuous gt time-to-event ordered
  • gt categorical gt binary
  • As a result, demonstrating an effect when the
    primary efficacy measure is a response rate is
    typically most demanding, in terms of sample size
  • Although continuous response variables may have
    preferable statistical properties, it is quite
    common for FDA to require the primary efficacy
    variable to be a response rate, where response is
    defined as the proportion of subjects who reach a
    specified threshold of improvement on the
    continuous scale (Raptiva, Lucentis)

Endpoints in cancer trials
  • Response rate (where response is based on change
    in tumor size, according to well-defined
    criteria best post-treatment evaluation is
    counted, so response is not linked to a specific
  • Duration of response (note that the resolution
    with which this can be determined will depend on
    the frequency of scheduled evaluations)
  • Survival time
  • Time to disease progression, where criteria for
    progression are well-defined
  • Progression-free survival
  • One major question is the extent to which a
    treatment effect
  • on response, in terms of reduction of tumor size,
    is predictive
  • for treatment effect on survival. Unfortunately,
    this seems to vary by tumor
  • and treatment class.

Sample Size Considerations
  • In the standard hypothesis testing framework for
  • Type I error conclude an ineffective drug is
    effective (false positive)
  • Type II error conclude an effective drug is
    ineffective (false negative)
  • Ideally, both error probabilities should be
  • Generally, sample size is chosen to give
    acceptable power (defined as 1- Type II error
    rate, or 1 - ?) for a prespecified false positive
    rate, ?
  • In phase III efficacy trials, ? is 0.05, by
    regulatory fiat
  • Acceptable power is generally taken to be 90 for
    pivotal studies

Phase III Trials Sample Sizes
  • This has implications for sample size, due to
    tension between both types of error
  • Timeline implications, as study duration
    treatment duration accrual time
  • Common pitfall exaggerate extent of the
    possible treatment effect (power for the home
    run), over-optimistic sample sizes
  • General guideline power study to detect
    treatment effect specified in the target product
    profile (regular, not optimistic, scenario)
  • In some cases, sample size is dictated by safety,
    rather than efficacy, considerations (satisfy
    minimum regulatory requirements)

Sample Size Considerations
  • For a given value of ?, power depends on
  • Magnitude of the treatment effect (?)
  • Sample size (?)
  • Inter-subject variability for continuous
    measurements (?)
  • Response rates for binary responses (??)
  • For most pivotal efficacy trials, the standard
    approach is to calculate the sample size
    necessary to give adequate (90) power to detect
    a clinically meaningful treatment effect, with a
    type I error rate of 5
  • Calculating the sample size needed for a given
    power requires some knowledge about variability
    of continuous responses (or response rates, for
    binary data)
  • Clinically meaningful needs to be defined in
    terms of the target product profile, not as the
    effect size which will give acceptable power for
    the sample size Im willing/able to use

Sample Size other approaches
  • Sample size is not always dictated by this kind
    of power analysis in some cases, safety
    requirements may be the deciding factor
    (rheumatoid arthritis, psoriasis)
  • In earlier phases, it may not be practical to run
    trials big enough to control both Type I and Type
    II error rates as well as we might like
  • 80 power is generally considered adequate in
    Phase II on occasion we may settle for less
  • Similarly, requiring significance at the 5 level
    may be overly stringent in Phase II
  • Personal view it is foolish to allow the
    hegemony of hypothesis testing to control our
    thinking prior to Phase III
  • Instead, view the issue as an estimation problem
  • Precision analysis
  • Choose sample size in such a way that there is a
    desired precision at fixed confidence level
  • Small chance of detecting true treatment effect

Sample Size for Time to Event Endpoints
  • Challenge
  • Power for correctly detecting a clinical
    meaningful difference at a fixed type I error
    rate depends primarily on the number of events
    (deaths, progressions, etc.)
  • Specifying the number of events doesnt uniquely
    determine the number of subjects
  • For instance, suppose the required number of
    events is 280. If 300 subjects per group is
    sufficient to give the required number of events,
    then 250 per group must as well it will just
    take longer
  • Thus, sample size calculations are a little more
    complex for time-to-event responses and will
    depend on
  • calculating the number of events needed to give
    the desired power
  • an assumption about the median time-to-event in
    the control group
  • an assumption about the size of the difference
    between control and treated groups
  • projected accrual patterns
  • targeted study duration

Interim Analyses
  • Interim analysis is a tool to protect the welfare
    of subjects
  • By stopping enrollment/treatment as soon as a
    drug is determined to be harmful
  • By stopping enrollment as soon as a drug is
    determined to be beneficial
  • By stopping trials which will yield little
    additional useful information (or which have
    negligible chance of demonstrating efficacy if
    fully enrolled, given results to date)
  • The associated statistical methods are generally
    referred to as group sequential methods

Interim analysis Concerns
  • Should preserve an overall false positive rate of
    ? for the trial cannot claim statistical
    significance at level ? if the unadjusted p-value
    at one of the interim analyses happens to be less
    than ?
  • In general, the unadjusted p-value for testing
    treatment effect at any given interim analysis
    will be compared to a more stringent (lower)
    bound to stop early (for efficacy) requires
    compelling evidence
  • Regulatory agencies need to be convinced that
    interim analyses do not compromise the integrity
    of the blind
  • Regulatory guidelines over the past 10 years have
    become stricter and stricter, ultimately
    requiring that interim analyses be conducted by
    an external, independent group, i.e. study team
    members are no longer privy to interim results

Interim analysis Concerns
  • Basically, interim results should not be shared
    with anyone in the sponsor company, or at
    participating study centers
  • The only feedback to the sponsor is in the form
    of the recommendations from the Data Monitoring
  • Details of any proposed interim analysis,
    including the sponsors expectations of the DMC,
    should be laid out prospectively in a written
  • SOPs and a charter template exist and should be
  • Although team members do not conduct the actual
    analyses, scheduled interim analyses can be
    highly labor-intensive nonetheless. Genentechs
    biostatistician/statistical programmer will still
    need to work with the external data group to
    develop detailed specifications for the analyses
    and displays to be made available to the Data
    Monitoring Board

Interim analysis
  • Early stopping for efficacy is not the only
    possibility (recent experience notwithstanding).
    Doing so is generally non-controversial, provided
    an appropriate group sequential stopping rule,
    and the role of the DMC, have been identified
  • Early stopping for safety can range from
    scenarios which are very clear-cut to situations
    which are considerably more ambiguous. In the
    latter case, having an experienced DMC chair can
    be particularly important
  • Early stopping for lack of efficacy (futility
    analysis) is not particularly common (with one
    exception, discussed on the next slide) the
    idea that incorporating this option can result in
    substantial reduction in the number of patients
    (gating risk) seems slightly misleading
    (personal opinion)
  • Stopping for futility in a controlled trial will
    typically happen only if the treatment appears
    considerably inferior to control at the interim
  • Enrolment continues during preparation for the
    interim analysis, which typically occurs at a
    point where accrual has gained momentum, so of
    subjects saved may not be that great

Early stopping for futility
  • An exception is the case of uncontrolled oncology
    trials focusing on estimation of response rate
  • Use of a two-stage (or multi-stage) design is
  • At a given analysis stage, if the observed
    response rate is so low that it essentially rules
    out the possibility that the true response rate
    is acceptable, may choose to stop
  • Typically the argument is based on the upper 90
    or 95 confidence limit for the true response
    rate stop if this is lower than the minimum
    rate identified as interesting in the TPP
  • Recall the rule of 3, often invoked in the
    context of safety data. If a particular event
    (adverse reaction, response) occurs in 0 out of N
    subjects tested, then the 95 upper confidence
    limit for the true rate of occurrence is 3/N.
  • Thus, for instance, if no responses are observed
    in the first 20 subjects, this effectively rules
    out values of the true response rate greater than
    3/20, or 15. If the TPP requires a response rate
    of at least 20, stopping for futility seems

Statistical analysis methods for rates
  • A fairly detailed exposition can be found on our
    website at gwiz/projects/stathelp
    introductory course notes, lecture 4
  • Use of the binomial distribution
  • Calculating standard errors normal approximation
    for large samples
  • Estimation and confidence intervals for a single
  • Testing for difference between two rates (z-test,
    ?²-test, Fishers exact test)
  • Estimation and confidence intervals for the
    difference between two rates
  • Testing for differences in rates among several
    groups (?²-test, Fishers exact test)

Statistical methods for survival analysis
  • If the response of interest is survival time,
    then specialized methods are needed, for two main
  • Frequency distribution of survival times is
    usually not well-behaved not normal, not even
  • In the context of clinical studies, cannot wait
    to observe all survival times this means, for
    some subjects, all we know is that their survival
    time exceeds the observation period
  • In statistical jargon, such survival times are
    called (right)-censored observations
  • Methods for survival times are also applicable to
    any response of type time-to-event e.g. time
    to disease progression, etc.

Overview of survival analysis methods
  • Definitions survivor function, hazard function
  • Estimation of survival curve Kaplan-Meier
  • Comparison of one or more survival curves
    logrank test, Wilcoxon test
  • Comparing survival curves, allowing adjustment
    for other factors (e.g. baseline disease status)
    proportional hazard regression, aka the Cox

Kaplan-Meier disease-free survival curves
stratified by p53 mutation status (n 542)
Solid/dotted without/with a p53 tumor mutation
Graphing survival data Kaplan-Meier estimation
  • We wish to estimate the proportion remaining
    disease-free at any given time, equivalently, the
    estimated probability of that a member of the
    population from which the sample is drawn is
    alive without disease at that time
  • Because of the censoring we use the Kaplan-Meier
    method. For each time interval we estimate the
    probability that those without disease at the
    beginning remain so throughout the interval. This
    is a conditional probability.
  • The probability of being disease-free at any time
    point is calculated as the product of the
    conditional probabilities of surviving without
    disease through each interval prior to that time
  • The calculations are simplified by ignoring times
    at which there were no recorded events (whether
    progressions or losses to censorship).
  • Censorship is accommodated in the calculations by
    ensuring that all subjects previously lost to
    censoring are removed from the risk set when
    calculating the conditional probability for a
    given timepoint
  • Because the overall probability of being disease
    free at a particular timepoint is calculated as a
    product of the relevant conditional
    probabilities, this (Kaplan-Meier) method of
    estimating the survival curve is sometimes
    referred to as the product-limit estimate

Describing survival pattern for a single group
  • Survival probabilities are usually presented as a
    connected "curve. The curve takes the form of a
    step function, with changes in the estimated
    probability occurring (only) when an event
    (progression) was observed
  • Observations censored during any interval affect
    the number still at risk at the start of the next
    interval. Censoring is thus accommodated when
    calculating the step sizes, its effect on the
    curve is relatively subtle, but becomes
    cumulatively more important over time. Some
    versions of the Kaplan-Meier curve display
    censoring times as superimposed short vertical
    lines (works best for relatively small sample
  • In practice, a computer is used to do these
  • Standard errors and confidence intervals for
    estimated survival probabilities can be found by
    using a formula due to Greenwood
  • Reporting estimated median survival with
    associated confidence limits is usual estimating
    other percentiles is also possible

Comparing survival patterns across groups
  • Two most common tests are
  • Logrank test
  • Wilcoxon test
  • If comparison needs to allow adjustment for other
  • covariates besides group ID (e.g baseline disease
  • status), the most common approach is
  • Cox (proportional hazards) regression
  • As the name implies, this analysis frames the
    comparison in terms
  • of the effect a treatment or covariate exerts on
    the hazard function,
  • rather than directly on the survival function

Comparing survival patterns testing
  • Logrank test
  • Basic idea at each new event time, figure out
    the survival pattern that would be expected if
    the null hypothesis (no difference) were true
  • Quantify the difference between the observed
    survival pattern and that expected under null
    hypothesis. This is done at each new event time.
  • Obtain a cumulative measure of discrepancy from
    H0 by adding up the contributions across all
    event times
  • Compare the result to appropriate tables
    (chi-square) to obtain a p-value
  • Wilcoxon test variation of logrank text which
    gives greater weight to discrepancies occurring

Comparing survival patterns estimation
  • Limitations of the logrank test
  • Only addresses the question is there a
    difference? No direct quantification of the size
    of the difference
  • Doesnt allow adjustment for other relevant
    prognostic factors (e.g. differences at baseline)
  • These questions usually addressed by Cox
    (proportional hazards) regression. Salient output
  • estimated coefficient with standard error and/or
    confidence interval
  • Usually interested in whether or not coefficient
    is zero
  • Quantifies effect on hazard, rather than the
    survival function

Definitions of survival and hazard functions
  • For completeness, here are the definitions
  • Survival function
  • S(t) Probability of surviving past time t
  • Hazard function
  • h(t) Probability of dying at time t, given one
    has survived until that time
  • For calculus fans, the hazard function turns out
    to be d/dt - log (S(t)

Safety analyses
  • Safety and efficacy data differ in some key
  • Safety hypotheses are not specified a priori
  • Failure to achieve statistical significance does
    not mean that a safety finding can be ignored
  • With safety data the goal is to prove a negative
  • Safety analyses are usually descriptive
  • A few serious medical events can lead to the
    termination of products development extreme
    value distributions are relevant to safety
  • Concurrent controls may not provide adequate
    context for interpretation

Safety Analysis - Challenges
  • Phase III trials are typically sized based on
    efficacy what type of safety statements are
  • Drug exposure how to summarize, how to
    correlate with adverse events observed, etc.
  • Dose response
  • Open label trials
  • Placebo-controlled trials
  • Sources of bias (under-reporting, longer
    follow-up leads to more events)
  • Adverse events very very many types, so what is
    an appropriate way to summarize/analyze?
  • Multiplicity

Safety Analyses - Challenges
  • Number of subjects and duration of exposure
    during development is minimal relative to the
    of patients that may receive drug post-approval
  • Only the most common AEs (e.g., incidence of 1
    or more) are identified
  • Less common AEs (1 in 1000) cannot be reliably
  • Rare events (1 in 10,000) will almost certainly
    not be observed at all
  • Some patient groups may have been excluded from
    trials entirely, or insufficiently represented
    to a degree which precludes identifying any risks
    specific to them

Regulatory Requirements
  • Safety
  • Applicant must demonstrate product safety (FDA
    has obligation to demand)
  • Extent of data There must be sufficient
    information to decide whether the drug is safe.
  • Adequate analyses Adequate tests by all
    methods reasonably applicablemust be performed
    to evaluate safety for labeled use.
  • Reasonable results Tests should show that drug
    is safe as labeled
  • Risks must be adequately defined.
  • Extreme risks (even if rare) must be obvious.

Regulatory Requirements
  • Efficacy
  • Applicant must demonstrate substantial evidence
    of effectiveness claimed.
  • Substantial evidence evidence consisting of
    adequate and well-controlled investigations,
    including clinical investigations, from which
    experts could conclude the drug will have the
    claimed effect.
  • Investigations imply replication or
  • Typical 2 Phase III trials with identical or
    similar designs
  • In special circumstances 1 Phase III trial may
    be sufficient.
  • E.g. life-threatening diseases with very limited
    therapeutic options (always a good idea to talk
    to regulatory agencies prior to trial initiation)

Guidelines and Regulations
  • Regulatory Agencies
  • FDA
  • EEC (European Economic Community)
  • U.S. Codes of Federal Regulations for Clinical
  • ICH (International Conference on Harmonization)
  • Initiatives undertaken by regulatory authorities
    and industry associations to promote
    international harmonization of regulatory
  • Good Clinical Practice (GCP)
  • Structure and content of clinical studies
  • Clinical safety data management Definitions and
    standards for expedited reporting
  • Statistical principles for clinical trials

Biomarker - working definition
  • . a laboratory measurement or physical sign
    used as a substitute for a clinical endpoint that
    measures how a patient feels, functions, or
  • from a definition of the term surrogate
    endpoint by
  • Temple, cited in Fleming and DeMets (1996),
  • Annals of Internal Medicine, 125, pages 605-613
  • Surrogate endpoints in clinical trials are we
    being misled?

  • Some thoughts on biomarkers

Biomarkers as surrogate endpoints
  • Predict clinical efficacy of treatment based
  • on its effect on biomarker (data may be
  • available earlier may provide answer with fewer
  • number of subjects)
  • Use in Phase II is common
  • dose ranging based on biomarker
  • Phase III go/no go decision based on observed
    treatment effect on biomarker

Common biomarker types
  • Biochemical (cholesterol, HIV viral load,
    cytokine concentration, hemoglobin A1c )
  • Immunological (lymphocyte subpopulation counts,
    CD4 , CD11a T cells, CD20 B cells..)
  • Saturation of target cell surface antigen or
    soluble ligand
  • Physiological (e.g. blood pressure, pulmonary
    function testing, episodes of arrythmia )
  • Imaging (angiography, tumor size, bone density by
    DEXA scan )

Biomarkers as surrogates - successes
  • Lowering of cholesterol level by treatment with
    statins (survival benefit established)
  • Reduction in viral RNA in peripheral blood
    through treatment with protease inhibitors delays
    HIV disease progression
  • Improved glycemic control (HbA1c) predictive of
    delayed onset of microvascular complications
    (retino-, nephro-, neuropathy) in Type I diabetes
  • 90-minute TIMI flow (angiography) predictive of
    30-day survival following thrombolytic therapy
  • Reduction in free IgE following treatment with an
    anti-IgE antibody correlates with symptom
    improvement scores in allergic rhinitis and asthma

Biomarkers as surrogates cant win em all
  • Experience with biomarkers is not always positive
  • CD4 counts as a surrogate in AIDS trials mixed
    performance as a predictor of clinical benefit
  • Tumor size in cancer trials experience runs
    both ways appears to depend both on tumor type
    and on class of treatments
  • Experience in the CAST trial demonstrated that
    treatment with encainide/flecainide clearly
    reduced the incidence of arrythmias, but
    increased mortality
  • Similar results in context of treating atrial
  • Blood pressure as surrogate effect translates
    to clinical benefit for some drug classes, but
    not others

What can make biomarkers unreliable?
  • Biomarker not on causal pathway of disease
  • Several pathways intervention affects that
    mediated through biomarker, but not others
  • Biomarker not on the pathway affected by the
    intervention, or is insensitive to treatment
  • Intervention has mechanisms of action unrelated
    to the disease process (aka the law of
    unintended consequences)
  • Failure of either type is possible - biomarker
    could falsely predict, or fail to predict,
    clinical benefit

What can make biomarkers unreliable?
  • Other potential contributing factors include
  • Measurement difficulties due to rater effects
  • GNE experience (?-interferon in renal cell
  • strongly supports advisability of blinded tumor
  • evaluation by a single central review board
  • bias, minimize center differences)
  • Measurement difficulties arising from sample
  • transport, storage, and handling
  • Time constraints in assaying fresh blood,
    possible effects of
  • activation of T-cells, lack of standardization of
    FACS assay
  • protocols and reporting methods, heterogeneity of
  • samples, center differences (use of local or
    central labs)

What can make biomarkers unreliable?
  • Other potential assay-related difficulties
    include -
  • Matrix effects
  • Interference by other proteins can affect assay
  • specificity and/or sensitivity
  • Development of antibodies
  • Can be hard to detect harder to quantify
  • extremely difficult to assess clinical
    significance, if any
  • Inter-laboratory differences
  • Can be large enough to make biomarker data

Biomarkers editorial comments
  • Avoid the what we can measure is what we should
    measure fallacy
  • Experience with imaging-based biomarkers to date
    has been disappointing
  • Non-targeted genomic assays (e.g. microarrays
    followed by data mining) has the potential for
    much wasted effort
  • Avoid the rearranging the deckchairs on the
    Titanic fix, e.g. straining to improve assay
    precision from a CV of 20 to 15 when the
    within-subject CV for the marker is 40 and the
    inter-subject CV is 50.
  • Cytokines make particularly treacherous
  • Proteomics is not for sissies
  • Distinguish between must know and
  • An understanding of mechanism of action may be
    nice to know, but is not a requirement for drug

Personal opinions (tongue in cheek)
  • If the word cascade appears in the description
    of the disease process, all bets are off
  • The topic of biomarkers seems to drive otherwise
    thoughtful researchers to an irrational frenzy of
    wishful thinking
  • The message so eloquently expounded by Jagger et
    al remains as relevant today as it was in 1969
  • Lasagnas Law already mitigates against rapid
    accrual of eligible subjects to clinical trials
  • To slow recruitment from a trickle to a complete
    grinding halt only two words are needed in the
    protocol serial biopsy

Biomarkers - general conclusions
  • Utility of a particular biomarker depends not
    only on the disease, but also on the nature of
    the therapeutic intervention
  • Validation of any candidate biomarker must
    necessarily be considered on a case-by-case basis
  • Validity of a marker for a given drug class may
    not transfer to other drug classes for the same
  • Success is most likely when intervention clearly
    affects the biomarker, whose role in the disease
    process is well-established and clearly
  • Validation of a putative marker cannot happen
    without ultimately generating the required
    clinical outcome data
  • Regulatory conservatism is to be expected, and
    seems appropriate

(No Transcript)
Write a Comment
User Comments (0)