EPI 260 Statistics in Phase II Clinical Trials - PowerPoint PPT Presentation

1 / 103
About This Presentation

EPI 260 Statistics in Phase II Clinical Trials


... preliminary evidence of efficacy and safety If the new treatment works well ... level decision rule ... the NULL Introduction to ROC curves ROC ... – PowerPoint PPT presentation

Number of Views:425
Avg rating:3.0/5.0
Slides: 104
Provided by: JimmyHw


Transcript and Presenter's Notes

Title: EPI 260 Statistics in Phase II Clinical Trials

EPI 260Statistics in Phase II Clinical Trials
Jimmy Hwang, Ph.D. Biostatistics Core, Cancer
Center UC San Francisco April 29, 2010
Early Phase Clinical Development Phase II
studies Statistics (in syllabus)
  • Purpose of Phase II clinical studies
  • Phase II study design
  • formulation of testable hypotheses
  • determine the study endpoints and when they will
    be evaluated
  • define the population to be studied
  • select the appropriate study design
  • Determine the required sample size by making
    assumptions about the extent of benefit to be
    achieved with the new treatment and acceptable
    errors in making a final decision about whether
    the null hypothesis can be rejected
  • Methods for statistical analysis

Four types of trial designs (1)
  • Phase I pharmacologically oriented
  • The safe dose range        
  • The side effects
  • How the body copes with the drug
  • If the treatment shrinks cancer
  • Phase II preliminary evidence of efficacy and
  • If the new treatment works well enough to test in
    phase 3        
  • Which types of cancer it is effective
  • More about side effects and how to manage
  • More about the most effective dose to use

Four types of trial designs (2)
  • Phase III new treatments are compared with the
    best currently available treatment (the standard
  • A completely new treatment with the standard
  • Different doses or ways of giving a standard
  • A new radiotherapy schedule with the standard one
  • Phase IV post-marketing surveillance
  • More about the side effects and safety of the
  • What the long term risks and benefits are     
  • How well the drug works when its used more
    widely than in clinical trials

Statistical Considerations
  • Define Clinical Question (Objectives).
  • Study Development and Protocol Development
  • Types of Study (pilot, clinical trial,
    observational, etc.)
  • Endpoints (feasibility and appropriateness)
  • Protocol Development (objectives, aims,
    statistical design, patient selection, data
    collection procedures, number of points, stopping
    rules and interim analysis, statistical
    endpoints, analysis plan, sample size)
  • During Study Randomization, data quality
    control, interim analysis and/or monitoring of
    patient safety
  • Study Finishing Data lock, data analysis and
    interpretation, assist decisions for the
    follow-up studies, and preparation for papers and

Statistical Perspectives
  1. Philosophy of inference divides statisticians
    Frequentists versus Bayesian
  2. Statistical procedures are not standardized.
  3. Things to consider
  4. Randomization
  5. Intent-to-treat Design
  6. Unbalanced groups
  7. Stratification
  8. Large-scale, small clinical trials, meta analysis
  9. Adjusted or weighted analysis
  10. Trials can provide confirmatory evidence.
  11. Other methods are valid for making clinical

Basic Question
  • Clinical reasoning requires generalizing from
    individual patients.
  • Statistical reasoning emphasizes inference based
    on structured data processing.
  • Which treatment is safer and better?

  • Benefit could be defined as
  • Antitumor activity
  • Safety
  • The pharmacokinetics or pharmacodynamics
  • The biologic correlates which may predict
    response or resistance to treatment and/or

Intent-to-treat (ITT) Principle
  • Unlike animal studies, investigator cannot
    dictate what a participant should do in a
    clinical trial.
  • A participant may forget to take the pills,
    receive dose reduction due to toxicity, drop out
    from the study at any point or lost to f/u.
  • Use only full compliers? Use all subjects?
  • ITT compares intervention strategies and not

Standards of Ethical Conduct
  1. The study participants must give voluntary
  2. There must be no reasonable alternative to
    conducting the experiment.
  3. The anticipated results must have a basis in
    biological knowledge and animal experimentation.
  4. The procedures should avoid unnecessary suffering
    and injury.
  5. There is no expectation for death or disability
    as a result of the trial.

Standards of Ethical Conduct
  1. The degree of risk for the patient is consistent
    with the humanitarian importance of the study.
  2. The subjects are protected against even a remote
    possibility of death or injury.
  3. The study must be conducted by qualified
  4. The subject can stop participation at will.
  5. The investigator has an obligation to terminate
    the experiment if injury seems likely.

Study Protocol
  • Every well-designed study required a protocol.
  • Protocol is a written agreement between
    investigators, participants, and the scientific
  • Protocol is a comprehensive operational manual.
    It specifies the standard operation procedure

Defining study questions
  • Each clinical trial must have a primary question.
  • The primary question, as well as any secondary or
    subsidiary questions, should be carefully
    selected, clearly defined, and stated in advance.
  • Selection of the questions
  • Primary and secondary objectives
  • Interventions
  • Response variables
  • Surrogate endpoints, biomarkers

Primary Objective
  • Define one question the investigators are most
    interested in answering and is capable of being
    adequately answered.
  • Define the primary endpoint
  • Toxicity, efficacy (response/survival), QOL
  • Define the type of study
  • Hypothesis testing or estimation,
  • Superiority or equivalence trials
  • The sample size is based on.

Secondary Objectives
  • Different endpoints
  • Subgroup hypotheses
  • Prospectively defined
  • Based on reasonable expectations
  • Limited in number
  • Hypothesis testing vs. hypothesis generating
  • Hunting expedition vs. fishing expedition
  • Multiplicity Issues

What Study Aims Tell You
  • Type of study general design
  • (pilot, phase I, II or III study arms)
  • Who is eligible
  • Outcome measure
  • (e.g. toxicity, response, duration, biomarker)
  • When outcome will be evaluated
  • (Timing of evaluations)

Interim Analysis Why ?
  • Many trials require large N and/or long duration.
  • Interim analysis can result in more efficient
    designs and correct conclusion can be reached
  • Ethical considerations
  • Pace of scientific advancement demands learning
    from the observed data.
  • Public health concerns, pressure from activists
  • Requirement from IRB and other regulatory agencies

Interim Analysis Factors to Consider before
Early Termination
  • Possible difference in prognostic factors among
  • Bias in assessing response variables
  • Impact of missing data
  • Differential concomitant tx or adherence
  • Differential side effects
  • Secondary outcomes
  • Internal consistency
  • External consistency, other trials

Interim Analysis Reasons for early stopping
  • Efficacy Treatments are convincingly different
    or not different (by impartial knowledgeable
  • Toxicity Serious Adverse Events, Side effects or
    toxicity are too severe (outweight the potential
  • Futility Significant difference at the end of
    the trial is unlikely
  • Data are of poor quality
  • Accrual too slow in a timely fashion
  • More information becomes available outside the
    study (unnecessary or unethical to continue)
  • Scientific questions are no longer important
  • Poor adherence (preventing answers to basic
  • Resources to study are lost or no longer
  • Fraud or misconduct undermines study integrity.

Interim Analysis To Stop or Not To Stop?
  • How sure?
  • Is the evidence strong enough, or just due to
    stochastic variation, or imbalance in covariates
    or other factors?
  • Wrongly stopping for efficacy false positive
  • False claim that the drug is active
  • Waste time and money for future development
  • Wrongly stopping for futility false negative
  • Kill a promising drug
  • Group ethics vs. individual ethics

Data Assessment Reasons for Noncompliance
  • Toxicity or side effects
  • Involving life style/behavior change
  • Complex or inconvenient interventions
  • Insufficient or lack of understanding
  • Change of mind, refusal
  • Lack of family support
  • If non-compliance is treatment dependent, it will
    result in biased data

Data Assessment Non-adherence
  • Include non- or partial compliers, drop-in and
  • Could due to toxicity, lack of efficacy, refusal.
  • Need to compare the non-adherence rate between
  • Exclude in the analysis
  • Rationale pts not taking medication will not
    benefit from it.
  • Compare the optimal intervention vs. control
  • Can lead to biased result
  • Include in the analysis
  • Intend-to-treat (ITT) principle
  • Power reduced but also less bias
  • More relevant to generalize study result to the
    real world setting
  • Do both. Sensitivity analysis

Data Assessment Poor Quality or Missing Data
  • Missing visits may or may not due to outcomes
    related to treatment, such as pts health status
  • Informative or non-informative missing
  • Missing completely at random
  • Missing at random (missing does not depends on
    unobserved values)
  • Not missing at random
  • Available methods
  • Complete case analysis
  • Last value carried forward
  • Single imputation
  • Multiple imputation
  • Sensitivity analysis

Defining Response Variables
  • Dose limiting toxicities (DLT), complications
  • Response, incidence of a disease, total
    mortality, death from a specific cause
  • Overall survival, time to progression, time to
  • Blood pressure, biomarkers, PSA, CD4 count
  • Quality of life
  • Cost and ease of administrating the intervention
  • In general, a single response variable should be
    identified to answer the primary question.

Defining Response Variables
  • Define the questions prospectively and
  • Study drug can increase the response rate
    (PRCR) from 25 to 50 in patients with certain
  • The primary response variable can be assessed in
    all participants and as completely as possible
  • Informative drop-out or lost to f/u due to
  • Participation generally ends when the primary
    response variable occurs
  • Off-drug, off-study, extended f/u
  • Response variables should be unbiased and
    precisely assessed
  • Hard, objective endpoints vs. soft, subjective
  • Standardization of evaluation, central lab and
    pre-trial training

Scales of measurement
  • Nominal
  • Ordinal
  • Interval
  • Ratio

Statistical Methods for Categorical Data
  • Goal Analysis
  • Describe one group Proportion
  • Compare one group to a Chi-square test
  • hypothetical value
  • Compare two unpaired groups Chi-square test
  • Compare two paired groups McNemar's test
  • Compare three or more Chi-square test
  • unmatched groups
  • Model the effect of multiple Logistic regression
  • prognostic variables
  • When sample size is small, use Fishers exact

Statistical Methods for Continuous Data
  • Goal Analysis
  • Describe one group Mean, SD
  • Compare one group to a One-sample t-test
  • hypothetical value
  • Compare two unpaired groups Two-sample t-test
  • Compare paired data Paired t-test
  • Compare three or more One-way ANOVA
  • unmatched groups

Statistical Methods for Non-Parametric Data
  • Goal Analysis
  • Describe one group Median, Percentiles
  • Compare one group to a Signed-rank test
  • hypothetical value
  • Compare two unpaired groups Mann-Whitney test
  • Wilcoxon rank sum test
  • Compare paired data Signed-rank test
  • Compare three or more
  • unmatched groups Kruskal-Wallis test

Statistical Methods for Survival Data
  • Goal Analysis
  • Describe one group Kaplan-Meier
  • Compare two unpaired groups log-rank test
  • Compare three or more Cox regression
  • unmatched groups/continuous
  • risk factors
  • Model the effect of multiple Cox regression
  • prognostic factors

Samples and Population
  • Research findings are based on samples drawn from
  • Inferential statistics allow us to infer what the
    population is like, based on sample data
  • The defined group of individuals from which a
    sample is drawn
  • Sample should closely reflect the population
    otherwise there is sampling bias.

  • The process of choosing members of a population
    to be included in the sample
  • Research uses data from a sample to make
    inferences about a population.

  • How much do scores vary about the average?
  • Variance (sum of squared deviations of each
    score from the mean)/(n-1)
  • Variance is small when scores are close to the
  • Standard deviation square root of variance

Within-group variability
  • Variability within-groups is measured by the
    variance and divided by sample size
  • Tells us how far individual scores deviate from
    the group mean
  • This reflects "error"
  • The number becomes lower with increasing sample

Two Group Means
  • Ask samples of males and females about their
    number of doctor visits during the past year
  • Suppose the mean for males is 1.3 and the mean
    for females is 2.1

Do males and females differ?
  • Is the mean number for males different from the
    mean number for females?
  • Obviously, the sample means are different
  • Can we infer that the population means differ as

Whats the Problem?
  • The difference observed in the samples may be
  • However, the difference could just reflect the
    fact that there is some chance of error there
    is always a margin of error around the sample

Hypothesis Testing
a Type I error (level of significance) b
Type II error (1- b Power)
(false - )
(false )
Inverse relationship between a and b for given
sample size Sample Size Calculation Find N s.t.
to a and b are under control. Typically, compute
N for a given a to yield (1-b)x100 power. For
example, compute N for a 0.05 to yield 80
Null and Research Hypotheses
  • Null hypothesis H0
  • Population means are in fact equal
  • Any mean difference observed in the samples
    reflects the margin of error
  • straw man or what you want to reject
  • any observed deviation from what we expect to see
    is due to chance variability
  • Research hypothesis H1
  • Population means are not equal
  • The mean difference observed is real
  • claim, or what you want to accept or test)

Alternative Hypotheses H1
Is the New" Treatment Different from the
standard? (2-sided) Better than the standard?
(1-sided, directional) Not different from the
standard? (Equivalency) Not worse than
the standard? (Not inferiority)
Hypothesis testing
  • Problem Determine whether or not the population
    means of two groups of subjects truly differ with
    respect to the outcome of interest.
  • Solution Assume that the two groups do not
    differ, and see if the sample data disagree with
    this assumption. That is, perform a hypothesis

Hypothesis testing (contd)
  • The null hypothesis assumes that there is no
    difference in outcome between the two groups.
  • The alternative hypothesis assumes that one group
    has a more favorable outcome than the other.
  • The research hypothesis is usually the
    alternative hypothesis.

Hypothesis testing (contd)
  • To do a hypothesis test
  • Calculate a test statistic from the data.
  • Determine whether the value of the test statistic
    is likely or unlikely under the null hypothesis.
  • If the value is very unlikely, reject the null

Hypothesis testing (contd)
  • Problem we might reject the null hypothesis
    when it is true.
  • That is, we might commit Type I error.
  • Solution Construct the test so that there is
    only a 5 chance of incorrectly rejecting the
    null hypothesis.
  • That is, the level of the test (alpha) is 0.05.

Type I Error
  • The chance of rejecting a NULL which is true is
    a this type of mistake is called a Type I error
    or false positive
  • Reject the null hypothesis when it is true
  • Likelihood is set the alpha level decision rule
    (.05 usually)
  • 5 is a reasonably low probability of being
    wrong, but could set lower
  • For early phase II trials, we often use more
    liberal type I errors for not missing the
    possible treatments
  • In medical contexts, the specificity of a test is
    the chance that the test result is negative given
    that the subject is negative this is just 1 - a

P lt .05
  • The alpha level for rejecting the null hypothesis
    is conventionally set as .05
  • Obtained sample data are inconsistent with what
    the null hypothesis expects
  • Reject the null hypothesis and therefore accept
    the research hypothesis
  • Therefore, conclude that the obtained difference
    in means is statistically significant

Type II Error
  • Incorrectly accepting the null hypothesis when
    there really is a difference
  • The chance of not rejecting a NULL which is false
    is ß this type of mistake is called a Type II
    error or a false negative
  • In medical contexts, the sensitivity of a test is
    the chance that the test result is positive given
    that the subject is positive this is just 1 - ß,
    also called power

  • Probability of correctly rejecting the null
  • 1-Beta
  • Power is higher with
  • Large sample size
  • Large difference between group means
  • Low within-group variability

What is p value?
  • The p-value is the probability of obtaining data
    as extreme as the observed result when the null
    hypothesis is true.
  • That is, the p-value is the strength of the
    evidence against the null hypothesis.
  • For a level 0.05 test, we reject the null
    hypothesis if the p-value is 0.05 or less.
  • Smaller p-values ? stronger evidence against H0.
  • Statistical Significance or Clinical Significance
  • Large samples small differences may be
  • Small samples large differences may not be
  • The frequentist inference depends on sample
    space, i.e. the design.

What is p value?
  • Decide on whether or not to reject the NULL
    hypothesis H0 based on the chance of obtaining a
    TS as or more extreme (as far away from what we
    expected or even farther, in the direction of the
    ALT) than the one we got, ASSUMING THE NULL IS
  • The likelihood of observing the same outcome or
    one more extreme if the study were carried out
  • This chance is called the observed significance
    level, or p-value
  • A TS with a p-value less than some prespecified
    false positive level (or size) a is said to be
    statistically significant at that level

What is p value?
  • The interpretation of a p-value is a little
    tricky In particular, it does NOT tell us the
    probability that the NULL hypothesis is true
  • The p-value represents the chance that we would
    see a difference as big as we saw (or bigger) if
    there were really nothing happening other than
    chance variability.
  • p 0.08, 8 times out of 100 the same result or
    more extreme would occur due to chance alone
  • A single convenient number giving a measure of
    the degree of surprise which the experiment
    should cause a believer of the null hypothesis

Judging a p-value
The results are significant.
The results are highly significant.
The results are very highly significant
lt 0.001
The results are not statistically significant
gt 0.05
A trend toward statistical significance
Statistical Significance Tests
  • Significance tests provide a way of making a
    decision about the population means
  • There are many such tests used for different
    types of data. But all use the same logic

Test statistic
  • Measure how far the observed data are from what
    is expected assuming the NULL (H0) by computing
    the value of a test statistic (TS) from the data
  • The particular TS computed depends on the
  • For example, to test the population mean µ, the
    TS is the sample mean (or standardized sample

  • An experiment is conducted to study the effect of
    exercise on the reduction of the cholesterol
    level in slightly obese patients considered to be
    at risk for heart attack. 80 patients are put on
    a specified exercise plan while maintaining a
    normal diet. At the end of 4 weeks the change in
    cholesterol level will be noted. It is thought
    that the program will reduce the average
    cholesterol reading by more than 25 points.
  • Data
  • sample mean 27
  • sample SD 18

Steps in hypothesis testing (I)
  • 1. Identify the population parameter being tested
    (ie population mean). Here, the parameter being
    tested is the population mean cholesterol reading
  • 2. Formulate the NULL (H0) and ALT hypotheses
  • H0 µ 25 (or µ 25)
  • Ha µ gt 25
  • 3. Compute the test statistic (TS)
  • t (27 25)/(18/v 80) .99

Steps in hypothesis testing (II)
  • Compute the p-value.
  • Here, p P(T79 gt .99) .16
  • (Optional) Decision Rule
  • REJECT H0 if the p-value a
  • (This is a type of argument by contradiction)
    A typical value of a is .05, but theres no law
    that it needs to be. If we use .05, the decision
    here will be)

Hypotheses Null New drug doesnt work
Alternative New drug works Decisions New
drug works Correctly reject H0Power Abandon
new drug Correctly dont reject H0 Proceed
with an ineffective drug Type I error
Abandon a drug that might work Type II error
Pitfalls in hypothesis testing
  • Even if a result is statistically significant,
    it can still be due to chance
  • Statistical significance is not the same as
    practical importance
  • A test of significance does not say how important
    the difference is, or what caused it
  • A test does not check the study design If the
    test is applied to a nonrandom sample (or the
    whole population), the p-value may be meaningless
  • Data-snooping makes p-values hard to interpret

Introduction to Permutation test (Rank Test)
  • A type of nonparametric hypothesis test
  • Also called randomization test, exact test
  • Very widely applicable class of tests
  • Introduced in the 1930s
  • Usually require only a few weak assumptions
  • Often shows good power

5 Steps to a permutation test
  • 1. Analyze the problem identify the NULL and ALT
  • 2. Choose a test statistic (TS)
  • 3. Compute the TS for the original labeling of
    the observations
  • 4. Rearrange (permute) the labels and recompute
    the TS for the rearranged labels (do for all
    possible permutations)
  • 5. Decide whether to reject NULL based on this
    permutation distribution

  • A permutation is a reordering of the numbers 1,
    ..., n
  • Example What are some permutations of the
    numbers 1, 2, 3, 4??
  • The NULL specifies that the permutations are all
    equally likely
  • The sampling distribution of the TS under the
    NULL is computed by forming all permutations,
    calculating the TS for each and considering these
    values all equally likely

  • Suppose we wish to compare the length of stay in
    the hospital for patients with the same diagnosis
    at two different hospitals. We have the following
  • 1st hospital
  • 21,10,32,60,8,44,29,5,13,26,33
  • 2nd hospital
  • 86,27,10,68,87,76,125,60,35,73,96,44,238
  • How could we carry out a permutation test to test
    the NULL hypothesis of no difference between two
  • Why is a t test not useful in this case?

  • The distribution of length of stay is very skewed
    and far from normal distribution.
  • Using Rank-sum test,
  • R 83.5, T 3.10 p 0.002
  • This is an example of an unpaired 2 sample test
  • Here, we have to find all of the combinations
    (since order within each group doesnt matter)

  • Can get a permutation test for any TS, even if
    its sampling distribution is unknown
  • This gives more freedom in choosing a TS
  • Can use on unbalanced designs
  • Can combine dependent tests on mixtures of
    different data types (e.g. with numerical and
    categorical data)

  • Assumption that the observations are exchangeable
    under the NULL
  • This allows us to randomly move observations
    between the groups
  • For example, when testing for a difference in 2
    group means you would need to assume that the
    distributions in both groups have the same shape
    and spread
  • Cannot use for testing hypotheses in a single
    population, or to compare groups that are
    different under the NULL

Introduction to ROC curves
  • ROC Receiver Operating Characteristic
  • Started in electronic signal detection theory
    (1940s - 1950s)
  • Has become very popular in biomedical
    applications, particularly radiology and imaging
  • Also used in machine learning applications to
    assess classifiers
  • Can be used to compare tests/procedures
  • True positive rate (sensitivity) vs. false
    positive rate (1-specificity)

Examples using ROC analysis
  • Threshold selection for tuning on already trained
    classifier (eg neural nets)
  • Defining signal thresholds in DNA microarrays
  • Comparing test statistics for identifying
    differentially expressed genes in replicated
    microarray data
  • Assessing performance of different protein
    prediction algorithms
  • Inferring protein homology

ROC curves simplest case
  • Consider diagnostic test for a disease
  • Test has 2 possible outcomes
  • positive suggesting presence of disease
  • negative
  • An individual can test either positive or
    negative for the disease

Specific Example
Test Result
Test Result
Four groups
True Positives
True Negatives
False Negatives
False Positives
Test Result
Moving the threshold
True Positives
True Negatives
False Negatives
False Positives
Test Result
ROC Curve
True positive rate (sensitivity)
False Positive Rate (1-specificity)
ROC Curve
True positive rate (sensitivity)
False Positive Rate (1-specificity)
Area under ROC curve (AUC)
  • Overall measure of test performance
  • Comparisons between two tests based on
    differences between (estimated) AUC
  • For continuous data, AUC is equivalent to
    Mann-whitney U-statistic (non-parametric test of
    difference in location between two populations)

Interpretation of AUC
  • The probability that the test result from a
    randomly chosen diseased individual is more
    indicative of disease than that from a randomly
    chosen nondiseased individual P(Xi gt Xj Di1,
  • A nonparametric distance between
    disease/nondisease test results.
  • No clinically relevant meaning
  • A lot of the area is coming from the range of
    large false positive values, no one cares whats
    going on in that region.
  • The curves might cross, so that there might be a
    meaningful difference in performance that is not
    picked up by AUC

Elements of sample size calculation
  • Hypothesis
  • H0 New treatment standard treatment
  • Ha New treatment is better.
  • Type I and Type II errors
  • ? .025 (or two-sided ? .05)
  • ? .15 (Power 85)
  • Effect size
  • ? mu1 mu2 (for continuous outcomes)
  • ? Pi1 Pi2 (for dichotomous outcomes)
  • Sample variation
  • s(? )

Test of Proportions
  • Determining the Sample Size
  • What is the level of significance?
  • (Prob. or ? level)
  • Rejecting a true null hypothesis
  • What are the chances of detecting
  • a real difference? (Power)
  • How large a difference (?) is clinically

Determining the Sample Size
  • Criteria are inter-related
  • If you know 3 of 4 parameters, the other is fixed
    (n, ?, ? and ?)
  • Must keep the study feasible
  • There are trade offs
  • There is no one correct answer

Sample Size Calculation Is Only An Estimate
  • Parameters used in calculation are estimates
    themselves with a level of uncertainty.
  • Estimated treatment effect may be based on a
    different population.
  • Estimated treatment effect is often overly
    optimistic based on highly selected pilot
  • Patients eligibility criteria may be changed,
    thus, affect the sample population.
  • Better to design a larger study with early
    stopping and a smaller study than try to expand N
    /extend f/u during the trial.

Sample Size and Power Why?
  • Before a study how large of a sample does a
    study require? (in planning)
  • After a study if no association was found, could
    it be due to either true lack of association in
    population low power and small sample size?

Power sample size
  • Problem we might fail to reject the null
    hypothesis when the alternative is true.
  • That is, we might commit Type II error.
  • Solution Select a large enough sample so that
    there is an 80 chance of rejecting the null
    hypothesis if the alternative is true.
  • Then the power to detect the alternative is 80.

Power sample size (contd)
  • Problem Sometimes the sample size required is
    too large.
  • Solutions
  • Be content to detect with less power (allow more
    type II error).
  • Increase the level of the test (allow more type I
  • Pick a more extreme alternative.

Sample Size
  • Larger sample sizes provide more accurate
    estimates of the characteristics of the
  • Confidence interval specify where the
    population value probably lies
  • As sample size increases, there is less margin of

Change in Sample Size Test of Proportions
Test of Hypothesis for Phase II Trial 1 arm H0
p lt 0.10 H1 p gt 0.25 n
40 Design ?10.04 1-sided test 1 - ?
0.82 ? 0.15 1 arm
Change in Sample Size Test of Proportions
Test of Hypothesis for Phase II Trial 1 arm H0
p lt 0.10 H1 p gt 0.25 ? 0.15 ?1
0.05 0.025 0.01 1 - ? 0.80 40 49 62 0.90 55
64 78 0.95 70 79 103
TTP Example
Assumptions 1
arm ?1 0.05 Power 0.80 H0 Med30
mos. H1 Med40 mos. Hazard Reduction 26 Accrua
l 12/mo. Duration of Accrual 14.7
mos. Follow-up 24 mos. Total Sample
Size 176 pts.
Change to a 2 Arm Study
Assumptions 2 arm study ?1 0.05 Power
0.80 H0 Med30 mos. H1 Med40 mos. Hazard
Reduction 26 Accrual12/mo. Duration of Accrual
(mos) 43.1 Follow-up 24 mos. Total Sample
Size 518
Increase Power
Assumptions 2 arm study ?1 0.05 Power
0.80 0.90 H0 Med30 mos. H1 Med40
mos. Hazard Reduction 26 Accrual12/mo. Duration
of Accrual (mos) 43.1 55.8 Follow-up 24
mos. Total Sample Size 518 670
Statistical Power
Characteristics of Phase I Trials
  • Small sample sizes
  • Not hypothesis driven
  • Toxicity (DLT and MTD) and Efficacy
  • Patient safety and benefit
  • Dose escalation and drug discovery
  • Clinician, Patients and Drug Development

Phase I trial designs
  • Conventional/Standard Method
  • 33 Dose Escalation Design
  • Sequential/Bayesian Methods
  • Continual Reassessment Method (CRM)
  • Random Walk Rules (RWR)
  • Decision-theoretic Approaches
  • Escalation with Overdose Control (EWOC)

Phase I Dose StudyStandard Method- 33 design
  • At each predefined dose level, treat 3 patients
    with dose level 1.
  • If 0 of 3 have DLT, increase to next level
  • If 2 or more have DLT, decrease to previous level
  • If 1 of 3 has DLT, treat 3 more at current dose
  • If 1 of 6 has DLT, increase to next level
  • If 2 or more have DLT, decrease to previous level
  • If a dose has de-escalated to previous level
  • If only 3 had been treated, enroll 3 more for a
    total of 6
  • If 6 have been treated, stop study and declare it
    as MTD.
  • MTD the largest dose for which 1 or fewer DLT
  • Escalation never occurs to a dose at which 2 or
    more DLT have occurred.

Sample Size for Safety Trials
  • Type I Error (Alpha)
  • Acceptable Safety Rate (Rho)
  • Sample Size (N)

Alpha 0.10 0.05 0.05 0.10 0.10
Rho 5 10 14 20 25
Sample Size 45 28 20 10 8
Characteristics of Phase II Trials
  • Aim To determine the efficacy of a new
    treatment (what outcomes to observe)
  • Small study of one experimental treatment (E)
  • Often a single-arm trial of E alone, without
  • Efficacy and safety are evaluated using an
    early outcome
  • Data on E are compared to historical data on
    standard treatment (S)
  • If E is promising, then Organize a randomized
    phase III trial of E-vs-S based on a
    time-to-event outcome (T)

Primary Outcome Measure Point Estimate
Mean Hgb µ Proportion Responding p Median
Nadir PSA Failure Rate ?
Typical Phase II Trials
  • Typical cancer phase II trials investigate the
    response rate
  • Historical reference p0
  • Desired clinical significant response p1
  • Hypotheses
  • H0 pp0 (If true response rate is no larger
    than p0, a minimum response rate of interest)
  • H1 pp1 (If true response rate is at least
    p1, a target response rate)
  • Stop the trial early if p is not sufficiently

Typical Phase II Trials
  • One stage design Using the Fishers exact test
    to reject null
  • Two stage design First stage to have N1
    patients. If not enough responses, stop the
    trial Otherwise, continue to full N (gt N1 )
    patients evaluate treatment response based on
    the number of responses
  • The choice of N and N1 according prespecified
    type I and II errors.

Phase II Trial Designs
  • Single sample (1 stage)
  • Multiple stage design
  • Gehan (2 stage), Fleming,
  • Simons Optimal, MiniMax
  • Bayesian
  • Multiple Outcomes Measures
  • Interim Analyses
  • Stop for toxicity or lack of activity
  • Not rejecting null hypothesis.

Phase II Trial Designs
  • Randomized Phase II (2 samples)
  • Reduce bias by randomizing pts.
  • Concurrent accrual/Comparative
  • Control/Selection
  • Randomized discontinuation
  • Interim Analysis
  • Stop for toxicity or lack of activity
  • Not rejecting null hypothesis
  • Adaptive

Simons Optimal 2-Stage Design
  • P00.6 vs P10.75

? 0.10 ? 0.10 E(Np0) 48 PET(p0) 0.65
Characteristics of Phase III Trials
  • Use phase 2 data to decide what to test in phase
  • Randomize between E and S, usually multi-center
  • Typically based on T survival time or DFS time
  • The scientific standard for deciding if E is
Write a Comment
User Comments (0)
About PowerShow.com