Title: Session 5: The design of RCTs of treatments for HIV infection
1Session 5 The design of RCTs of treatments for
HIV infection
A beginners guide to some of the methodological
and statistical issues in HIV research
- Caroline Sabin
- Reader in Medical Statistics and Epidemiology
- Department of Primary Care and Population
Sciences, RFUCMS
2What is a clinical trial?
Any form of planned experiment which involves
patients and is designed to find the most
appropriate treatment for a particular medical
condition
3Types of trials (clinical)
Phase I studies Focus on safety rather than
efficacy Dose-escalation studies, studies of
drug metabolism and bioavailability Usually
based on small numbers of subjects, often healthy
volunteers Phase II studies Initial
investigation for clinical effect. Small-scale
studies into effectiveness and safety of drug.
Phase III studies Full-scale treatment
evaluation. Comparison to standard therapy (if
one exists) or placebo Phase IV trials
Post-marketing surveillance. Monitoring for
adverse effects. Long-term studies of morbidity
and mortality. Promotion exercises
4Topics already covered in first session
- Control groups
- Randomisation
- Blinding
- Parallel vs. cross-over trials
- The limitations of RCTs
5Topics to be covered today
- Why do we need a control group?
- Why do we need randomisation?
- The protocol
- Defining endpoints (primary and secondary
endpoints, clinical vs surrogate endpoints) - How to deal with protocol violations (patients
who drop out of the study and missing data) - Approaches to analysis (ITT, as treated)
- Subgroup and interim analyses
6Why do we need a control group
- Early medical developments were usually so great
that controls werent always needed (eg. trials
of anaesthetics, first trials of antibiotics
etc.) - However, most developments these days are more
modest and some form of control group is now
essential
7Why do we need a control group
- Silverman, 1985 Epidemic of retrolental
fibroplasia in babies - Uncontrolled trials suggested that treatment with
adrenocorticotrophic hormone had a 75 success
rate - After controlled trials were finally carried out,
it was found that 75 of infants return to normal
without treatment - Identification of true cause of epidemic (oxygen
to premature babies) was delayed
8Why do we need a control group
- Uncontrolled trials may give a distorted view of
a new therapy - Patients may improve over time, even without
treatment thus, any improvement cannot
necessarily be attributed to treatment - Patients selected for treatment may be less
seriously ill than those not selected for
treatment which may overestimate the benefits of
new therapy - Patients in clinical trials generally do better
than patients on same treatment who are not in
trials
9Which control group? - the use of historical or
non-randomised controls
Characteristics of patients
- Controls less likely to have clearly defined
criteria for inclusion/exclusion - May have been a change in the type of patient
eligible for treatment, or prognosis may have
changed over time - Investigator may have been more restrictive in
choice of patients for the trial, then when
treating patients in the past
10Which control group? - the use of historical or
non-randomised controls
Experimental environment
- Quality of recorded data may not be as good
- Definitions of response may differ between groups
(eg. viral load endpoints) - Ancillary care may improve in a trial (eg.
adherence support, support for toxicities etc.)
Thus, treatment and control groups may differ
with respect to many features other than
treatment, and so we cannot attribute any
difference in outcome to the new treatment
11What is randomisation?
- Allocation of patients to treatments is
determined by chance - Randomised trials provide most efficient trial
design (ie. they are the most powerful) as they
ensure that any factors that may affect outcome
will be distributed equally between the treatment
groups - This allows any difference in treatment response
to be attributed to the treatment - Removes impact of known confounding factors as
well as unknown ones
12Why do trials need to be randomised
- Non-randomised trials have the potential to be
seriously biased - If there are systematic differences between the
patients in the treatment groups at the outset of
the trial, then any differences in treatment
response cannot necessarily be attributed to the
new treatment - Eg. treatment comparisons in cohort studies
13When can a randomised trial be done
New treatment better than standard
New treatment worse than standard
Equipoise
- Who should have equipoise?
- The doctors recruiting patients
- The patients entering the trial
- (is this true in reality?)
14Other benefits of randomisation
- Helps with blinding of trial (see later)
- Prevents any conscious or subconscious selection
bias, whereby doctor tends to put more (or less)
severely ill patients in a particular treatment
group - Beware of any approach to randomisation whereby
clinicians may be able to establish treatment
allocation prior to entry to the trial (eg.
systematic allocation by date of birth, alternate
allocation)
15Selection of patients for a trial
Discuss trial with patient and assess eligibility
Obtain informed consent
Formally enter patient into trial
Randomise
16Other benefits of randomisation (cont.)
Example Trial of anticoagulant therapy (Wright
1948) Patients admitted on odd days
anticoagulants Patients admitted on even days
placebo Anticoagulant therapy n589 Placebo
n442
17The protocol
The workshop manual for the trial. Will
contain many or all of the following
- Background, aims and objectives
- Trial design
- Patient selection inclusion/exclusion criteria
- Treatment schedules
- Monitoring
- Registration, randomisation and blinding
- Methods of patient evaluation
- Patient consent
- Size of study
- Plans for dealing with protocol deviations
- Plans for statistical analysis
- Ethical approval and administrative matters
18Selection of patients for a trial
- A trial should have explicit inclusion criteria
and exclusion criteria precise definitions of
who can be included in the study - Patients should be broadly representative of some
future group of patients to whom the trial may be
applied - BUT patients in trials are not necessarily a
random selection of all HIVve individuals
(unlikely to be the case)
19Evaluation of response the primary endpoint
- In any trial we need to define (preferably) a
single primary endpoint that captures the key
effects of treatment on the patient - Primary endpoint is usually related to efficacy
- If results from different endpoints are
inconsistent, the primary endpoint will be the
one on which any decisions about the value of the
drug will be mainly based
20Evaluation of response secondary endpoints
- In addition to the primary endpoint, we may also
define any number of secondary endpoints - These are often related to toxicity or quality of
life, or may be other measures of efficacy not
captured by the primary endpoint
21Definitions of endpoints example
Abacavir substitution for nucleoside analogs in
patients with HIV lipoatrophy. Carr A et al.
JAMA (2002) 288 207-215.
Primary endpoint Mean change in limb fat mass
measured by DXA at week 24 Secondary
endpoints Adverse events Anthropometry Total and
central fat mass Biochemical, lipid, and glycemic
measurements Viral load CD4 count Quality of life
22Defining an endpoint
- In most trials patients are monitored very
regularly (eg. every 4 weeks after randomisation - Tempting to compare treatments at each time point
- however, this is not advisable because of
problems with multiple testing and the fact that
the tests are not independent - Thus, must select a single time point for
assessment of the primary endpoint (eg. 24 weeks
or 48 weeks) - Treatments should be formally compared at that
timepoint only
23Definitions of endpoints example
Abacavir substitution for nucleoside analogs in
patients with HIV lipoatrophy. Carr A et al.
JAMA (2002) 288 207-215.
Primary endpoint Mean change in limb fat mass
measured by DXA at week 24 Secondary
endpoints Adverse events Anthropometry Total and
central fat mass Biochemical, lipid, and glycemic
measurements Viral load CD4 count Quality of life
24Clinical vs. surrogate endpoints
- We are usually most interested in the effect of a
new treatment on a clinical outcome (eg. new AIDS
events or death) - However, currently, trials of HAART that use
clinical endpoints generally have to be extremely
large and follow patients for very long periods
of time in order to have sufficient power to
detect a difference between treatment regimens - Thus, we often consider the effect of the
treatment regimen on a surrogate endpoint (eg.
change in CD4, HIV RNA etc.)
25Surrogate endpoints
A laboratory measurement or a physical sign used
as a substitute for a clinically meaningful
endpoint that measures directly how a patient
feels, functions or survives.
Temple RJ. A regulatory authoritys opinion
about surrogate endpoints. In Nimmo WS, Tucker
GT, eds. Clinical measurement in drug
evaluation. New York, NY John Wiley Sons Inc.
1995.
26Surrogate endpoints (cont.)
In order for a laboratory marker to be a good
surrogate endpoint for a clinical outcome, it has
to fulfill two criteria
- Surrogate must be on the causal pathway of the
disease process - Entire effect of the intervention on clinical
outcome should be captured by changes in the
surrogate
Changes in surrogate
Improved clinical outcome
Treatment
27Surrogate endpoints (cont.)
- Pre-HAART, CD4 count was established as reliable
surrogate endpoint for AIDS/death - Most trials now use HIV RNA as a surrogate
endpoint (eg. viral load lt50 copies/ml) - BUT not all of the effect of the treatment (eg.
toxicities) may act through changes in the CD4
count or HIV RNA level - Many combinations have similar virological
efficacy other outcomes may now be more
important
28Definitions of endpoints example
Abacavir substitution for nucleoside analogs in
patients with HIV lipoatrophy. Carr A et al.
JAMA (2002) 288 207-215.
Primary endpoint Mean change in limb fat mass
measured by DXA at week 24 Secondary
endpoints Adverse events Anthropometry Total and
central fat mass Biochemical, lipid, and glycemic
measurements Viral load CD4 count Quality of life
29Protocol violations
For a number of reasons, patients included and
randomised in the trial may not behave as
stated in the protocol
- Ineligible patients may be recruited by mistake
- Non-adherent may forget to take some or all of
their drugs, may not attend for follow-up visits,
may take alternative treatments - Patient withdrawals not able to tolerate drugs,
may switch treatments
QUESTION how should these be dealt with in any
analysis?
30Analysis by Intention-to-treat (ITT)
All patients randomised to treatment should be
included in the analysis in the groups to which
they were randomised
31Analysis by Intention-to-treat (ITT)
- Provides a measure of the real-life effect of
treatment - Is the only unbiased estimate of the treatments
effect - Most major journals require analysis by ITT
- All presentations should include analysis by ITT
as the primary analysis unless there is a strong
justification for not doing this
32On-treatment analyses
Only include those patients who complete a full
course of treatment to which they were randomised
33On-treatment analyses
- Suggested that this shows the optimal effect of
treatment when taken as recommended - However, has potential to provide extremely
biased estimates of treatment effect as those
with the worse responses to treatment are likely
to be the ones who drop-out/switch treatments - Approach will give an overly positive estimate of
effect of new treatment
34On-treatment analyses - example
RCT with primary endpoint of virological failure
at week 48. Patients are allowed to switch
therapy once failure has occurred.
Viral load gt 50 copies/ml
CHANGED TREATMENT
Viral load lt 50 copies/ml
CHANGED TREATMENT
CHANGED TREATMENT
CHANGED TREATMENT
35On-treatment analyses - example
RCT with primary endpoint of virological failure
at week 48. Patients are allowed to switch
therapy once failure has occurred.
Viral load gt 50 copies/ml
CHANGED TREATMENT
Viral load lt 50 copies/ml
CHANGED TREATMENT
Primary endpoint at week 48 1/1 (100)
CHANGED TREATMENT
CHANGED TREATMENT
36On-treatment analyses
- Those remaining on randomised treatment at 48
weeks will, by definition, be those who have not
experienced virological failure - Anyone with virological failure prior to week 48
will change treatment and will be excluded from
the denominator - Primary event rate will always be close to 100
(depending on how quickly treatments are changed
after virological failure) - FOR THIS REASON, ON-TREATMENT ANALYSES SHOULD NOT
BE USED FOR THE PRIMARY ANALYSIS OF A TRIAL
37Problems when analysing by ITT with surrogate
endpoints
- If patients are lost-to-follow-up or drop out of
a trial, they are unlikely to attend for
follow-up visits and blood tests - Whilst it may be possible to obtain information
on clinical endpoints from other sources,
information on CD4 counts or HIV RNA levels may
be unavailable - Where data are missing, it is difficult to run a
ITT analysis in which all patients are included
in the analysis
38Alternative methods of ITT analyses
Where data on surrogate markers are missing, a
number of alternative strategies have been
proposed
- ITT MissingFailure (ITT MF)
- All missing values are treated as failures in
the analysis irrespective of most recent value
ensures that all patients are included in the
denominator. If anything, this gives the most
pessimistic view of the new treatment.
39Alternative methods of ITT analyses
Where data on surrogate markers are missing, a
number of alternative strategies have been
proposed
- ITT last observation carried forward (LOCF)
- The last available measurement for each person
is used in the analysis (irrespective of how long
before the endpoint it was measured). This is an
ITT analysis as all patients are included in the
denominator but it is not favoured by regulatory
bodies (eg. FDA)
40Alternative methods of ITT analyses
Where data on surrogate markers are missing, a
number of alternative strategies have been
proposed
- ITT missingexcluded
- All patients with missing surrogate values are
excluded from the analyses this is NOT an ITT
analysis as the denominator does not include all
patients recruited to the trial. Essentially
this is an on-treatment analysis
41Examples of different approaches
Primary endpoint
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
42Responder On treatment/ ITT missing excluded
Examples of different approaches
Primary endpoint
1 1 1 - - 0 - - 1 - 0 1 - - 1 - 1 - 1 0
Response rate 8/11 73
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
43Responder ITT missing failure
Examples of different approaches
Primary endpoint
1 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0
Response rate 8/20 40
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
44Responder ITT missing LOCF
Examples of different approaches
Primary endpoint
1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0
Response rate 11/20 55
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
45Examples of different approaches - summary
Approach Response rate On treatment/ITT
missingexcluded 73 ITT missing
failure 40 ITT missing LOCF 55
46Subgroup analyses
- It is often tempting to consider the effect of
the treatment regimen in a number of subgroups of
the analyses - For example, consider the effect of the regimen
in the following groups - - Males/females
- - Low/high viral load at baseline
- - Low/high CD4 count at baseline
- - ARV-naïve/ARV-experienced at start of trial
47Subgroup analyses
- There are a number of dangers inherent in
performing too many subgroup analyses - The increased number of tests being performed
means that there are problems of multiple testing
(ie. some of these comparisons are likely to be
significant due to chance) - Although the study will have sufficient power to
detect a difference, the subgroups will often be
based on a much smaller sample size and so will
not be sufficiently powered
48Subgroup analyses example 1
Although the difference between regimens A and B
is similar in women as it is in men, it is not
significant due to the small number of women in
the study This does not provide evidence that
there is no benefit of regimen B in women
49Subgroup analyses example 2
Although regimen B now looks better in females
than males, a formal test of the interaction
between sex and treatment group (p0.11),
suggests that these results are likely to have
arisen by chance
50Subgroup analyses
- In any trial analysis, if subgroup analyses are
thought to be important then they should be
specified a priori in the protocol - The study should be sufficiently large that these
subgroup analyses will be large enough to detect
important differences - Evidence of a subgroup effect should never be
based on a comparison of p-values in the
individual subgroups, but should be based on
formal tests of interaction between the factors
of interest
51Interim analyses
- In any trial there is always a concern that one
of the treatment arms may be inferior in some way
to the others (eg. one regimen may be far more
efficacious or may be associated with a greater
rate of serious toxicity than the others) - If so, it may be considered to be ethically
unsound to continue to place patients at risk of
the serious toxicity or of treatment failure - As a result, one or more interim analyses may be
planned at pre-specified time points to monitor
the progress of the trial
52Interim analyses
- However, there is always the chance that initial
findings, particularly on small numbers of
patients, may have arisen by chance - If the trial is allowed to continue to
completion, these trends may disappear - Have to be very careful about stopping the trial
early based on results of interim analyses - If interim analyses are to be performed, then it
is usually recommended that the trial is only
stopped if evidence for a difference between the
arms is very strong (eg. plt0.0001)
53Interim analyses the role of a DSMB
- Often a Data Safety and Monitoring Board (DSMB)
may be convened - Will include a number of independent experts in
the area, usually including a statistician - The DSMB will evaluate safety data on a regular
basis (this information will not usually be
blinded) and will report back to the trial
Steering Committee - The DSMB may recommend that a trial be stopped
early if necessary
54Interim analyses (cont)
- If interim results suggest superiority of one of
the arms, but DSMB do not recommend stopping the
trial, presentation of results could hinder the
successful completion of the trial
- Patients already randomised may switch to
superior arm, resulting in high levels of
drop-out - New patients will not wish to be randomised to
the inferior arm
55Interim analyses (cont)
- If data from interim analyses are to be released,
it is important that either blinding is
maintained, or results are not presented
separately for the groups - In some cases, even blinded or combined data may
give an indication of the effect of the new
drug/combination (e.g. in a placebo controlled
trial) - If this is the case, then no results concerning
the primary endpoint of the trial (even blinded
or combined) should be presented
56Tests of superiority
- In a standard trial we usually test the null
hypothesis that there is no difference between
the treatment arms, against an alternative
hypothesis that there is a difference between
treatment arms - Note that no direction is specified for this
difference (ie. drug A could be worse or better
than drug B) - This is known as a test of superiority, even
though we dont specify which drug is superior
57Tests of equivalence
- Sometimes, however, we may not want to test
whether one drug is better than an another, but
may simply want to show that the two drugs are
equivalent - This is usually the case when a new drug appears
to have similar efficacy but may have a better
toxicity profile, be easier to take, or is
cheaper - Designing a study to show equivalence requires a
different emphasis to a study of superiority
58Tests of superiority the effect of increasing
the sample size
A more effective
Difference in percentage
A less effective
Non-significant difference, but huge uncertainty
Similar difference but significant, due to
increased power of study
59Tests of equivalence (cont.)
- In a test of equivalence we focus more strongly
on the confidence interval for the treatment
effect - The confidence interval around the treatment
effect must be narrow to exclude even a moderate
difference - In order to do this, we usually require a much
larger sample size than we would need to show
that one is superior to the other - Have to decide a priori on what can be deemed as
equivalent
60Example difference in percentage undetectable
at 24 weeks with confidence interval (regimen A
vs regimen B)
EQUIVALENCE RANGE
Difference in percentage
Non-significant difference, but huge uncertainty
Non-significant difference but less uncertainty
61Testing for equivalence (cont.)
- Need to specify the maximum amount by which it is
thought that the two treatments could differ even
when thought to be equivalent - If the lower (or upper) limit of the CI of the
treatment effect does not exceed this value, then
the two drugs are deemed equivalent - Sample size is chosen to ensure that the
confidence interval around the treatment effect
is narrow - Usually requires approximately twice as large a
sample as a test of non-equivalence
62Tests of non-inferiority
- Conceptually similar to tests of equivalence
- New drug may be expected to be slightly inferior
to standard but at the same time offers other
benefits (eg. easier to take, less toxicities) - Need to show that the effect of the new treatment
is not below some pre-stated non-inferiority
margin - Confidence intervals again need to be narrow, and
sample sizes may be larger than in a superiority
trial
63Example difference in percentage undetectable
at 24 weeks with confidence interval (regimen A
vs regimen B)
A more effective
Difference in percentage
DRUG CONSIDERED NON-INFERIOR
DRUG CONSIDERED INFERIOR
A less effective
64Group work
65Session 6 Critically appraising research
66Why do we need to appraise research
- Many people, particularly pharma companies, have
vested interests in some pieces of research - Although it is unlikely that anyone would
deliberately falsify research for their own
interests, the way in which results are presented
may be misleading - Even if a study is perfectly carried out and
presented appropriately, the results may not be
applicable to your situation - Thus, we need to consider any piece of research
carefully before acting on its recommendations
67The peer-review process journal articles
- Most major journals use a process known as
peer-review - Each submitted article is usually sent to two or
more experts in the field so that they may give
their opinion on the design and conduct of the
study, the analytical methods used and importance
of the results - On the basis of their reports, journals may
either reject the article, ask for a resubmission
with changes, or accept the article
68Problems with the peer-review process
- Hard to make the process truly blind therefore
personal biases may be introduced - May not be sent to someone who fully understands
or knows the area - Relies largely on goodwill of reviewers (payment,
if any, is minimal) and some put more time and
effort into the task than others - Thus, peer review system isnt perfect and poor
papers may be published - Difficult to improve the system though - having
some system in place is better than none at all!
69What is and isnt published
- Large studies are more likely to be published
than small ones, irrespective of study quality - If the study is small then those studies that
show a significant result are more likely to be
published (publication bias) - There is a perception that if you are part of an
established group with a known track record, it
is easier to get things published not sure
whether this is backed up by evidence!
70Peer-review of other material
- Very limited
- Conference presentations are often selected on
basis of peer review of abstract only - However, abstract is often very short, very vague
and may not contain all the information required
to make a valid decision on its quality - Conference abstracts are also selected to fit in
with the programme and planned sessions
71The main questions when appraising research
Do I believe the results?
72The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
73The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
YES
Are the results applicable to me or to other
people in a similar situation?
74The main questions when appraising research
Do I believe the results?
75Do I believe the results are the results valid?
(RCTs)
- Was the assignment of patients to treatment
groups randomised?
- How was the assignment list produced?
- How was the assignment list concealed from the
doctors?
76Do I believe the results are the results valid?
(RCTs)
- Were all the patients who entered properly
accounted for?
- How complete was the follow-up?
- How did the authors deal with patients who did
not receive assigned treatment or who deviated
from the protocol? - Was an ITT analysis performed?
77Do I believe the results are the results valid?
(RCTs)
- To what extent was blinding carried out?
- Patients?
- Doctors?
- Other study personnel?
78Do I believe the results are the results valid?
(RCTs)
- How similar were groups at start of trial?
79Do I believe the results are the results valid?
(RCTs)
- Aside from the experimental intervention, were
the two groups treated equally?
80Assessing validity some words of caution!
- Remember that it is very easy to criticise a
paper, but it is not always as easy to carry out
the research in the first place - It is very difficult to write the perfect paper
(someone, somewhere will always find something
wrong with it) - You have to decide if, putting all your
criticisms together, they are enough to make you
seriously doubt the validity of the findings
81The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
82Are the results important what are the results?
(RCTs)
- How large was the treatment effect?
83Are the results important what are the results?
(RCTs)
- How large was the treatment effect?
- Are the results clinically significant? -
Relative risk? - Difference in risks? - Number
needed to treat? (an estimate of the number of
patients who need to receive the treatment in
order to prevent one bad event)
84Are the results important what are the results?
(RCTs)
- How precise was the treatment effect?
85Are the results important what are the results?
(RCTs)
- How precise was the treatment effect?
- How wide was the confidence interval? - What
interpretation do you make of the confidence
interval?
86The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
YES
Are the results applicable to me or to other
people in a similar situation?
87Are the results applicable to me? (RCTs)
- Were patients in the trial similar to my own
situation? - Did the authors consider all clinically important
outcomes? - Are the likely benefits of the new treatment
worth the potential harms/costs?
88Critically appraising cohort studies
General points are the same as for RCTs, although
clearly the issues of randomisation, blinding
etc. are not appropriate
89Critically appraising cohort studies
How representative is the cohort Who is
included in the cohort Who is excluded? Are there
any differences between those included and
excluded that could limit the generalisability of
the findings?
90Critically appraising cohort studies
Follow-up How is this maintained? How many
patients are lost to follow-up? How are these
patients dealt with in the analysis?
91Critically appraising cohort studies
Temporal changes If the authors are considering
changes over time (or some treatment whose use
may change over time) then has anything else
changed which could explain the findings?
92Critically appraising cohort studies
Possible bias Have all other factors that could
explain any differences been considered?
93Group work