Session 5: The design of RCTs of treatments for HIV infection

About This Presentation

Title:

Session 5: The design of RCTs of treatments for HIV infection

Description:

Reader in Medical Statistics and Epidemiology. Department of Primary Care and ... Equipoise' Who should have equipoise? The doctors recruiting patients ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 94

Provided by: carolin69

Category:

more less

Transcript and Presenter's Notes

Title: Session 5: The design of RCTs of treatments for HIV infection

1
Session 5 The design of RCTs of treatments for
HIV infection
A beginners guide to some of the methodological
and statistical issues in HIV research

Caroline Sabin
Reader in Medical Statistics and Epidemiology
Department of Primary Care and Population
Sciences, RFUCMS

2
What is a clinical trial?
Any form of planned experiment which involves
patients and is designed to find the most
appropriate treatment for a particular medical
condition
3
Types of trials (clinical)
Phase I studies Focus on safety rather than
efficacy Dose-escalation studies, studies of
drug metabolism and bioavailability Usually
based on small numbers of subjects, often healthy
volunteers Phase II studies Initial
investigation for clinical effect. Small-scale
studies into effectiveness and safety of drug.
Phase III studies Full-scale treatment
evaluation. Comparison to standard therapy (if
one exists) or placebo Phase IV trials
Post-marketing surveillance. Monitoring for
adverse effects. Long-term studies of morbidity
and mortality. Promotion exercises
4
Topics already covered in first session

Control groups
Randomisation
Blinding
Parallel vs. cross-over trials
The limitations of RCTs

5
Topics to be covered today

Why do we need a control group?
Why do we need randomisation?
The protocol
Defining endpoints (primary and secondary
endpoints, clinical vs surrogate endpoints)
How to deal with protocol violations (patients
who drop out of the study and missing data)
Approaches to analysis (ITT, as treated)
Subgroup and interim analyses

6
Why do we need a control group

Early medical developments were usually so great
that controls werent always needed (eg. trials
of anaesthetics, first trials of antibiotics
etc.)
However, most developments these days are more
modest and some form of control group is now
essential

7
Why do we need a control group

Silverman, 1985 Epidemic of retrolental
fibroplasia in babies
Uncontrolled trials suggested that treatment with
adrenocorticotrophic hormone had a 75 success
rate
After controlled trials were finally carried out,
it was found that 75 of infants return to normal
without treatment
Identification of true cause of epidemic (oxygen
to premature babies) was delayed

8
Why do we need a control group

Uncontrolled trials may give a distorted view of
a new therapy
Patients may improve over time, even without
treatment thus, any improvement cannot
necessarily be attributed to treatment
Patients selected for treatment may be less
seriously ill than those not selected for
treatment which may overestimate the benefits of
new therapy
Patients in clinical trials generally do better
than patients on same treatment who are not in
trials

9
Which control group? - the use of historical or
non-randomised controls
Characteristics of patients

Controls less likely to have clearly defined
criteria for inclusion/exclusion
May have been a change in the type of patient
eligible for treatment, or prognosis may have
changed over time
Investigator may have been more restrictive in
choice of patients for the trial, then when
treating patients in the past

10
Which control group? - the use of historical or
non-randomised controls
Experimental environment

Quality of recorded data may not be as good
Definitions of response may differ between groups
(eg. viral load endpoints)
Ancillary care may improve in a trial (eg.
adherence support, support for toxicities etc.)

Thus, treatment and control groups may differ
with respect to many features other than
treatment, and so we cannot attribute any
difference in outcome to the new treatment
11
What is randomisation?

Allocation of patients to treatments is
determined by chance
Randomised trials provide most efficient trial
design (ie. they are the most powerful) as they
ensure that any factors that may affect outcome
will be distributed equally between the treatment
groups
This allows any difference in treatment response
to be attributed to the treatment
Removes impact of known confounding factors as
well as unknown ones

12
Why do trials need to be randomised

Non-randomised trials have the potential to be
seriously biased
If there are systematic differences between the
patients in the treatment groups at the outset of
the trial, then any differences in treatment
response cannot necessarily be attributed to the
new treatment
Eg. treatment comparisons in cohort studies

13
When can a randomised trial be done
New treatment better than standard
New treatment worse than standard
Equipoise

Who should have equipoise?
The doctors recruiting patients
The patients entering the trial
(is this true in reality?)

14
Other benefits of randomisation

Helps with blinding of trial (see later)
Prevents any conscious or subconscious selection
bias, whereby doctor tends to put more (or less)
severely ill patients in a particular treatment
group
Beware of any approach to randomisation whereby
clinicians may be able to establish treatment
allocation prior to entry to the trial (eg.
systematic allocation by date of birth, alternate
allocation)

15
Selection of patients for a trial
Discuss trial with patient and assess eligibility
Obtain informed consent
Formally enter patient into trial
Randomise
16
Other benefits of randomisation (cont.)
Example Trial of anticoagulant therapy (Wright
1948) Patients admitted on odd days
anticoagulants Patients admitted on even days
placebo Anticoagulant therapy n589 Placebo
n442
17
The protocol
The workshop manual for the trial. Will
contain many or all of the following

Background, aims and objectives
Trial design
Patient selection inclusion/exclusion criteria
Treatment schedules
Monitoring
Registration, randomisation and blinding
Methods of patient evaluation
Patient consent
Size of study
Plans for dealing with protocol deviations
Plans for statistical analysis
Ethical approval and administrative matters

18
Selection of patients for a trial

A trial should have explicit inclusion criteria
and exclusion criteria precise definitions of
who can be included in the study
Patients should be broadly representative of some
future group of patients to whom the trial may be
applied
BUT patients in trials are not necessarily a
random selection of all HIVve individuals
(unlikely to be the case)

19
Evaluation of response the primary endpoint

In any trial we need to define (preferably) a
single primary endpoint that captures the key
effects of treatment on the patient
Primary endpoint is usually related to efficacy
If results from different endpoints are
inconsistent, the primary endpoint will be the
one on which any decisions about the value of the
drug will be mainly based

20
Evaluation of response secondary endpoints

In addition to the primary endpoint, we may also
define any number of secondary endpoints
These are often related to toxicity or quality of
life, or may be other measures of efficacy not
captured by the primary endpoint

21
Definitions of endpoints example
Abacavir substitution for nucleoside analogs in
patients with HIV lipoatrophy. Carr A et al.
JAMA (2002) 288 207-215.
Primary endpoint Mean change in limb fat mass
measured by DXA at week 24 Secondary
endpoints Adverse events Anthropometry Total and
central fat mass Biochemical, lipid, and glycemic
measurements Viral load CD4 count Quality of life
22
Defining an endpoint

In most trials patients are monitored very
regularly (eg. every 4 weeks after randomisation
Tempting to compare treatments at each time point
- however, this is not advisable because of
problems with multiple testing and the fact that
the tests are not independent
Thus, must select a single time point for
assessment of the primary endpoint (eg. 24 weeks
or 48 weeks)
Treatments should be formally compared at that
timepoint only

23
Definitions of endpoints example
Abacavir substitution for nucleoside analogs in
patients with HIV lipoatrophy. Carr A et al.
JAMA (2002) 288 207-215.
Primary endpoint Mean change in limb fat mass
measured by DXA at week 24 Secondary
endpoints Adverse events Anthropometry Total and
central fat mass Biochemical, lipid, and glycemic
measurements Viral load CD4 count Quality of life
24
Clinical vs. surrogate endpoints

We are usually most interested in the effect of a
new treatment on a clinical outcome (eg. new AIDS
events or death)
However, currently, trials of HAART that use
clinical endpoints generally have to be extremely
large and follow patients for very long periods
of time in order to have sufficient power to
detect a difference between treatment regimens
Thus, we often consider the effect of the
treatment regimen on a surrogate endpoint (eg.
change in CD4, HIV RNA etc.)

25
Surrogate endpoints
A laboratory measurement or a physical sign used
as a substitute for a clinically meaningful
endpoint that measures directly how a patient
feels, functions or survives.
Temple RJ. A regulatory authoritys opinion
about surrogate endpoints. In Nimmo WS, Tucker
GT, eds. Clinical measurement in drug
evaluation. New York, NY John Wiley Sons Inc.
1995.
26
Surrogate endpoints (cont.)
In order for a laboratory marker to be a good
surrogate endpoint for a clinical outcome, it has
to fulfill two criteria

Surrogate must be on the causal pathway of the
disease process
Entire effect of the intervention on clinical
outcome should be captured by changes in the
surrogate

Changes in surrogate
Improved clinical outcome
Treatment
27
Surrogate endpoints (cont.)

Pre-HAART, CD4 count was established as reliable
surrogate endpoint for AIDS/death
Most trials now use HIV RNA as a surrogate
endpoint (eg. viral load lt50 copies/ml)
BUT not all of the effect of the treatment (eg.
toxicities) may act through changes in the CD4
count or HIV RNA level
Many combinations have similar virological
efficacy other outcomes may now be more
important

28
Definitions of endpoints example
Abacavir substitution for nucleoside analogs in
patients with HIV lipoatrophy. Carr A et al.
JAMA (2002) 288 207-215.
Primary endpoint Mean change in limb fat mass
measured by DXA at week 24 Secondary
endpoints Adverse events Anthropometry Total and
central fat mass Biochemical, lipid, and glycemic
measurements Viral load CD4 count Quality of life
29
Protocol violations
For a number of reasons, patients included and
randomised in the trial may not behave as
stated in the protocol

Ineligible patients may be recruited by mistake
Non-adherent may forget to take some or all of
their drugs, may not attend for follow-up visits,
may take alternative treatments
Patient withdrawals not able to tolerate drugs,
may switch treatments

QUESTION how should these be dealt with in any
analysis?
30
Analysis by Intention-to-treat (ITT)
All patients randomised to treatment should be
included in the analysis in the groups to which
they were randomised
31
Analysis by Intention-to-treat (ITT)

Provides a measure of the real-life effect of
treatment
Is the only unbiased estimate of the treatments
effect
Most major journals require analysis by ITT
All presentations should include analysis by ITT
as the primary analysis unless there is a strong
justification for not doing this

32
On-treatment analyses
Only include those patients who complete a full
course of treatment to which they were randomised
33
On-treatment analyses

Suggested that this shows the optimal effect of
treatment when taken as recommended
However, has potential to provide extremely
biased estimates of treatment effect as those
with the worse responses to treatment are likely
to be the ones who drop-out/switch treatments
Approach will give an overly positive estimate of
effect of new treatment

34
On-treatment analyses - example
RCT with primary endpoint of virological failure
at week 48. Patients are allowed to switch
therapy once failure has occurred.
Viral load gt 50 copies/ml
CHANGED TREATMENT
Viral load lt 50 copies/ml
CHANGED TREATMENT
CHANGED TREATMENT
CHANGED TREATMENT
35
On-treatment analyses - example
RCT with primary endpoint of virological failure
at week 48. Patients are allowed to switch
therapy once failure has occurred.
Viral load gt 50 copies/ml
CHANGED TREATMENT
Viral load lt 50 copies/ml
CHANGED TREATMENT
Primary endpoint at week 48 1/1 (100)
CHANGED TREATMENT
CHANGED TREATMENT
36
On-treatment analyses

Those remaining on randomised treatment at 48
weeks will, by definition, be those who have not
experienced virological failure
Anyone with virological failure prior to week 48
will change treatment and will be excluded from
the denominator
Primary event rate will always be close to 100
(depending on how quickly treatments are changed
after virological failure)
FOR THIS REASON, ON-TREATMENT ANALYSES SHOULD NOT
BE USED FOR THE PRIMARY ANALYSIS OF A TRIAL

37
Problems when analysing by ITT with surrogate
endpoints

If patients are lost-to-follow-up or drop out of
a trial, they are unlikely to attend for
follow-up visits and blood tests
Whilst it may be possible to obtain information
on clinical endpoints from other sources,
information on CD4 counts or HIV RNA levels may
be unavailable
Where data are missing, it is difficult to run a
ITT analysis in which all patients are included
in the analysis

38
Alternative methods of ITT analyses
Where data on surrogate markers are missing, a
number of alternative strategies have been
proposed

ITT MissingFailure (ITT MF)
All missing values are treated as failures in
the analysis irrespective of most recent value
ensures that all patients are included in the
denominator. If anything, this gives the most
pessimistic view of the new treatment.

39
Alternative methods of ITT analyses
Where data on surrogate markers are missing, a
number of alternative strategies have been
proposed

ITT last observation carried forward (LOCF)
The last available measurement for each person
is used in the analysis (irrespective of how long
before the endpoint it was measured). This is an
ITT analysis as all patients are included in the
denominator but it is not favoured by regulatory
bodies (eg. FDA)

40
Alternative methods of ITT analyses
Where data on surrogate markers are missing, a
number of alternative strategies have been
proposed

ITT missingexcluded
All patients with missing surrogate values are
excluded from the analyses this is NOT an ITT
analysis as the denominator does not include all
patients recruited to the trial. Essentially
this is an on-treatment analysis

41
Examples of different approaches
Primary endpoint
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
42
Responder On treatment/ ITT missing excluded
Examples of different approaches
Primary endpoint
1 1 1 - - 0 - - 1 - 0 1 - - 1 - 1 - 1 0
Response rate 8/11 73
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
43
Responder ITT missing failure
Examples of different approaches
Primary endpoint
1 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0
Response rate 8/20 40
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
44
Responder ITT missing LOCF
Examples of different approaches
Primary endpoint
1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0
Response rate 11/20 55
Viral load gt 50 copies/ml
Viral load lt 50 copies/ml
45
Examples of different approaches - summary
Approach Response rate On treatment/ITT
missingexcluded 73 ITT missing
failure 40 ITT missing LOCF 55
46
Subgroup analyses

It is often tempting to consider the effect of
the treatment regimen in a number of subgroups of
the analyses
For example, consider the effect of the regimen
in the following groups
- Males/females
- Low/high viral load at baseline
- Low/high CD4 count at baseline
- ARV-naïve/ARV-experienced at start of trial

47
Subgroup analyses

There are a number of dangers inherent in
performing too many subgroup analyses
The increased number of tests being performed
means that there are problems of multiple testing
(ie. some of these comparisons are likely to be
significant due to chance)
Although the study will have sufficient power to
detect a difference, the subgroups will often be
based on a much smaller sample size and so will
not be sufficiently powered

48
Subgroup analyses example 1
Although the difference between regimens A and B
is similar in women as it is in men, it is not
significant due to the small number of women in
the study This does not provide evidence that
there is no benefit of regimen B in women
49
Subgroup analyses example 2
Although regimen B now looks better in females
than males, a formal test of the interaction
between sex and treatment group (p0.11),
suggests that these results are likely to have
arisen by chance
50
Subgroup analyses

In any trial analysis, if subgroup analyses are
thought to be important then they should be
specified a priori in the protocol
The study should be sufficiently large that these
subgroup analyses will be large enough to detect
important differences
Evidence of a subgroup effect should never be
based on a comparison of p-values in the
individual subgroups, but should be based on
formal tests of interaction between the factors
of interest

51
Interim analyses

In any trial there is always a concern that one
of the treatment arms may be inferior in some way
to the others (eg. one regimen may be far more
efficacious or may be associated with a greater
rate of serious toxicity than the others)
If so, it may be considered to be ethically
unsound to continue to place patients at risk of
the serious toxicity or of treatment failure
As a result, one or more interim analyses may be
planned at pre-specified time points to monitor
the progress of the trial

52
Interim analyses

However, there is always the chance that initial
findings, particularly on small numbers of
patients, may have arisen by chance
If the trial is allowed to continue to
completion, these trends may disappear
Have to be very careful about stopping the trial
early based on results of interim analyses
If interim analyses are to be performed, then it
is usually recommended that the trial is only
stopped if evidence for a difference between the
arms is very strong (eg. plt0.0001)

53
Interim analyses the role of a DSMB

Often a Data Safety and Monitoring Board (DSMB)
may be convened
Will include a number of independent experts in
the area, usually including a statistician
The DSMB will evaluate safety data on a regular
basis (this information will not usually be
blinded) and will report back to the trial
Steering Committee
The DSMB may recommend that a trial be stopped
early if necessary

54
Interim analyses (cont)

If interim results suggest superiority of one of
the arms, but DSMB do not recommend stopping the
trial, presentation of results could hinder the
successful completion of the trial

Patients already randomised may switch to
superior arm, resulting in high levels of
drop-out
New patients will not wish to be randomised to
the inferior arm

55
Interim analyses (cont)

If data from interim analyses are to be released,
it is important that either blinding is
maintained, or results are not presented
separately for the groups
In some cases, even blinded or combined data may
give an indication of the effect of the new
drug/combination (e.g. in a placebo controlled
trial)
If this is the case, then no results concerning
the primary endpoint of the trial (even blinded
or combined) should be presented

56
Tests of superiority

In a standard trial we usually test the null
hypothesis that there is no difference between
the treatment arms, against an alternative
hypothesis that there is a difference between
treatment arms
Note that no direction is specified for this
difference (ie. drug A could be worse or better
than drug B)
This is known as a test of superiority, even
though we dont specify which drug is superior

57
Tests of equivalence

Sometimes, however, we may not want to test
whether one drug is better than an another, but
may simply want to show that the two drugs are
equivalent
This is usually the case when a new drug appears
to have similar efficacy but may have a better
toxicity profile, be easier to take, or is
cheaper
Designing a study to show equivalence requires a
different emphasis to a study of superiority

58
Tests of superiority the effect of increasing
the sample size
A more effective
Difference in percentage
A less effective
Non-significant difference, but huge uncertainty
Similar difference but significant, due to
increased power of study
59
Tests of equivalence (cont.)

In a test of equivalence we focus more strongly
on the confidence interval for the treatment
effect
The confidence interval around the treatment
effect must be narrow to exclude even a moderate
difference
In order to do this, we usually require a much
larger sample size than we would need to show
that one is superior to the other
Have to decide a priori on what can be deemed as
equivalent

60
Example difference in percentage undetectable
at 24 weeks with confidence interval (regimen A
vs regimen B)
EQUIVALENCE RANGE
Difference in percentage
Non-significant difference, but huge uncertainty
Non-significant difference but less uncertainty
61
Testing for equivalence (cont.)

Need to specify the maximum amount by which it is
thought that the two treatments could differ even
when thought to be equivalent
If the lower (or upper) limit of the CI of the
treatment effect does not exceed this value, then
the two drugs are deemed equivalent
Sample size is chosen to ensure that the
confidence interval around the treatment effect
is narrow
Usually requires approximately twice as large a
sample as a test of non-equivalence

62
Tests of non-inferiority

Conceptually similar to tests of equivalence
New drug may be expected to be slightly inferior
to standard but at the same time offers other
benefits (eg. easier to take, less toxicities)
Need to show that the effect of the new treatment
is not below some pre-stated non-inferiority
margin
Confidence intervals again need to be narrow, and
sample sizes may be larger than in a superiority
trial

63
Example difference in percentage undetectable
at 24 weeks with confidence interval (regimen A
vs regimen B)
A more effective
Difference in percentage
DRUG CONSIDERED NON-INFERIOR
DRUG CONSIDERED INFERIOR
A less effective
64
Group work
65
Session 6 Critically appraising research
66
Why do we need to appraise research

Many people, particularly pharma companies, have
vested interests in some pieces of research
Although it is unlikely that anyone would
deliberately falsify research for their own
interests, the way in which results are presented
may be misleading
Even if a study is perfectly carried out and
presented appropriately, the results may not be
applicable to your situation
Thus, we need to consider any piece of research
carefully before acting on its recommendations

67
The peer-review process journal articles

Most major journals use a process known as
peer-review
Each submitted article is usually sent to two or
more experts in the field so that they may give
their opinion on the design and conduct of the
study, the analytical methods used and importance
of the results
On the basis of their reports, journals may
either reject the article, ask for a resubmission
with changes, or accept the article

68
Problems with the peer-review process

Hard to make the process truly blind therefore
personal biases may be introduced
May not be sent to someone who fully understands
or knows the area
Relies largely on goodwill of reviewers (payment,
if any, is minimal) and some put more time and
effort into the task than others
Thus, peer review system isnt perfect and poor
papers may be published
Difficult to improve the system though - having
some system in place is better than none at all!

69
What is and isnt published

Large studies are more likely to be published
than small ones, irrespective of study quality
If the study is small then those studies that
show a significant result are more likely to be
published (publication bias)
There is a perception that if you are part of an
established group with a known track record, it
is easier to get things published not sure
whether this is backed up by evidence!

70
Peer-review of other material

Very limited
Conference presentations are often selected on
basis of peer review of abstract only
However, abstract is often very short, very vague
and may not contain all the information required
to make a valid decision on its quality
Conference abstracts are also selected to fit in
with the programme and planned sessions

71
The main questions when appraising research
Do I believe the results?
72
The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
73
The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
YES
Are the results applicable to me or to other
people in a similar situation?
74
The main questions when appraising research
Do I believe the results?
75
Do I believe the results are the results valid?
(RCTs)

Was the assignment of patients to treatment
groups randomised?

How was the assignment list produced?
How was the assignment list concealed from the
doctors?

76
Do I believe the results are the results valid?
(RCTs)

Were all the patients who entered properly
accounted for?

How complete was the follow-up?
How did the authors deal with patients who did
not receive assigned treatment or who deviated
from the protocol?
Was an ITT analysis performed?

77
Do I believe the results are the results valid?
(RCTs)

To what extent was blinding carried out?

Patients?
Doctors?
Other study personnel?

78
Do I believe the results are the results valid?
(RCTs)

How similar were groups at start of trial?

79
Do I believe the results are the results valid?
(RCTs)

Aside from the experimental intervention, were
the two groups treated equally?

80
Assessing validity some words of caution!

Remember that it is very easy to criticise a
paper, but it is not always as easy to carry out
the research in the first place
It is very difficult to write the perfect paper
(someone, somewhere will always find something
wrong with it)
You have to decide if, putting all your
criticisms together, they are enough to make you
seriously doubt the validity of the findings

81
The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
82
Are the results important what are the results?
(RCTs)

How large was the treatment effect?

83
Are the results important what are the results?
(RCTs)

How large was the treatment effect?

- Are the results clinically significant? -
Relative risk? - Difference in risks? - Number
needed to treat? (an estimate of the number of
patients who need to receive the treatment in
order to prevent one bad event)
84
Are the results important what are the results?
(RCTs)

How precise was the treatment effect?

85
Are the results important what are the results?
(RCTs)

How precise was the treatment effect?

- How wide was the confidence interval? - What
interpretation do you make of the confidence
interval?
86
The main questions when appraising research
Do I believe the results?
YES
Are the results important or new?
YES
Are the results applicable to me or to other
people in a similar situation?
87
Are the results applicable to me? (RCTs)

Were patients in the trial similar to my own
situation?
Did the authors consider all clinically important
outcomes?
Are the likely benefits of the new treatment
worth the potential harms/costs?

88
Critically appraising cohort studies
General points are the same as for RCTs, although
clearly the issues of randomisation, blinding
etc. are not appropriate
89
Critically appraising cohort studies
How representative is the cohort Who is
included in the cohort Who is excluded? Are there
any differences between those included and
excluded that could limit the generalisability of
the findings?
90
Critically appraising cohort studies
Follow-up How is this maintained? How many
patients are lost to follow-up? How are these
patients dealt with in the analysis?
91
Critically appraising cohort studies
Temporal changes If the authors are considering
changes over time (or some treatment whose use
may change over time) then has anything else
changed which could explain the findings?
92
Critically appraising cohort studies
Possible bias Have all other factors that could
explain any differences been considered?
93
Group work

Write a Comment

User Comments (0)