Design and Analysis of Cluster Randomization Trials in Health Research - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Design and Analysis of Cluster Randomization Trials in Health Research

Description:

11 paired communities were selected and one member of each pair was randomly ... Analysis is only possible under the untenable assumption that there is no ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 38

Provided by: JCa123

Category:

more less

Transcript and Presenter's Notes

Title: Design and Analysis of Cluster Randomization Trials in Health Research

1
Design and Analysis of Cluster Randomization
Trials in Health Research

Allan Donner, Ph.D., Professor and Chair
Department of Epidemiology Biostatistics
The University of Western Ontario, London,
Ontario, Canada
Neil Klar, Ph.D., Senior Biostatistician
Division of Preventive Oncology
Cancer Care Ontario, Toronto, Ontario, Canada

Dr. Allan Donner

Dr. Neil Klar

4
Learning Objectives

To distinguish experimental trials based on the
unit of randomization (e.g. individual, family,
community).
To appreciate the consequences of cluster
randomization on sample size estimation and data
analysis.
To identify key features of a cluster
randomization trial which need to be included in
reports.

5
What Are Cluster Randomization Trials?

Cluster randomization trials are experiments in
which clusters of individuals rather than
independent individuals are randomly allocated to
intervention groups.

6
Example 1 Study Purpose To
evaluate the
effectiveness of Vitamin A
supplements on childhood mortality.

450 villages in Indonesia were randomly assigned
to either participate in a Vitamin A
supplementation scheme, or serve as a control.
One year mortality rates were compared in the two
groups.
Sommer et al.
(Lancet, 1986)

7
Example 2 Study Purpose To promote
smoking cessation using community resources.

11 paired communities were selected and one
member of each pair was randomly assigned to the
intervention group.
5-year smoking cessation rates were compared in
the two groups.
Communities were matched on demographic
characteristics
(e.g. size, population density) and
geographical proximity.
COMMIT Research Group (Am J Public Health,
1995)

8
Example 3 Study Purpose To evaluate the
effectiveness of treated
nasal tissues versus
standard tissues.

90 families were selected and randomized to one
of the two intervention groups separately in
each of three family size strata (2, 3, or 4
members per family).
24-week incidence of respiratory illness were
compared in the two groups.
Farr et al. (Am. J. Epid., 1988)

9
Reasons for Adopting Cluster Randomization

Administrative convenience
To obtain cooperation of investigators
Ethical considerations
To enhance subject compliance
To avoid treatment group contamination
Intervention is naturally applied at the cluster
level

10
Unit of Randomization vs. Unit of Analysis

A key property of cluster randomization trials is
that inferences are frequently intended to apply
at the individual level while randomization is at
the cluster or group level. Thus the unit of
randomization may be different from the unit of
analysis.
In this case, the lack of independence among
individuals in the same cluster, i.e.
between-cluster variation, creates special
methologic challenges in both design and analysis.

11
Implications of Between-Cluster Variation

Presence of between-cluster variation implies
(i) Reduction in effective sample size.
Extent depends on degree of within-cluster
correlation and on average cluster size.
(ii) Standard approaches for sample size
estimation and statistical analysis do not apply.
Application of standard sample size approaches
leads to an underpowered study.
Application of standard statistical methods
generally tends to bias p-values downwards, i.e.
could lead to spurious statistical significance.

12
Possible Reasons for Between- Cluster
Variation

1. Subjects frequently select the clusters to
which they belong
e.g., Patient characteristics could be
related to age or sex
differences among physicians
2. Important covariates at the cluster level
affect all individuals within the cluster in the
same manner e.g. Differences in temperature
between nurseries may be related to infection
rates
3. Individuals within clusters frequently
interact and, as a result, may respond similarly
e.g. Education strategies or therapies provided
in a group setting
4. Tendency of infectious diseases to spread
more rapidly within than among families or
communities.

13
Quantifying the Effect of Clustering

Consider a trial in which k clusters of size m
are randomly assigned to each of an experimental
and control group. Also assume the response
variable Y is normally distributed with common
variance ?2
Aim is to test H0?1 ?2. Then appropriate
estimates of ?1 and ?2 are given by Y1, Y2, the
usual sample means.
Also V(Yi) (?2/km) 1 (m - 1) ? , i 1,2
where ? is the coefficient of intracluster
correlation.
Then IF 1 (m- 1) ? is the variance
inflation factor or design effect associated
with cluster randomization.

14
Variance Component Interpretation of ?

The overall response variance ?2 may be
expressed as the sum of two components, i.e.,
?2 ?2A ?2W ,
where
?2A between-cluster component of variance
?2W within-cluster component of variance
then
? ?2A / (?2A ?2W)

15
Sample Size Requirements for Completely
Randomized Designs

Comparison of Means
Suppose k clusters of size m are to be
assigned to each of two intervention groups.
Then the number of subjects required per
intervention group to test H0?1 ?2 is given by
n (Z?/2 Z?)2 (2?2) 1 (m 1) ? / (?1 -
?2)2
where ?2 ?2A ?2W
Equivalently, the number of required clusters is
given by k n/m.

16
Example

Hsieh (1988) reported on the results of a pilot
study for a planned 5-year trial examining
cardiovascular risk factors, obtaining
cholesterol levels from 754 individuals in 4
worksites.
Estimated variance components were
S2W 2209, S2A 93.
? value of ? assessed as ? 93 / (93 2209)
0.04
Assuming 70 subjects/worksite,
IF 1 (70 - 1) 0.04 3.76

17
? To obtain 80 power at ? .05 (2 sided) for
detecting a mean difference of 20 mg/dl between
intervention groups, the number of required
worksites per group is given byk (1.96
0.84)2 2(2302) (3.76) / 70 (20) 2 4.8 ? 5To
adjust for the use of normal distribution
critical values, and possible loss of follow-up,
might enroll 7 clusters per group.
18
Impact on Power of Increasing the Number of
Clusters vs. Increasing Cluster Size

Let d mean difference between intervention
groups then,
Var (d) (2?2 / km) 1 ( m 1) ?
As the number of clusters k ? ? , Var(d) ? 0
but,
as the cluster size size m ? ? , Var (d) ? (2?2
?) / k 2?2A/k
Trial randomizing between 30 and 50 individuals
will tend to have almost the same statistical
power as trials randomizing the same number of
much larger units. But clusters of larger size
are often recruited for very practical reasons
(to reduce contamination, to avoid logistic or
ethical problems, etc.).

19
Factors Influencing Loss of Precision

1. Interventions often applied on a group basis
with little or no attention given to individual
study participants.
2. Some studies permit the immigration of new
subjects after baseline.
3. Entire clusters, rather than just
individuals, may be lost to follow-up.
4. Over-optimistic expectations regarding
effect size.

20
Strategies for Improving Precision in Cluster
Randomization Trials

1. Establish cluster-level eligibility criteria
so as to reduce between-cluster variability
e.g., geographical restrictions.
2. Consider increasing the number of clusters
randomized, even if only in the control group.
3. Consider matching or stratifying in the design
by baseline variable having prognostic
importance.
4. Obtain baseline measurements on other
potentially important prognostic variables.
5. Take repeated assessments over time from the
same clusters or from different clusters of
subjects.
6. Develop a detailed protocol for ensuring
compliance and minimizing loss to follow-up.

21
The Importance of Cluster Level Replication

Some investigators have designed community
intervention trials in which exactly one cluster
has been assigned to each intervention group.
Such trials invariably result in interpretational
difficulties caused by the total confounding of
two sources of variation
the variation in response due to the effect of
intervention, and
the natural variation that exists between the two
communities (clusters) even in the absence of an
intervention effect.
Analysis is only possible under the untenable
assumption that there is no clustering of
individuals responses within communities.
More attention to the effects of clustering when
determining sample size might help to eliminate
designs which lack replication.

22
Analysis of Binary Outcomes

Objective
To assess the effects of interventions on
individuals when clusters are the sampling unit.
Statistical Issue
Responses on individuals within the same
cluster tend to be positively correlated,
violating the assumption of independence required
for the application of standard statistical
methods.

Example
Data obtained from a study evaluating the effect
of school-based interventions in reducing
adolescent tobacco use.
12 school units were randomly assigned to each of
four conditions, including three intervention
conditions and a control condition (existing
curriculum).
We compare here the effect of the SFG (Smoke Free
Generation) intervention to the existing
curriculum (EC) with respect to reducing the
proportion of children who report using smokeless
tobacco after four years of follow-up.

24
Data
Overall rates of tobacco use Control 91/1479
.062 Experimental 58/1341 .043
25
Procedures Reviewed

1. Standard chi-squared test
2. Two sample t-test on school-specific
event rates
3. Direct adjustment of standard chi-
square test
4. Generalized estimating equations
approach

26
Hypothesis to be tested
27
Two-sample T-test Comparing Average Values of the
Event Rates
Pooled variance (.0352 .0262) / 2
.00095 t22 (.060 - .039) / ?.00095(1/12
1/12) 1.66 (p .11)
28
Adjusted Chi-square Test

An adjustment which depends on computing
clustering correction factors for each group, is
given by
C1 ?m1j 1 (m1j - 1) ?/ ?m1j ,
C2 ?m2j 1 (m2j - 1) ?/ ?m2j ,
where m1j, m2j are the cluster sizes within
groups 1 and 2, respectively, and ? is an
estimate of within-cluster correlation.

29
For Data in the Example

? .011, C1 2.599, C2 2.536
Then the adjusted one degree of freedom
chi-square statistic is given by
?2A M1 (P1 P)2 / C1 P (1 P) M2
(P2 - P)2 / C2 P (1 - P)
1479 (.062 - .053)2 / 2.599
(.053) (1 - .053) 1341 (.043 - .053) 2 /
2.536 (.053) (1 - .053) 1.83 (p .18)
Comments
1. Reduces to standard Pearson chi-square test
when ? 0
2. Imposes no distributional assumptions on the
responses within a cluster.
3. Assumes the Ci , i 1,2 are not
significantly different, i.e. estimate the same
population design effect.

30
Generalized Estimating Equations Approach

The generalized estimating equations (GEE)
approach, developed by Liang and Zeger (1986),
can be used to construct an extension of standard
logistic regression which adjusts for the effect
of clustering, and does not require parametric
assumptions.
Consider the logistic-regression model given by
log P / (1 - P) ?0 ?1 ?
where ? 1 if experimental
and ? 0 if control
The odds ratio for the effect of intervention is
given
by exp (?1)

31
Result

The resulting robust one degree of freedom
chi-square statistic, constructed using a working
exchangeable correlation matrix is given by
X2LZ 2.62 (? 0.11).
Comments
1) The GEE extensions of logistic regression
were fit using the statistical package SAS. These
procedures are also available in several other
statistical packages (e.g., STATA, SUDAAN).
2) Robust statistical inference are valid so
long as there are large numbers of clusters even
if the working correlation is not correctly
specified.
3) Can be extended to model dependence on
individual level covariates.

32
Summary of Results
33
Reporting Cluster Randomization Trials

Reporting of study design
Justify the use of cluster randomization.
Provide a clear definition of the unit of
randomization.
Explain how the chosen sample size or statistical
power calculations accounts for between-cluster
variation.

34
Reporting of study results

Provide the number of clusters randomized, the
average cluster size and the number of subjects
selected for study from each cluster.
Provide the values of the intracluster
correlation coefficient as calculated for the
primary outcome variables.
Explain how the reported statistical analyses
account for between-cluster variations.

35
References Links

References
1. Donner A, Klar N. Design and Analysis of
Cluster Randomization Trials in Health Research.
Arnold, London 2000
2. Statistics notes Sample size in cluster
randomisation Sally M Kerry and J Martin Bland
BMJ 1998 316 549.

3. Cluster randomised trials time for
improvement Marion K Campbell and Jeremy M
Grimshaw BMJ 1998 317 1171- 1172
4. Ethical issues in the design and conduct of
cluster randomised controlled trials Sarah J L
Edwards, David A Braunholtz, Richard J Lilford,
and Andrew J Stevens BMJ 1999 318 1407-1409.