Design and Analysis of Cluster Randomization Trials in Health Research - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Design and Analysis of Cluster Randomization Trials in Health Research

Description:

11 paired communities were selected and one member of each pair was randomly ... Analysis is only possible under the untenable assumption that there is no ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 38
Provided by: JCa123
Category:

less

Transcript and Presenter's Notes

Title: Design and Analysis of Cluster Randomization Trials in Health Research


1
Design and Analysis of Cluster Randomization
Trials in Health Research
  • Allan Donner, Ph.D., Professor and Chair
  • Department of Epidemiology Biostatistics
  • The University of Western Ontario, London,
    Ontario, Canada
  • Neil Klar, Ph.D., Senior Biostatistician
  • Division of Preventive Oncology
  • Cancer Care Ontario, Toronto, Ontario, Canada

2
  • Dr. Allan Donner

3
  • Dr. Neil Klar

4
Learning Objectives
  • To distinguish experimental trials based on the
    unit of randomization (e.g. individual, family,
    community).
  • To appreciate the consequences of cluster
    randomization on sample size estimation and data
    analysis.
  • To identify key features of a cluster
    randomization trial which need to be included in
    reports.

5
What Are Cluster Randomization Trials?
  • Cluster randomization trials are experiments in
    which clusters of individuals rather than
    independent individuals are randomly allocated to
    intervention groups.

6
Example 1 Study Purpose To
evaluate the
effectiveness of Vitamin A
supplements on childhood mortality.
  • 450 villages in Indonesia were randomly assigned
    to either participate in a Vitamin A
    supplementation scheme, or serve as a control.
  • One year mortality rates were compared in the two
    groups.
  • Sommer et al.
    (Lancet, 1986)

7
Example 2 Study Purpose To promote
smoking cessation using community resources.
  • 11 paired communities were selected and one
    member of each pair was randomly assigned to the
    intervention group.
  • 5-year smoking cessation rates were compared in
    the two groups.
  • Communities were matched on demographic
    characteristics
  • (e.g. size, population density) and
    geographical proximity.
  • COMMIT Research Group (Am J Public Health,
    1995)

8
Example 3 Study Purpose To evaluate the
effectiveness of treated
nasal tissues versus
standard tissues.
  • 90 families were selected and randomized to one
    of the two intervention groups separately in
    each of three family size strata (2, 3, or 4
    members per family).
  • 24-week incidence of respiratory illness were
    compared in the two groups.
  • Farr et al. (Am. J. Epid., 1988)

9
Reasons for Adopting Cluster Randomization
  • Administrative convenience
  • To obtain cooperation of investigators
  • Ethical considerations
  • To enhance subject compliance
  • To avoid treatment group contamination
  • Intervention is naturally applied at the cluster
    level

10
Unit of Randomization vs. Unit of Analysis
  • A key property of cluster randomization trials is
    that inferences are frequently intended to apply
    at the individual level while randomization is at
    the cluster or group level. Thus the unit of
    randomization may be different from the unit of
    analysis.
  • In this case, the lack of independence among
    individuals in the same cluster, i.e.
    between-cluster variation, creates special
    methologic challenges in both design and analysis.

11
Implications of Between-Cluster Variation
  • Presence of between-cluster variation implies
  • (i) Reduction in effective sample size.
  • Extent depends on degree of within-cluster
    correlation and on average cluster size.
  • (ii) Standard approaches for sample size
    estimation and statistical analysis do not apply.
  • Application of standard sample size approaches
    leads to an underpowered study.
  • Application of standard statistical methods
    generally tends to bias p-values downwards, i.e.
    could lead to spurious statistical significance.

12
Possible Reasons for Between- Cluster
Variation
  • 1. Subjects frequently select the clusters to
    which they belong
  • e.g., Patient characteristics could be
    related to age or sex
  • differences among physicians
  • 2. Important covariates at the cluster level
    affect all individuals within the cluster in the
    same manner e.g. Differences in temperature
    between nurseries may be related to infection
    rates
  • 3. Individuals within clusters frequently
    interact and, as a result, may respond similarly
    e.g. Education strategies or therapies provided
    in a group setting
  • 4. Tendency of infectious diseases to spread
    more rapidly within than among families or
    communities.

13
Quantifying the Effect of Clustering
  • Consider a trial in which k clusters of size m
    are randomly assigned to each of an experimental
    and control group. Also assume the response
    variable Y is normally distributed with common
    variance ?2
  • Aim is to test H0?1 ?2. Then appropriate
    estimates of ?1 and ?2 are given by Y1, Y2, the
    usual sample means.
  • Also V(Yi) (?2/km) 1 (m - 1) ? , i 1,2
  • where ? is the coefficient of intracluster
    correlation.
  • Then IF 1 (m- 1) ? is the variance
    inflation factor or design effect associated
    with cluster randomization.

14
Variance Component Interpretation of ?
  • The overall response variance ?2 may be
    expressed as the sum of two components, i.e.,
  • ?2 ?2A ?2W ,
  • where
  • ?2A between-cluster component of variance
  • ?2W within-cluster component of variance
  • then
  • ? ?2A / (?2A ?2W)

15
Sample Size Requirements for Completely
Randomized Designs
  • Comparison of Means
  • Suppose k clusters of size m are to be
    assigned to each of two intervention groups.
  • Then the number of subjects required per
    intervention group to test H0?1 ?2 is given by
  • n (Z?/2 Z?)2 (2?2) 1 (m 1) ? / (?1 -
    ?2)2
  • where ?2 ?2A ?2W
  • Equivalently, the number of required clusters is
    given by k n/m.

16
Example
  • Hsieh (1988) reported on the results of a pilot
    study for a planned 5-year trial examining
    cardiovascular risk factors, obtaining
    cholesterol levels from 754 individuals in 4
    worksites.
  • Estimated variance components were
  • S2W 2209, S2A 93.
  • ? value of ? assessed as ? 93 / (93 2209)
    0.04
  • Assuming 70 subjects/worksite,
  • IF 1 (70 - 1) 0.04 3.76

17
? To obtain 80 power at ? .05 (2 sided) for
detecting a mean difference of 20 mg/dl between
intervention groups, the number of required
worksites per group is given byk (1.96
0.84)2 2(2302) (3.76) / 70 (20) 2 4.8 ? 5To
adjust for the use of normal distribution
critical values, and possible loss of follow-up,
might enroll 7 clusters per group.
18
Impact on Power of Increasing the Number of
Clusters vs. Increasing Cluster Size
  • Let d mean difference between intervention
    groups then,
  • Var (d) (2?2 / km) 1 ( m 1) ?
  • As the number of clusters k ? ? , Var(d) ? 0
    but,
  • as the cluster size size m ? ? , Var (d) ? (2?2
    ?) / k 2?2A/k
  • Trial randomizing between 30 and 50 individuals
    will tend to have almost the same statistical
    power as trials randomizing the same number of
    much larger units. But clusters of larger size
    are often recruited for very practical reasons
    (to reduce contamination, to avoid logistic or
    ethical problems, etc.).

19
Factors Influencing Loss of Precision
  • 1. Interventions often applied on a group basis
    with little or no attention given to individual
    study participants.
  • 2. Some studies permit the immigration of new
    subjects after baseline.
  • 3. Entire clusters, rather than just
    individuals, may be lost to follow-up.
  • 4. Over-optimistic expectations regarding
    effect size.

20
Strategies for Improving Precision in Cluster
Randomization Trials
  • 1. Establish cluster-level eligibility criteria
    so as to reduce between-cluster variability
    e.g., geographical restrictions.
  • 2. Consider increasing the number of clusters
    randomized, even if only in the control group.
  • 3. Consider matching or stratifying in the design
    by baseline variable having prognostic
    importance.
  • 4. Obtain baseline measurements on other
    potentially important prognostic variables.
  • 5. Take repeated assessments over time from the
    same clusters or from different clusters of
    subjects.
  • 6. Develop a detailed protocol for ensuring
    compliance and minimizing loss to follow-up.

21
The Importance of Cluster Level Replication
  • Some investigators have designed community
    intervention trials in which exactly one cluster
    has been assigned to each intervention group.
  • Such trials invariably result in interpretational
    difficulties caused by the total confounding of
    two sources of variation
  • the variation in response due to the effect of
    intervention, and
  • the natural variation that exists between the two
    communities (clusters) even in the absence of an
    intervention effect.
  • Analysis is only possible under the untenable
    assumption that there is no clustering of
    individuals responses within communities.
  • More attention to the effects of clustering when
    determining sample size might help to eliminate
    designs which lack replication.

22
Analysis of Binary Outcomes
  • Objective
  • To assess the effects of interventions on
    individuals when clusters are the sampling unit.
  • Statistical Issue
  • Responses on individuals within the same
    cluster tend to be positively correlated,
    violating the assumption of independence required
    for the application of standard statistical
    methods.

23
  • Example
  • Data obtained from a study evaluating the effect
    of school-based interventions in reducing
    adolescent tobacco use.
  • 12 school units were randomly assigned to each of
    four conditions, including three intervention
    conditions and a control condition (existing
    curriculum).
  • We compare here the effect of the SFG (Smoke Free
    Generation) intervention to the existing
    curriculum (EC) with respect to reducing the
    proportion of children who report using smokeless
    tobacco after four years of follow-up.

24
Data
Overall rates of tobacco use Control 91/1479
.062 Experimental 58/1341 .043
25
Procedures Reviewed
  • 1. Standard chi-squared test
  • 2. Two sample t-test on school-specific
  • event rates
  • 3. Direct adjustment of standard chi-
  • square test
  • 4. Generalized estimating equations
  • approach

26
Hypothesis to be tested
27
Two-sample T-test Comparing Average Values of the
Event Rates
Pooled variance (.0352 .0262) / 2
.00095 t22 (.060 - .039) / ?.00095(1/12
1/12) 1.66 (p .11)
28
Adjusted Chi-square Test
  • An adjustment which depends on computing
    clustering correction factors for each group, is
    given by
  • C1 ?m1j 1 (m1j - 1) ?/ ?m1j ,
  • C2 ?m2j 1 (m2j - 1) ?/ ?m2j ,
  • where m1j, m2j are the cluster sizes within
    groups 1 and 2, respectively, and ? is an
    estimate of within-cluster correlation.

29
For Data in the Example
  • ? .011, C1 2.599, C2 2.536
  • Then the adjusted one degree of freedom
    chi-square statistic is given by
  • ?2A M1 (P1 P)2 / C1 P (1 P) M2
    (P2 - P)2 / C2 P (1 - P)
  • 1479 (.062 - .053)2 / 2.599
    (.053) (1 - .053) 1341 (.043 - .053) 2 /
    2.536 (.053) (1 - .053) 1.83 (p .18)
  • Comments
  • 1. Reduces to standard Pearson chi-square test
    when ? 0
  • 2. Imposes no distributional assumptions on the
    responses within a cluster.
  • 3. Assumes the Ci , i 1,2 are not
    significantly different, i.e. estimate the same
    population design effect.

30
Generalized Estimating Equations Approach
  • The generalized estimating equations (GEE)
    approach, developed by Liang and Zeger (1986),
    can be used to construct an extension of standard
    logistic regression which adjusts for the effect
    of clustering, and does not require parametric
    assumptions.
  • Consider the logistic-regression model given by
  • log P / (1 - P) ?0 ?1 ?
  • where ? 1 if experimental
  • and ? 0 if control
  • The odds ratio for the effect of intervention is
    given
  • by exp (?1)

31
Result
  • The resulting robust one degree of freedom
    chi-square statistic, constructed using a working
    exchangeable correlation matrix is given by
  • X2LZ 2.62 (? 0.11).
  • Comments
  • 1) The GEE extensions of logistic regression
    were fit using the statistical package SAS. These
    procedures are also available in several other
    statistical packages (e.g., STATA, SUDAAN).
  • 2) Robust statistical inference are valid so
    long as there are large numbers of clusters even
    if the working correlation is not correctly
    specified.
  • 3) Can be extended to model dependence on
    individual level covariates.

32
Summary of Results
33
Reporting Cluster Randomization Trials
  • Reporting of study design
  • Justify the use of cluster randomization.
  • Provide a clear definition of the unit of
    randomization.
  • Explain how the chosen sample size or statistical
    power calculations accounts for between-cluster
    variation.

34
Reporting of study results
  • Provide the number of clusters randomized, the
    average cluster size and the number of subjects
    selected for study from each cluster.
  • Provide the values of the intracluster
    correlation coefficient as calculated for the
    primary outcome variables.
  • Explain how the reported statistical analyses
    account for between-cluster variations.

35
References Links
  • References
  • 1. Donner A, Klar N. Design and Analysis of
    Cluster Randomization Trials in Health Research.
    Arnold, London 2000
  • 2. Statistics notes Sample size in cluster
    randomisation Sally M Kerry and J Martin Bland
    BMJ 1998 316 549.

36
  • 3. Cluster randomised trials time for
    improvement Marion K Campbell and Jeremy M
    Grimshaw BMJ 1998 317 1171- 1172
  • 4. Ethical issues in the design and conduct of
    cluster randomised controlled trials Sarah J L
    Edwards, David A Braunholtz, Richard J Lilford,
    and Andrew J Stevens BMJ 1999 318 1407-1409.

37
  • Links
  • The last three articles are available at
    http//www.bmj.com/
  • Arnold Publishers
  • http//www.arnoldpublishers.com/
Write a Comment
User Comments (0)
About PowerShow.com