Systematic Reviews Methods and Procedures

George A. Wells Editor, Cochrane Musculoskeletal

Review Group Department of Epidemiology and

Community Medicine University of Ottawa Ottawa,

Ontario, Canada

Meta-analysis

- Meta-analysis is a statistical analysis of a

collection of studies - Meta-analysis methods focus on contrasting and

comparing results from different studies in

anticipation of identifying consistent patterns

and sources of disagreements among these results - Primary objective
- Synthetic goal (estimation of summary effect)

vs - Analytic goal (estimation of differences)

- Systematic Review
- the application of scientific strategies that

limit bias to the systematic assembly, critical

appraisal and synthesis of all relevant studies

on a specific topic - Meta-Analysis
- a systematic review that employs statistical

methods to combine and summarize the results of

several studies

Features of narrative reviews and systematic

reviews

NARRATIVE SYSTEMATIC

QUESTION Broad Focused SOURCES/ Usually

unspecified Comprehensive SEARCH Possibly

biased explicit SELECTION Unspecified

biased?Criterion-based uniformly

applied APPRAISAL Variable Rigourous SYNTHESIS

Usually qualitative Quantitative INFERENCE

Sometimes Usually evidence-

evidence-based based

Steps of a Cochrane Systematic Review

- Clearly formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

- What is the study objective
- to validate results in a large population
- to guide new studies
- Pose question in both biologic and health care

terms specifying with operational definitions - population
- intervention
- outcomes (both beneficial and harmful)

Inclusion Criteria

- Study design
- Population
- Interventions
- Outcomes

Steps of a Cochrane Systematic Review

- Clearly formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

- Need a well formulated and co-ordinated effort
- Seek guidance from a librarian
- Specify language constraints
- Requirements for comprehensiveness of search

depends on the field and question to be addressed - Possible sources include
- computerized bibliographic database
- review articles
- abstracts
- conference proceedings
- dissertations
- books
- experts
- granting agencies
- trial registries
- industry
- journal handsearching

- Procedure
- usually begin with searches of biblographic

reports (citation indexes, abstract databases) - publications retrieved and references therein

searched for more references - as a step to elimination of publication bias need

information from unpublished research - databases of unpublished reports
- clinical research registries
- clinical trial registries
- unpublished theses
- conference indexes

Published Reports (publication bias ie.

tendency to publish statistically significant

results)

Steps of a Cochrane Systematic Review

- Clearly formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

Study Selection

- 2 independent reviewers select studies
- Selection of studies addressing the question

posed based on a priori specification of the

population, intervention, outcomes and study

design - Level of agreement kappa
- Differences resolved by consensus
- Specify reasons for rejecting studies

Data Extraction

- 2 independent reviewers extract data using

predetermined forms - Patient characteristics
- Study design and methods
- Study results
- Methodologic quality
- Level of agreement kappa
- Differences resolved by consensus

Data Extraction .

- Be explicit, unbiased and reproducible
- Include all relevant measures of benefit and harm

of the intervention - Contact investigators of the studies for

clarification in published methods etc. - Extract individual patient data when published

data do not answer questions about intention to

treat analyses, time-to-event analyses,

subgroups, dose-response relationships

Steps of a Cochrane Systematic Review

- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

Description of Studies

- Size of study
- Characteristics of study patients
- Details of specific interventions used
- Details of outcomes assessed

Methodologic Quality Assessment

- Can use as
- threshold for inclusion
- possible explanation form heterogeneity
- Base quality assessments on extent to which bias

is minimized - Make quality assessment scoring systems

transparent and parsimonious - Evaluate reproducibility of quality assessment
- Report quality scoring system used

Quality Assessment Example

indicates that randomization was appropriate (

eg

Random numbers were computer generated)

Steps of a Cochrane Systematic Review

- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

Outcome

Discrete (event)

Continuous (measured)

Mean Standardized Difference Mean

Difference (MD) (SMD)

Odds Relative Risk Ratio Risk

Difference (OR) (RR) (RD)

(Basic Data)

(Basic Data)

Overall Estimate Fixed Effects Random Effects

Overall Estimate Fixed Effects Random Effects

Effect measures discrete data

- P1 event rate in experimental group
- P2 event rate in control group
- RD Risk difference P2 - P1
- RR Relative risk P1 / P2
- RRR Relative risk reduction (P2-P1)/P2
- OR Odds ratio P1/(1-P1)/P2/(1-P2)
- NNT No. needed to treat 1 / (P2-P1)

Example

- Experimental event rate 0.3
- Control event rate 0.4
- RD 0.4 - 0.3 0.1
- RR 0.3 / 0.4 0.75
- RRR (0.4 - 0.3) / 0.4 0.25
- OR (0.3/0.7)/(0.4/0.6) 0.64
- NNT 1 / (0.4 - 0.3) 10

Discrete - Odds Ratio (OR)

Event No event Experimental a b

ne Control c d nc

Odds number of patients experiencing

event number of patients not experiencing

event Odds ratio Odds in Experimental

group Odds in Control group

Basic Data a/ne c/nc

Discrete - Odds Ratio Example

Event No event Experimental 13 33

46 Control 7 31 38

Basic Data 13/46 7/38

Discrete - Relative Risk (RR)

Event No event Experimental a b

ne Control c d nc

Risk number of patients experiencing

event number of patients Risk Ratio Risk in

Experimental group Risk in Control group

Basic Data a/ne c/nc

Discrete - Relative Risk - Example

Event No event Experimental 13 33

46 Control 7 31 38

Basic Data 13/46 7/38

Discrete - Risk Difference (RD)

Event No event Experimental a b

ne Control c d nc

Risk number of patients experiencing

event number of patients Risk

Difference (Risk in Experimental group) - (Risk

in Control group)

RD Pe- Pc

Basic Data a/ne c/nc

Discrete - Risk Difference - Example

Event No event Experimental 13 33

46 Control 7 31 38

RD Pe- Pc 13/46 - 7/38 0.098

Basic Data 13/46 7/38

Discrete - Odds Ratio

(O)

Event No event Experimental a b

ne Control c d nc

Estimator

Standard Error

100(1- ) CI

Discrete - Relative Risk

(R)

Event No event Experimental a b

ne Control c d nc

Estimator

Standard Error

100(1- ) CI

Discrete - Risk Difference

(D)

Event No event Experimental a b

ne Control c d nc

Estimator

Standard Error

100(1- ) CI

When to use OR / RR / RD

OR vs RR Odds Ratio ? Relative Risk if event

occurs infrequently (i.e. a and c small

relative to b and d) RR a(cd) ? ad

OR (ab)c bc Odds Ratio gt Relative Risk if

event occurs frequently RD vs RR When

interpretation in terms of absolute difference is

better than in relative terms (eg. Interest in

absolute reduction in adverse events)

(No Transcript)

Continuous Data - Mean Difference (MD)

number mean standard deviation Experimental ne

se Control nc sc

Continuous Data - Standardized Mean Difference

(SMD)

number mean standard deviation Experimental ne

se Control nc sc

When to use MD / SMD

- Mean Difference
- When studies have comparable outcome measures

(ie. Same scale, probably same length of

follow-up) - A meta-analysis using MDs is known as a weighted

mean difference (WMD) - Standardized Mean Difference
- When studies use different outcome measurements

which address the same clinical outcome (eg

different scales) - Converts scale to a common scale number of

standard deviations

Example Combining different scales for Swollen

Joint Count

Sources of Variation over Studies

- True inter-study variation may exist

(fixed/random-effects model) - Sampling error may vary among studies (sample

size) - Characteristics may differ among studies

(population, intervention)

Modelling Variation

- Parameter of interest (quantifies average

treatment effect) - Number of independent studies k
- Summary Statistic Yi (i1,2,,k)
- Large sample size asymptotic normal distribution

Fixed-effects model vs Random-effects model

Fixed-Effects Model

- Outcome Yi from study i is a sample from a

distribution with mean - (ie. common mean across studies)
- Yi are independently distributed as N ( ,

) (i1,2,,k) where Var(Yi ) and

assume E(Yi)

Fixed-Effects Model

x

Random-Effects Model

- Outcome Yi from study i is a sample from a

distribution with mean - (ie. study-specific means)
- Yi are independently distributed as N ( ,

) (i1,2,,k) where Var(Yi ) and

assume E(Yi) - is a realization from a distribution of

effects with mean - are independently distributed as N ( ,

) (i1,2,,k) where - Var ( ) is the inter-study variation
- is the average treatment effect

Random-Effects Model

x

Random-Effects Model ..

Estimating Average Study Effect

- after averaging study-specific effects,

distribution of Yi is N ( , ) - although is parameter of interest, must

be considered and estimated

Estimating Study-Specific Effects

- distribution of conditional on observed

data, and is N (

) - where Fi is the shrinkage factor for the ith

study

Modelling Variation

- Studies are stratified and then combined to

account for differences in sample size and study

characteristics - A weighted average of estimates from each study

is calculated - Question of whether a common or study-specific

parameter is to be estimated remains .

Procedure - perform test of homogeneity
- if no significant difference use fixed-effects

model - otherwise identify study characteristics that

stratifies studies into subsets with homogeneous

effects or use random effects model

Fixed Effects Model

- Require from each study
- effect estimate and
- standard error of effect estimate
- Combine these using a weighted average
- pooled estimate sum of (estimate ? weight)

- sum of weights
- where weight 1 / variance of estimate
- Assumes a common underlying effect behind every

trial

Fixed-Effects Model General Scheme

Study Measure Std Error Weight 1 Y1 s1 W1 2 Y

2 s2 W2 . . . . . . . . . . . . k Yk sk

Wk (no association Yi0)

Overall Measure

Chi-Square Tests

1

2

1

If large association

2

If large heterogeneity

Features in Graphic Display

- For each trial
- estimate (square)
- 95 confidence interval (CI) (line)
- size (square) indicates weight allocated
- Solid vertical line of no effect
- if CI crosses line then effect not significant

(pgt0.05) - Horizontal axis
- arithmetic RD, MD, SMD
- logarithmic OR, RR
- Diamond represents combined estimate and 95 CI
- Dashed line plotted vertically through combined

estimate

Odds Ratio

Three methods for combining (1)

Mantel-Haenszel method (2) Petos method (3)

Maximum likelihood method Relative Risk Risk

Difference

Peto Odds Ratio

Mantel-Haenszel Odds Ratio

Relative Risk

Risk Difference

Weighted Mean Difference

- Standardized Mean Difference

Weighted Mean Difference

Standardized Mean Difference

Heterogeneity

- Define meaning of heterogeneity for each review
- Define a priori the important degree of

heterogeneity (in large data sets trivial

heterogeneity may be statistically significant) - If heterogeneity exists examine potential sources

(differences in study quality, participants,

intervention specifics or outcome

measurement/definition) - If heterogeneity exists across studies, consider

using random effects model - If heterogeneity can be explained using a priori

hypotheses, consider presenting results by these

subgroups - If heterogeneity cannot be explained, proceed

with caution with further statistical aggregation

and subgroup analysis

Heterogeneity How to Identify it

- Common sense
- are the patients, interventions and outcomes in

each of the included studies sufficiently similar - Exploratory analysis of study-specific estimates
- Statistical tests

Heterogeneity How to deal with it

Lau et al. 1997

Heterogeneity Exploring it

- Subgroup analyses
- subsets of trials
- subsets of patients
- SUBGROUPS SHOULD BE PRE-SPECIFIED TO AVOID BIAS
- Meta-regression
- relate size of effect to characteristics of the

trials

Exploring Heterogeneity subgroup analysis

Exploring Heterogeneity subgroup analysis

Random Effects Model

- Assume true effect estimates really vary across

studies - Two sources of variation
- within studies (between patients)
- between studies (heterogeneity)
- What the software does
- Revise weights to take into account both

components of variation - weight 1
- varianceheterogeneity
- When heterogeneity exists we get
- a different pooled estimate (but not necessarily)

with a different interpretation - a wider confidence interval
- a larger p-value

Random Effects Model

If is known then MLE of is

If is unknown three common methods of

inference can be used Restricted Maximum

Likelihood (REML) Bayesian Method of

Moments (MOM)

Method of Moments (Random effects model)

Study Measure Weight (FE) Weight (RE) 1 Y1

W1 w1(w1-1 )-1 2 Y2 W2 w2(w2-1

)-1 . . . . . . . . . . . . k Yk Wk

wk(wk-1 )-1

Overall Measure

Effect of model choice on study weights

Larger studies receive proportionally less

weight in RE model than in FE model

Fixed vs Random Effects Discrete Data

Fixed Effects

Random Effects

Fixed vs Random Effects Continuous Data

Fixed Effects

Random Effects

Omission of Outlier - Chestnut Study

Analysis

- Include all relevant and clinically useful

measures of treatment effect - Perform a narrative, qualitative summary when

data are too sparse, of too low quality or too

heterogeneous to proceed with a meta-analysis - Specify if fixed or random effects model is used
- Describe proportion of patients used in final

analysis - Use confidence intervals
- Include a power analysis
- Consider cumulative meta-analysis (by order of

publication date, baseline risk, study quality)

to assess the contribution of successive studies

Steps of a Cochrane Systematic Review

- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

Subgroup Analyses

- Pre-specify hypothesis-testing subgroup analyses

and keep few in number - Label all a posteriori subgroup analyses
- When subgroup differences are detected, interpret

in light of whether they are - established a priori
- few in number
- supported by plausible causal mechanisms
- important (qualitative vs quantitative)
- consistent across studies
- statistically significant (adjusted for multiple

testing)

Sensitivity Analyses

- Test robustness of results relative to key

features of the studies and key assumptions and

decisions - Include tests of bias due to retrospective nature

of systematic reviews (eg.with/without studies of

lower methodologic quality) - Consider fragility of results by determining

effect of small shifts in number of events

between groups - Consider cumulative meta-analysis to explore

relationship between effect size and study

quality, control event rates and other relevent

features - Test a reasonable range of values for missing

data from studies with uncertain results

Funnel Plot

- Scatterplot of effect estimates against sample

size - Used to detect publication bias
- If no bias, expect symmetric, inverted funnel
- If bias, expect asymmetric or skewed shape

x x x x

x x x x x x x x

x x x

x x x x x x x

Suggestion of missing small studies

Funnel Plot Example 1 Prophylaxis of NSAID

induced Gastric Ulcers

700

600

500

400

Sample Size

300

Intervention

200

100

H2-Blockers

0

1.2

1.0

.8

.6

.4

.2

0.0

Effect Size (RR)

Funnel Plot Example 2 Alendronate for

Postmenopausal Osteoporosis

2500

2000

WMD of change in lumbar bone mineral density

1500

Sample Size

1000

500

0

0

5

10

Weighted Mean Difference

Steps of a Cochrane Systematic Review

- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if

appropriate and possible - Prepare a structured report

Presentation of Results

- Include a structured abstract
- Include a table of the key elements of each study
- Include summary data from which the measures are

computed - Employ informative graphic displays representing

confidence intervals, group event rates, sample

sizes etc.

Interpretation of Results

- Interpret results in context of current health

care - State methodologic limitations of studies and

review - Consider size of effect in studies and review,

their consistency and presence of dose-response

relationship - Consider interpreting results in context of

temporal cumulative meta-analysis - Interpret results in light of other available

evidence - Make recommendations clear and practical
- Propose future research agenda (clinical and

methodological requirements)

Generic Inferential Framework

Generic inferential framework

- (1) Conceptually, think of a generic effect

size statistic T - (2) corresponding effect size parameter ?
- (3) associated standard error SE(T), square root

of variance - (4) for some effect sizes, some suitable

transformation may be needed to make inference

based on normal distribution theory

Generic inferential framework ...

- (A) Fixed-Effects Model (FEM)
- Assume a common effect size
- Obtain average effect size as a weighted mean

(unbiased) - Optimal weight is reciprocal of variance

(inverse variance weighted method)

Generic inferential framework ...

- Variances inversely proportional to within-study

sample sizes - what is the effect of larger studies in

calculating weights? - may also weigh by quality index, q, scaled from

0 to 1

Generic inferential framework ...

- Average effect size has conditional variance (a

function of conditional variances of each effect

size, quality index, ) - e.g.. V 1/total weight
- Multiply the resulting standard error by

appropriate critical value (1.96, 2.58, 1.645) - Construct confidence interval and/or test

statistic

Generic inferential framework ...

- Test the homogeneity assumption using a weighted

effect size sums of squares of deviations, Q - If Q exceeds the critical value of chi-square at

k-1 d.f. (k number of studies), then observed

between-study variance significantly greater than

what would be expected under the null hypothesis

Generic inferential framework ...

- When within-study sample sizes are very large, Q

may be rejected even when individual effect size

estimates do not differ much - One can take different courses of action when Q

is rejected (see next page)

Generic inferential framework ...

- Methodologic choices in dealing with

heterogeneous data

Generic inferential framework ...

- (B) Random-Effects Model (REM)
- Total variability of an observed study effect

size reflects within and between variance (extra

variance component) - If between-studies variance is zero, equations of

REM reduce to those of FEM - Presence of a variance component which is

significantly different from zero may be

indicative of REM

Generic inferential framework ...

- Once significance of variance component is

established (e.g.. Q test for homogeneity of

effect size), - its magnitude should be estimated
- variance components can be estimated in many

ways! - the most commonly used method is the so-called

the DerSimonian-Laird method which is based on

method-of-moments approach - Compute random effects weighted mean as an

estimate of the average of the random effects in

the population - construct confidence interval and conduct

hypothesis tests as before (new variance and thus

new weights!!!)

Correlation Coefficient

Example Correlation coefficient

- A measure of association more popular in

cross-sectional observational studies than in

RCTs is Pearsons correlation coefficient, r

given by - X and Y must be continuous (e.g. blood pressure

and weight) - r lies between -1 to 1
- not available in RevMan / MetaView at this time

Correlation coefficient (contd)

- Following the generic framework discussed

earlier - the effect size statistic is r
- the corresponding effect size parameter is the

underlying population correlation coefficient, ? - in this case, a suitable transformation is needed

to achieve approximate normality of effect size - inference is conducted on the scale of the

transformed variable and final results are

back-transformed to the original scale

Correlation coefficient (contd)

- Assuming X and Y have a bivariate normal

distribution, the Fishers Z transformed variable - has, for large sample, an approximate normal

distribution with mean of - and a variance of
- Hence, weighting factor associated with Z is W

1/Var n-3.

Correlation coefficient (contd)

- meta-analysis is carried out on Z-transformed

measures and final results are transformed back

to the scale of correlation using

Numerical Example

- Source Fleiss J., Statistical Methods in Medical

Research 1993 2 121 -- 145. - correlation coefficients reported by 7

independent studies in education are included in

the meta-analysis - Comparison association between a characteristic

of the teacher and the mean measure of his or her

students achievement

Example Fleiss (1993)

__________________________________________ Study

n r Z W WZ

WZ2

1 15 -0.073 -0.073 12

-0.876 0.064 2 16 0.308 0.318 13

4.134 1.315 3 15 0.481 0.524 12

6.288 3.295 4 16 0.428 0.457 13

5.941 2.715 5 15 0.180 0.182 12

2.184 0.397 6 17 0.290 0.299 14

4.186 1.252 7 __ 15 0.400 0.424 _ 12

___5.088 2.157__ Sum 88 26.945

11.195

Z Fishers Z-transformation of r W

n-3

Q 2.94 on 6 df is not statistically significant.

Results and discussions

- No evidence for heterogeneous association across

studies - Fixed effect analysis may be undertaken
- Questions
- Would a random effect analysis as shown earlier

produce a different numerical value for the

combined correlation coefficient? - How would the weights be modified to carry out a

REM?

Results and discussions (contd)

- the weighted mean of Z is
- the approximate standard error of the combined

mean is

Results and discussions (contd)

- Test of significance is carried out using
- this value exceeds the critical value 1.96

(corresponding to 5 level of significance), so

we conclude that average value of Z (hence the

average correlation) is statistically significant

Results and discussions (contd)

- 95 confidence interval for ? is
- Transforming back to the original scale, a 95 CI

for the parameter of interest, ?, is - again confirming a significant association

Critical Appraisal of a Systematic Review

(A) The Message

- Does the review set out to answer a precise

question about patient care? - Should be different from an uncritical

encyclopedic presentation

(B) The Validity

- Have studies been sought thoroughly
- Medline and other relevant bibliographic database

- Cochrane controlled clinical trials register
- Foreign language literature
- "Grey literature" (unpublished or un-indexed

reports theses, conference proceedings, internal

reports, non-indexed journals, pharmaceutical

industry files) - Reference chaining from any articles found
- Personal approaches to experts in the field to

find unpublished reports - Hand searches of the relevant specialized

journals.

Validity (contd)

- Have inclusion and exclusion criteria for studies

been stated explicitly, taking account of the

patients in the studies, the interventions used,

the outcomes recorded and the methodology?

Validity (contd)

- Have the authors considered the homogeneity of

the studies the idea that the studies are

sufficiently similar in their design,

interventions and subjects to merit combination. - this is done either by eyeballing graphs like the

forest plot or by applications of chi-square

tests (Q test)

(C) The Utility

- The various studies may have used patients of

different ages or social classes, but if the

treatment effects are consistent across the

studies, then generalisation to other groups or

populations is more justified.

Utility (contd)

- Be wary of sub-group analyses where the authors

attempt to draw new conclusions by comparing the

outcomes for patients in one study with the

patients in another study - Be wary of "data-dredging" exercises, testing

multiple hypotheses against the data, especially

if the hypotheses were constructed after the

study had begun data collection.

Utility (contd)

- One may also want to ask
- Were all clinically important outcomes

considered? - Are the benefits worth the harms and costs?