Meta-analysis%20of%20Test%20Performance%20When%20There%20Is%20a%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Meta-analysis%20of%20Test%20Performance%20When%20There%20Is%20a%20

Description:

Meta-analysis of Test Performance When There Is a Gold Standard Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Medical ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 59
Provided by: AlexP171
Category:

less

Transcript and Presenter's Notes

Title: Meta-analysis%20of%20Test%20Performance%20When%20There%20Is%20a%20


1
Meta-analysis of Test Performance When There Is a
Gold Standard
  • Prepared for
  • The Agency for Healthcare Research and Quality
    (AHRQ)
  • Training Modules for Medical Test Reviews Methods
    Guide
  • www.ahrq.gov

2
Learning Objectives
  • Graphically display diagnostic test performance
    across multiple studies using a gold standard
    reference
  • Explain the dependence of sensitivity and
    specificity over studies and thus the need for a
    multivariate (joint) analysis
  • Describe choices for a meta-analysis to summarize
    test performance depending on whether the
    sensitivity and specificity estimates from
    multiple studies vary (or do not vary) widel

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
3
Background
  • This module focuses on how to conduct a
    meta-analysis with a gold standard reference.
  • Module 9 discusses how to conduct a meta-analysis
    when no gold standard reference exists.
  • There are two goals for a meta-analysis in a
    systematic review
  • Provide summary estimates for key quantities
  • Explain observed heterogeneity in the results of
    studies included in the review
  • For systematic reviews of medical tests, a
    meta-analysis often focuses on synthesis of test
    performance data.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
4
Important Terms
  • Gold Standard A reference standard that is
    considered adequate in defining the presence or
    absence of the condition of interest (disease).
  • Diagnostic Test This type of test is potentially
    less accurate than using the gold standard to
    ascertain disease.
  • Data The main focus is on tests with positive or
    negative results because of the use of a cut-off
    level (threshold) each study provides 2 2
    tabulation.

Test Result With Disease Healthy
Positive True positive False positive
Negative False negative True negative
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
5
Measures Used To Assess Test Performance (1 of 2)
  • Sensitivity The proportion of test positives
    among people with a disease (true-positive rate)
  • Specificity The proportion of test negatives
    among healthy people (true-negative rate)
  • Positive predictive value Proportion with
    disease among people with test-positive results
  • Negative predictive value Proportion of healthy
    people with test-negative results
  • The predictive values can be computed from
    sensitivity, specificity, and disease prevalence.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
6
Measures Used To Assess Test Performance (2 of 2)
  • Positive likelihood ratio sensitivity/(1 ?
    specificity) proportion of test positives among
    diseased/proportion of test positives among
    healthy
  • Negative likelihood ratio (1 ?
    sensitivity)/specificity proportion of test
    negatives among diseased/proportion of test
    negatives among healthy
  • Diagnostic odds ratio (true positives/false
    negatives)/(false positives/true negatives)
    odds of a positive test with disease over odds of
    a positive test without disease
  • Diagnostic odds ratios do not allow weighing of
    the true-positive and false-positive rates
    separately.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
7
Dependence of Sensitivity and Specificity Across
Studies (1 of 2)
  • Meta-analysis aims to provide a meaningful
    summary of sensitivity and specificity across
    studies.
  • Within each study, sensitivity and specificity
    are independent they are estimated from
    different patients (those with a disease or those
    who are healthy).
  • Across studies, sensitivity and specificity are
    generally negatively correlated as one
    increases the other is expected to decrease.
  • This negative correlation is most obvious with
    varying thresholds (known as threshold effect),
    varying time from onset of symptom to test, et
    cetera.
  • Positive correlations are often due to a missing
    covari

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
8
Dependence of Sensitivity and Specificity Across
Studies (2 of 2)
  • This is an example with 11 studies using D-dimer
    tests to diagnose acute coronary events, showing
    that sensitivity increases as specificity
    decreases
  • Summarizing the two correlated quantities is a
    multivariate problem, and multivariate methods
    should be used to address it.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Becker DM, Philbrick
JT, Bachhuber TL, et al. Ann Intern Med 1996 May
13156(9)939-46. PMID 8624174.
9
Challenges
  • How to quantitatively summarize medical test
    performance when
  • The sensitivity and specificity estimates of
    various studies do not vary widely or extensively
  • Can use a summary point to obtain summary test
    performance if the tests have the same threshold
  • Summary point a summary sensitivity and summary
    specificity pair
  • The sensitivity and specificity of multiple
    studies vary widely
  • Can use a summary line to describe the
    relationship between average sensitivity and
    average specificity
  • May be less important than variations in
    thresholds, reference standards, study designs,
    et cetera, between the studies

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
10
Principles for Addressing the Challenges
  • Principle 1 Favor the most informative way to
    summarize the data.
  • Choose between a summary point and a summary
    line.
  • Use the summary point when sensitivity/specificity
    do not vary much.
  • Use the summary line when there are different
    thresholds for positive tests or estimates vary
    widely.
  • Both can also be used, since they convey
    complementary information.
  • The choice is subjective there are no
    hard-and-fast rules.
  • Principle 2 Explore the variability in study
    results with graphs and suitable analyses rather
    than relying exclusively on grand means (i.e.,
    a single summary statistic).

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
11
Deciding Which Metrics To Meta-analyze (1 of 6)
  • Problem
  • Within a study, sensitivity, specificity,
    positive/negative predictive values, and
    prevalence are all interrelated via simple
    formulas.
  • Meta-analyzing each metric across studies will
    create summaries that are inconsistent with these
    formulas.
  • Proposed solution
  • Obtain summaries for sensitivities and
    specificities across studies via meta-analysis,
    then back-calculate the rest of the metrics
    (using the formulas) over a range of prevalence
    values.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
12
Deciding Which Metrics To Meta-analyze (2 of 6)
  • A Visual Summary of Sensitivity/Specificity
    Across Studies With back calculation of the Other
    Metrics

NLR negative likelihood ratio NPV negative
predictive value PLR positive likelihood ratio
PPV positive predictive value Prev
prevalence Se sensitivity Sp specificity
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
13
Deciding Which Metrics To Meta-analyze (3 of 6)
  • Why does it make sense to directly meta-analyze
    sensitivity and specificity?
  • It aligns well with our understanding of
    positivity threshold effects.
  • Sensitivity and specificity are often considered
    independent of prevalence.
  • Summary sensitivity and specificity obtained by
    direct meta-analysis will always be between 0 and
    1.
  • These two metrics are not as easily understood as
    predictive values and likelihood ratios, so back
    calculation of these other metrics is useful.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
14
Deciding Which Metrics To Meta-analyze (4 of 6)
  • Why does it not make sense to directly
    meta-analyze predictive values or prevalence?
  • Predictive values are dependent on prevalence.
  • Rarely is it meaningful to meta-analyze each
    value across studies.
  • Prevalence is often wide ranging.
  • Prevalence cannot be estimated from case-control
    studies (the main design of many medical test
    studies).
  • It is better to back calculate these values over
    a range of plausible prevalence values.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
15
Deciding Which Metrics To Meta-analyze (5 of 6)
  • Why can directly meta-analyzing positive and
    negative likelihood ratios be problematic?
  • Combining likelihood ratios across studies does
    not guarantee the summary values are internally
    consistent.
  • It is possible to obtain summary likelihood
    ratios that correspond to impossible summary
    sensitivities or specificities (i.e., values lt0
    or gt1).
  • Back calculation avoids this.
  • This is not a common case, however often direct
    meta-analysis yields the same conclusions as back
    calculation.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
16
Deciding Which Metrics To Meta-analyze (6 of 6)
  • Directly analyzing diagnostic odds ratios
  • Is straightforward and follows standard
    meta-analytic methods
  • Characteristics of the diagnostic odds ratio
  • Closely linked to sensitivity, specificity, and
    likelihood ratios
  • Can easily be included in meta-regression models
    for analysis of heterogeneity between studies
  • Disadvantages
  • Challenging to interpret
  • Impossible to weigh the true-positive rate and
    the false-positive rate separately

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
17
Desired Characteristics of Meta-analytic Methods
  • Meta-analytic methods should
  • Respect the multivariate nature of test
    performance metrics (i.e., sensitivity and
    specificity)
  • Allow for nonindependence between sensitivity and
    specificity across studies (threshold effect)
  • Allow for between-study heterogeneity (i.e.,
    variability not explained by the statistical
    distribution of the data in each study)
  • The most theoretically motivated approaches are
    based on multivariate methods (hierarchical
    modeling).

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
18
Preferred Methods for Obtaining a Summary Point
(1 of 4)
  • Multivariate meta-analysis of sensitivity and
    specificity (i.e., joint analysis of both) should
    be performed, rather than separate univariate
    meta-analyses.
  • It requires hierarchical modeling.
  • Bivariate model
  • Hierarchical summary receiver operator
    characteristic model
  • Both families of models use two levels to model
    data.
  • 1st level within-study variability, from 2 2
    table counts
  • 2nd level between-study variability (i.e.
    heterogeneity), allowing for nonindependence of
    sensitivity and specificity across studies

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
19
Preferred Methods for Obtaining a Summary Point
(2 of 4)
  • Model families differ in the parameters used for
    between-study variability in the 2nd level.
  • The bivariate model uses parameters that are
    transformations of the average sensitivity and
    specificity.
  • The hierarchical summary receiver operator
    characteristic (HSROC) model uses a scale
    parameter and an accuracy parameter.
  • Both models are functions of the sensitivity and
    specificity.
  • They also define an underlying HSROC curve.
  • Both models are mathematically the same in the
    absence of covariates.
  • Both models assume a normal distribution of
    parameters, which can be difficult to satisfy.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
20
Preferred Methods for Obtaining a Summary Point
(3 of 4)
  • Researchers need to choose between the bivariate
    and the hierarchical summary receiver operator
    characteristic (HSROC) models when covariates are
    present (i.e., meta-regression analysis). For
    example
  • The bivariate model is more appropriate when
    there is variation in disease severity.
  • This affects sensitivity but not specificity.
  • The bivariate model allows direct evaluation of
    the difference in sensitivity and/or specificity.
  • The HSROC model is more effective when spectrum
    effects (the subjects in a study do not
    represent the patients who will receive the test
    in practice) are present.
  • This is more likely to affect test accuracy
    rather than threshold.
  • The HSROC model allows direct evaluation of the
    difference in accuracy and/or threshold
    parameters.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
21
Preferred Methods for Obtaining a Summary Point
(4 of 4)
  • Methods Commonly Used To Calculate a Summary
    Point

Method Description or Comment Does It Have the Desired Characteristics?
Independent meta-analysis of sensitivity and specificity Separate meta-analyses per metric Within-study variability preferably modeled by the binomial distribution Ignores the correlation between sensitivity and specificity Underestimates the summary sensitivity and specificity and wrong confidence intervals
Joint (multivariate) meta-analysis of sensitivity and specificity based on hierarchical modeling Based on multivariate (joint) modeling of sensitivity and specificity Two families of models that are equivalent when there are no covariates Modeling preferably using binomial likelihood rather than normal approximations The generally preferred method
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
22
Preferred Methods for Obtaining a Summary Line (1
of 2)
  • Hierarchical modeling is recommended.
  • Hierarchical summary lines can be calculated from
    bivariate random-effects model parameters
  • A range of hierarchical summary receiver operator
    characteristic (HSROC) lines can be calculated
    from fitted bivariate model parameters.
  • An example is the Rutter-Gatsonis HSROC model.
  • Represent alternative characterizations of the
    bivariate distribution of sensitivity and
    specificity
  • Show how the summary sensitivity changes with the
    summary specificity

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Rutter CM, Gatsonis
CA. Acad Radiol 1995 Mar2 Suppl 1S48-56
discussion S65-7, S70-1 pas. PMID 9419705.
23
Preferred Methods for Obtaining a Summary Line (2
of 2)
  • Methods Commonly Used To Calculate a Summary Line

Method Description or Comment Does It Have the Desired Characteristics?
Moses-Littenberg model Summary line based on a simple regression of the difference of logit-transformed true-positive and false-positive rates versus their average Ignores unexplained variation between-studies (fixed effects) Does not account for correlation between sensitivity and specificity Does not account for variability in the independent variable Inability to weight studies optimally yields wrong inferences when covariates are used
Random intercept augmentation of the Moses-Littenberg model Regression of the difference of logit-transformed true-positive and false-positive rates versus their average for random effects that allows for variability across studies Does not account for correlation between sensitivity and specificity Does not account for variability in the independent variable
Summary receiver operator characteristic (ROC) based on hierarchical modeling Same as for multivariate meta-analysis to obtain a summary point hierarchical modeling Many ways to obtain a (hierarchical) summary ROC Rutter-Gatsonis (most common) Several alternative curves Most theoretically motivated method Rutter-Gatsonis hierarchical summary ROC is recommended in the Cochrane Handbook, as it is the method that has been used most often
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Littenberg B, Moses
LE. Med Decis Making 1993 Oct-Dec13(4)313-21.
PMID 8246704. Rutter CM, Gatsonis CA. Acad
Radiol 1995 Mar2 Suppl 1S48-56 discussion
S65-7, S70-1 pas. PMID 9419705.
24
Special Case Joint Analysis of Sensitivity and
Specificity With Multiple Thresholds (1 of 2)
  • It is not uncommon for studies to report multiple
    sensitivity/ specificity pairs at several
    thresholds for positive tests.
  • Option 1 Decide on one threshold from each study
    (e.g., the threshold with the highest
    sensitivity)
  • Option 2 Use all thresholds
  • An extension of the hierarchical summary receiver
    operator characteristic model has been developed
    for this purpose.
  • A method combining whole receiver operator
    characteristic (ROC) curves can also be used.
  • It is recommended that data be explored
    graphically in ROC space to highlight
    similarities and differences among the studies.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
25
Special Case Joint Analysis of Sensitivity and
Specificity With Multiple Thresholds (2 of 2)
  • This is an example of an ROC graph for studies
    with different thresholds for total serum
    bilirubin. Points on the line for each study
    represent sensitivity/specificity pairs at
    different threshold values.

This is a typical receiver operator
characteristic (ROC) graph for four hypothetical
studies. Studies in the left shaded area have an
LR 10. Studies in the top shaded area have an
LR- 0.1. Those in the intersection have both.
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Chung
M, Lau J, et al. Pediatrics 2009
Oct124(4)1162-71. PMID 19786450.
26
Recommended Algorithm
  • A three-step algorithm is recommended for
    meta-analyzing studies with a gold standard
    reference
  • Start by considering sensitivity and specificity
    separately.
  • Perform a multivariate meta-analysis (when each
    study reports a single threshold).
  • Explore between-study heterogeneity

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
27
Step 1 Start by Considering Sensitivity and
Specificity Separately (1 of 2)
  • Reviewers should familiarize themselves with the
    pattern of study-level sensitivities and
    specificities.
  • Use graphical displays.
  • Forest plots of study sensitivities and
    specificities with their confidence intervals
    give a visual impression of variability of
    sensitivity and specificity across studies
  • A plot of sensitivity (vertical axis) versus 1
    specificity (horizontal axis) give a visual
    impression of the relationship between
    sensitivity and specificity across studies. These
    plots are also known as receiver operator
    characteristic graphs.
  • A shoulder-and-arm pattern is present when there
    is a threshold effect.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
28
Step 1 Start by Considering Sensitivity and
Specificity Separately (2 of 2)
  • Examples of forest plots
  • An example of a receiver operator characteristic
    graph with the shoulder-and-arm pattern

Increasing the threshold decreases sensitivity
but increases specificity
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Becker DM, Philbrick
JT, Bachhuber TL, et al. Arch Intern Med 1996 May
13156(9)939-46. PMID 8624174.
29
Step 2 Multivariate Meta-analysis (When Each
Study Reports a Single Threshold)
  • Obtain a 2-dimensional summary point
    (sensitivity, specificity) using the bivariate
    model of meta-analysis, preferably with
    utilization of binomial error.
  • Obtain summary lines based on multivariate
    meta-analytic models.
  • Interpretation of a summary line is not
    automatically that of threshold effects,
    especially if there is a positive correlation
    between sensitivity and specificity across
    studies
  • If more than one threshold is reported per study,
    consider incorporating all of them in the
    analysis both qualitatively (via graphs) and
    quantitatively (via proper methods).

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
30
Step 3 Explore Between-Study Heterogeneity
  • The hierarchical summary receiver operator
    characteristic (HSROC) model allows direct
    evaluation of heterogeneity in accuracy and
    threshold parameters.
  • Bivariate models allow direct evaluation of
    sensitivity and specificity.
  • Added covariates that reduce variability across
    studies may need to be taken into account when
    summarizing the studies.
  • Some common sources of heterogeneity
  • Patient population/selection
  • Methods to verify/interpret results
  • Clinical setting
  • Disease severity

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
31
Example 1 D-Dimer Assays for Diagnosing Venous
Thromboembolism (1 of 4)
Forest Plots of Sensitivity, Specificity, and
Likelihood Ratios
  • D-dimers are fragments specific to fibrin
    degradation.
  • They are measured by using an enzyme-linked
    immunosorbent assay (ELISA) to diagnose venous
    thromboembolism.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Becker DM, Philbrick
JT, Bachhuber TL, et al. Ann Intern Med 1996 May
13156(9)939-46. PMID 8624174.
32
Example 1 D-Dimer Assays for Diagnosing Venous
Thromboembolism (2 of 4)
  • Forest plots show more heterogeneity in
    sensitivity/specificity than in likelihood
    ratios.
  • Verified by formal heterogeneity testing
  • May be a threshold effect
  • Because of the variety of thresholds being used
    in each study, it is more informative to
    summarize test performance with an hierarchical
    summary receiver operator characteristic plot
    rather than by summarizing sensitivities and
    specificities.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
33
Example 1 D-Dimer Assays for Diagnosing Venous
Thromboembolism (3 of 4)
HSROC Plot of D-Dimer Tests Using the Highest
Thresholds
  • The shoulder-and-arm pattern indicates the
    threshold effect.
  • The location of points in the upper shaded area
    of the receiver operator characteristic space
    indicates high sensitivity and low specificity.
  • The test minimizes false-negative results and is
    good for ruling out disease.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Lijmer JG, Bossuyt
PM, Heisterkamp SH. Stat Med 2002 Jun
1521(11)1525-37. PMID 12111918.
34
Example 1 D-Dimer Assays for Diagnosing Venous
Thromboembolism (4 of 4)
Calculated Negative Predictive Values for the
D-Dimer Test With the Prevalence of Venous
Thromboembolism Between 5 and 50 Percent
  • It is informative to give a summary of the
    negative and positive predictive values for this
    test.
  • Calculate over a range of prevalence values using
    the summary sensitivity and specificity values.
  • A consistently high negative predictive value
    line means that a high percentage of people who
    test negative actually are negative for the
    disease.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Lijmer JG, Bossuyt
PM, Heisterkamp SH. Stat Med 2002 Jun
1521(11)1525-37. PMID 12111918.
35
Example 2 Serial Measurements of the Creatine
Kinase-Myocardial Band To Diagnose Acute Cardiac
Ischemia (1 of 3)
  • Serial measurements of the creatine
    kinase-myocardial band (CK-MB) are used to
    diagnose acute cardiac ischemia in the emergency
    room.
  • Blood levels of CK-MB increase over time from
    symptom onset.
  • 14 studies performed CK-MB testing at varying
    times after symptom onset.
  • There was evident heterogeneity in sensitivity
    that was not attributable to the threshold
    effect.
  • The sensitivity of the test increased as the time
    from symptom onset increased.
  • The difference in sensitivity may be attributable
    to time to test this possibility, a bivariate
    meta-analytic model was used.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
36
Example 2 Serial Measurements of the Creatine
Kinase-Myocardial Band To Diagnose Acute Cardiac
Ischemia (2 of 3)
  • Sensitivity increases with longer hours from
    symptom onset to the last measurement of the
    creatine kinase-myocardial band.

Actual Hours
95-Percent Confidence Regions
Actual hours are indicated next to the points
circles 3 hours Xs gt 3 hours
Dashed lines 95-percent confidence regions
blue 3 hours red gt 3 hours
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Balk EM, Ioannidis
JP, Salem D, et al. Ann Emerg Med 2001
May37(5)478-94. PMID 11326184. Lau J,
Ioannidis JP, Balk E, et al. Evid Rep Technol
Assess (Summ) 2000 Sep(26)1-4. PMID 11079073.
37
Example 2 Serial Measurements of the Creatine
Kinase-Myocardial Band To Diagnose Acute Cardiac
Ischemia (3 of 3)
  • The hierarchical summary receiver operator
    characteristic (HSROC) model (bivariate
    meta-regression) was used to compare summary
    sensitivity and specificity with a binary
    variable to account for timing of the last serial
    creatine kinase-myocardial band measurement
    (fixed-effects binary covariate).
  • Note that properly specified bivariate/HSROC
    meta-regressions can be used to compare two or
    more index tests.

Meta-analysis Metric 3 Hours gt3 Hours P-Value for the Comparison Across Subgroups
Summary sensitivity (Percentage) 80 (64 to 90) 96 (85 to 99) 0.36
Summary specificity (Percentage) 97 (94 to 98) 97 (95 to 99) 0.56
Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Balk EM, Ioannidis
JP, Salem D, et al. Ann Emerg Med 2001
May37(5)478-94. PMID 11326184. Lau J,
Ioannidis JP, Balk E, et al. Evid Rep Technol
Assess (Summ) 2000 Sep(26)1-4. PMID 11079073.
38
Overall Recommendations (1 of 3)
  • Use the bivariate random-effects meta-analytic
    models to obtain a summary sensitivity and
    specificity.
  • Back-calculate the overall positive and negative
    predictive values (over a range of prevalence
    values) from summary estimates of sensitivity and
    specificity, rather than meta-analyzing them
    directly.
  • Back-calculate overall positive and negative
    likelihood ratios from summary estimates of
    sensitivity and specificity, rather than
    meta-analyzing them directly.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
39
Overall Recommendations (2 of 3)
  • To obtain a summary line, use multivariate
    meta-analysis methods such as the hierarchical
    summary receiver operator characteristic (HSROC)
    model.
  • Several summary lines can be obtained based on
    multivariate meta-analytic models.
  • They can differ when the estimated correlation
    between sensitivity and specificity is positive
    and when there is little between-study
    variability.
  • If there is evidence of a positive correlation,
    the variability in the studies cannot be
    attributed to a threshold effect.
  • Explore for missing important covariates.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
40
Overall Recommendations (3 of 3)
  • If more than one threshold is reported per study,
    this must be taken into account in the
    quantitative analyses.
  • Qualitative analysis with graphs and quantitative
    analyses with proper methods are encouraged.
  • Explore the impact of study characteristics on
    summary results using meta-regressionbased
    analyses or subgroup analyses in the context of
    the primary methodology used to summarize the
    studies.

Trikalinos TA, Coleman CI, Griffith L, et al.
Meta-analysis of test performance when there is a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
41
Practice Question 1 (1 of 2)
  • Within individual studies of a systematic review,
    sensitivity and specificity are independent
    variables.
  • True
  • False

42
Practice Question 1 (2 of 2)
  • Explanation for Question 1
  • This statement is true. Sensitivity and
    specificity within each study are independent
    because they are estimated from different
    patients. Across studies they typically are
    negatively correlated.

43
Practice Question 2 (1 of 2)
  • Why does this module recommend directly
    meta-analyzing sensitivity and specificity?
  • Sensitivity and specificity are dependent on the
    prevalence of the condition under study.
  • Other predictive values and likelihood ratios can
    be back-calculated for a range of prevalence
    values by using known formulas.
  • Summary sensitivity and specificity obtained by
    direct meta-analysis will always be greater than
    1.
  • Interpretation of sensitivity and specificity is
    very intuitive.

44
Practice Question 2 (2 of 2)
  • Explanation for Question 2
  • The correct answer is b. Once the summary
    sensitivity and specificity are calculated by
    meta-analysis, there are formulas that allow the
    back calculation of overall predictive values and
    likelihood ratios. Likelihood ratios and
    predicative values are more easily interpreted by
    the reader of the review. Sensitivity and
    specificity are often considered to be
    independent of prevalence because they do not
    depend on it mathematically and will always be
    between 0 and 1.

45
Practice Question 3 (1 of 2)
  • What is the preferred method for obtaining a
    summary sensitivity and specificity in a
    meta-analysis?
  • Multivariate meta-analysis
  • Separate univariate meta-analyses
  • Using a summary line
  • The Kester and Buntinx variant

46
Practice Question 3 (2 of 2)
  • Explanation for Question 3
  • The correct answer is a. A multivariate
    meta-analysis of sensitivity and specificity is
    the recommended method for obtaining a summary
    point (summary sensitivity and specificity). This
    is a joint analysis of both quantities instead of
    a separate univariate meta-analyses. Obtaining a
    summary line is an alternative to calculating a
    summary point. The Kester and Buntinx method is
    used to analyze sensitivity and specificity pairs
    when there are several thresholds for positive
    tests.

47
Practice Question 4 (1 of 2)
  • In which situation would a summary line be more
    helpful in summarizing medical test performance?
  • Sensitivity and specificity estimates of various
    studies do not vary widely.
  • Sensitivity and specificity of various studies
    vary over a large range.

48
Practice Question 4 (2 of 2)
  • Explanation for Question 4
  • The correct answer is b. Both a summary point and
    a summary line are informative and are useful in
    synthesizing data. There are no strict rules to
    follow in deciding which to use. A summary line
    may be more helpful as a summary of test
    performance when the sensitivity and specificity
    estimates of various studies vary over a large
    range.

49
Authors
  • This presentation was prepared by Brooke
    Heidenfelder, Andrzej Kosinski, Rachael Posey,
    Lorraine Sease, Remy Coeytaux, Gillian Sanders,
    and Alex Vaz, of the Duke University
    Evidence-based Practice Center.
  • The module is based on Trikalinos TA, Coleman CI,
    Griffith L, et al. Meta-analysis of test
    performance when there is a gold standard. In
    Chang SM and Matchar DB, eds. Methods guide for
    medical test reviews. Rockville, MD Agency for
    Healthcare Research and Quality June 2012. p.
    8.1-21. AHRQ Publication No. 12-EHC017. Available
    at www.effectivehealthcare.ahrq.gov/medtestsguide.
    cfm.

50
References (1 of 9)
  • Arends LR, Hamza TH, van Houwelingen JC, et al.
    Bivariate random effects meta-analysis of ROC
    curves. Med Decis Making 2008 Sep-Oct28(5)621-38
    . PMID 18591542.
  • Balk EM, Ioannidis JP, Salem D, et al. Accuracy
    of biomarkers to diagnose acute cardiac ischemia
    in the emergency department a meta-analysis. Ann
    Emerg Med 2001 May37(5)478-94. PMID 11326184.
  • Becker DM, Philbrick JT, Bachhuber TL, et al.
    D-dimer testing and acute venous thromboembolism.
    A shortcut to accurate diagnosis? Arch Intern Med
    1996 May 13156(9)939-46. PMID 8624174.
  • Bossuyt PM, Reitsma JB, Bruns DE, et al. The
    STARD statement for reporting studies of
    diagnostic accuracy explanation and elaboration.
    Ann Intern Med 2003 Jan 7138(1)W1-12. PMID
    12513067.

51
References (2 of 9)
  • Chappell FM, Raab GM, Wardlaw JM. When are
    summary ROC curves appropriate for diagnostic
    meta-analyses? Stat Med 2009 Sep
    2028(21)2653-68. PMID 19591118.
  • Deeks JJ, Altman DG. Diagnostic tests 4
    likelihood ratios. BMJ 2004 Jul
    17329(7458)168-9. PMID 15258077.
  • Dukic V, Gatsonis C. Meta-analysis of diagnostic
    test accuracy assessment studies with varying
    number of thresholds. Biometrics 2003
    Dec59(4)936-46. PMID 14969472.
  • Fu R, Gartlehner G, Grant M, et al. Conducting
    quantitative synthesis when comparing medical
    interventions AHRQ and the Effective Health Care
    Program. J Clin Epidemiol 2011 Nov64(11)1187-97.
    PMID 21477993.
  • Glas AS, Lijmer JG, Prins MH, et al. The
    diagnostic odds ratio a single indicator of test
    performance. J Clin Epidemiol 2003
    Nov56(11)1129-35. PMID 14615004.

52
References (3 of 9)
  • Harbord RM, Deeks JJ, Egger M, et al. A
    unification of models for meta-analysis of
    diagnostic accuracy studies. Biostatistics 2007
    Apr8(2)239-51. PMID 16698768.
  • Harbord RM, Whiting P, Sterne JA, et al. An
    empirical comparison of methods for meta-analysis
    of diagnostic accuracy showed hierarchical models
    are necessary. J Clin Epidemiol 2008
    Nov61(11)1095-103. PMID 19208372.
  • Hartmann KE, Matchar DB, Chang S. Chapter 6
    assessing applicability of medical test studies
    in systematic reviews. J Gen Intern Med 2012
    Jun27 Suppl 1S39-46. PMID 22648674.
  • Irwig L, Tosteson AN, Gatsonis C, et al.
    Guidelines for meta-analyses evaluating
    diagnostic tests. Ann Intern Med 1994 Apr
    15120(8)667-76. PMID 8135452.

53
References (4 of 9)
  • Kardaun JW, Kardaun OJ. Comparative diagnostic
    performance of three radiological procedures for
    the detection of lumbar disk herniation. Methods
    Inf Med 1990 Jan29(1)12-22. PMID 2308524.
  • Kester AD, Buntinx F. Meta-analysis of ROC
    curves. Med Decis Making 2000 Oct-Dec20(4)430-9.
    PMID 11059476.
  • Lau J, Ioannidis JP, Balk E, et al. Evaluation of
    technologies for identifying acute cardiac
    ischemia in emergency departments. Evid Rep
    Technol Assess (Summ) 2000 Sep(26)1-4. PMID
    11079073.
  • Lau J, Ioannidis JP, Schmid CH. Summing up
    evidence one answer is not always enough. Lancet
    1998 Jan 10351(9096)123-7. PMID 9439507.
  • Leeflang MM, Bossuyt PM, Irwig L. Diagnostic test
    accuracy may vary with prevalence implications
    for evidence-based diagnosis. J Clin Epidemiol
    2009 Jan62(1)5-12. PMID 18778913.

54
References (5 of 9)
  • Lijmer JG, Bossuyt PM, Heisterkamp SH. Exploring
    sources of heterogeneity in systematic reviews of
    diagnostic tests. Stat Med 2002 Jun
    1521(11)1525-37. PMID 12111918.
  • Lijmer JG, Mol BW, Heisterkamp S, et al.
    Empirical evidence of design-related bias in
    studies of diagnostic tests. JAMA 1999 Sep
    15282(11)1061-6. PMID 10493205.
  • Littenberg B, Moses LE. Estimating diagnostic
    accuracy from multiple conflicting reports a new
    meta-analytic method. Med Decis Making 1993
    Oct-Dec13(4)313-21. PMID 8246704.
  • Loong TW. Understanding sensitivity and
    specificity with the right side of the brain. BMJ
    2003 Sep 27327(7417)716-9. PMID 14512479.
  • Moses LE, Shapiro D, Littenberg B. Combining
    independent studies of a diagnostic test into a
    summary ROC curve data-analytic approaches and
    some additional considerations. Stat Med 1993 Jul
    3012(14)1293-316. PMID 8210827.

55
References (6 of 9)
  • Mulherin SA, Miller WC. Spectrum bias or spectrum
    effect? Subgroup variation in diagnostic test
    evaluation. Ann Intern Med 2002 Oct
    1137(7)598-602. PMID 12353947.
  • Oei EH, Nikken JJ, Verstijnen AC, et al. MR
    imaging of the menisci and cruciate ligaments a
    systematic review. Radiology 2003
    Mar226(3)837-48. PMID 12601211.
  • Reitsma JB, Glas AS, Rutjes AW, et al. Bivariate
    analysis of sensitivity and specificity produces
    informative summary measures in diagnostic
    reviews. J Clin Epidemiol 2005 Oct58(10)982-90.
    PMID 16168343.
  • Riley RD, Abrams KR, Lambert PC, et al. An
    evaluation of bivariate random-effects
    meta-analysis for the joint synthesis of two
    correlated outcomes. Stat Med 2007 Jan
    1526(1)78-97. PMID 16526010.

56
References (7 of 9)
  • Riley RD, Abrams KR, Sutton AJ, et al. Bivariate
    random-effects meta-analysis and the estimation
    of between-study correlation. BMC Med Res
    Methodol 2007 Jan 1273. PMID 17222330.
  • Rutjes AW, Reitsma JB, Di Nisio M, et al.
    Evidence of bias and variation in diagnostic
    accuracy studies. CMAJ 2006 Feb 14174(4)469-76.
    PMID 16477057.
  • Rutter CM, Gatsonis CA. Regression methods for
    meta-analysis of diagnostic test data. Acad
    Radiol 1995 Mar2 Suppl 1S48-56 discussion
    S65-7, S70-1 pas. PMID 9419705.
  • Simel DL, Bossuyt PM. Differences between
    univariate and bivariate models for summarizing
    diagnostic accuracy may not be large. J Clin
    Epidemiol 2009 Dec62(12)1292-300. PMID
    19447007.
  • Thompson SG, Sharp SJ. Explaining heterogeneity
    in meta-analysis a comparison of methods. Stat
    Med 1999 Oct 3018(20)2693-708. PMID 10521860.

57
References (8 of 9)
  • Trikalinos TA, Coleman CI, Griffith L, et al.
    Meta-analysis of test performance when there is a
    gold standard. In Chang SM and Matchar DB,
    eds. Methods guide for medical test reviews.
    Rockville, MD Agency for Healthcare Research and
    Quality June 2012. p. 8.1-21. AHRQ Publication
    No. 12-EHC017. Available at www.effectivehealthcar
    e.ahrq.gov/medtestsguide.cfm.
  • Trikalinos TA, Chung M, Lau J, et al. Systematic
    review of screening for bilirubin encephalopathy
    in neonates. Pediatrics 2009 Oct124(4)1162-71.
    PMID 19786450.
  • Visser K, Hunink MG. Peripheral arterial disease
    gadolinium-enhanced MR angiography versus
    color-guided duplex US--a meta-analysis.
    Radiology 2000 Jul216(1)67-77. PMID 10887229.
  • Whiting P, Rutjes AW, Reitsma JB, et al. Sources
    of variation and bias in studies of diagnostic
    accuracy a systematic review. Ann Intern Med
    2004 Feb 3140(3)189-202. PMID 14757617.

58
References (9 of 9)
  • Zwinderman AH, Bossuyt PM. We should not pool
    diagnostic likelihood ratios in systematic
    reviews. Stat Med 2008 Feb 2827(5)687-97. PMID
    17611957.
Write a Comment
User Comments (0)
About PowerShow.com