Options for Summarizing Medical Test Performance in the Absence of a - PowerPoint PPT Presentation

About This Presentation
Title:

Options for Summarizing Medical Test Performance in the Absence of a

Description:

Options for Summarizing Medical Test Performance in the Absence of a Gold Standard Prepared for: The Agency for Healthcare Research and Quality (AHRQ) – PowerPoint PPT presentation

Number of Views:218
Avg rating:3.0/5.0
Slides: 44
Provided by: AlexP153
Learn more at: http://www.baylorcme.org
Category:

less

Transcript and Presenter's Notes

Title: Options for Summarizing Medical Test Performance in the Absence of a


1
Options for SummarizingMedical Test
Performancein the Absence of a Gold Standard
  • Prepared for
  • The Agency for Healthcare Research and Quality
    (AHRQ)
  • Training Modules for Medical Test Reviews Methods
    Guide
  • www.ahrq.gov

2
Learning Objectives
  • Recognize settings where the reference standard
    may be imperfect (i.e., no gold standard)
  • Describe sources of potential bias resulting from
    the use of an imperfect reference standard when
    estimating the sensitivity and specificity of a
    medical test
  • Understand the options for analyzing data, their
    advantages and justification, and potential
    assumptions

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
3
Introduction Classical Paradigm
  •  

Truly Diseased Truly Healthy
Index text () True positive (TP) False positive (FP)
Index test (-) False negative (FN) True negative (TN)
 
Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
4
Introduction Reference Standard Issues
  • True status is directly observable (e.g., for
    tests predicting short-term mortality after a
    procedure).
  • True status is commonly based on a reference
    standard (test), which is considered to be a
    gold standard if it actually reflects the
    true status.
  • Reference standard bias arises when the
    reference test does not mirror the truth well.
  • The further the reference test deviates from the
    truth, the less accurate the estimate of the
    index tests performance.
  • An imperfect reference standard is a reference
    standard test that misclassifies true status at
    a rate that cannot be ignored.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
5
Imperfect Reference Standard Scenario (1 of 2)
  • The simplest case is an index test and a
    reference standard that give dichotomous results
    (e.g., positive or negative for disease).
  • Both the index and reference tests can err by not
    reflecting the true status.
  • The example in the following slide shows true
    2-by-2 table probabilities in relation to the
    eight combinations of index and reference test
    results.
  • These eight probabilities (?1, ?1, ?1, ?1, ?2,
    ?2, ?2, and ?2) need to be estimated from the
    accuracy data.
  • The perfect reference standard is the gold
    standard.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
6
Imperfect Reference Standard Scenario (2 of 2)
  •  

Truly Diseased Truly Diseased Truly Healthy Truly Healthy
RS () RS (-) RS () RS (-)
Index test () ?1 ?2 ?2 ?1
Index test (-) ?1 ?2 ?2 ?1
RS () RS (-)
Index test () ? ?1 ?2 ? ?1 ?2
Index test (-) ? ?1 ?2 ? ?1 ?2
Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
7
Imperfect Reference Standard Bias (1 of 2)
  •  

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
8
Imperfect Reference Standard Bias (2 of 2)
  • Naïve estimates are underestimates versus true
    values when test results are independent among
    those with and without the condition of interest
    (conditional independence).

Abbreviations Seindex index test
specificity Spindex index test specificity P
disease prevalence
Solid red line true sensitivity
Dashed red line true specificity Solid black
line naïve sensitivity Dashed black
line naïve specificity
Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
9
Reference Standard Validity
  • Only rarely are we absolutely sure that the
    reference standard is a perfect reflection of the
    truth.
  • Often, we are comfortable with overlooking small
    or moderate misclassifications by the reference
    standard.
  • Hard-and-fast rules for judging the (in)adequacy
    of the reference standard do not exist.
  • Consult content experts on a case-by-case basis
    to make judgments.
  • There are three settings in which one might
    question the validity of the reference standard.
  • The reference method yields different
    measurements over time or across settings.
  • The condition of interest is variably defined.
  • The new method is an improved version of a
    usually applied test

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
10
Imperfect Reference Standard Setting 1
  • Situation The reference method yields different
    measurements over time or across settings.
  • Example Diagnosis of obstructive sleep apnea
    typically requires a high Apnea-Hypopnea Index
    (AHI an objective measurement) and the presence
    of suggestive symptoms and signs.
  • Problem There is large night-to-night
    variability in measured AHI and substantial
    between-rater and between-laboratory variability.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
11
Imperfect Reference Standard Setting 2
  • Situation The condition of interest is variably
    defined.
  • Example The disease, such as psoriatic
    arthritis, is complex.
  • Problem There is no single symptom, sign, or
    measurement that suffices to make a diagnosis of
    the disease with certainty. Instead, a set of
    diagnostic criteria (symptoms, signs, imaging
    results, and laboratory measures) is used to
    identify the disease, which will unavoidably be
    differentially applied across studies.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
12
Imperfect Reference Standard Setting 3
  • Situation The new method is an improved version
    of a usually applied test.
  • Example Measurement of parathyroid hormone (PTH)
  • Problem Older measurement methodologies are
    being replaced by newer, more specific ones.
  • Measurements with the new and old methodologies
    do not agree very well.
  • It is incorrect to use the older method as the
    reference standard.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
13
Analytic Options for a Systematic Review
  • Analytic options 1 and 2 below are preferred when
    possible to summarize data from two fallible
    tests option 3 is also suitable.
  • Forgo the classical paradigm, which focuses on
    test accuracy assess the ability of the index
    test to predict patient outcomes (using the index
    test as a predictive instrument).
  • Forgo the classical paradigm assess agreement of
    the index and reference test results, that is,
    treat index and reference tests as two
    alternative measurement methods.
  • Using the classical paradigm, calculate naïve
    estimates of the index tests sensitivity and
    specificity, but qualify study findings to avoid
    misinterpretation.
  • Mathematically adjust the naïve estimates of
    the index tests sensitivity and specificity to
    account for the imperfect reference standard.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
14
Analysis Option 1Focus on Prediction of Patient
Outcomes
  • Forgo the classical paradigm, which compares the
    index test to a reference standard (test
    accuracy).
  • This information is not informative or
    interpretable with an imperfect reference
    standard.
  • Instead, assess the ability of the index test to
    predict patient outcomes such as history, future
    clinical events, and response to therapy.
  • This option follows a well-known paradigm in
    systematic reviews for evaluating prognostic
    tests (more information is available in Module
    12).

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
15
Analysis Option 2 Focus on the Agreement of
Index and Reference Tests (1 of 2)
  • Forgo the classical paradigm (test accuracy).
  • Instead, assess agreement (concordance) of the
    index and reference test results.
  • Simply treat the index and reference tests as two
    alternative measurement methods.
  • How to do this depends on whether the results are
    categorical or continuous.
  • For categorical test results
  • Cohens kappa statistic is a measure of
    categorical agreement that accounts for agreement
    by chance.
  • Meta-analyses of kappa statistics are not common
    in the medical literature they will need to be
    explained and interpreted in detail.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
16
Analysis Option 2 Focus on the Agreement of
Index and Reference Tests (2 of 2)
  • When there are continuous test results but
    individual data points are available, the
    researcher can
  • Directly compare measurements between tests
  • Pool data from all available studies and
  • Perform regression of one test versus another,
    which accounts for measurement error
  • Conduct a Bland-Altman analysis (difference vs.
    the average of the two test results)
  • When there are continuous test results but
    individual data points are not available, the
    researcher can
  • Summarize study-level information from (1) or (2)
    above

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
17
Analysis Option 3Calculate Naïve Estimates
and Discuss Bias
  • Calculate naïve estimates of the index test
    sensitivity (Se) and specificity (Sp), ignoring
    imperfection of the reference standard but making
    qualitative judgments on the direction of bias of
    these naïve estimates.
  • Index and reference tests are independent within
    strata of disease (conditional independence).
    Naïve estimates of index test Se and Sp are
    biased downward (underestimated).
  • Index and reference tests are correlated within
    strata of disease. Naïve estimates of Se and Sp
    can be
  • Overestimates if tests agree more than by chance
  • Underestimates when tests disagree more than by
    chance
  • Problem The researcher cannot assume conditional
    independence without justification external data
    are needed.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
18
Analysis Option 3 Example
  • The prostate-specific antigen (PSA) test is used
    to detect prostate cancer.
  • Numerous methods have been developed to test PSA
    levels.
  • These tests prone to false-negative
    misclassification PSA levels are not elevated in
    up to 15 percent of prostate cancer cases.
  • Obesity can reduce serum PSA.
  • Obesity will likely affect all PSA-detection
    methods, old and new (conditional dependence).
  • Conditional dependence of PSA tests results in
    overestimation of the accuracy of a new (index)
    test.
  • When compared to a non-PSA reference (e.g., a
    prostate biopsy), this is no conditional
    dependence misclassification results in in
    underestimation.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
19
Analysis Option 4 Mathematically Adjust Naïve
Estimates
  • Mathematically adjust (correct) the naïve
    estimates of the index test sensitivity and
    specificity to account for the imperfect
    reference standard.
  • Data from 2 ? 2 tables are not enough additional
    information is needed from the literature.
  • The task is easiest if conditional independence
    can be assumed when
  • The sensitivity and specificity of an imperfect
    reference test are known from other studies.
  • The specificity of both the index and imperfect
    reference standard are known from other studies,
    but the sensitivities are unknown.
  • Use Bayesian inference to add prior distribution
    data from other studies as opposed to fixed
    values. It provides data on sensitivity,
    specificity, and disease prevalence.
  • Alternative sets of assumptions are possible.
  • Problem Model mis-specification can result in
    biased estimates.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
20
Example Performing a Systematic Review on
Obstructive Sleep Apnea
  • Obstructive sleep apnea (OSA) is characterized by
    sleep disturbances secondary to upper airway
    obstruction.
  • OSA has a prevalence of 2 to 4 percent in
    middle-aged adults.
  • It is associated with daytime somnolence,
    cardiovascular morbidity, diabetes, and other
    adverse outcomes.
  • Treatment includes continuous positive airway
    pressure.
  • A systematic review on the diagnosis of OSA in
    the home setting used
  • Portable monitors as the index diagnostic test
  • Facility-based polysomnography as the reference
    standard
  • The reviewers first attempted analysis option 3,
    then moved on to analysis option 2.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip
S, Raman G, et al. Home diagnosis of obstructive
sleep apnea-hypopnea syndrome. Technology
Assessment. Available at www.cms.gov/Medicare/Cove
rage/DeterminationProcess/downloads/id48TA.pdf.
21
Systematic Review ExampleChoice of Reference
Standard and Cutoff
  • There is no perfect or accepted reference
    standard for obstructive sleep apnea (OSA).
  • A diagnosis of OSA is based on suggestive signs
    and symptoms and objective assessment of
    breathing patterns during sleep with
    facility-based polysomnography (PSG).
  • PSG quantifies the Apnea-Hypopnea Index (AHI).
  • Portable monitors can also measure AHI.
  • A high AHI (usually 15 events per hour of sleep)
    is suggestive of OSA alternative cutoffs range
    from 5 to 40 events/hour.
  • The main analysis in the systematic reviews used
    a cutoff of AHI 15, but cutoffs of 10 and 20
    were also analyzed (there were too few data to
    analyze other cut-offs).

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip
S, Raman G, et al. Home diagnosis of obstructive
sleep apnea-hypopnea syndrome. Technology
Assessment. Available at www.cms.gov/Medicare/Cove
rage/DeterminationProcess/downloads/id48TA.pdf.
22
Systematic Review ExampleAnalysis Option 3
Naïve Estimates
  • The reviewers calculated naïve estimates of the
    sensitivity (Se) and specificity (Sp) of the
    Apnea-Hypopnea Index by comparing portable
    monitors with polysomnography and qualified
    the results.
  • Naïve estimates of sensitivity and specificity
    were displayed in the receiver operator
    characteristic space.
  • High Se and Sp levels were suggested.
  • However, there was considerable variability in
    the measurements.
  • It was not possible to deduce whether the naïve
    estimates overestimate or underestimate the
    true Se and Sp.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip
S, Raman G, et al. Home diagnosis of obstructive
sleep apnea-hypopnea syndrome. Technology
Assessment. Available at www.cms.gov/Medicare/Cove
rage/DeterminationProcess/downloads/id48TA.pdf.
23
Systematic Review ExampleAnalysis Option 2
Pooled Data Analysis
  • Reviewers also described concordance between
    Apnea-Hypopnea Index (AHI) measured by portable
    monitors (index test) versus polysomnography
    (reference test) with Bland-Altman analysis
    (continuous data with individual points
    available), but are the tests interchangeable?
  • They found better agreement for lower AHI levels.

Dashed line line of perfect agreement Broad
limits suboptimal agreement
Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip
S, Raman G, et al. Home diagnosis of obstructive
sleep apnea-hypopnea syndrome. Technology
Assessment. Available at www.cms.gov/Medicare/Cove
rage/DeterminationProcess/downloads/id48TA.pdf.
24
Systematic Review ExampleAnalysis Option 2
Study-Specific Results
  • The reviewers summarized Bland-Altman plots
    across studies.
  • The mean difference in the two measurements of
    the Apnea-Hypopnea Index (mean bias) and the
    95-percent limits of agreement are shown for each
    study.
  • The 95-percent limits of agreement are very wide
    in most studies, suggesting great variability in
    the measurements with the two methods.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip
S, Raman G, et al. Home diagnosis of obstructive
sleep apnea-hypopnea syndrome. Technology
Assessment. Available at www.cms.gov/Medicare/Cove
rage/DeterminationProcess/downloads/id48TA.pdf.
25
Systematic Review ExampleConclusions and a
Recommendation
  • Measurements of the Apnea-Hypopnea Index (AHI)
    with the two methods generally agree on which
    patients have 15 or less events per hour of sleep
    (low AHI).
  • The methods disagree on the exact measurement
    among people who have higher AHIs on average.
  • The reviewers identified a gap in the literature.
  • The reviewers recommended undertaking studies
    that perform clinical validation of portable
    monitors, i.e. their ability to predict patients
    history, risk propensity, or clinical profile
    (analysis option 1).

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip
S, Raman G, et al. Home diagnosis of obstructive
sleep apnea-hypopnea syndrome. Technology
Assessment. Available at www.cms.gov/Medicare/Cove
rage/DeterminationProcess/downloads/id48TA.pdf.
26
Overall Recommendations
  • When multiple reference standard tests, or
    multiple cutoffs for the same reference test, are
    available
  • Justify the choice of test and/or cutoff or
  • Consider analyzing multiple options
  • Decide on the most appropriate analysis options
    to synthesize test performance.
  • The four analysis options presented in this
    module are largely complementary approaches and
    are not mutually exclusive.
  • Analysis options 1, 2, and 3 are recommended.
  • Analysis option 4 requires expert statistical
    help.
  • There are no empirical data on the merits and
    pitfalls of the mathematical adjustments in
    option 4 for an imperfect reference standard.

Trikalinos TA, Balion TA. Options for summarizing
medical test performance in the absence of a
gold standard. In Methods guide for medical
test reviews. Available at www.effectivehealthcare
.ahrq.gov/medtestsguide.cfm.
27
Practice Question 1 (1 of 2)
  • The validity of the reference standard should be
    questioned when the new test being evaluated is
    an improved version of the usually applied test.
  • True
  • False

28
Practice Question 1 (2 of 2)
  • Explanation for Question 1
  • The statement is true. There are several
    situations when the validity of the reference
    standard should be questioned. These include when
    a new method of testing is an improved version of
    the usually applied test. Measurements using the
    different methods may not agree well.

29
Practice Question 2 (1 of 2)
  • Which of the following options is considered most
    preferable for evaluating information on a
    diagnostic test when there is no perfect
    reference test (gold standard)?
  • Assess the tests ability to predict
    patient-relevant outcomes instead of test
    accuracy.
  • Assess whether the results of the two tests agree
    or disagree and treat them as two alternative
    measurement methods.
  • Calculate estimates of the index tests
    sensitivity and specificity from each study, but
    qualify the study findings.
  • Adjust the estimates of sensitivity and
    specificity of the index test to account for the
    imperfect reference standard.

30
Practice Question 2 (2 of 2)
  • Explanation for Question 2
  • The correct answer is a. All of the options
    listed are suggested methods for synthesizing
    information on medical tests when there is no
    gold standard. The preferred method involves
    assessing the tests ability to predict
    patient-relevant outcomes instead of calculating
    test accuracy when compared with an imperfect
    standard. This way, the index test is treated as
    a predictive instrument.

31
Practice Question 3 (1 of 2)
  • When considering imperfect reference standard
    bias, which of the following applies to naïve
    estimates of sensitivity and specificity when
    there is conditional independence of the results?
  • They are overestimates compared to the true
    values.
  • They are underestimates compared to the true
    values.
  • They are always equal to the true values.
  • They cannot be compared to the true values.

32
Practice Question 3 (2 of 2)
  • Explanation for Question 3
  • The correct answer is b. Conditional independence
    implies that the results of the index and
    reference tests are independent among people with
    and without the condition of interest. In this
    case, estimates of sensitivity and specificity
    from the standard formulas will usually be
    smaller than the true values. In other words, the
    naïve estimates of sensitivity and specificity
    for the index test will be underestimates of the
    true values.

33
Practice Question 4 (1 of 2)
  • When evaluating a medical test with no gold
    standard, one can mathematically calculate
    accurate sensitivity and specificity of the index
    test using standard 2 ? 2 cross-tabulation of
    test results.
  • True
  • False

34
Practice Question 4 (2 of 2)
  • Explanation for Question 4
  • The statement is false. The estimates of
    sensitivity and specificity will have to be
    adjusted to account for the imperfect reference
    standard. This may require expert statistical
    help.

35
Authors
  • This presentation was prepared by Brooke
    Heidenfelder, Andrzej Kosinski, Rachael Posey,
    Lorraine Sease, Remy Coeytaux, Gillian Sanders,
    and Alex Vaz, members of the Duke University
    Evidence-based Practice Center
  • The module is based on Trikalinos TA, Balion TA.
    Options for summarizing medical test performance
    in the absence of a gold standard. In Chang SM
    and Matchar DB, eds. Methods guide for medical
    test reviews. Rockville, MD Agency for
    Healthcare Research and Quality June 2012. p.
    9.1-16. AHRQ Publication No. 12-EHC017. Available
    at www.effectivehealthcare.ahrq.gov/medtestsguide.
    cfm.

36
References (1 of 8)
  • Albert PS, Dodd LE. A cautionary note on the
    robustness of latent class models for estimating
    diagnostic error without a gold standard.
    Biometrics 2004 Jun60(2)427-35. PMID 15180668.
  • Altman DG, Bland JM. Absence of evidence is not
    evidence of absence. BMJ 1995 Aug311(7003)485.
    PMID 7647644.
  • Bablok W, Passing H, Bender R, et al. A general
    regression procedure for method transformation.
    Application of linear regression procedures for
    method comparison studies in clinical chemistry,
    Part III. J Clin Chem Clin Biochem 1988
    Nov26(11)783-90. PMID 3235954.
  • Black MA, Craig BA. Estimating disease prevalence
    in the absence of a gold standard. Stat Med 2002
    Sep 3021(18)2653-69. PMID 12228883.
  • Bland JM, Altman DG. Measuring agreement in
    method comparison studies. Stat Methods Med Res
    1999 Jun8(2)135-60. PMID 10501650.

37
References (2 of 8)
  • Bland JM, Altman DG. Applying the right
    statistics analyses of measurement studies.
    Ultrasound Obstet Gynecol 2003 Jul22(1)85-93.
    PMID 12858311.
  • Bossuyt PM. Interpreting diagnostic test accuracy
    studies. Semin Hematol 2008 Jul45(3)189-95.
    PMID 18582626.
  • Dendukuri N, Hadgu A, Wang L. Modeling
    conditional dependence between diagnostic tests
    a multiple latent variable model. Stat Med 2009
    Feb 128(3)441-61. PMID 19067379.
  • Dendukuri N, Joseph L. Bayesian approaches to
    modeling the conditional dependence between
    multiple diagnostic tests. Biometrics 2001
    Mar57(1)158-67. PMID 11252592.
  • Garrett ES, Eaton WW, Zeger S. Methods for
    evaluating the performance of diagnostic tests in
    the absence of a gold standard a latent class
    model approach. Stat Med 2002 May
    1521(9)1289-307. PMID 12111879.

38
References (3 of 8)
  • Gart JJ, Buck AA. Comparison of a screening test
    and a reference test in epidemiologic studies.
    II. A probabilistic model for the comparison of
    diagnostic tests. Am J Epidemiol 1966
    May83(3)593-602. PMID 5932703.
  • Goldberg JD, Wittes JT. The estimation of false
    negatives in medical screening. Biometrics 1978
    Mar34(1)77-86. PMID 630038.
  • Gyorkos TW, Genta RM, Viens P, et al.
    Seroepidemiology of Strongyloides infection in
    the Southeast Asian refugee population in Canada.
    Am J Epidemiol 1990 Aug132(2)257-64. PMID
    2196791.
  • Hui SL, Zhou XH. Evaluation of diagnostic tests
    without gold standards. Stat Methods Med Res 1998
    Dec7(4)354-70. PMID 9871952.
  • Joseph L, Gyorkos TW. Inferences for likelihood
    ratios in the absence of a "gold standard". Med
    Decis Making 1996 Oct-Dec16(4)412-7. PMID
    8912303.

39
References (4 of 8)
  • Jonas DE, Wilt TJ, Taylor BC, et al. Chapter 11
    challenges in and principles for conducting
    systematic reviews of genetic tests used as
    predictive indicators. J Gen Intern Med 2012
    Jun27 Suppl 1S83-93. PMID 22648679.
  • Linnet K. Estimation of the linear relationship
    between the measurements of two methods with
    proportional errors. Stat Med 1990
    Dec9(12)1463-73. PMID 2281234.
  • Linnet K. Performance of Deming regression
    analysis in case of misspecified analytical error
    ratio in method comparison studies. Clin Chem
    1998 May44(5)1024-31. PMID 9590376.
  • Qu Y, Tan M, Kutner MH. Random effects models in
    latent class analysis for evaluating accuracy of
    diagnostic tests. Biometrics 1996
    Sep52(3)797-810. PMID 8805757.

40
References (5 of 8)
  • Reitsma JB, Rutjes AW, Khan KS, et al. A review
    of solutions for diagnostic accuracy studies with
    an imperfect or missing reference standard. J
    Clin Epidemiol 2009 Aug62(8)797-806. PMID
    19447581.
  • Rutjes AW, Reitsma JB, Coomarasamy A, et al.
    Evaluation of diagnostic tests when there is no
    gold standard. A review of methods. Health
    Technol Assess 2007 Dec11(50)iii, ix-51. PMID
    18021577.
  • Sokal RR, Rohlf EF. Biometry. New York, NY
    Freeman 1981.
  • Sun S. Meta-analysis of Cohen's kappa. Health
    Serv Outcomes Res Method 201111145-163.
  • Thompson IM, Pauler DK, Goodman PJ, et al.
    Prevalence of prostate cancer among men with a
    prostate-specific antigen level lt or 4.0 ng per
    milliliter. N Engl J Med 2004 May
    27350(22)2239-46. PMID 15163773.

41
References (6 of 8)
  • Toft N, Jorgensen E, Hojsgaard S. Diagnosing
    diagnostic tests evaluating the assumptions
    underlying the estimation of sensitivity and
    specificity in the absence of a gold standard.
    Prev Vet Med 2005 Apr68(1)19-33. PMID
    15795013.
  • Torrance-Rynard VL, Walter SD. Effects of
    dependent errors in the assessment of diagnostic
    test performance. Stat Med 1997 Oct
    1516(19)2157-75. PMID 9330426.
  • Trikalinos TA, Balion TA. Options for summarizing
    medical test performance in the absence of a
    gold standard. In Chang SM and Matchar DB,
    eds. Methods guide for medical test reviews.
    Rockville, MD Agency for Healthcare Research and
    Quality June 2012. p. 9.1-16. AHRQ Publication
    No. 12-EHC017. Available at www.effectivehealthcar
    e.ahrq.gov/medtestsguide.cfm.

42
References (7 of 8)
  • Trikalinos TA, Balion CM, Coleman CI, et al.
    Chapter 8 meta-analysis of medical test
    performance when there is a gold standard. J
    Gen Intern Med 2012 Jun27 Suppl 1S56-66. PMID
    22648676.
  • Trikalinos TA, Ip S, Raman G, et al. Home
    diagnosis of obstructive sleep apnea-hypopnea
    syndrome. Technology Assessment (Prepared by the
    TuftsNew England Medical Center Evidence-based
    Practice Center). Rockville, MD, Agency for
    Healthcare Research and Quality August 2007.
    Available at www.cms.gov/Medicare/Coverage/
    Determination Process/downloads/id48TA.pdf.
  • Vacek PM. The effect of conditional dependence on
    the evaluation of diagnostic tests. Biometrics
    1985 Dec41(4)959-68. PMID 3830260.
  • Walter SD, Irwig LM. Estimation of test error
    rates, disease prevalence and relative risk from
    misclassified data a review. J Clin Epidemiol
    198841(9)923-37. PMID 3054000.

43
References (8 of 8)
  • Walter SD, Irwig L, Glasziou PP. Meta-analysis of
    diagnostic tests with imperfect reference
    standards. J Clin Epidemiol 1999
    Oct52(10)943-51. PMID 10513757.
  • Whiting P, Rutjes AW, Reitsma JB, et al. Sources
    of variation and bias in studies of diagnostic
    accuracy a systematic review. Ann Intern Med
    2004 Feb 3140(3)189-202. PMID 14757617.
Write a Comment
User Comments (0)
About PowerShow.com