Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

Description:

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Chapter 8 Multiple Tests and Multivariable Decision Rules – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 76
Provided by: Michael60
Category:

less

Transcript and Presenter's Notes

Title: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy


1
Multiple Tests, Multivariable Decision Rules, and
Studies of Diagnostic Test Accuracy
Chapter 8 Multiple Tests and Multivariable
Decision Rules Chapter 5 Studies of Diagnostic
Test Accuracy
Michael A. Kohn, MD, MPP 10/19/2006
2
Outline of Topics
  • Combining results of multiple tests importance
    of test non-independence
  • Recursive Partitioning
  • Logistic Regression
  • Published rules for combining test results
    importance of validation separate from derivation
  • Biases in studies of diagnostic test accuracy
  • Overfitting bias
  • Incorporation bias
  • Referral bias
  • Double gold standard bias
  • Spectrum bias

3
Warning Different Example
  • Example of combining two tests in this talk
  • Prenatal sonographic Nuchal Translucency (NT) and
    Nasal Bone Exam (NBE) as dichotomous tests for
    Trisomy 21
  • Example of combining two tests in book
  • Premature birth (GA lt 36 weeks) and low birth
    weight (BW lt 2500 grams) as dichotomous tests for
    neonatal morbidity

Cicero, S., G. Rembouskos, et al. (2004).
"Likelihood ratio for trisomy 21 in fetuses with
absent nasal bone at the 11-14-week scan."
Ultrasound Obstet Gynecol 23(3) 218-23. Soon
to be replaced
4
If NT 3.5 mm Positive for Trisomy 21
Whats wrong with this definition?
5
(No Transcript)
6
  • In general, dont make multi-level tests like NT
    into dichotomous tests by choosing a fixed cutoff
  • I did it here to make the discussion of multiple
    tests easier
  • I arbitrarily chose to call 3.5 mm positive

7
One Dichotomous Test
  • Trisomy 21
  • Nuchal D D- LR
  • Translucency
  • 3.5 mm 212 478 7.0
  • lt 3.5 mm 121 4745 0.4
  • Total 333 5223

Do you see that this is (212/333)/(478/5223)?
Review of Chapter 3 What are the sensitivity,
specificity, PPV, and NPV of this test? (Be
careful.)
8
Nuchal Translucency
  • Sensitivity 212/333 64
  • Specificity 4745/5223 91
  • Prevalence 333/(3335223) 6
  • (Study population pregnant women about to under
    go CVS, so high prevalence of Trisomy 21)
  • PPV 212/(212 478) 31
  • NPV 4745/(121 4745) 97.5

Not that great prior to test P(D-) 94
9
Clinical Scenario One TestPre-Test Probability
of Downs 6NT Positive
  • Pre-test prob 0.06
  • Pre-test odds 0.06/0.94 0.064
  • LR() 7.0
  • Post-Test Odds Pre-Test Odds x LR()
  • 0.064 x 7.0 0.44
  • Post-Test prob 0.44/(0.44 1) 0.31

10
Pre-Test Probability of Tri21 6NT
PositivePost-Test Probability of Tri21 31
Clinical Scenario One Test
Using Probabilities
Using Odds
Pre-Test Odds of CAD 0.064EECG Positive (LR
7.0)Post-Test Odds of CAD 0.44
11
Clinical Scenario One TestPre-Test Probability
of Tri21 6NT Positive
  • NT (LR 7.0)
  • ---------------gt
  • -------------------------X---------------X-------
    -----------------------

  • Log(Odds) 2 -1.5 -1 -0.5
    0 0.5 1
  • Odds 1100 133 110 13
    11 31 101
  • Prob 0.01 0.03 0.09 0.25
    0.5 0.75 0.91

Odds 0.064 Prob 0.06
Odds 0.44 Prob 0.31
12
Nasal Bone Seen NBE Negative for Trisomy 21
Nasal Bone Absent NBE Positive for Trisomy 21
13
Second Dichotomous Test
  • Nasal Bone Tri21 Tri21- LR
  • Absent 229 129 27.8
  • Present 104 5094 0.32
  • Total 333 5223

Do you see that this is (229/333)/(129/5223)?
14
Pre-Test Probability of Trisomy 21 6NT
Positive for Trisomy 21 ( 3.5 mm)Post-NT
Probability of Trisomy 21 31NBE Positive for
Trisomy 21 (no bone seen)Post-Nuclide
Probability of Trisomy 21 ?
Clinical Scenario Two Tests
Using Probabilities
15
Clinical Scenario Two Tests
Using Odds
Pre-Test Odds of Tri21 0.064NT Positive (LR
7.0)Post-Test Odds of Tri21 0.44NBE Positive
(LR 27.8?)Post-Test Odds of Tri21 .44 x
27.8? 12.4? (P
12.4/(112.4) 92.5?)
16
Clinical Scenario Two TestsPre-Test
Probability of Trisomy 21 6NT 3.5 mm AND
Nasal Bone Absent
  • NT (LR 6.96)
  • ---------------gt
  • NBE (LR 27.8)
  • -----------------------
    ----gt
  • NT NBE
  • Can we do this? ---------------gt------
    ---------------------gt
  • NT and NBE
  • ---------------X----------------X------
    ----------------------X-

  • Log(Odds) 2 -1.5 -1 -0.5
    0 0.5 1
  • Odds 1100 133 110 13
    11 31 101
  • Prob 0.01 0.03 0.09 0.25
    0.5 0.75 0.91

Odds 0.064 Prob 0.06
Odds 12.4 Prob 0.925
Odds 0.44 Prob 0.31
17
Question
  • Can we use the post-test odds after a positive
    Nuchal Translucency as the pre-test odds for the
    positive Nasal Bone Examination?
  • i.e., can we combine the positive results by
    multiplying their LRs?
  • LR(NT, NBE ) LR(NT ) x LR(NBE ) ?
  • 7.0 x 27.8 ?
  • 194 ?

18
Answer No
NT NBE Trisomy 21 Trisomy 21 - LR
Pos Pos 158 47 36 0.7 69
Pos Neg 54 16 442 8.5 1.9
Neg Pos 71 21 93 1.8 12
Neg Neg 50 15 4652 89 0.2
Total Total 333 100 5223 100  
Not 194
19
Non-Independence
  • Absence of the nasal bone does not tell you as
    much if you already know that the nuchal
    translucency is 3.5 mm.

20
Clinical Scenario
Using Odds
Pre-Test Odds of Tri21 0.064NT/NBE (LR
68.8)Post-Test Odds 0.064 x 68.8
4.40 (P 4.40/(14.40) 81, not 92.5)
21
Non-Independence
NT
---------------gt
NBE
---------------------------gt
NT NBE if
tests were independent---------------gt----------
------------------gt
NT and NBE since tests are
dependent-----------------------------------gt
---------------X----------------X---------
---------X----------

Log(Odds) 2 -1.5 -1 -0.5
0 0.5 1 Odds 1100 133
110 13 11 31 101
Prob 0.01 0.03 0.09 0.25
0.5 0.75 0.91
Prob 0.81
22
Non-Independence of NT and NBE
  • Apparently, even in chromosomally normal fetuses,
    enlarged NT and absence of the nasal bone are
    associated. A false positive on the NT makes a
    false positive on the NBE more likely. Of normal
    (D-) fetuses with NT lt 3.5 mm only 2.0 had nasal
    bone absent. Of normal (D-) fetuses with NT
    3.5 mm, 7.5 had nasal bone absent.

Some (but not all) of this may have to do with
ethnicity. In this London study, chromosomally
normal fetuses of Afro-Caribbean ethnicity had
both larger NTs and more frequent absence of the
nasal bone.
In Trisomy 21 (D) fetuses, normal NT was
associated with the presence of the nasal bone,
so a false negative on the NT was associated with
a false negative on the NBE.
23
Non-Independence
  • Instead of looking for the nasal bone, what if
    the second test were just a repeat measurement of
    the nuchal translucency?
  • A second positive NT would do little to increase
    your certainty of Trisomy 21. If it was false
    positive the first time around, it is likely to
    be false positive the second time.

24
Reasons for Non-Independence
  • Tests measure the same aspect of disease.
  • Consider exercise ECG (EECG) and radionuclide
    scan as tests for coronary artery disease (CAD)
    with the gold standard being anatomic narrowing
    of the arteries on angiogram. Both EECG and
    nuclide scan measure functional narrowing. In a
    patient without anatomic narrowing (a D-
    patient), coronary artery spasm could cause false
    positives on both tests.

25
Reasons for Non-Independence
  • Spectrum of disease severity.
  • In the EECG/nuclide scan example, CAD is defined
    as 70 stenosis on angiogram. A D patient
    with 71 stenosis is much more likely to have a
    false negative on both the EECG and the nuclide
    scan than a D patient with 99 stenosis.

26
Reasons for Non-Independence
  • Spectrum of non-disease severity.
  • In this example, CAD is defined as 70 stenosis
    on angiogram. A D- patient with 69 stenosis is
    much more likely to have a false positive on both
    the EECG and the nuclide scan than a D- patient
    with 33 stenosis.

27
Counterexamples Possibly Independent Tests
  • For Venous Thromboembolism
  • CT Angiogram of Lungs and Doppler Ultrasound of
    Leg Veins
  • Alveolar Dead Space and D-Dimer
  • MRA of Lungs and MRV of leg veins

28
Unless tests are independent, we cant combine
results by multiplying LRs
29
Ways to Combine Multiple Tests
  • On a group of patients (derivation set), perform
    the multiple tests and determine true disease
    status (apply the gold standard)
  • Measure LR for each possible combination of
    results
  • Recursive Partitioning
  • Logistic Regression

30
Determine LR for Each Result Combination
NT NBE Tri21 Tri21- LR Post Test Prob
Pos Pos 158 47 36 0.7 69 81
Pos Neg 54 16 442 8.5 1.9 11
Neg Pos 71 21 93 1.8 12 43
Neg Neg 50 15 4652 89.1 0.2 1
Total Total 333 100 5223 100  
Assumes pre-test prob 6
31
Determine LR for Each Result Combination
2 dichotomous tests 4 combinations 3 dichotomous
tests 8 combinations 4 dichotomous tests 16
combinations Etc.
2 3-level tests 9 combinations 3 3-level tests
27 combinations Etc.
32
Determine LR for Each Result Combination
How do you handle continuous tests?
Not practical for most groups of tests.
33
Recursive PartitioningMeasure NT First
34
Recursive PartitioningExamine Nasal Bone First
35
Recursive PartitioningExamine Nasal Bone
FirstCVS if P(Trisomy 21 gt 5)
36
Recursive PartitioningExamine Nasal Bone
FirstCVS if P(Trisomy 21 gt 5)
37
Recursive Partioning
  • Same as Classification and Regression Trees
    (CART)
  • Dont have to work out probabilities (or LRs) for
    all possible combinations of tests, because of
    tree pruning

38
Tree Pruning Goldman Rule
  • 8 Tests for Acute MI in ER Chest Pain Patient
  • ST Elevation on ECG
  • CP lt 48 hours
  • ST-T changes on ECG
  • Hx of MI
  • Radiation of Pain to Neck/LUE
  • Longest pain gt 1 hour
  • Age gt 40 years
  • CP not reproduced by palpation.

Goldman L, Cook EF, Brand DA, et al. A computer
protocol to predict myocardial infarction in
emergency department patients with chest pain. N
Engl J Med. 1988318(13)797-803.
39
8 tests ? 28 256 Combinations
40
(No Transcript)
41
Recursive Partitioning
  • Does not deal well with continuous test results
  • when there is a monotonic relationship between
    between the rest result and the probability of
    disease

42
Logistic Regression
  • Ln(Odds(D))
  • a bNTNT bNBENBE binteract(NT)(NBE)
  • 1
  • - 0
  • More on this later in ATCR!

43
  • Logistic Regression Approach to the R/O ACI
    patient

Coefficient MV Odds Ratio
Constant -3.93  
Presence of chest pain 1.23 3.42
Pain major symptom 0.88 2.41
Male Sex 0.71 2.03
Age 40 or less -1.44 0.24
Age gt 50 0.67 1.95
Male over 50 years -0.43 0.65
ST elevation 1.314 3.72
New Q waves 0.62 1.86
ST depression 0.99 2.69
T waves elevated 1.095 2.99
T waves inverted 1.13 3.10
T wave ST changes -0.314 0.73
Selker HP, Griffith JL, D'Agostino RB. A tool
for judging coronary care unit admission
appropriateness, valid for both real-time and
retrospective use. A time-insensitive predictive
instrument (TIPI) for acute cardiac ischemia a
multicenter study. Med Care. Jul
199129(7)610-627. For corrected coefficients,
see http//medg.lcs.mit.edu/cardiac/cpain.htm
44
Clinical Scenario
  • 71 y/o man with 2.5 hours of CP, substernal,
    non-radiating, described as bloating. Cannot
    say if same as prior MI or worse than prior
    angina.
  • Hx of CAD, s/p CABG 10 yrs prior, stenting 3
    years and 1 year ago. DM on Avandia.
  • ECG RBBB, Qs inferiorly. No ischemic ST-T
    changes.

Real patient seen by MAK 1 am 10/12/04
45
(No Transcript)
46
Coefficient Clinical Scenario Clinical Scenario
Constant -3.93 Result -3.93
Presence of chest pain 1.23 1 1.23
Pain major symptom 0.88 1 0.88
Sex 0.71 1 0.71
Age 40 or less -1.44 0 0
Age gt 50 0.67 1 0.67
Male over 50 years -0.43 1 -0.43
ST elevation 1.314 0 0
New Q waves 0.62 0 0
ST depression 0.99 0 0
T waves elevated 1.095 0 0
T waves inverted 1.13 0 0
T wave ST changes -0.314 0 0
-0.87
Odds of ACI 0.418952
Probability of ACI Probability of ACI 30
47
What Happened to Pre-test Probability?
  • Typically clinical decision rules report
    probabilities rather than likelihood ratios for
    combinations of results.
  • Can back out LRs if we know prevalence, pD,
    in the study dataset.
  • With logistic regression models, this backing
    out is known as a prevalence offset. (See
    Chapter 8A.)

48
Optimal Cutoff for a Single Continuous Test
  • Depends on
  • Pre-test Probability of Disease
  • ROC Curve (Likelihood Ratios)
  • Relative Misclassification Costs
  • Cannot choose an optimal cutoff with just the ROC
    curve.

49
Optimal Cutoff Line for Two Continuous Tests
50
Choosing Which Tests to Include in the Decision
Rule
  • Have focused on how to combine results of two or
    more tests, not on which of several tests to
    include in a decision rule.
  • Options include
  • Recursive partitioning
  • Automated stepwise logistic regression

Choice of variables in derivation data set
requires confirmation in a separate validation
data set.
51
Need for Validation Example
  • Study of clinical predictors of bacterial
    diarrhea.
  • Evaluated 34 historical items and 16 physical
    examination questions.
  • 3 questions (abrupt onset, gt 4 stools/day, and
    absence of vomiting) best predicted a positive
    stool culture (sensitivity 86 specificity 60
    for all 3).
  • Would these 3 be the best predictors in a new
    dataset? Would they have the same sensitivity
    and specificity?

DeWitt TG, Humphrey KF, McCarthy P. Clinical
predictors of acute bacterial diarrhea in young
children. Pediatrics. Oct 198576(4)551-556.
52
Need for Validation
  • Develop prediction rule by choosing a few tests
    and findings from a large number of
    possibilities.
  • Takes advantage of chance variations in the data.
  • Predictive ability of rule will probably
    disappear when you try to validate on a new
    dataset.
  • Can be referred to as overfitting.

53
VALIDATION
  • No matter what technique (CART or logistic
    regression) is used, the rule for combining
    multiple test results must be tested on a data
    set different from the one used to derive it.
  • Beware of validation sets that are just
    re-hashes of the derivation set.
  • (This begins our discussion of potential problems
    with studies of diagnostic tests.)

54
Studies of Diagnostic Test AccuracySackett, EBM,
pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

55
Bias in Studies of Diagnostic Test Accuracy
  • Index Test Test Being Evaluated
  • Gold Standard Test Used to Determine True
    Disease Status

56
Studies of Diagnostic TestsSackett, EBM, pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

57
Studies of Diagnostic TestsIncorporation Bias
Index Test is incorporated into gold standard.
Consider a study of the usefulness of various
findings for diagnosing pancreatitis. If the
"Gold Standard" is a discharge diagnosis of
pancreatitis, which in many cases will be based
upon the serum amylase, then the study can't
quantify the accuracy of the amylase for this
diagnosis.
58
Studies of Diagnostic TestsIncorporation Bias
A study of BNP in dyspnea patients as a
diagnostic test for CHF also showed that the CXR
performed extremely well in predicting CHF.
The two cardiologists who determined the final
diagnosis of CHF were blinded to the BNP level
but not to the CXR report, so the assessment of
BNP should be unbiased, but not the assessment
CXR.
Maisel AS, Krishnaswamy P, Nowak RM, McCord J,
Hollander JE, Duc P, et al. Rapid measurement of
B-type natriuretic peptide in the emergency
diagnosis of heart failure. N Engl J Med
2002347(3)161-7.
59
Studies of Diagnostic TestsSackett, EBM, pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

60
Studies of Diagnostic TestsVerification Bias
The study population only includes those to whom
the gold standard was applied, but patients with
positive index tests are more likely to be
referred for the gold standard.
Example V/Q Scan as a test for PE. Gold
standard is a PA-gram. Patients with negative
V/Q scans are less frequently referred for
PA-gram than those with positive V/Q scans.
Only patients who had PA-grams are included in
the study.
AKA Work-up, Referral Bias, or Ascertainment Bias
61
Studies of Diagnostic TestsVerification Bias
PA-gram PA-gram-
V/Q Scan a b
V/Q Scan - c ? d ?
Sensitivity (a/(ac)) is biased UP.
Specificity (d/(bd)) is biased DOWN.
62
Studies of Diagnostic TestsDouble Gold Standard
Bias
One gold standard (e.g. biopsy) is applied in
patients with positive index test, another gold
standard (e.g., clinical follow-up) is applied in
patients with a negative index test.
63
Studies of Diagnostic TestsDouble Gold Standard
Test V/Q Scan Disease PE Gold Standard PA-gram
in patients who had one, clinical follow-up in
patients who didnt Study Population All
patients presenting to the ED who received a V/Q
scan. Assume some patients did not get PA-gram
because of normal/low probability V/Q scans but
would have had positive PA-grams. Instead they
had negative clinical follow-up and were counted
as true negatives. If they had had PA-grams,
they would have been counted as false negatives.
PIOPED. JAMA 1990263(20)2753-9.
64
Studies of Diagnostic TestsDouble Gold Standard
PA-Gram PA-Gram -
V/Q Scan a b
V/Q Scan - c d
Sensitivity (a/(ac)) biased UP Specificity
(d/(bd)) biased UP
65
Studies of Diagnostic TestsSackett, EBM, pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

66
Studies of Diagnostic TestsSpectrum Bias
So far, we have said that PPV and NPV of a test
depend on the population being tested,
specifically on the prevalence of D in the
population.
We said that sensitivity and specificity are
properties of the test and independent of the
prevalence and, by implication at least, the
population being tested.
In fact,
67
Studies of Diagnostic TestsSpectrum Bias
Sensitivity depends on the spectrum of disease in
the population being tested.
Specificity depends on the spectrum of
non-disease in the population being tested.
68
Studies of Diagnostic TestsSpectrum Bias
D and D- groups are not homogeneous.
D-/D really is D-,D, D, or D
D-/D really is (D1-, D2-, or D3-)/D
69
Studies of Diagnostic TestsSpectrum Bias
Example Absence of Nasal Bone (on 13-week
ultrasound) as a Test for Chromosomal Abnormality
70
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality
  • Nasal D D- LR
  • Bone
  • Absent 229 129 7.0
  • Present 104 5094 0.4
  • Total 333 5223

Sensitivity 229/333 69 BUT the D group
only included fetuses with Trisomy 21
71
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality
  • D group excluded 295 fetuses with other
    chromosomal abnormalities (esp. Trisomy 18)
  • If the purpose of the nasal bone exam is to
    determine on whom to get CVS, these 295 fetuses
    with chromosomal abnormalities other than trisomy
    21 should be included in the D group.
  • 95/295 (32, not 69) had absent nasal bone.

72
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality
  • Nasal D D- LR
  • Bone
  • Absent 22995 324 478 7.0
  • Present 104200304 4745 0.4
  • Total 333295628 5223

Sensitivity 324/628 52 NOT 69 obtained when
the D group only included fetuses with Trisomy 21
73
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality
  • By excluding chromosomal abnormalities other than
    Trisomy 21 from the D group, the study
    exaggerates the sensitivity of the Nasal Bone
    Exam (NBE) for chromosomal abnormalities.
  • True Sensitivity of NBE for chromosomal
    abnormalities 52
  • Biased estimate due to spectrum bias (excluding
    other chromosomal problems) 69

74
Biases in Studies of Tests
  • Overfitting Bias Data snooped cutoffs take
    advantage of chance variations in derivations set
    making test look falsely good.
  • Incorporation Bias index test part of gold
    standard (Sensitivity Up, Specificity Up)
  • Verification/Referral Bias positive index test
    increases referral to gold standard (Sensitivity
    Up, Specificity Down)
  • Double Gold Standard positive index test causes
    application of definitive gold standard, negative
    index test results in clinical follow-up
    (Sensitivity Up, Specificity Up)
  • Spectrum Bias
  • D sickest of the sick (Sensitivity Up)
  • D- wellest of the well (Specificity Up)

75
Biases in Studies of Tests
  • Dont just identify potential biases, figure out
    how the biases could affect the conclusions.
  • Studies concluding a test is worthless are not
    invalid if biases in the design would have led to
    the test looking BETTER than it really is.
Write a Comment
User Comments (0)
About PowerShow.com