Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

Description:

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Coursebook Chapter 5 Multiple Tests and Multivaraible Decision Rules – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 64
Provided by: Michael2741
Category:

less

Transcript and Presenter's Notes

Title: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy


1
Multiple Tests, Multivariable Decision Rules, and
Studies of Diagnostic Test Accuracy
Coursebook Chapter 5 Multiple Tests and
Multivaraible Decision Rules Coursebook Chapter
6 Studies of Diagnostic Test Accuracy
Michael A. Kohn, MD, MPP 10/14/2004
2
Outline of Topics
  • Combining results of multiple tests importance
    of test non-independence
  • Recursive Partitioning
  • Logistic Regression
  • Published rules for combining test results
    importance of validation separate from derivation
  • Biases in studies of diagnostic tests
  • Overfitting bias
  • Incorporation bias
  • Referral bias
  • Double gold standard bias
  • Spectrum bias

3
Warning Different Example
  • Example of combining two tests in this talk
  • Exercise ECG and Nuclide Scan as dichotomous
    tests for CAD (assumed to be a dichotomous D/D-
    disease)
  • Example of combining two tests in Coursebook
  • Premature birth (GA lt 36 weeks) and low birth
    weight (BW lt 2500 grams) as dichotomous tests for
    neonatal morbidity

Sackett DL, Haynes RB, Guyatt GH, Tugwell P.
Clinical epidemiology a basic science for
clinical medicine. 2nd ed. Boston Little Brown
1991.
4
One Dichotomous Test
  • Exercise ECG CAD CAD- LR
  • Positive 299 44 6.80
  • Negative 201 456 0.44
  • Total 500 500

Do you see that this is (299/500)/(44/500)?
Review of Chapter 3 What are the sensitivity,
specificity, PPV, and NPV of this test? (Be
careful.)
5
Clinical Scenario One TestPre-Test Probability
of CAD 33EECG Positive
  • Pre-test prob 0.33
  • Pre-test odds 0.33/0.67 0.5
  • LR() 6.80
  • Post-Test Odds Pre-Test Odds x LR()
  • 0.5 x 6.80 3.40
  • Post-Test prob 3.40/(3.40 1) 0.77

6
Pre-Test Probability of CAD 33EECG
PositivePost-Test Probability of CAD 77
Clinical Scenario One Test
Using Probabilities
Using Odds
Pre-Test Odds of CAD 0.50EECG Positive (LR
6.80)Post-Test Odds of CAD 3.40
7
Clinical Scenario One TestPre-Test Probability
of CAD 33EECG Positive
  • EECG (LR 6.80)

  • -----------------gt
  • ------------------------------------------X------
    ------------X----------

  • Log(Odds) 2 -1.5 -1 -0.5
    0 0.5 1
  • Odds 1100 133 110 13
    11 31 101
  • Prob 0.01 0.03 0.09 0.25
    0.5 0.75 0.91

Odds 0.50 Prob 0.33
Odds 3.40 Prob 0.77
8
Second Dichotomous Test
  • Nuclide Scan CAD CAD- LR
  • Positive 416 190 2.19
  • Negative 84 310 0.27
  • Total 500 500

Do you see that this is (416/500)/(190/500)?
9
Pre-Test Probability of CAD 33EECG
PositivePost-EECG Probability of CAD
77Nuclide Scan PositivePost-Nuclide
Probability of CAD ?
Clinical Scenario Two Tests
Using Probabilities
10
Clinical Scenario Two Tests
Using Odds
Pre-Test Odds of CAD 0.50EECG Positive (LR
6.80)Post-Test Odds of CAD 3.40Nuclide Scan
Positive (LR 2.19?)Post-Test Odds of CAD
3.40 x 2.19? 7.44? (P
7.44/(17.44) 88?)
11
Clinical Scenario Two TestsPre-Test
Probability of CAD 33EECG Positive
  • E-ECG (LR 6.80)

  • -----------------gt

  • Nuclide (LR 2.19)

  • ------gt

  • E-ECG Nuclide
  • Can we do this?
    -----------------gt-----gt

  • E-ECG and Nuclide
  • --------------------------------X------
    ------------X------X---

  • Log(Odds) 2 -1.5 -1 -0.5
    0 0.5 1
  • Odds 1100 133 110 13
    11 31 101
  • Prob 0.01 0.03 0.09 0.25
    0.5 0.75 0.91

Odds 0.50 Prob 0.33
Odds 7.44 Prob 0.88
Odds 3.40 Prob 0.77
12
Question
  • Can we use the post-test odds after a positive
    Exercise ECG as the pre-test odds for the
    positive nuclide scan?
  • i.e., can we combine the positive results by
    multiplying their LRs?
  • LR(E-ECG , Nuclide ) LR(E-ECG ) x LR(Nuclide
    ) ?
  • 6.80 x 2.19 ?
  • 14.88 ?

13
Answer No
E-ECG Nuclide CAD CAD- LR
Pos Pos 276 55 26 5 10.62
Pos Neg 23 5 18 4 1.28
Neg Pos 140 28 164 33 0.85
Neg Neg 61 12 292 58 0.21
Total Total 500 100 500 100  
Not 14.88
14
Non-Independence
  • A positive nuclide scan does not tell you as much
    if the patient has already had a positive
    exercise ECG.

15
Clinical Scenario
Using Odds
Pre-Test Odds of CAD 0.50EECG /Nuclide Scan
(LR 10.62)Post-Test Odds of CAD 0.50 x
10.62 5.31 (P 5.31/(15.31)
84, not 88)
16
Non-Independence

E-ECG
-----------------gt
Nuclide
------gt
E-ECG
Nuclide if tests were
independent -----------------gt-----gt
E-ECG and
Nuclide since tests are
dependent --------------------gt
--------------------------------X----------------
----X--------
Log(Odds)
2 -1.5 -1 -0.5 0
0.5 1 Odds 1100 133 110
13 11 31 101 Prob
0.01 0.03 0.09 0.25 0.5
0.75 0.91
Prob 0.84
17
Non-Independence
  • Instead of the nuclide scan, what if the second
    test were just a repeat exercise ECG?
  • A second positive E-ECG would do little to
    increase your certainty of CAD. If it was false
    positive the first time around, it is likely to
    be false positive the second time.

18
Counterexamples Possibly Independent Tests
  • For Venous Thromboembolism
  • CT Angiogram of Lungs and Doppler Ultrasound of
    Leg Veins
  • Alveolar Dead Space and D-Dimer
  • MRA of Lungs and MRV of leg veins

19
Unless tests are independent, we cant combine
results by multiplying LRs
20
Ways to Combine Multiple Tests
  • On a group of patients (derivation set), perform
    the multiple tests and determine true disease
    status (apply the gold standard)
  • Measure LR for each possible combination of
    results
  • Recursive Partitioning
  • Logistic Regression

21
Determine LR for Each Result Combination
E-ECG Nuclide CAD CAD- LR Post Test Prob
Pos Pos 276 55 26 5 10.62 84
Pos Neg 23 5 18 4 1.28 39
Neg Pos 140 28 164 33 0.85 30
Neg Neg 61 12 292 58 0.21 9
Total Total 500 100 500 100  
Assumes pre-test prob 33
22
Determine LR for Each Result Combination
2 dichotomous tests 4 combinations 3 dichotomous
tests 8 combinations 4 dichotomous tests 16
combinations Etc.
2 3-level tests 9 combinations 3 3-level tests
27 combinations Etc.
23
Determine LR for Each Result Combination
How do you handle continuous tests?
Not practical for most groups of tests.
24
Recursive Partitioning
25
Recursive Partioning
  • Same as Classification and Regression Trees
    (CART)
  • Dont have to work out probabilities (or LRs) for
    all possible combinations of tests, because of
    tree pruning

26
Tree Pruning Goldman Rule
  • 8 Tests for Acute MI in ER Chest Pain Patient
  • ST Elevation on ECG
  • CP lt 48 hours
  • ST-T changes on ECG
  • Hx of ACI
  • Radiation of Pain to Neck/LUE
  • Longest pain gt 1 hour
  • Age gt 40 years
  • CP not reproduced by palpation.

Goldman L, Cook EF, Brand DA, et al. A computer
protocol to predict myocardial infarction in
emergency department patients with chest pain. N
Engl J Med. 1988318(13)797-803.
27
8 tests ? 28 256 Combinations
28
(No Transcript)
29
Recursive Partitioning
  • Does not deal well with continuous test results

30
Logistic Regression
  • Ln(Odds(D))
  • a bE-ECGE-ECG bNuclideNuclide
    binteract(E-ECG)(Nuclide)
  • 1
  • - 0
  • More on this later in ATCR!

31
  • Logistic Regression Approach to the R/O ACI
    patient

Coefficient MV Odds Ratio
Constant -3.93  
Presence of chest pain 1.23 3.42
Pain major symptom 0.88 2.41
Male Sex 0.71 2.03
Age 40 or less -1.44 0.24
Age gt 50 0.67 1.95
Male over 50 years -0.43 0.65
ST elevation 1.314 3.72
New Q waves 0.62 1.86
ST depression 0.99 2.69
T waves elevated 1.095 2.99
T waves inverted 1.13 3.10
T wave ST changes -0.314 0.73
Selker HP, Griffith JL, D'Agostino RB. A tool
for judging coronary care unit admission
appropriateness, valid for both real-time and
retrospective use. A time-insensitive predictive
instrument (TIPI) for acute cardiac ischemia a
multicenter study. Med Care. Jul
199129(7)610-627. For corrected coefficients,
see http//medg.lcs.mit.edu/cardiac/cpain.htm
32
Clinical Scenario
  • 71 y/o man with 2.5 hours of CP, substernal,
    non-radiating, described as bloating. Cannot
    say if same as prior MI or worse than prior
    angina.
  • Hx of CAD, s/p CABG 10 yrs prior, stenting 3
    years and 1 year ago. DM on Avandia.
  • ECG RBBB, Qs inferiorly. No ischemic ST-T
    changes.

Real patient seen by MAK 1 am 10/12/04
33
(No Transcript)
34
Coefficient Clinical Scenario Clinical Scenario
Constant -3.93 Result -3.93
Presence of chest pain 1.23 1 1.23
Pain major symptom 0.88 1 0.88
Sex 0.71 1 0.71
Age 40 or less -1.44 0 0
Age gt 50 0.67 1 0.67
Male over 50 years -0.43 1 -0.43
ST elevation 1.314 0 0
New Q waves 0.62 0 0
ST depression 0.99 0 0
T waves elevated 1.095 0 0
T waves inverted 1.13 0 0
T wave ST changes -0.314 0 0
-0.87
Odds of ACI 0.418952
Probability of ACI Probability of ACI 30
35
What Happened to Pre-test Probability?
  • Typically clinical decision rules report
    probabilities rather than likelihood ratios for
    combinations of results.
  • Can back out LRs if we know prevalence, pD,
    in the study dataset.
  • With logistic regression models, this backing
    out is known as a prevalence offset. (See
    Chapter 5A.)

36
Need for Validation
  • Develop prediction rule by choosing a few tests
    and findings from a large number of
    possibilities.
  • Takes advantage of chance variations in the data.
  • Predictive ability of rule will probably
    disappear when you try to validate on a new
    dataset.
  • Can be referred to as overfitting.

37
Need for Validation Example
  • Study of clinical predictors of bacterial
    diarrhea.
  • Evaluated 34 historical items and 16 physical
    examination questions.
  • 3 questions (abrupt onset, gt 4 stools/day, and
    absence of vomiting) best predicted a positive
    stool culture (sensitivity 86 specificity 60
    for all 3).
  • Would these 3 be the best predictors in a new
    dataset? Would they have the same sensitivity
    and specificity?

DeWitt TG, Humphrey KF, McCarthy P. Clinical
predictors of acute bacterial diarrhea in young
children. Pediatrics. Oct 198576(4)551-556.
38
VALIDATION
  • No matter what technique (CART or logistic
    regression) is used, the rule for combining
    multiple test results must be tested on a data
    set different from the one used to derive it.
  • Beware of validation sets that are just
    re-hashes of the derivation set.
  • (This begins our discussion of potential problems
    with studies of diagnostic tests.)

39
Studies of Diagnostic Test AccuracySackett, EBM,
pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

40
Studies of Diagnostic TestsOverfitting Bias
(Data Snooping)
Usually a problem for multi-test rules which use
a few predictors chosen from a wide array of
candidates. But, in studies of single tests,
beware of data-snooped cutoffs A
procalcitonin concentration of 3.9088 ng/ml is
the best cutoff for predicting ventilator-associat
ed pneumonia. A CSF WBCRBC ratio lt 1117 is a
sensitive and specific predictor of real
meningitis vs. a traumatic puncture A birth
weight cutoff of 1625 grams accurately identifies
newborns at high risk for neonatal morbidity and
mortality.
41
Studies of Diagnostic TestsOverfitting Bias
  • Problems with Data-Snooped Cutoffs
  • -- Dependent on the derivation set, require
    independent validation
  • -- Fixed cutoffs assume a common prevalence or
    pre-test probability of disease (Recall our
    discussion in Chapter 4 about the undesirability
    of a fixed cutoff for a multi-level or continuous
    test)

42
Studies of Diagnostic TestsSackett, EBM, pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

43
Studies of Diagnostic TestsIncorporation Bias
Consider a study of the usefulness of various
findings for diagnosing pancreatitis. If the
"Gold Standard" is a discharge diagnosis of
pancreatitis, which in many cases will be based
upon the serum amylase, then the study can't
quantify the accuracy of the amylase for this
diagnosis.
44
Studies of Diagnostic TestsIncorporation Bias
A study of BNP in dyspnea patients as a
diagnostic test for CHF also showed that the CXR
performed extremely well in predicting CHF.
The two cardiologists who determined the final
diagnosis of CHF were blinded to the BNP level
but not to the CXR report, so the assessment of
BNP should be unbiased, but not the assessment
CXR.
Maisel AS, Krishnaswamy P, Nowak RM, McCord J,
Hollander JE, Duc P, et al. Rapid measurement of
B-type natriuretic peptide in the emergency
diagnosis of heart failure. N Engl J Med
2002347(3)161-7.
45
Studies of Diagnostic TestsSackett, EBM, pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

46
Studies of Diagnostic TestsReferral Bias
The study population only includes those to whom
the gold standard was applied, but patients with
positive index tests are more likely to be
referred for the gold standard.
Example Swelling as a test for ankle fracture.
Gold standard is a positive X-ray. Patients with
swelling are more likely to be referred for
x-ray. Only patients who had x-rays are included
in the study.
47
Studies of Diagnostic TestsReferral Bias
Fracture No Fracture
Swelling a b
No Swelling c ? d ?
Sensitivity (a/(ac)) is biased UP.
Specificity (d/(bd)) is biased DOWN.
48
Studies of Diagnostic TestsReferral Bias Example
Test A-a O2 gradient Disease PE Gold Standard
VQ scan or pulmonary angiogram Study Population
Patients who had VQ scan or PA-gram Results A-a
O2 gradient gt 20 mm Hg had very high sensitivity
(almost every patient with PE by VQ scan or PA
gram had a gradient gt 20 mm Hg), but a very low
specificity (lots of patients with negative PA
grams had gradients gt 20 mm Hg).
McFarlane MJ, Imperiale TF. Use of the
alveolar-arterial oxygen gradient in the
diagnosis of pulmonary embolism. Am J Med.
199496(1)57-62.
49
Studies of Diagnostic TestsReferral Bias
VQ Scan VQ Scan -
A-aO2 gt 20 mmHg a b
A-aO2 lt 20 mmHg c ? d ?
Sensitivity (a/(ac)) is biased UP.
Specificity (d/(bd)) is biased DOWN.
Still concluded test not sensitive enough, so it
probably isnt.
50
Studies of Diagnostic TestsDouble Gold Standard
One gold standard (e.g. biopsy) is applied in
patients with positive index test, another gold
standard (e.g., clinical follow-up) is applied in
patients with a negative index test.
51
Studies of Diagnostic TestsDouble Gold Standard
Test A-a O2 gradient Disease PE Gold Standard
VQ scan or pulmonary angiogram in patients who
had one, clinical follow-up in patients who
didnt Study Population All patients presenting
to the ED with dyspnea. Some patients did not get
VQ scan or PA-gram because of normal A-a O2
gradients but would have had positive studies.
Instead they had negative clinical follow-up and
were counted as true negatives.
52
Studies of Diagnostic TestsDouble Gold Standard
PE No PE
A-a O2 gt 20 a b
A-a O2 lt 20 c d
Sensitivity (a/(ac)) biased UP Specificity
(d/(bd)) biased UP
53
Studies of Diagnostic TestsSackett, EBM, pg 68
  1. Was there an independent, blind comparison with a
    reference (gold) standard of diagnosis?
  2. Was the diagnostic test evaluated in an
    appropriate spectrum of patients (like those in
    whom we would use it in practice)?
  3. Was the reference standard applied regardless of
    the diagnostic test result?
  4. Was the test (or cluster of tests) validated in a
    second, independent group of patients?

54
Studies of Diagnostic TestsSpectrum Bias
So far, we have said that PPV and NPV of a test
depend on the population being tested,
specifically on the prevalence of D in the
population.
We said that sensitivity and specificity are
properties of the test and independent of the
prevalence and, by implication at least, the
population being tested.
In fact,
55
Studies of Diagnostic TestsSpectrum Bias
Sensitivity depends on the spectrum of disease in
the population being tested.
Specificity depends on the spectrum of
non-disease in the population being tested.
56
Studies of Diagnostic TestsSpectrum Bias
D and D- groups are not homogeneous.
D-/D really is D-,D, D, or D
D-/D really is (D1-, D2-, or D3-)/D
57
Studies of Diagnostic TestsSpectrum Bias
Example Pale Conjunctiva as Test for Iron
Deficiency Anemia
Assume that conjunctival paleness always occurs
at HCT lt 25
58
Pale Conjunctiva as a Test for Iron Deficiency
59
Pale Conjunctiva as a Test for Iron Deficiency
Sensitivity is HIGHER in the population with more
severe disease
60
Pale Conjunctiva as a Test for Iron Deficiency
61
Pale Conjunctiva as a Test for Iron Deficiency
Specificity is LOWER in the population with more
severe non-disease. (Patients without the disease
in question are more likely to have other
diseases that can be confused with the disease in
question.)
62
Biases in Studies of Tests
  • Overfitting Bias Data snooped cutoffs take
    advantage of chance variations in derivations set
    making test look falsely good.
  • Incorporation Bias index test part of gold
    standard (Sensitivity Up, Specificity Up)
  • Referral Bias positive index test increases
    referral to gold standard (Sensitivity Up,
    Specificity Down)
  • Double Gold Standard positive index test causes
    application of definitive gold standard, negative
    index test results in clinical follow-up
    (Sensitivity Up, Specificity Up)
  • Spectrum Bias
  • D sickest of the sick (Sensitivity Up)
  • D- wellest of the well (Specificity Up)

63
Biases in Studies of Tests
  • Dont just identify potential biases, figure out
    how the biases could affect the conclusions.
  • Studies concluding a test is worthless are not
    invalid if biases in the design would have led to
    the test looking BETTER than it really is.
Write a Comment
User Comments (0)
About PowerShow.com