Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

About This Presentation

Title:

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

Description:

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Chapter 8 Multiple Tests and Multivariable Decision Rules – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 76

Provided by: Michael60

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

1
Multiple Tests, Multivariable Decision Rules, and
Studies of Diagnostic Test Accuracy
Chapter 8 Multiple Tests and Multivariable
Decision Rules Chapter 5 Studies of Diagnostic
Test Accuracy
Michael A. Kohn, MD, MPP 10/19/2006
2
Outline of Topics

Combining results of multiple tests importance
of test non-independence
Recursive Partitioning
Logistic Regression
Published rules for combining test results
importance of validation separate from derivation
Biases in studies of diagnostic test accuracy
Overfitting bias
Incorporation bias
Referral bias
Double gold standard bias
Spectrum bias

3
Warning Different Example

Example of combining two tests in this talk
Prenatal sonographic Nuchal Translucency (NT) and
Nasal Bone Exam (NBE) as dichotomous tests for
Trisomy 21
Example of combining two tests in book
Premature birth (GA lt 36 weeks) and low birth
weight (BW lt 2500 grams) as dichotomous tests for
neonatal morbidity

Cicero, S., G. Rembouskos, et al. (2004).
"Likelihood ratio for trisomy 21 in fetuses with
absent nasal bone at the 11-14-week scan."
Ultrasound Obstet Gynecol 23(3) 218-23. Soon
to be replaced
4
If NT 3.5 mm Positive for Trisomy 21
Whats wrong with this definition?
5
(No Transcript)
6

In general, dont make multi-level tests like NT
into dichotomous tests by choosing a fixed cutoff
I did it here to make the discussion of multiple
tests easier
I arbitrarily chose to call 3.5 mm positive

7
One Dichotomous Test

Trisomy 21
Nuchal D D- LR
Translucency
3.5 mm 212 478 7.0
lt 3.5 mm 121 4745 0.4
Total 333 5223

Do you see that this is (212/333)/(478/5223)?
Review of Chapter 3 What are the sensitivity,
specificity, PPV, and NPV of this test? (Be
careful.)
8
Nuchal Translucency

Sensitivity 212/333 64
Specificity 4745/5223 91
Prevalence 333/(3335223) 6
(Study population pregnant women about to under
go CVS, so high prevalence of Trisomy 21)
PPV 212/(212 478) 31
NPV 4745/(121 4745) 97.5

Not that great prior to test P(D-) 94
9
Clinical Scenario One TestPre-Test Probability
of Downs 6NT Positive

Pre-test prob 0.06
Pre-test odds 0.06/0.94 0.064
LR() 7.0
Post-Test Odds Pre-Test Odds x LR()
0.064 x 7.0 0.44
Post-Test prob 0.44/(0.44 1) 0.31

10
Pre-Test Probability of Tri21 6NT
PositivePost-Test Probability of Tri21 31
Clinical Scenario One Test
Using Probabilities
Using Odds
Pre-Test Odds of CAD 0.064EECG Positive (LR
7.0)Post-Test Odds of CAD 0.44
11
Clinical Scenario One TestPre-Test Probability
of Tri21 6NT Positive

NT (LR 7.0)
---------------gt
-------------------------X---------------X-------
-----------------------
Log(Odds) 2 -1.5 -1 -0.5
0 0.5 1
Odds 1100 133 110 13
11 31 101
Prob 0.01 0.03 0.09 0.25
0.5 0.75 0.91

Odds 0.064 Prob 0.06
Odds 0.44 Prob 0.31
12
Nasal Bone Seen NBE Negative for Trisomy 21
Nasal Bone Absent NBE Positive for Trisomy 21
13
Second Dichotomous Test

Nasal Bone Tri21 Tri21- LR
Absent 229 129 27.8
Present 104 5094 0.32
Total 333 5223

Do you see that this is (229/333)/(129/5223)?
14
Pre-Test Probability of Trisomy 21 6NT
Positive for Trisomy 21 ( 3.5 mm)Post-NT
Probability of Trisomy 21 31NBE Positive for
Trisomy 21 (no bone seen)Post-Nuclide
Probability of Trisomy 21 ?
Clinical Scenario Two Tests
Using Probabilities
15
Clinical Scenario Two Tests
Using Odds
Pre-Test Odds of Tri21 0.064NT Positive (LR
7.0)Post-Test Odds of Tri21 0.44NBE Positive
(LR 27.8?)Post-Test Odds of Tri21 .44 x
27.8? 12.4? (P
12.4/(112.4) 92.5?)
16
Clinical Scenario Two TestsPre-Test
Probability of Trisomy 21 6NT 3.5 mm AND
Nasal Bone Absent

NT (LR 6.96)
---------------gt
NBE (LR 27.8)
-----------------------
----gt
NT NBE
Can we do this? ---------------gt------
---------------------gt
NT and NBE
---------------X----------------X------
----------------------X-
Log(Odds) 2 -1.5 -1 -0.5
0 0.5 1
Odds 1100 133 110 13
11 31 101
Prob 0.01 0.03 0.09 0.25
0.5 0.75 0.91

Odds 0.064 Prob 0.06
Odds 12.4 Prob 0.925
Odds 0.44 Prob 0.31
17
Question

Can we use the post-test odds after a positive
Nuchal Translucency as the pre-test odds for the
positive Nasal Bone Examination?
i.e., can we combine the positive results by
multiplying their LRs?
LR(NT, NBE ) LR(NT ) x LR(NBE ) ?
7.0 x 27.8 ?
194 ?

18
Answer No
NT NBE Trisomy 21 Trisomy 21 - LR
Pos Pos 158 47 36 0.7 69
Pos Neg 54 16 442 8.5 1.9
Neg Pos 71 21 93 1.8 12
Neg Neg 50 15 4652 89 0.2
Total Total 333 100 5223 100
Not 194
19
Non-Independence

Absence of the nasal bone does not tell you as
much if you already know that the nuchal
translucency is 3.5 mm.

20
Clinical Scenario
Using Odds
Pre-Test Odds of Tri21 0.064NT/NBE (LR
68.8)Post-Test Odds 0.064 x 68.8
4.40 (P 4.40/(14.40) 81, not 92.5)
21
Non-Independence
NT
---------------gt
NBE
---------------------------gt
NT NBE if
tests were independent---------------gt----------
------------------gt
NT and NBE since tests are
dependent-----------------------------------gt
---------------X----------------X---------
---------X----------

Log(Odds) 2 -1.5 -1 -0.5
0 0.5 1 Odds 1100 133
110 13 11 31 101
Prob 0.01 0.03 0.09 0.25
0.5 0.75 0.91
Prob 0.81
22
Non-Independence of NT and NBE

Apparently, even in chromosomally normal fetuses,
enlarged NT and absence of the nasal bone are
associated. A false positive on the NT makes a
false positive on the NBE more likely. Of normal
(D-) fetuses with NT lt 3.5 mm only 2.0 had nasal
bone absent. Of normal (D-) fetuses with NT
3.5 mm, 7.5 had nasal bone absent.

Some (but not all) of this may have to do with
ethnicity. In this London study, chromosomally
normal fetuses of Afro-Caribbean ethnicity had
both larger NTs and more frequent absence of the
nasal bone.
In Trisomy 21 (D) fetuses, normal NT was
associated with the presence of the nasal bone,
so a false negative on the NT was associated with
a false negative on the NBE.
23
Non-Independence

Instead of looking for the nasal bone, what if
the second test were just a repeat measurement of
the nuchal translucency?
A second positive NT would do little to increase
your certainty of Trisomy 21. If it was false
positive the first time around, it is likely to
be false positive the second time.

24
Reasons for Non-Independence

Tests measure the same aspect of disease.
Consider exercise ECG (EECG) and radionuclide
scan as tests for coronary artery disease (CAD)
with the gold standard being anatomic narrowing
of the arteries on angiogram. Both EECG and
nuclide scan measure functional narrowing. In a
patient without anatomic narrowing (a D-
patient), coronary artery spasm could cause false
positives on both tests.

25
Reasons for Non-Independence

Spectrum of disease severity.
In the EECG/nuclide scan example, CAD is defined
as 70 stenosis on angiogram. A D patient
with 71 stenosis is much more likely to have a
false negative on both the EECG and the nuclide
scan than a D patient with 99 stenosis.

26
Reasons for Non-Independence

Spectrum of non-disease severity.
In this example, CAD is defined as 70 stenosis
on angiogram. A D- patient with 69 stenosis is
much more likely to have a false positive on both
the EECG and the nuclide scan than a D- patient
with 33 stenosis.

27
Counterexamples Possibly Independent Tests

For Venous Thromboembolism
CT Angiogram of Lungs and Doppler Ultrasound of
Leg Veins
Alveolar Dead Space and D-Dimer
MRA of Lungs and MRV of leg veins

28
Unless tests are independent, we cant combine
results by multiplying LRs
29
Ways to Combine Multiple Tests

On a group of patients (derivation set), perform
the multiple tests and determine true disease
status (apply the gold standard)
Measure LR for each possible combination of
results
Recursive Partitioning
Logistic Regression

30
Determine LR for Each Result Combination
NT NBE Tri21 Tri21- LR Post Test Prob
Pos Pos 158 47 36 0.7 69 81
Pos Neg 54 16 442 8.5 1.9 11
Neg Pos 71 21 93 1.8 12 43
Neg Neg 50 15 4652 89.1 0.2 1
Total Total 333 100 5223 100
Assumes pre-test prob 6
31
Determine LR for Each Result Combination
2 dichotomous tests 4 combinations 3 dichotomous
tests 8 combinations 4 dichotomous tests 16
combinations Etc.
2 3-level tests 9 combinations 3 3-level tests
27 combinations Etc.
32
Determine LR for Each Result Combination
How do you handle continuous tests?
Not practical for most groups of tests.
33
Recursive PartitioningMeasure NT First
34
Recursive PartitioningExamine Nasal Bone First
35
Recursive PartitioningExamine Nasal Bone
FirstCVS if P(Trisomy 21 gt 5)
36
Recursive PartitioningExamine Nasal Bone
FirstCVS if P(Trisomy 21 gt 5)
37
Recursive Partioning

Same as Classification and Regression Trees
(CART)
Dont have to work out probabilities (or LRs) for
all possible combinations of tests, because of
tree pruning

38
Tree Pruning Goldman Rule

8 Tests for Acute MI in ER Chest Pain Patient
ST Elevation on ECG
CP lt 48 hours
ST-T changes on ECG
Hx of MI
Radiation of Pain to Neck/LUE
Longest pain gt 1 hour
Age gt 40 years
CP not reproduced by palpation.

Goldman L, Cook EF, Brand DA, et al. A computer
protocol to predict myocardial infarction in
emergency department patients with chest pain. N
Engl J Med. 1988318(13)797-803.
39
8 tests ? 28 256 Combinations
40
(No Transcript)
41
Recursive Partitioning

Does not deal well with continuous test results
when there is a monotonic relationship between
between the rest result and the probability of
disease

42
Logistic Regression

Ln(Odds(D))
a bNTNT bNBENBE binteract(NT)(NBE)
1
- 0
More on this later in ATCR!

Logistic Regression Approach to the R/O ACI
patient

Coefficient MV Odds Ratio
Constant -3.93
Presence of chest pain 1.23 3.42
Pain major symptom 0.88 2.41
Male Sex 0.71 2.03
Age 40 or less -1.44 0.24
Age gt 50 0.67 1.95
Male over 50 years -0.43 0.65
ST elevation 1.314 3.72
New Q waves 0.62 1.86
ST depression 0.99 2.69
T waves elevated 1.095 2.99
T waves inverted 1.13 3.10
T wave ST changes -0.314 0.73
Selker HP, Griffith JL, D'Agostino RB. A tool
for judging coronary care unit admission
appropriateness, valid for both real-time and
retrospective use. A time-insensitive predictive
instrument (TIPI) for acute cardiac ischemia a
multicenter study. Med Care. Jul
199129(7)610-627. For corrected coefficients,
see http//medg.lcs.mit.edu/cardiac/cpain.htm
44
Clinical Scenario

71 y/o man with 2.5 hours of CP, substernal,
non-radiating, described as bloating. Cannot
say if same as prior MI or worse than prior
angina.
Hx of CAD, s/p CABG 10 yrs prior, stenting 3
years and 1 year ago. DM on Avandia.
ECG RBBB, Qs inferiorly. No ischemic ST-T
changes.

Real patient seen by MAK 1 am 10/12/04
45
(No Transcript)
46
Coefficient Clinical Scenario Clinical Scenario
Constant -3.93 Result -3.93
Presence of chest pain 1.23 1 1.23
Pain major symptom 0.88 1 0.88
Sex 0.71 1 0.71
Age 40 or less -1.44 0 0
Age gt 50 0.67 1 0.67
Male over 50 years -0.43 1 -0.43
ST elevation 1.314 0 0
New Q waves 0.62 0 0
ST depression 0.99 0 0
T waves elevated 1.095 0 0
T waves inverted 1.13 0 0
T wave ST changes -0.314 0 0
-0.87
Odds of ACI 0.418952
Probability of ACI Probability of ACI 30
47
What Happened to Pre-test Probability?

Typically clinical decision rules report
probabilities rather than likelihood ratios for
combinations of results.
Can back out LRs if we know prevalence, pD,
in the study dataset.
With logistic regression models, this backing
out is known as a prevalence offset. (See
Chapter 8A.)

48
Optimal Cutoff for a Single Continuous Test

Depends on
Pre-test Probability of Disease
ROC Curve (Likelihood Ratios)
Relative Misclassification Costs
Cannot choose an optimal cutoff with just the ROC
curve.

49
Optimal Cutoff Line for Two Continuous Tests
50
Choosing Which Tests to Include in the Decision
Rule

Have focused on how to combine results of two or
more tests, not on which of several tests to
include in a decision rule.
Options include
Recursive partitioning
Automated stepwise logistic regression

Choice of variables in derivation data set
requires confirmation in a separate validation
data set.
51
Need for Validation Example

Study of clinical predictors of bacterial
diarrhea.
Evaluated 34 historical items and 16 physical
examination questions.
3 questions (abrupt onset, gt 4 stools/day, and
absence of vomiting) best predicted a positive
stool culture (sensitivity 86 specificity 60
for all 3).
Would these 3 be the best predictors in a new
dataset? Would they have the same sensitivity
and specificity?

DeWitt TG, Humphrey KF, McCarthy P. Clinical
predictors of acute bacterial diarrhea in young
children. Pediatrics. Oct 198576(4)551-556.
52
Need for Validation

Develop prediction rule by choosing a few tests
and findings from a large number of
possibilities.
Takes advantage of chance variations in the data.
Predictive ability of rule will probably
disappear when you try to validate on a new
dataset.
Can be referred to as overfitting.

53
VALIDATION

No matter what technique (CART or logistic
regression) is used, the rule for combining
multiple test results must be tested on a data
set different from the one used to derive it.
Beware of validation sets that are just
re-hashes of the derivation set.
(This begins our discussion of potential problems
with studies of diagnostic tests.)

54
Studies of Diagnostic Test AccuracySackett, EBM,
pg 68

Was there an independent, blind comparison with a
reference (gold) standard of diagnosis?
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those in
whom we would use it in practice)?
Was the reference standard applied regardless of
the diagnostic test result?
Was the test (or cluster of tests) validated in a
second, independent group of patients?

55
Bias in Studies of Diagnostic Test Accuracy

Index Test Test Being Evaluated
Gold Standard Test Used to Determine True
Disease Status

56
Studies of Diagnostic TestsSackett, EBM, pg 68

Was there an independent, blind comparison with a
reference (gold) standard of diagnosis?
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those in
whom we would use it in practice)?
Was the reference standard applied regardless of
the diagnostic test result?
Was the test (or cluster of tests) validated in a
second, independent group of patients?

57
Studies of Diagnostic TestsIncorporation Bias
Index Test is incorporated into gold standard.
Consider a study of the usefulness of various
findings for diagnosing pancreatitis. If the
"Gold Standard" is a discharge diagnosis of
pancreatitis, which in many cases will be based
upon the serum amylase, then the study can't
quantify the accuracy of the amylase for this
diagnosis.
58
Studies of Diagnostic TestsIncorporation Bias
A study of BNP in dyspnea patients as a
diagnostic test for CHF also showed that the CXR
performed extremely well in predicting CHF.
The two cardiologists who determined the final
diagnosis of CHF were blinded to the BNP level
but not to the CXR report, so the assessment of
BNP should be unbiased, but not the assessment
CXR.
Maisel AS, Krishnaswamy P, Nowak RM, McCord J,
Hollander JE, Duc P, et al. Rapid measurement of
B-type natriuretic peptide in the emergency
diagnosis of heart failure. N Engl J Med
2002347(3)161-7.
59
Studies of Diagnostic TestsSackett, EBM, pg 68

Was there an independent, blind comparison with a
reference (gold) standard of diagnosis?
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those in
whom we would use it in practice)?
Was the reference standard applied regardless of
the diagnostic test result?
Was the test (or cluster of tests) validated in a
second, independent group of patients?

60
Studies of Diagnostic TestsVerification Bias
The study population only includes those to whom
the gold standard was applied, but patients with
positive index tests are more likely to be
referred for the gold standard.
Example V/Q Scan as a test for PE. Gold
standard is a PA-gram. Patients with negative
V/Q scans are less frequently referred for
PA-gram than those with positive V/Q scans.
Only patients who had PA-grams are included in
the study.
AKA Work-up, Referral Bias, or Ascertainment Bias
61
Studies of Diagnostic TestsVerification Bias
PA-gram PA-gram-
V/Q Scan a b
V/Q Scan - c ? d ?
Sensitivity (a/(ac)) is biased UP.
Specificity (d/(bd)) is biased DOWN.
62
Studies of Diagnostic TestsDouble Gold Standard
Bias
One gold standard (e.g. biopsy) is applied in
patients with positive index test, another gold
standard (e.g., clinical follow-up) is applied in
patients with a negative index test.
63
Studies of Diagnostic TestsDouble Gold Standard
Test V/Q Scan Disease PE Gold Standard PA-gram
in patients who had one, clinical follow-up in
patients who didnt Study Population All
patients presenting to the ED who received a V/Q
scan. Assume some patients did not get PA-gram
because of normal/low probability V/Q scans but
would have had positive PA-grams. Instead they
had negative clinical follow-up and were counted
as true negatives. If they had had PA-grams,
they would have been counted as false negatives.
PIOPED. JAMA 1990263(20)2753-9.
64
Studies of Diagnostic TestsDouble Gold Standard
PA-Gram PA-Gram -
V/Q Scan a b
V/Q Scan - c d
Sensitivity (a/(ac)) biased UP Specificity
(d/(bd)) biased UP
65
Studies of Diagnostic TestsSackett, EBM, pg 68

Was there an independent, blind comparison with a
reference (gold) standard of diagnosis?
Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those in
whom we would use it in practice)?
Was the reference standard applied regardless of
the diagnostic test result?
Was the test (or cluster of tests) validated in a
second, independent group of patients?

66
Studies of Diagnostic TestsSpectrum Bias
So far, we have said that PPV and NPV of a test
depend on the population being tested,
specifically on the prevalence of D in the
population.
We said that sensitivity and specificity are
properties of the test and independent of the
prevalence and, by implication at least, the
population being tested.
In fact,
67
Studies of Diagnostic TestsSpectrum Bias
Sensitivity depends on the spectrum of disease in
the population being tested.
Specificity depends on the spectrum of
non-disease in the population being tested.
68
Studies of Diagnostic TestsSpectrum Bias
D and D- groups are not homogeneous.
D-/D really is D-,D, D, or D
D-/D really is (D1-, D2-, or D3-)/D
69
Studies of Diagnostic TestsSpectrum Bias
Example Absence of Nasal Bone (on 13-week
ultrasound) as a Test for Chromosomal Abnormality
70
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality

Nasal D D- LR
Bone
Absent 229 129 7.0
Present 104 5094 0.4
Total 333 5223

Sensitivity 229/333 69 BUT the D group
only included fetuses with Trisomy 21
71
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality

D group excluded 295 fetuses with other
chromosomal abnormalities (esp. Trisomy 18)
If the purpose of the nasal bone exam is to
determine on whom to get CVS, these 295 fetuses
with chromosomal abnormalities other than trisomy
21 should be included in the D group.
95/295 (32, not 69) had absent nasal bone.

72
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality

Nasal D D- LR
Bone
Absent 22995 324 478 7.0
Present 104200304 4745 0.4
Total 333295628 5223

Sensitivity 324/628 52 NOT 69 obtained when
the D group only included fetuses with Trisomy 21
73
Spectrum BiasAbsence of Nasal Bone as a Test for
Chromosomal Abnormality

By excluding chromosomal abnormalities other than
Trisomy 21 from the D group, the study
exaggerates the sensitivity of the Nasal Bone
Exam (NBE) for chromosomal abnormalities.
True Sensitivity of NBE for chromosomal
abnormalities 52
Biased estimate due to spectrum bias (excluding
other chromosomal problems) 69

74
Biases in Studies of Tests

Overfitting Bias Data snooped cutoffs take
advantage of chance variations in derivations set
making test look falsely good.
Incorporation Bias index test part of gold
standard (Sensitivity Up, Specificity Up)
Verification/Referral Bias positive index test
increases referral to gold standard (Sensitivity
Up, Specificity Down)
Double Gold Standard positive index test causes
application of definitive gold standard, negative
index test results in clinical follow-up
(Sensitivity Up, Specificity Up)
Spectrum Bias
D sickest of the sick (Sensitivity Up)
D- wellest of the well (Specificity Up)

75
Biases in Studies of Tests

Dont just identify potential biases, figure out
how the biases could affect the conclusions.
Studies concluding a test is worthless are not
invalid if biases in the design would have led to
the test looking BETTER than it really is.

Write a Comment

User Comments (0)

About PowerShow.com

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy - PowerPoint PPT Presentation

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy

Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Chapter 8 Multiple Tests and Multivariable Decision Rules – PowerPoint PPT presentation