Title: CAT (Critically Appraised Topic) (adapted from Sackett, et al. 2000)
1CAT (Critically Appraised Topic) (adapted from
Sackett, et al. 2000)
- 1-page summary of evidence resulting from
critical appraisal of an article, test, etc. - Answers a specific foreground question
- Compared to no treatment, does
parent-administered treatment significantly
improve the language skills of toddlers with
language delay?
2First part of CAT identical for tx and dx studies
(see handout pp. 2-3)
- Clinical bottom line (appears 1st but completed
last) - Clinical question
- Search terms
- Appraised by whom, and date
- Synopsis of key (memorable) information, in a
concise, maximally useful format (e.g., types of
subjects, procedures, measures, results, etc.)
3CAT-egories (appraisal points) for a study of
therapy (Sackett et al., 2000)
- Prospective, controlled?
- Random assignment?
- Comparing gt 2 conditions?
- Recognizable subjects?
- Evidence of pre-tx group similarity?
- Blinding (insofar as possible) of evaluators,
relevant others?
4Appraisal points (cont.)
- Control over nuisance variables?
- Valid, reliable measures of tx effects?
- Statistically significant difference (p-value)?
- Practically significant difference (d-value)?
- Precision of treatment effects (narrow CI)?
- Outcomes for all enrolled?
- Cost-benefit and feasibility analyses?
5A sample treatment CAT
- CAT Language of delayed toddlers improves in
response to parent-administered focused
stimulation - Clinical bottom line Compared to an untreated
control group, motivated mothers of
low-vocabulary toddlers significantly decreased
their speaking rate and language complexity and
increased their vocabulary inputs in response to
18 hr of instruction in focused stimulation
techniques, and their children produced
significantly more words and early grammatical
forms. - Clinical question Compared to no treatment,
does parent-administered treatment significantly
improve the language skills of toddlers with
language delay? - Search terms word learning AND toddlers, PubMed
clinical query - Appraised by Dollaghan
6Key appraisal points
- Prospective, controlled Yes
- Randomized Yes
- Comparing gt 2 conditions Yes
- Recognizable Ss Yes
- Pre-tx similarity Yes
- Blinding Yes Cn no parent
- Control over nuisance variables Yes
- Valid, reliable measures Yes
- Statistically significant differences Yes
- Practically significant differences Yes
- Precision of treatment effects No
- Outcomes for all enrolled Yes
- Cost-benefit, feasibility analyses Yes
7Critical appraisal of evidence on diagnostic
indicators
- The key variables by which individuals are
identified as members of a class, ostensibly to
improve prediction and outcome for them - Myriad diagnostic indicators have been proposed
in communication sciences and disorders - Diagnostic indicators in your area of interest?
8Most diagnostic indicators in CSD are based on
Phase I studies
- Group mean comparison studies
- People with, and people without, the condition of
interest are compared with respect to a proposed
indicator - Correlational studies
- Association between proposed indicator and
accepted indicators - Such studies cant address the two most crucial
features of a diagnostic indicator accuracy and
precision
9Accuracy and precision
- Accuracy
- The ability of an indicator to identify a
condition of interest, i.e., the amount of
agreement between the proposed indicator and a
reference standard - Precision
- Width of confidence intervals (CI) for estimates
of accuracy
10Accuracy of a diagnostic indicator
- The ability of an indicator to identify a
condition of interest, i.e., the amount of
agreement between the proposed indicator and a
reference standard - Preferred measures of diagnostic accuracy
positive and negative likelihood ratios
(Battaglia et al., 2002)
11Positive Likelihood Ratio (LR)
- Reflects the degree of confidence that a person
who scores in the positive (affected or
disordered) range on a dx indicator does have the
disorder - Formula sensitivity/1-specificity
- The higher the LR, the more informative the
indicator for identifying people who have the
disorder
12Interpreting LR values (Sackett et al., 1991)
- LR gt 20 Very high virtually certain that a
person with this score has the disorder - LR 10 High disorder very likely in a person
- with this score
- LR 4 Intermediate the indicator is
- suggestive of disorder but
insufficient - to diagnose
- LR 1 Equivocal a person who scores in the
- disordered range on the measure may
- or may not have the disorder
the - measure provides no new
information
13Negative Likelihood Ratio (LR-)
- Reflects the degree of confidence that a person
scoring in the negative (normal) range on the
diagnostic indicator truly does not have the
disorder - Formula 1-sensitivity/specificity
- The lower the LR-, the more informative the
indicator for ruling out the presence of disorder
-
14Interpreting LR- values (Sackett et al., 1991)
- LR- lt 0.10 Very low virtually certain that a
- person scoring in this range does not
- have the disorder
- LR- 0.20 Low disorder very unlikely
- LR- 0.40 Intermediate the indicator is
suggestive - but insufficient to rule out the disorder
- LR- 1.0 Equivocal a person scoring in the
- normal range on this measure may or
- may not be normal
15Calculating sensitivity and specificity (nothing
more than LR precursors)
- Sensitivity the percentage of people with the
disorder that the new indicator correctly
classifies as disordered - Specificity the percentage of people who dont
have the disorder that the new indicator
correctly classifies as not disordered - The true status of every individual with regard
to the disorder is established according to a
gold (or reference) standard
16Disorder Status (re Gold Standard)
- Disorder (LN)
Disorder (LI)
a b
c d
Disorder (LI)
New Test Result
-Disorder (LN)
with disorder
without disorder
17Disorder Status (re Gold Standard)
Disorder (LI)
- Disorder (LN)
Disorder (LI)
New Test Result
-Disorder (LN)
Sensitivitya/ac (the proportion of people
with the disorder that the new test identifies as
having the disorder)
18Disorder Status (re Gold Standard)
Disorder
- Disorder
True positive a False positive b
c False negative d True negative
Disorder
New Test Result
-Disorder
Specificity d/bd (the proportion of
people without the disorder that the new test
identifies as not having the disorder)
19Example
- 100 children diagnosed with language impairments
(LI) and enrolled in language intervention, and
100 same-age children with no history of language
impairment (LN), were administered a new test of
grammatical morphology. - 80 of the children with LI, and 30 of the
children with LN, scored in the disordered range
on the new measure.
20Disorder Status (re Gold Standard)
Disorder (LI)
- Disorder (LN)
80 a 30 b
c (20) d (70)
Disorder (LI)
New Test Result
-Disorder (LN)
100 with disorder Sens a/ac 80/100 .80
100 without disorder Spec d/bd 70/100 .70
21Why not just use sensitivity and specificity as
measures of accuracy?
- Its their interrelationship that is most
important overall - Sensitivity and specificity vary substantially
according sample characteristics, including N,
base rate (prevalence), severity, confusability - Likelihood Ratios are not impervious to sample
characteristics, but are much less affected than
are sensitivity and specificity
22Calculating Likelihood Ratios
- Sens .80
- Spec .70
- LR sens/1-spec .80/.30 2.67
- LR- 1-sens/spec .20/.770 0.29
- Several programs, some free on web, are set up
to allow entry in 2x2 table format - In addition to accuracy measures, they also
provide information on precision
23Precision of a diagnostic indicator
- Width of confidence intervals (CI) for
sensitivity, specificity, and likelihood ratios,
calculated by adding and subtracting a multiple
of standard error (e.g., 1.96 SE for a 95 CI) - Standard error depends on sample size and
reliability larger samples and higher
reliability will result in narrower CIs, all else
being equal - Sackett et al. (2000) appendix shows how to
calculate CIs by hand, and programs (some free)
provide CIs given raw numbers in a 2x2 table
24Sample size and precision 95 CIs for studies
with same LRs but different Ns
- N 200 N 20
- Value (95 CI) (95 CI)
- Sens .80 (0.71-0.87) (0.44-0.98)
- Spec .70 (0.60-0.79) (0.35-0.93)
- LR 2.67 (1.98-3.70) (1.12-7.66)
- LR- 0.29 (0.19-0.42) (0.08-0.87)
25CAT-ing evidence on a diagnostic indicator
(Sackett et al., 2000 Battaglia et al., 2002)
- Does the study report a comparison between
measures, or measure and gold standard? - sine qua non for evidence of diagnostic accuracy
- Was the gold (or reference) standard valid,
reliable, and/or reasonable? - Gold standard and new indicator also must be
independent to avoid incorporation bias that can
inflate accuracy measures
26Criteria for diagnostic indicators (cont.)
- Were patients enrolled prospectively and
consecutively (or by random assignment), and - Did the sample include a spectrum of patient
types and severities? - These two criteria are important in avoiding
spectrum bias, in which the sample includes only
clear-cut or hand-picked cases and thus does not
represent the diagnostic task
27Criteria for diagnostic indicators (cont.)
- Were the new measure and the reference standard
administered independently, by different
examiners, and - Were the examiners blinded to the subjects
performance on the other test and to other
relevant subject information? - Were the new measure and the reference standard
both administered to all subjects and controls? - Important to avoid differential verification
bias, when controls are assumed to be normal
without testing on gold standard
28Criteria for diagnostic indicators (cont.)
- Do likelihood ratios suggest adequate diagnostic
accuracy? - LR gt 4.0 (gt 10 cf. Bayes Library, 2002)
- LR- lt 0. 40 (lt 0.20, cf Bayes Library, 2002)
- Precision (narrow confidence intervals)?
- Feasibility for usual clinical practice?
- Value (i.e., better than current measure)?
29Evidence on norm-referenced tests as diagnostic
indicators for early LI
- Many norm-referenced tests have diagnosis of LI
as their explicit purpose - A growing number of tests meet typical
psychometric criteria, e.g. N 100 subjects per
age level reliability gt .90 means, standard
deviations, and standard errors of measurement - But very few provide evidence of diagnostic
accuracy or precision, and none meet the
recommended critical appraisal criteria
30Norm-referenced tests not providing information
on accuracy or precision
- Test of Language Development (TOLD)
- Sequenced Inventory of Language Development
(SICD) - Test of Early Language Development (TELD)
- Reynell Scales
- MacArthur Communicative Development Inventories
(CDI)
31A few tests provide information allowing accuracy
and precision to be calculated
- Age LI LN LR (95 CI) LR- (95 CI)
- PLS-4 Total language score lt 85
- 3 24 24 6.7 (2.6-19.4) 0.19 (.08-.42)
- 4 23 23 18 (3.6-102) 0.23 (.10-.44)
- 5 28 28 4.4 (2.1-10.2) 0.26 (.12-.50)
- 3-5 75 75 6.7 (3.7-12.5) 0.23 (.14-.35)
- CELF-P Total language score lt 85
- 3-5 80 80 5.3 (2.9-10.2) 0.45 (.34-.58)
- CELF-P Total language score lt 77
- 3-5 80 80 12.7 (4.4-37.8) 0.54 (.43-.66)
- But note that these studies would fail many of
the other critical appraisal criteria, their
accuracy notwithstanding.
32The situation is no better for other proposed
diagnostic indicators
- Few compare indicator to a gold standard, so
accuracy cant be determined - Few used blinded examiners, so a high potential
for context and other biases - Small samples, wide CIs (rarely provided)
- When sensitivity and specificity have been
reported, they have sometimes been calculated
incorrectly and/or misinterpreted
33I choose not to despair
- Knowing the limitations of our diagnostic tools
is an important prerequisite to designing better
diagnostic tools - Several possible ways forward, most
involving clinician-researcher partnerships
34A way forward to EBP in Speech-language pathology
and Audiology
- Designing studies to meet the criteria for strong
evidence - e.g., STARD (Bossuyt et al., 2003) statement
- Large-scale, cooperative studies of diagnostic
indicators - CARE-COAD model (Straus et al. 2002)
- Dealing with the absence of a gold standard
- e.g., Demissie et al., 1998 Dunson, 2001
reliability and outcome studies - Diagnostic studies as multivariable, prediction
research (Moons Grobbee, 2002)
35Test yourself
- Critical appraisal of diagnostic test (handout p.
5) - Critical appraisal of treatment study (handout p.
4)
36Critical appraisal and CAT enable the remaining
steps to EBP
- 5. Decide whether the evidence is strong enough
to influence your clinical practice - 6. Integrate the evidence with the intangibles
- 7. Update!
37EBP is itself a set of assumptions, not a cult
- Ultimately, strong evidence will be needed to
determine whether EBP results in improved
clinical service. - And EBP cant be applied blindly, to all kinds of
problems...
38As with many interventions intended to prevent
ill health, the effectiveness of parachutes has
not been subjected to rigorous evaluation by
using randomised controlled trials. Advocates of
evidence based medicine have criticised the
adoption of interventions evaluated by using only
observational data. We think that everyone might
benefit if the most radical protagonists of
evidence based medicine organised and
participated in a double blind, randomised,
placebo controlled, crossover trial of the
parachute.
39Thanks!
References