Title: Establishing Clinically Important Differences CIDs in HealthRelated Quality of Life HRQoL Measures
1Establishing Clinically Important Differences
(CIDs) in Health-Related Quality of Life (HRQoL)
Measures
- Fredric D. Wolinsky
- The University of Iowa
- Center for Health Policy and Research
- October 29, 2004
2Three Reasons Why Patients Receive Medical
Treatment
- Increased longevity
- Prevention of future morbidity
- To feel better
3The DilemmaMeasuring the Success of
Interventions
- For the first two dimensionslongevity and
preventionthis is straightforward and relatively
easy - For HRQoL, however, it is not
- Indeed, physiological and laboratory test results
are often used as indirect measures or
substituted end points
4What Is HRQoL?
- The aspects of patients lives that they value
- At a minimum, measures should tap
- Symptoms
- The functional consequences of those symptoms
- Emotional function
- Measures may be disease-specific or generic
5How Is HRQoL Measured?
- Questionnaires ask patients how they feel or what
they are experiencing - Response options are typically yes/no, lt 7-point
scales, or visual analog scales - Answers are summed within domains or dimensions
to yield an overall score
6The Big Question
- When is the change between the baseline and
follow-up HRQoL scores for the same patient
clinically relevant? - Note we are assessing intra-individual change,
not inter-individual change
7The Answer
- The blunt version we dont know
- The politically correct version additional
work to enhance the interpretability of HRQoL
outcome measures, particularly in terms of
clinical significance, is needed - from Clancy and Eisenberg, Science 282245-246,
1998
8Two Methods for Determining the Minimum Level of
Real Change
- Distributional approaches
- Anchor-based approaches
9Common Distributional Approaches
- Effect size measures, where the average change is
divided by the baseline standard deviation
(Cohen, 1969) - Small effect size gt .20
- Medium effect size gt .50
- Large effect size gt .80
10Are Effect Size Measures Arbitrary?
- Testa (1987) says yes, and then arbitrarily
argues for .60 for medium effect size - Feinstein (1999) suggested .56 based on
mathematical properties of rxy - Sloan et al. (1998) showed that if the entire
range of a scale is covered in 6 SDs, then an
effect size of .50 is the same as the
anchor-based suggestion of 0.5 on a 7-point scale
11More on the 0.5 SD Effect Size
- Norman et al. (2003) recently reviewed 38 studies
that calculated MIDs (minimally important
differences) - In all but six studies, the MID was very close to
the .50 SD (M .495, SD .155) - And, it did not matter whether it was generic or
disease-specific, or how many response categories
there were
12The Standard Error of Measurement (SEM)
- The SEM is defined as the SD times the square
root of one minus its reliability coefficient, or - SD (SQRT (1 rxx))
- 1-SEM has been argued to represent reliable
change - If rxx .75, 1-SEM .50 SD
- If rxx gt .75, 1-SEM lt .50 SD
13Anchor-Based Measures
- This approaches uses an independent criterion
(i.e., anchor) instead of the statistical
distribution - Anchors are what the patient, the primary care
physician (PCP), or the expert says would be the
smallest difference that is perceived to be
beneficial and warrants a change in treatment
14An Anchor-Based ExampleGlobal Assessment Items
- Overall, how has your energy level vitality
changed since your last interview? Would you say
your energy level is worse, about the same, or
better? - If worse How much worse do you feel? Would you
say you feel almost the same, hardly any worse
at all a little worse somewhat worse
moderately worse a good deal worse a great deal
worse or a very great deal worse?
15An Anchor-Based ExampleGlobal Assessment Items
- This yields a 15-point response scale, (-7 to 7)
- Conceptually (expressed in absolute values)
- 0 and 1 reflect no change
- 2 and 3 reflect small change
- 4 and 5 reflect medium change
- 6 and 7 reflect large change
- The average gain scores (deltas) within these
categories reflect CIDs
16Guyatt et al.s Three HRQoL Measures With
Established CIDs
- CHQ -- the Chronic Heart Failure Questionnaire --
for CAD/CHF patients - CRQ -- the Chronic Respiratory Questionnaire --
for COPD patients - AQLQ -- the Asthma Quality of Life Questionnaire
-- for asthma patients
17These Measures Use 7-Point Response Sets with Two
Types of Items
- Standard items the same questions asked of all
patients in the study - Patient-specific items different questions
generated for each patient, based on a 3-step
process - ask patients what is important to them
- review a standard list of items
- have patients select the five most important items
18Bad News and Good News About Patient-Specific
Questions
- Bad news it takes more time to administer
(about 8-10 minutes to derive the top five
affected activities), and is cognitively taxing
for older or disadvantaged patients - Good news it increases relevance to each
patient, and enhances follow-up participation
rates
19The CHQThree Domains, 16 Items
- Dyspnea is tapped by 5 patient-specific
activities affected by their disease - Fatigue is tapped by 4 standard items
- Emotional function is tapped by 7 standard items
20The CRQFour Domains, 20 Items
- Dyspnea is tapped by 5 patient-specific
activities affected by their disease - Fatigue is tapped by 4 standard items
- Emotional function is tapped by 7 standard items
- Mastery is tapped by 4 standard items
21The AQLQFour Domains, 32 Items
- Activities are tapped by 5 patient-specific and 6
standard activities - Symptoms are tapped by 12 standard items
- Emotional function is tapped by 5 standard items
- Environment is tapped by 4 standard items
22Guyatt et al. Used Two Streams of Data to Derive
CIDs
- Panel of providers experienced in administering
the CHQ and CRQ questionnaires - same small panel for the CHQ CRQ
- no panel for the AQLQ
- Patients with symptomatic disease
- very small samples
- CHF and COPD patients combined
23What the Panelists and the Patients Did
- Panelists defined CIDs by consensus
- Patients completed HRQoL measures at baseline and
follow-up, and global change assessments at
follow-up, from which anchor gain-scores were
calculated - By triangulation, a minimal CID standard was
defined as a 0.5 per-item change
24The Aims of Our AHRQ-Funded Study
- Refine and extend CIDs (small, medium, and large)
for the CHQ, CRQ and AQLQ in CAD/CHF, COPD, or
asthma patients - Develop comparable CIDs for the SF-36 (V-2) in
these three patient groups - Identify SEM values corresponding to these CIDs
25Why Include the SF-36, V-2?
- It is the most widely used HRQoL
- Cost-effectiveness studies frequently use it
- No CIDs exist for the SF-36, despite the myth,
based on inter-individual data, that the CID
equals 3-5 points for any subscale - New response sets provide better reliability
26The SF-36, V-2Eight Domains, 35 Standard Items
- Physical functioning 10 items, 3 responses
- Role physical 4 items, 5 responses
- Bodily pain 2 items, 5 or 6 responses
- General health perceptions 5 items, 5 responses
- Vitality 4 items, 5 responses
- Social functioning 2 items, 5 responses
- Role emotional 3 items, 5 responses
- Mental health 5 items, 5 responses
27We Rely On Three Streams of Data
- North American panels of expert physicians to
define CIDs - Patients attending general medicine clinics
participating in a multi-wave longitudinal
observational study - The primary care physicians of the patients in
that longitudinal study
28Data Stream OneThe Expert Physician Panels
- Three separate panels of nine physicians each
- Potential physician panelists were identified
using Medline searches for published studies
using the CHQ, CRQ, AQLQ, or SF-36 among patients
within the three disease groups - Final panelist selection was based on obtaining a
balance between generalists and specialists,
geographic diversity, research groups, and
availability
29The Three Tasks forThe Expert Physician Panels
- Approve search terms for electronic medical chart
reviews that identify potentially eligible
patients - Achieve consensus on CID thresholds for small,
medium, and large individual change for the
better or worse - Agree on the suggested wording of the global
change questions and their response options that
will be asked of patients at follow-up
30Data Stream TwoThe Middle Aged or Older Patients
- 1,662 patients, or 300 targeted in each disease
group (CAD/CHF, COPD, and asthma) at each
clinical site (St. Louis VAMC PRIME and related
clinics Indiana University PCC clinics) - A three-part eligibility processelectronic
medical chart review, PCP confirmation, and
patient screening
31The Three Tasks forThe Middle Aged or Older
Patients
- 15-minute enrollment interview in the clinic to
elicit the HRQoL patient-specific items - 30-minute baseline telephone interview that
includes the disease-specific measure, the SF-36,
socio-demographics, stress, sense of control,
religiosity, and patient satisfaction measures - Six bimonthly 30-minute telephone interviews
including most of the above measures and the
retrospective change questions
32Data Stream ThreeThe Primary Care Physicians
- 46 GIM faculty who each have 50 or more
potentially eligible patients - 14 PCPs from the St. Louis VAMC clinics
- 32 PCPs from Indiana University clinics
33Three Tasks for The Primary Care Physicians
- Chart review and disease confirmation of their
own potentially eligible patients - Completion of a 6-item questionnaire about their
patients disease severity, prognosis, and
treatment history at the time of the enrollment
visit - Completion of a 6-item questionnaire at their
patients subsequent visits about clinically
significant changes since the last visit, and
whether these resulted in a treatment modification
34How We Are Determining the CIDs
35Determining the Expert-Based CIDs
- A two-round Delphi approach including background
materials - A 4-6 hour in-person meeting to review the Delphi
results and achieve consensus on the CIDs - Iterative review of the panel reports to achieve
consensus - Each panel report has been published (JGIM, AHJ,
AAAI), the comparison paper is in press (HSR)
36Determining the PCP-Based CIDs
- At follow-up visits PCPs are asked if there has
been a clinically significant change in the
patients condition since the last visit, if so
in what direction, and was that small, medium, or
large (7 not 15 responses) - Within those response groups, the patients gain
score in HRQoL (current-last) is calculated
37Determining the Patient-Based CIDs
- At follow-up visits patients are asked if there
has been a significant change in their condition
since the last visit, if so in what direction,
and on a 7 point scale, how much better or worse
(15 responses) - Within those response groups, the patients gain
score in HRQoL (current-last) is calculated
38Arriving at Common Ground
- Triangulate or consolidate the three (expert,
PCP, and patient-based) sets of CIDs -
- But, if they are too disparate, we have specified
an a priori evidentiary hierarchy - --PCPs (its CIDs)
- --Patients (its their HRQoL)
- --Experts (they werent there)
39Results The Expert Physician Data Stream
40(No Transcript)
41(No Transcript)
42 ResultsThe PCP Data Stream
43PCP Baseline Questionnaires
- Rating of patient severity M 3.0 (1 much
worse, 3 about average, 5 much better) - Approximate chance of patient being hospitalized
or dying in two years MH 28.1, MD 14.8 - Patients on medication (97), who ever had lab
tests ordered (84), or were referred to
specialists (50) for the target condition
44PCP Follow-up Questionnaires
- PCPs reported CIDs in the patients condition in
205 (19.4) of the 1,057 follow-up visits - Of these 205 linked visits
- 65.4 were declines
- 56.5 were small changes
- 36.5 were medium changes
- 6.8 were large changes
45PCP Follow-up Questionnaires
- Among the 205 patients with CIDs in their target
condition, at that visit - 46.8 had orders for changes in medication
- 24.9 had orders for labs or procedures
- 12.7 had referrals to specialists
46Mean Transformed Gain Scores by the PCPs Global
Change Perception Category for the SF-36 Scales.
47Mean Transformed PCP-Based CIDs for the AQLQ
Scales.
48Mean Transformed PCP-Based CIDs for the CHQ
Scales.
49Mean Transformed PCP-Based CIDs for the CRQ
Scales.
50Results The Patient Data Stream
51Patient Follow-up Questionnaires
- Patients reported CIDs in their condition in
2,981 (36.2) of their 8,254 linked follow-up
visits - Of these 2,981 linked follow-up visits with CIDs
- 62.5 were declines
- 45.6 were small changes
- 35.4 were medium changes
- 19.0 were large changes
52Mean Transformed Gain Scores by Global Change
Perception Category for the SF-36 Scales
53Mean Transformed Gain Scores by Global Change
Perception Category for the AQLQ Scales
54Mean Transformed Gain Scores by Global Change
Perception Category for the CHQ Scales
55Mean Transformed Gain Scores by Global Change
Perception Category for the CRQ Scales
56SummaryThe Expert Panels
- Used doubling and trebling rules to go from small
to medium to large CIDs - Assumed that CIDs for improvement and decline
were symmetrical - Used state change multiples for the SF-36
- Were a bit (heart, lung) to twice (asthma) as
conservative as Guyatts original panels
57SummaryThe PCPs
- Were less likely than patients to indicate that
change had occurred (19 vs. 36) - Were more likely than patients to indicate that
changes were small (57 vs. 46) and less likely
to indicate that they were large (7 vs. 19) - Did not alter the treatment regimen for 58 of
patients with CIDs
58SummaryThe Patients
- Because of the high (gt .75) reliability estimates
for the scales, the value of 1-SEM was often much
less than that of 0.5 SD - The anchor-based CIDs were not symmetrical, and
the expert panels doubling and trebling rules
did not hold - The anchor-based CIDs are much smaller than the
distributional or panel CIDs
59Conclusion
- Reconciling the CID estimates from the three data
streams is not going to be straightforward - So, stay tuned