Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to C - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to C

Description:

Do clinical assessment only on those who screen 'likely' ... Life Inventory ..., Med Care, 2001;39:800-812. 32. Types of Measurement ... positive personality ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to C


1
Class 5Additional Psychometric
Characteristics Validity and Bias,
Responsiveness, Sensitivity to Change October
16, 2008
  • Anita L. Stewart
  • Institute for Health Aging
  • University of California, San Francisco

2
Overview
  • Validity
  • Including bias
  • How bias affects validity
  • Responsiveness, sensitivity to change
  • Meaningfulness of change

3
Validity
  • Does a measure (or instrument) measure what it is
    supposed to measure?
  • AndDoes a measure NOT measure what it is NOT
    supposed to measure?

4
Valid Scale? No!
  • There is no such thing as a valid scale
  • We accumulate evidence of validity in a variety
    of populations in which it has been tested
  • Similar to reliability

5
Validation of Measures is an Iterative, Lengthy
Process
  • Accumulation of evidence
  • Different samples
  • Longitudinal designs

6
Types of Measurement Validity
  • Content
  • Criterion
  • Construct
  • Convergent
  • Discriminant
  • Convergent/discriminant

All can be Concurrent Predictive
7
Content Validity
  • Relevant when writing items
  • Extent to which a set of items represents the
    defined concept

8
Relevance of Content Validity to Selecting
Measures
  • Conceptual adequacy
  • Does candidate measure represent adequately the
    concept YOU are intending to measure

9
Content Validity Appropriate at Two Levels
  • Battery or Are all relevant domainsinstrument
    represented in an instrument?
  • Measure Are all aspects of a defined
    concept represented in the items of a
    scale?

10
Example of Content Validity of Instrument
  • You are studying health-related quality of life
    (HRQL) in clinical depression
  • Your HRQL concept includes sleep problems,
    ability to work, and social functioning
  • SF-36 - a candidate
  • Missing sleep problems

11
Types of Measurement Validity
  • Content
  • Criterion
  • Construct
  • Convergent
  • Discriminant
  • Convergent/discriminant

All can be Concurrent Predictive
12
Criterion Validity
  • How well a measure correlates with another
    measure considered to be an accepted standard
    (criterion)
  • Can be
  • Concurrent
  • Predictive

13
Criterion Validity of Self-reported Health Care
Utilization
  • Compare self-report with objective data
    (computer records of utilization)
  • MD visits past 6 months (self-report)
    correlated .64 with computer records
  • hospitalizations past 6 months (self-report)
    correlated .74 with computer records

Ritter PL et al, J Clin Epid, 200154136-141
14
Criterion Validity of Screening Measure
  • Develop depression screening tool to identify
    persons likely to have disorder
  • Do clinical assessment only on those who screen
    likely
  • Criterion validity
  • Extent to which the screening tool detects
    (predicts) those with disorder
  • sensitivity and specificity, ROC curves

15
Criterion Validity of Measure to Predict Outcome
  • If goal is to predict health or other outcome
  • Extent to which the measure predicts the outcome
  • Example Develop self-reported war-related stress
    measure to identify vets at risk of PTSD
  • How well does it predict subsequent PTSD (Vogt et
    al., 2004, readings)

16
Interpreting Validity Coefficients
  • Magnitude and conformity to hypothesis are
    important, not statistical significance
  • Nunnally rarely exceed .30 to .40 which may be
    adequate (1994, p. 99)
  • McDowell and Newell typically between 0.40 and
    0.60 (1996, p. 36)
  • Max correlation between 2 measures square root
    of product of reliabilities
  • 2 scales with .70 reliabilities, max correlation
    .70
  • Correlation of .60 would be high

17
Types of Measurement Validity
  • Content
  • Criterion
  • Construct
  • Convergent
  • Discriminant
  • Convergent/discriminant

All can be Concurrent Predictive
18
Construct Validity Basics
  • Does measure relate to other measures in
    hypothesized ways?
  • Do measures behave as expected?
  • 3-step process
  • State hypothesis direction and magnitude
  • Calculate correlations
  • Do results confirm hypothesis?

19
Source of Hypotheses in Construct Validity
  • Prior literature in which associations between
    constructs have been observed
  • e.g., other samples, with other measures of
    constructs you are testing
  • Theory, that specifies how constructs should be
    related
  • Clinical experience

20
Who Tests for Validity?
  • When measure is being developed, investigators
    should test construct validity
  • As measure is applied, results of other studies
    provide information that can be used as evidence
    of construct validity

21
Types of Measurement Validity
  • Content
  • Criterion
  • Construct
  • Convergent
  • Discriminant
  • Convergent/discriminant

All can be Concurrent Predictive
22
Convergent Validity
  • Hypotheses stated as expected direction and
    magnitude of correlations
  • We expect X measure of depression to be
    positively and moderately correlated with two
    measures of psychosocial problems
  • The higher the depression, the higher the level
    of problems on both measures

23
Testing Validity of Expectations Regarding Aging
Measure
  • Hypothesis 1 ERA-38 total score would correlate
    moderately with ADLS, PCS, MCS, depression,
    comorbidity, and age
  • Hypothesis 2 Functional independence scale would
    show strongest associations with ADLs, PCS, and
    comorbidity

Sarkisian CA et al. Gerontologist. 200242534
24
Testing Validity of Expectations Regarding Aging
Measure
  • Hypothesis 1 ERA-38 total score would correlate
    moderately with ADLS, PCS, MCS, depression,
    comorbidity, and age (convergent)
  • Hypothesis 2 Functional independence scale would
    show strongest associations with ADLs, PCS, and
    comorbidity

Sarkisian CA et al. Gerontologist. 200242534
25
ERA-38 Convergent Validity Results Hypothesis 1
26
ERA-38 Non-Supporting Convergent Validity Results
27
Types of Measurement Validity
  • Content
  • Criterion
  • Construct
  • Convergent
  • Discriminant
  • Convergent/discriminant

All can be Concurrent Predictive
28
Discriminant Validity Known Groups
  • Does the measure distinguish between groups
    known to differ in concept being measured?
  • Tests for mean differences between groups

29
Example of a Known Groups Validity Hypothesis
  • Among three groups
  • General population
  • Patients visiting providers
  • Patients in a public health clinic
  • Hypothesis scores on functioning and well-being
    measures will be the best in a general population
    and the worst in patients in a public health
    clinic

30
Mean Scores on MOS 20-item Short Form in Three
Groups
  • Public
  • General MOS health
  • population patients patients
  • Physical function 91 78 50
  • Role function 88 78 39
  • Mental health 78 73 59
  • Health perceptions 74 63 41
  • Bindman AB et al.,
    Med Care 1990281142

31
PedsQL Known Groups Validity
  • Hypothesis PedsQL scores would be lower in
    children with a chronic health condition than
    without

JW Varni et al. PedsQL 4.0 Reliability and
Validity of the Pediatric Quality of Life
Inventory , Med Care, 200139800-812.
32
Types of Measurement Validity
  • Content
  • Criterion
  • Construct
  • Convergent
  • Discriminant
  • Convergent/discriminant

All can be Concurrent Predictive
33
Convergent/Discriminant Validity
  • Does measure correlate lower with measures it is
    not expected to be related to than to measures
    it is expected to be related to?
  • The extent to which the pattern of correlations
    conforms to hypothesis is confirmation of
    construct validity

34
Basis for Convergent/Discriminant Hypotheses
  • All measures of health will correlate to some
    extent
  • Hypothesis is of relative magnitude

35
Example of Convergent/Discriminant Validity
Hypothesis
  • Expected pattern of relationships
  • A measure of physical functioning is
    hypothesized to be more highly related to a
    measure of mobility than to a measure of
    depression

36
Example of Convergent/Discriminant Validity
Evidence
  • Pearson correlation
  • Mobility Depression
  • Physical functioning .57 .25

37
Testing Validity of Expectations Regarding Aging
Measure
  • Hypothesis 1 ERA-38 total score would correlate
    moderately with ADLS, PCS, MCS, depression,
    comorbidity, and age (convergent)
  • Hypothesis 2 Functional independence scale would
    show strongest associations with ADLs, PCS, and
    comorbidity (convergent/discriminant)

Sarkisian CA et al. Gerontologist. 200242534
38
ERA-38 Convergent/Discriminant Validity Results
Hypothesis 2
39
ERA-38 Non-Supporting Validity Results
40
Construct Validity Thoughts Lee Sechrest
  • There is no point at which construct validity is
    established
  • It can only be established incrementally
  • Our attempts to measure constructs help us better
    understand and revise these constructs

Sechrest L, Health Serv Res, 200540(5 part II),
1596
41
Construct Validity Thoughts Lee Sechrest (cont)
  • An impression of construct validity emerges from
    examining a variety of empirical results that
    together make a compelling case for the assertion
    of construct validity

42
Construct Validity Thoughts Lee Sechrest (cont)
  • Because of the wide range of constructs in the
    social sciences, many of which cannot be exactly
    defined..
  • once measures are developed and in use, we must
    continue efforts to understand them and their
    relationships to other measured variables.

43
Overview
  • Validity
  • Including bias
  • Responsiveness, sensitivity to change
  • Meaningfulness of change

44
Components of an Individuals Observed Item Score
(from Class 3)
  • Observed true item
    score score

random systematic
error


45
Random versus Systematic Error
  • Observed true item
    score score

Relevant to reliability
random systematic
error


Relevant to validity
46
Bias is Systematic Error
  • Affects validity of scores
  • If scores contain systematic error, cannot know
    the true mean score
  • Will obtain an observed score that is either
    systematically higher or lower than the true
    score

47
Bias or Systematic Error?
  • Bias implies that the direction of error known
  • Systematic error direction neutral
  • Same error applies to entire sample

48
Sources of Bias in Observed Scores of
Individuals
  • Respondent
  • Socially desirable responding
  • Acquiescent response bias
  • Cultural beliefs (e.g., not reporting distress)
  • Halo affects
  • Observer
  • Belief that respondent is ill
  • Instrument

49
Socially Desirable Responding
  • Tendency to respond in socially desirable ways to
    present oneself favorably
  • Observed score is consistently lower or higher
    than true score in the direction of a more
    socially acceptable score

50
Socially Desirable Response Set Looking good
  • After coming up with an answer to a question,
    respondent screens the answer
  • Will this make the person like me less?
  • May edit their answer to be more desirable
  • Example a woman has 2 drinks of alcohol a day,
    but responds that she drinks a few times a week
  • Systematic underreporting of risk behavior

51
Ways to Minimize Socially Desirable Responding
  • Write items to increase acceptability of an
    undesirable response
  • Instead of
  • Have you followed your doctors
    recommendations?
  • Use
  • Have you had any of the following problems
    following your doctors recommendations?

52
Example of Bias Due to Cultural Norms or Beliefs
  • A person feels sad most of the time
  • Unwilling to admit this to the interviewer so
    answers a little of the time
  • Not culturally appropriate to admit to negative
    feelings
  • Always present a positive personality
  • Observed response reflects less sadness than
    true sadness of respondent

53
Acquiescent Response Set - Yea Saying
  • Tendency to
  • agree with statements regardless of content
  • give positive response such as yes, true,
    satisfied
  • Extent and nature of bias depends on direction of
    wording of the questions
  • Minimizing acquiescence
  • Include positively- and negatively-worded items
    in the same scale

54
Discrepancies in Various Information Sources
Bias or Different Perspectives?
  • In reporting on a patients well-being
  • Patients report highest levels
  • Clinicians report levels in the middle
  • Family members report the lowest levels
  • No way to know which is the true score
  • to say one score is biased implies another one
    is the true score

55
Overview
  • Validity
  • Including bias
  • Responsiveness, sensitivity to change
  • Meaningfulness of change

56
Two Meanings of Sensitivity and Responsiveness
to Change
  • Measure able to detect true changes
  • One knows how much change is meaningful
  • regardless of statistical significance
  • change scores are interpretable in terms of
    meaningfulness

57
Sensitivity to Change Detects True Change
  • Sensitive to true differences or changes in the
    attribute being measured
  • Sensitive enough to measure differences in
    outcomes that might be expected given the
    relative effectiveness of treatments
  • Ability of a measure to detect change
    statistically

58
Instrument has Potential Distribution of Scores
to Detect Change
  • Evidence of good variability in sample like yours
    (at baseline)
  • Room to improve
  • Multi-item scales many scale levels

59
Importance of Sensitivity
  • Need to know measure can detect change if
    planning to use it as outcome of intervention
  • Approaches for testing sensitivity are often
    simultaneous tests of
  • effectiveness of an intervention
  • sensitivity or responsiveness of measures

60
Considerations in Developing CHAMPS Physical
Activity (PA) Questionnaire
  • Needed outcome measure to detect changes in PA
    due to CHAMPS intervention
  • increase PA levels in everyday life (e.g.,
    walking, stretching) in activities of their
    choice
  • Existing measures designed to capture younger
    persons PA

Stewart AL et al. Med Sci Sports Exerc,
2001331126-1141.
61
Changes in Measure Resulting from Intervention
Validity Evidence for Others
  • After intervention detected PA change, others
    used our results as evidence of sensitivity to
    change
  • Used in Project ACTIVE because of its
    sensitivity to change in CHAMPS (S Wilcox et al,
    Am J Pub Health, 2006961201-1209)
  • Changes in a depression measure in a drug trial
    is evidence that the measure is capable of
    detecting change in another study

62
Measuring Sensitivity
  • Score is stable in those who are not changing
  • Score changes in those who are actually changing
    (true change)
  • Not easy to quantify
  • can administer multiple measures of same concept
    in intervention
  • see which measures change the most

63
Responsiveness to Change
  • Used DSM-IV criteria to classify patients who had
    major depression at earlier time into
  • Persistent depression
  • Partial remission
  • Full remission
  • Examined PHQ-9 change scores in relation to these
    criteria
  • PHQ-9 a short screener for depression

Löwe B et al. Med Care, 2004421194-1201
64
Changes in PHQ-9 Scores by Criteria of Change in
Depression
Löwe et al, 2004, p. 1200
65
Relevant or Meaningful Change
  • Is the observed change important?
  • To clinician
  • meaningful to clinician
  • change might influence patient management
  • To patient
  • patient notices change
  • amount of change matters

66
Statistical Significance versus Importance
  • Statistical significance is not sufficient for
    clinical importance
  • Depends on sample size
  • Can obtain statistical significance of a very
    small change

67
Minimal Important Difference (MID)
  • MID the minimal difference that is clinically
    important
  • Smallest difference considered to be worthwhile
    or important
  • Context specific

68
Anchor-Based Approaches to Estimating MID
  • Anchor external information on amount of change
  • Identify group that you know has changed by a
    minimal amount
  • Clinical change
  • Patient reported change
  • Change in health measure for this group MID

69
Example of Patient-Reported Anchor
  • Since one year ago, how would you rate your
    health in general now?
  • Much worse now than one year ago
  • Somewhat worse than one year ago
  • About the same as one year ago
  • Somewhat better than one year ago
  • Much better than one year ago

70
Two Categories Can Define Minimal Change Groups
  • Since one year ago, how would you rate your
    health in general now?
  • Much worse now than one year ago
  • Somewhat worse than one year ago
  • About the same as one year ago
  • Somewhat better than one year ago
  • Much better than one year ago

71
Minimal Change Groups
  • Select subset of respondents who reported
    somewhat better or somewhat worse
  • change in your health measure for this subset
    would constitute the MID
  • Could also combine two groups using absolute
    change

72
Other Approaches to Assess Meaning of Change
(Relative to a Measured Change)
  • Patient noticed change
  • Since ., how would you rate the amount of change
    in your physical functioning?
  • 7-point scale very much better . very much
    worse
  • Patient satisfied with change
  • How satisfied are you with the amount of change
    in physical functioning?
  • 7-point scale extremely satisfied not at all
    satisfied

73
Other Measures of Perceived Change
  • Study of patients with hip or knee replacement
  • How successful was your (hip, knee) replacement
    in..
  • allowing you to return to your normal daily
    activities?
  • relieving your pain?
  • Response choices extremely, very, moderately,
    slightly, not at all successful

KB Bayley et al. Med Care 199533AS226
74
Next Class (Class 5)
  • Factor analysis with Steve Gregorich

75
Homework
  • Complete rows 21-27 in matrix for your two
    measures
  • Nature of samples on which it has been tested,
    validity, responsiveness and sensitivity to
    change
Write a Comment
User Comments (0)
About PowerShow.com