Validity - PowerPoint PPT Presentation

About This Presentation
Title:

Validity

Description:

... agreement between a test score and the ... Love integrates Attachment, Caring, & Intimacy ... Validity is a characteristic of evidence, not of tests. ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 55
Provided by: Patr581
Category:
Tags: love | tests | validity

less

Transcript and Presenter's Notes

Title: Validity


1
Validity Outline
  • Definition
  • Validity Two Different Views
  • Types of Validity
  • Face
  • Content
  • Criterion
  • Predictive vs. Concurrent
  • Validity Coefficients
  • Construct
  • Convergent
  • Discriminant

2
Validity Definition
  • Validity measures agreement between a test score
    and the characteristic it is believed to measure
  • The basic question is are you measuring what you
    think youre measuring?

3
Validity two very different views
  • Traditional
  • Validity is a property of tests
  • Does the test measure what you think it measures?

4
Validity two very different views
  • Traditional
  • Recent (e.g, Messick, 1989 Committee on
    Standards for Educational and Psychological
    Testing (CSEPT))
  • Validity is a property of test score
    interpretations
  • Validity exists when actions based on the
    interpretation are justified given a theoretical
    basis and social consequences

5
Note the difference
  • Does the test measure what you think it measures?
  • Validity exists when actions based on the
    interpretation are justified given a theoretical
    basis and social consequences

6
A problem with the CSEPT view
  • Who is to say the social consequences of test
    use are good or bad?
  • According to CSEPT validity is a subjective
    judgment
  • In my view, this makes the concept useless if
    you like the result the test gives you, you will
    consider it valid. If you dont, you wont.
  • Thats not how scientists think.

7
Borsboom et al. (2004)
  • Borsboom et al reject CSEPTs view
  • Validity is a very basic concept and was
    correctly formulated, for instance, by Kelley
    (1927, p. 14) when he stated that a test is valid
    if it measures what it purports to measure. (p.
    1061)

8
Borsboom et al. (2004)
  • a test is valid for measuring an attribute if
    and only if (a) the attribute exists and (b)
    variations in the attribute causally produce
    variations in the outcomes of the measurement
    procedure.
  • Variations in what you are measuring cause
    variations in your measurements.
  • E.g., variations across people in intelligence
    cause variations in their IQ scores
  • This is not a correlational model of validity

9
Borsboom et al. (2004)
  • You dont create a test and then do the analysis
    necessary to establish its validity
  • Rather, you begin by doing the theoretical work
    necessary to create a valid test in the first
    place.
  • On this view, validity is not a big issue.

10
Borsboom et al. vs. CSEPT
  • Who is right?
  • Each scientist has to make up his or her own
    mind on that question
  • I find Borsboom et al.s arguments compelling.
  • Other psychologists may disagree

11
The CSEPT view
  • CSEPT recognizes 3 types of evidence for test
    validity
  • Content-related
  • Criterion-related
  • Construct-related
  • Boundaries not clearly defined
  • Cronbach (1980) Construct is basic, while
    Content Criterion are subtypes.

12
Parenthetical Point Face Validity
  • Face validity refers to the appearance that a
    test measures what it is intended to measure.
  • Face validity has P.R. value test-takers may
    have better motivation if the test appears to be
    a sensible way to measure what it measures.

13
CSEPT Content validity
  • Content-related evidence considers coverage of
    the conceptual domain tested.
  • Important in educational settings
  • Like face validity, it is determined by logic
    rather than statistics
  • Typically assessed by expert judges

14
CSEPT Content validity
  • Content-related evidence considers coverage of
    the conceptual domain tested.
  • Construct-irrelevant variance
  • Construct under-representation
  • Is each item relevant to domain?
  • Is domain adequately covered or are parts of it
    left out?
  • But if you are going to ask these questions, why
    not do it when creating the test?

15
Borsboom et al. Content validity
  • Borsboom et al. would say that content validity
    is not something to be established after the test
    has been created.
  • Rather, you build it into your test by having a
    good theory of what you are testing
  • E.g., for a test in this course to have content
    validity, it should test your understanding of
    content validity!

16
CSEPT Criterion validity
  • Criterion-related evidence tells us how well a
    test score corresponds to a particular criterion
    measure.
  • A criterion is a standard against which a test is
    compared.
  • The test score should tell us something about the
    criterion score.

17
CSEPT Criterion validity
  • A criterion is a standard against which a test is
    compared.
  • E.g., we could compare GPAs to SAT scores to
    produce evidence of validity of conclusions drawn
    on basis of SAT scores
  • Two basic types
  • Predictive
  • Concurrent

18
CSEPT Criterion validity
  • Predictive validity
  • Test scores used to predict future performance
    how good is the prediction?
  • E.g., SAT is used to predict final undergraduate
    GPA
  • SAT GPA are moderately correlated

19
CSEPT Criterion validity
  • Predictive validity
  • Concurrent validity
  • Correlation between test scores and criterion
    when the two are measured at same time.
  • Test illuminates current performance rather than
    predicting future performance (e.g., why does
    patient have a temperature? Why cant student do
    math?)

20
Borsboom et al. Criterion validity
  • Criterion validity involves a correlation, of
    test scores with some criterion such as GPA
  • That does not establish the tests validity, only
    its utility.
  • E.g., height and weight are correlated, but a
    test of height is not a test of what bathroom
    scales measure.

21
Borsboom et al. Criterion validity
  • SAT is valid because it was developed on the
    sensible theory that past academic achievement
    is a good guide to future academic achievement
  • Validity is built into the test, not established
    after the test has been created

22
Borsboom et al. Criterion validity
  • Validation research aims at showing how variation
    in the attribute causes variation in the test
    score
  • This requires a theory of the task how does
    the test-taker do the mental operations needed to
    respond to test items?

23
CSEPT Criterion validity
  • Note no point in developing a test if you
    already have a criterion unless impracticality
    or expense makes use of the criterion difficult.
  • Criterion measure only available in the future?
  • Criterion too expensive to use?

24
CSEPT Criterion validity
  • Validity Coefficient
  • Compute correlation (r) between test score and
    criterion.
  • r .30 or .40 would be considered normal.
  • r gt .60 is rare
  • Note r varies between -1.0 and 1.0

25
CSEPT Criterion validity
  • Validity Coefficient
  • r2 gives proportion of variance in criterion
    explained by test score.
  • E.g., if rxy .30, r2 .09, so 9 of
    variability in Y can be explained by variation
    in X

26
CSEPT Criterion validity
  • Interpreting Validity Coefficients watch out
    for
  • Changes in causal relationships
  • What does criterion mean? Is it valid, reliable?
  • Is subject population for validity study
    appropriate?
  • Sample size

27
CSEPT Criterion validity
  • Interpreting Validity Coefficients watch out
    for
  • Criterion/predictor confusion
  • Range restrictions
  • Do validity study results generalize?
  • Differential predictions

28
CSEPT Construct validity
  • Problem for many psychological characteristics
    of interest there is no agreed-upon universe of
    content and no clear criterion
  • We cannot assess content or criterion validity
    for such characteristics
  • These characteristics involve constructs
    something built by mental synthesis.

29
CSEPT Construct validity
  • Examples of constructs
  • Intelligence
  • Love
  • Curiosity
  • Mental health
  • CSEPT We obtain evidence of validity by
    simultaneously defining the construct and
    developing instruments to measure it.
  • This is bootstrapping.

30
Bootstrapping construct validity
  • assemble evidence about what a test means in
    other words, about the characteristic it is
    testing.
  • CSEPT this process is never finished
  • Borsboom this is part of the process of creating
    a test in the first place, not something done
    after the fact

31
Bootstrapping construct validity
  • assemble evidence
  • show relationships between a test and other tests
  • none of the other tests is a criterion
  • Borsboom these relationships do not tell us what
    a test score means
  • (e.g., age is correlated with annual income but a
    measure of age is not a measure of annual income).

32
Bootstrapping construct validity
  • assemble evidence
  • show relationships
  • each new relationship adds meaning to the test
  • tests meaning is gradually clarified over time
  • Borsboom would say, why all the mystery? The
    meaning of many tests (e.g., WAIS, academic
    exams, Piagets tests) is clear right from the
    start

33
CSEPT Construct validity
  • Example from text Rubins work on Love.
  • Rubin collected a set of items for a Love scale
  • He read poetry, novels asked people for
    definitions
  • created a scale of Love and one of Liking

34
CSEPT Construct validity
  • Rubin gave scale to many subjects
    factor-analyzed results
  • Love integrates Attachment, Caring, Intimacy
  • Liking integrates Adjustment, Maturity, Good
    Judgment, and Intelligence
  • The two are independent you can love someone you
    dont like (as song-writers know)

35
Campbell Fiske (1959)
  • Two types of Construct-related Evidence
  • Convergent evidence
  • When a test correlates well with other tests
    believed to measure the same construct

36
Campbell Fiske (1959)
  • Two types of Construct-related Evidence
  • Convergent evidence
  • Discriminant evidence
  • When a test does not correlate with other tests
    believed to measure some other construct.

37
Convergent validity
  • Example Health Index
  • Scores correlated with age, number of symptoms,
    chronic medical conditions, physiological
    measures
  • Treatments designed to improve health should
    increase Health Index scores. They do.

38
Discriminant validity
  • low correlations between new test and tests
    believed to tap unrelated constructs.
  • evidence that the new test measures something
    unique

39
CSEPT Validity Reliability
  • CSEPT No point in trying to establish validity
    of an unreliable test.
  • Its possible to have a reliable test that has no
    meaning (is not valid).
  • Logically impossible to produce evidence of
    validity for an unreliable test.

40
Borsboom Validity Reliability
  • Borsboom et al what does it mean to say that a
    test is reliable but not valid?
  • What is it a test of?
  • It isnt a test at all, just a collection of
    items

41
Borsboom Validity Reliability
  • Borsboom et al validity is a necessary condition
    for reliability
  • Reliability of a test of X estimates precision of
    measurement of X but how could you estimate the
    precision of measurement of X for a test that
    does not measure X?
  • Thus, validity is presumed when you assess
    reliability

42
Blanton Jaccard arbitrary metrics
  • We observe a behavior in order to learn about the
    underlying psychological characteristic
  • A persons test score represents their standing
    on that underlying dimension
  • Such scores form an arbitrary metric
  • That is, we do not know how the observed scores
    are related to the true scores on the underlying
    dimension

43
Person A
Person B
Underlying dimension
Neutral
Test 1
0
1
2
3
4
5
6
Test 2
6
5
4
3
2
1
0
Adapted from Blanton Jaccard (2006) Figure 1,
p. 29
44
Arbitrary metrics the IAT
  • Implicit Association Test (IAT) claimed to
    diagnose implicit attitudinal preferences or
    racist attitudes
  • IAT authors say you may have prejudices you dont
    know you have.
  • Are these claims true?

45
Arbitrary metrics the IAT
  • Task categorize stimuli using two pairs of
    categories
  • Two buttons to press, two assignments of
    categories to buttons, used in sequence

46
Arbitrary metrics the IAT
  • Assignment pattern A
  • Button 1 press if stimulus refers to the
    category White or the category Pleasant
  • Button 2 press if stimulus refers to the
    category Black or the category Unpleasant
  • Assignment pattern B
  • Button 1 press if stimulus refers to the
    category White or the category Unpleasant
  • Button 2 press if stimulus refers to the
    category Black or the category Pleasant

47
Arbitrary metrics the IAT
  • IAT authors claim that if responses are faster to
    Pattern A than to Pattern B, that indicates a
    preference for Whites over Blacks in other
    words, a racist attitude
  • IAT authors also give test-takers feedback about
    how strong their preferences are, based on how
    much faster their responses are to Pattern A than
    to Pattern B
  • This is inappropriate

48
Arbitrary metrics the IAT
  • Blanton Jaccard
  • The IAT does not tell us about racist attitudes
  • IAT authors take a dimension which is
    non-arbitrary when used by physicists time
    and use it in an arbitrary way in psychology

49
Arbitrary metrics the IAT
  • The function relating the response dimension
    (time) to the underlying dimension (attitudes) is
    unknown
  • Zero on the (Pattern A Pattern B) difference
    may not be zero on the underlying attitude
    preference dimension
  • There are alternative models of how that (Pattern
    A Pattern B) difference could arise

50
Review
  • CSEPT
  • Validity is a characteristic of evidence, not of
    tests.
  • Valid evidence supports conclusions drawn using
    test results
  • Validity is determined by social consequences of
    test use
  • Borsboom et al.
  • Validity is not a methodological issue, but a
    substantive (theoretical) issue
  • A test of an attribute is valid if (a) the
    attribute exists, and (b) variation in the
    attribute causes variation in test scores

51
Review
  • CSEPT
  • Validity can be established in three ways, though
    boundaries between them are fuzzy
  • Content-related evidence
  • Criterion-related evidence
  • Construct-related evidence
  • Borsboom et al
  • Its all the same validity a test is valid if it
    measures what you think it measures
  • Validity is not mysterious

52
Review
  • CSEPT
  • Content-related evidence do test items represent
    whole domain of interest?
  • Criterion-related evidence do test scores relate
    to a criterion either now (concurrent) or in the
    future (predictive)?
  • Borsboom et al.
  • These questions are properly part of the process
    of creating a test

53
Review
  • CSEPT
  • Construct-related evidence is obtained when we
    develop a psychological construct and the way to
    measure it at the same time.
  • A test can be reliable but not valid. A test
    cannot be valid if not reliable.
  • Borsboom et al.
  • A test must be valid for a reliability estimate
    to have any meaning

54
Review
  • Blanton Jaccard (2006) warn against
    over-interpretation of scores which are based on
    an arbitrary metric
  • For an arbitrary metric, we have no idea how the
    test scores are actually related to the
    underlying dimension
Write a Comment
User Comments (0)
About PowerShow.com