Validity%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Validity%20

Description:

We cannot assess content or criterion validity for such characteristics ... Validity is a characteristic of evidence, not of tests. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 58
Provided by: Patr581
Category:

less

Transcript and Presenter's Notes

Title: Validity%20


1
Validity Outline
  1. Definition
  2. Two different views Traditional
  3. Two different views CSEPT
  4. Face Validity
  5. Content Validity CSEPT
  6. Content Validity Borsboom

2
Validity Outline
  • Criterion Validity CSEPT
  • Predictive vs. Concurrent
  • Validity Coefficients
  • Criterion Validity Borsboom
  • Construct Validity CSEPT
  • Convergent
  • Discriminant

3
Validity Definition
  • Validity measures agreement between a test score
    and the characteristic it is believed to measure

4
Validity CSEPT view
  • Validity is a property of test score
    interpretations
  • Validity exists when actions based on the
    interpretation are justified given a theoretical
    basis and social consequences

5
Validity Traditional view
  • Validity is a property of tests
  • Does the test measure what you think it measures?

6
Note the difference
  • Validity exists when actions based on the
    interpretation are justified given a theoretical
    basis and social consequences
  • Does the test measure what you think it measures?

7
A problem with the CSEPT view
  • Who is to say the social consequences of test
    use are good or bad?
  • According to CSEPT validity is a subjective
    judgment
  • In my view, this makes the concept useless if
    you like the result the test gives you, you will
    consider it valid. If you dont, you wont.
  • Thats not how scientists think.

8
Borsboom et al. (2004)
  • Borsboom et al reject CSEPTs view
  • Validity is a very basic concept and was
    correctly formulated, for instance, by Kelley
    (1927, p. 14) when he stated that a test is valid
    if it measures what it purports to measure. (p.
    1061)

9
Borsboom et al. (2004)
  • Variations in what you are measuring cause
    variations in your measurements.
  • E.g., variations across people in intelligence
    cause variations in their IQ scores
  • This is not a correlational model of validity

10
Borsboom et al. (2004)
  • You dont create a test and then do the analysis
    necessary to establish its validity
  • Rather, you begin by doing the theoretical work
    necessary to understand your subject and create a
    valid test in the first place.
  • On this view, validity is not a big problem.

11
Borsboom et al. vs. CSEPT
  • Who is right?
  • Each scientist has to make up his or her own
    mind on that question
  • I agree with Borsboom et al.s arguments.
  • Other psychologists may disagree.

12
The CSEPT view
  • CSEPT recognizes 3 types of evidence for test
    validity
  • Content-related
  • Criterion-related
  • Construct-related
  • Boundaries not clearly defined
  • Cronbach (1980) Construct is basic, while
    Content Criterion are subtypes.

13
Parenthetical Point Face Validity
  • Face validity refers to the appearance that a
    test measures what it is intended to measure.
  • Face validity has P.R. value test-takers may
    have better motivation if the test appears to be
    a sensible way to measure what it measures.

14
Content validity CSEPT
  • Content-related evidence considers coverage of
    the conceptual domain tested.
  • Important in educational settings
  • Like face validity, it is determined by logic
    rather than statistics
  • Typically assessed by expert judges

15
Content validity CSEPT
  • Construct-irrelevant variance
  • arises when irrelevant items are included
  • or when external factors such as illness
    influence test scores
  • requires a judgment about what is truly
    external
  • Construct under-representation
  • Is domain adequately covered or are parts of it
    left out?

16
Content validity Borsboom et al.
  • Borsboom et al. would say that content validity
    is not something to be established after the test
    has been created.
  • Rather, you build it into your test by having a
    good theory of what you are testing

17
Criterion validity CSEPT
  • Criterion-related evidence tells us how well a
    test score corresponds to a particular criterion
    measure.
  • Generally, we want the test score to tell us
    something about the criterion score.
  • How well the test does this provides
    criterion-related evidence

18
Criterion validity CSEPT
  • CSEPT we could compare undergraduate GPAs to SAT
    scores to produce evidence of validity of
    conclusions draw on basis of SAT scores.
  • Two basic types
  • Predictive
  • Concurrent

19
Criterion validity CSEPT
  • Predictive validity
  • Test scores used to predict future performance
    how good is the prediction?
  • E.g., SAT is used to predict final undergraduate
    GPA
  • SAT GPA are moderately correlated

20
Criterion validity CSEPT
  • Predictive validity
  • Concurrent validity
  • Correlation between test scores and criterion
    when the two are measured at same time.
  • Test illuminates current performance rather than
    predicting future performance (e.g., why does
    patient have a temperature? Why cant student do
    math?)

21
Criterion validity Borsboom et al.
  • Criterion validity involves a correlation, of
    test scores with some criterion such as GPA
  • That does not establish the tests validity, only
    its utility.
  • E.g., height and weight are correlated, but a
    test of height is not a test of what bathroom
    scales measure.

22
Criterion validity Borsboom et al.
  • SAT is valid because it was developed on the
    sensible theory that past academic achievement
    is a good guide to future academic achievement
  • Validity is built into the test, not established
    after the test has been created

23
Criterion validity
  • Note no point in developing a test if you
    already have a criterion unless impracticality
    or expense makes use of the criterion difficult.
  • Criterion measure only available in the future?
  • Criterion too expensive to use?

24
Criterion validity
  • Validity Coefficient
  • Compute correlation (r) between test score and
    criterion.
  • r .30 or .40 would be considered normal.
  • r gt .60 is rare
  • Note r varies between -1.0 and 1.0

25
Criterion validity
  • Validity Coefficient
  • r2 gives proportion of variance in criterion
    explained by test score.
  • E.g., if rxy .30, r2 .09, so 9 of
    variability in Y can be explained by variation
    in X

26
Interpreting validity coefficients
  • Watch out for
  • Changes in causal relationships
  • What does criterion mean? Is it valid, reliable?
  • Is subject population for validity study
    appropriate?
  • Sample size

27
Interpreting validity coefficients
  • Watch out for
  • Criterion/predictor confusion
  • Range restrictions
  • Do validity study results generalize?
  • Differential predictions

28
Construct validity CSEPT
  • Problem for many psychological characteristics
    of interest there is no agreed-upon universe of
    content and no clear criterion
  • We cannot assess content or criterion validity
    for such characteristics
  • These characteristics involve constructs
    something built by mental synthesis.

29
Construct validity CSEPT
  • Examples of constructs
  • Intelligence
  • Love
  • Curiosity
  • Mental health
  • CSEPT We obtain evidence of validity by
    simultaneously defining the construct and
    developing instruments to measure it.
  • This is bootstrapping.

30
Bootstrapping construct validity
  • assemble evidence about what a test means in
    other words, about the characteristic it is
    testing.
  • CSEPT this process is never finished

31
Bootstrapping construct validity
  • assemble evidence about what a test means in
    other words, about the characteristic it is
    testing.
  • Borsboom this is part of the process of creating
    the test in the first place, not something done
    after the fact

32
Bootstrapping construct validity
  • assemble evidence
  • show relationships between a test and other tests
  • CSEPT none of the other tests is a criterion but
    the web of relationships tells us what the test
    means

33
Bootstrapping construct validity
  • assemble evidence
  • show relationships between a test and other tests
  • Borsboom these relationships do not tell us what
    a test score means
  • (e.g., age is correlated with annual income but a
    measure of age is not a measure of annual income).

34
Bootstrapping construct validity
  • assemble evidence
  • show relationships
  • each new relationship adds meaning to the test
  • CSEPT a tests meaning is gradually clarified
    over time

35
Bootstrapping construct validity
  • assemble evidence
  • show relationships
  • each new relationship adds meaning to the test
  • Borsboom would say, why all the mystery? The
    meaning of many tests (e.g., WAIS, academic
    exams, Piagets tests) is clear right from the
    start

36
Construct validity
  • Example from text Rubins work on Love.
  • Rubin collected a set of items for a Love scale
  • He read poetry, novels he asked people for
    definitions
  • created a scale of Love and one of Liking

37
CSEPT Construct validity
  • Rubin gave scale to many subjects
    factor-analyzed results
  • Love integrates Attachment, Caring, Intimacy
  • Liking integrates Adjustment, Maturity, Good
    Judgment, and Intelligence
  • The two are independent you can love someone you
    dont like (as song-writers know)

38
Rubins study of Love
  • Borsboom et al. when creating a test, the
    researcher specifies the processes that convey
    the effect of the measured attribute on the test
    score.
  • Rubin laboriously built a theory about what the
    construct Love means.
  • Rubins process reading poetry and novels,
    asking people for definitions was a good
    process, so his test has construct validity.

39
Campbell Fiske (1959)
  • Two types of Construct-related Evidence
  • Convergent evidence
  • When a test correlates well with other tests
    believed to measure the same construct

40
Campbell Fiske (1959)
  • Two types of Construct-related Evidence
  • Convergent evidence
  • Discriminant evidence
  • When a test does not correlate with other tests
    believed to measure some other construct.

41
Convergent validity
  • Example Health Index
  • Scores correlated with age, number of symptoms,
    chronic medical conditions, physiological
    measures
  • Treatments designed to improve health should
    increase Health Index scores. They do.

42
Discriminant validity
  • Low correlations between new test and tests
    believed to tap unrelated constructs.
  • Evidence that the new test measures something
    unique

43
Validity Reliability CSEPT
  • CSEPT No point in trying to establish validity
    of an unreliable test.
  • Its possible to have a reliable test that is not
    valid (has no meaning).
  • Logically impossible to produce evidence of
    validity for an unreliable test.

44
Validity Reliability Borsboom
  • Borsboom et al what does it mean to say that a
    test is reliable but not valid?
  • What is it a test of?
  • It isnt a test at all, just a collection of
    items

45
Blanton Jaccard arbitrary metrics
  • We observe a behavior in order to learn about the
    underlying psychological characteristic
  • A persons test score represents their standing
    on that underlying dimension
  • Such scores form an arbitrary metric
  • That is, we do not know how the observed scores
    are related to the true scores on the underlying
    dimension

46
Person A
Person B
Underlying dimension
Neutral
Test 1
0
1
2
3
4
5
6
Test 2
6
5
4
3
2
1
0
Adapted from Blanton Jaccard (2006) Figure 1,
p. 29
47
Arbitrary metrics the IAT
  • Implicit Association Test (IAT) claimed to
    diagnose implicit attitudinal preferences or
    racist attitudes
  • IAT authors say you may have prejudices you dont
    know you have.
  • Are these claims true?

48
Arbitrary metrics the IAT
  • Task categorize stimuli using 2 pairs of
    categories
  • 2 buttons to press, 2 assignments of categories
    to buttons, used in sequence

49
Arbitrary metrics the IAT
  • Assignment pattern A
  • Button 1 press if stimulus refers to the
    category White or the category Pleasant
  • Button 2 press if stimulus refers to the
    category Black or the category Unpleasant
  • Assignment pattern B
  • Button 1 press if stimulus refers to the
    category White or the category Unpleasant
  • Button 2 press if stimulus refers to the
    category Black or the category Pleasant

50
Arbitrary metrics the IAT
  • IAT authors claim that if responses are faster to
    Pattern A than to Pattern B, that indicates a
    preference for Whites over Blacks in other
    words, a racist attitude
  • IAT authors also give test-takers feedback about
    how strong their preferences are, based on how
    much faster their responses are to Pattern A than
    to Pattern B
  • This is inappropriate

51
Arbitrary metrics the IAT
  • Blanton Jaccard
  • The IAT does not tell us about racist attitudes
  • IAT authors take a dimension which is
    non-arbitrary when used by physicists time
    and use it in an arbitrary way in psychology

52
Arbitrary metrics the IAT
  • The function relating the response dimension
    (time) to the underlying dimension (attitudes) is
    unknown
  • Zero on the (Pattern A Pattern B) difference
    may not be zero on the underlying attitude
    preference dimension
  • There are alternative models of how that (Pattern
    A Pattern B) difference could arise

53
Review
  • CSEPT
  • Validity is a characteristic of evidence, not of
    tests.
  • Valid evidence supports conclusions drawn using
    test results
  • Validity is determined by social consequences of
    test use
  • Borsboom et al.
  • Validity is not a methodological issue, but a
    substantive (theoretical) issue
  • A test of an attribute is valid if (a) the
    attribute exists, and (b) variation in the
    attribute causes variation in test scores

54
Review
  • CSEPT
  • Validity can be established in three ways, though
    boundaries between them are fuzzy
  • Content-related evidence
  • Criterion-related evidence
  • Construct-related evidence
  • Borsboom et al
  • Its all the same validity a test is valid if it
    measures what you think it measures
  • Validity is not mysterious

55
Review
  • CSEPT
  • Content-related evidence do test items represent
    whole domain of interest?
  • Criterion-related evidence do test scores relate
    to a criterion either now (concurrent) or in the
    future (predictive)?
  • Borsboom et al.
  • These questions are properly part of the process
    of creating a test

56
Review
  • CSEPT
  • Construct-related evidence is obtained when we
    develop a psychological construct and the way to
    measure it at the same time.
  • A test can be reliable but not valid. A test
    cannot be valid if not reliable.
  • Borsboom et al.
  • A test must be valid for a reliability estimate
    to have any meaning

57
Review
  • Blanton Jaccard (2006) warn against
    over-interpretation of scores which are based on
    an arbitrary metric
  • For an arbitrary metric, we have no idea how the
    test scores are actually related to the
    underlying dimension
Write a Comment
User Comments (0)
About PowerShow.com