Construct Validity: A Universal Validity System - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Construct Validity: A Universal Validity System

Description:

Content validity is not up to the burden of defining what is measured by a test ... Speeded math test to emphasize automatic numerical processes ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 54
Provided by: psych179
Category:

less

Transcript and Presenter's Notes

Title: Construct Validity: A Universal Validity System


1
Construct Validity A Universal Validity System
  • Susan Embretson
  • Georgia Institute of Technology
  • University of Maryland
  • Conference on the Concept of Validity

2
Introduction
  • Validity is a controversial concept in
    educational and psychological testing
  • Research on educational and psychological tests
    during the last half of the 20th century was
    guided by distinction of types of validity
  • Criterion-related validity, content validity and
    construct validity
  • Construct validity is the most problematic type
    of validity
  • It involves theory and the relationship of data
    to theory

3
Introduction
  • Yet the most controversial type of validity
    became the sole type of validity in the revised
    joint standards for educational and psychological
    tests (AERA/APA/NCME, 1999)
  • In the current standards Validity refers to the
    degree to which evidence and theory support the
    interpretations of test scores entailed by
    proposed uses of test
  • Content validity and criterion-related validity
    are two of five different kinds of evidence.
  • Reflects substantial impact from Messicks (1989)
    thesis of a single type of validity (construct
    validity) with several different aspects.

4
Topics
  • Overview of the validity concept
  • Current issues on validity
  • Discontent with construct validity for
    educational tests
  • Need for content validity
  • Critique of content validity as basis for
    educational testing
  • Universal system for construct validity
  • Applies to all tests
  • Achievement tests
  • Ability tests
  • Personality/psychopathology
  • Summary

5
History of the Construct Validity Concept
Origins
  • American Psychological Association (1954).
    Technical recommendations for psychological tests
    and diagnostic techniques. Psychological
    Bulletin, 51, 2, 1-38.
  • Prepared by a joint committee of the American
    Psychological Association, American Educational
    Research Association, and National Council on
    Measurements Used in Education.
  • Validity information indicates to the test user
    the degree to which the test is capable of
    achieving certain aims. Thus, a vocabulary
    test might be used simply as a measure of present
    vocabulary, as a predictor of college success, as
    a means of discriminating schizophrenics from
    organics, or as a means of making inferences
    about "intellectual capacity.
  • We can distinguish among the four types of
    validity by noting that each involves a different
    emphasis on the criterion. (p. 13)

6
Implications of Original Views
  • Same test can be used in different ways
  • Relevant type of validity depends on test use
  • The types of validity differ in the importance of
    the behaviors involved in the test

7
More Recent Views on Types of Validity
  • Standards for Educational and Psychological
    Testing (1954 1966 1974, 1985, 1999)
  • 1985
  • Traditionally, the various means of accumulating
    validity evidence have been grouped into
    categories called content-related,
    criterion-related and construct-related evidence
    of validity. These categories are
    convenient.but the use of category labels does
    not imply that there are distinct types of
    validity
  • An ideal validation includes several types of
    evidence, which span all three of the traditional
    categories.

8
Conceptualizations of Validity Psychological
Testing Textbooks
  • All validity analyses address the same basic
    question Does the test measure knowledge and
    characteristics that are appropriate to its
    purpose. There are three types of validity
    analysis, each answering this question in a
    slight different way. (Friedenberg,1995)
  • ..the types of validity are potentially
    independent of one another. (Murphy
    Davidshofer,1988)
  • There are three types of evidence (1)
    construct-related, (2) criterion-related, and (3)
    content-related. ..It is important to
    emphasize that categories for grouping different
    types of validity are convenient however, the
    use of categories does not imply that there are
    distinct forms of validity. Kaplan Saccuszzo
    (1993)

9
Most Recent View on Validity
  • Standards for Educational Psychological Testing
    1999
  • Validity refers to the degee to which evidence
    and theory support the interpretations of test
    scores entailed by proposed uses of tests. (p.9)
  • These sources of evidence may illuminate
    different aspects of validity, but they do not
    represent distinct types of validity. Validity
    is a unitary concept.
  • The wide variety of tests and circumstances
    makes it natural that some types of evidence will
    be especially critical in a given case, whereas
    other types will be less useful. (p. 9)
  • Because a validity argument typically depends on
    more than one proposition, strong evidence in
    support of one in no way diminishes the need for
    evidence to support others. (p. 11).

10
Implications of 1999 Validity Concept
  • No distinct types of validity
  • Multiple sources of evidence for single test aim
  • Example-Mathematical achievement test used to
    assess readiness for more advanced course
  • Propositions for inference
  • 1) Certain skills are prerequisite for advanced
    course
  • 2) Content domain structure for the test
    represents skills
  • 3) Test scores represent domain performance
  • 4) Test scores are not unduly influenced by
    irrelevant variables, such as writing ability,
    spatial ability, anxiety etc.
  • 5) Success in advanced course can be assessed
  • 6) Test scores are related to success in advanced
    curriculum

11
Current Issues with the Validity Concept
Educational Testing
  • Lissitz and Samuelson (2007)
  • Propose some changes in terminology and emphasis
    in the validity concept
  • Argue that construct validity as it currently
    exists has little to offer test construction in
    educational testing.
  • In fact, their system leads to a most startling
    conclusion
  • Construct validity is irrelevant to defining what
    is measured by an educational test!!
  • Content validity becomes primary in determining
    what an educational test measures

12
Critique of Content Validity as Basis for
Educational Testing
  • Content validity is not up to the burden of
    defining what is measured by a test
  • Relying on content validity evidence, as
    available in practice, to determine the meaning
    of educational tests could have detrimental
    impact on test quality
  • Giving content validity primacy for educational
    tests could lead to very different types and
    standards of evidence for educational and
    psychological tests

13
Validity in Educational Tests Response to Lissitz
Samuelson
  • Background
  • Embretson, S. E. (1983). Construct validity
    Construct representation versus nomothetic span.
    Psychological Bulletin, 93, 179-197.
  • Construct representation
  • Establishes the meaning of test scores from
    Identifying the theoretical mechanisms that
    underlie test performance (i.e., the processes,
    strategies and knowledge)
  • Nomothetic span
  • Establishes the significance of test scores by
    Identifying the network of relationships of test
    scores with other variables

14
Validity in Lissitz and Samuelsons Framework
  • Taxonomy of test evaluation procedures
  • 1) Investigative Focus
  • Internal sources analysis of the test and its
    items
  • Provides evidence about what is measured
  • External sources relationship of test scores to
    other measures criteria
  • Provides evidence about impact, utility and trait
    theory
  • 2) Perspective
  • Theoretical orientation concern with measuring
    traits
  • Practical orientation concern with measuring
    achievement

15
Figure 2. Taxonomy of Test Evaluation Procedures
  • Perspective

Theoretical Practical
Internal Latent Process Content and Reliability
External Nomological Network Utility and Impact
16
Figure 1. The Structure of the Technical
Evaluation of Educational Testing
17
Implications for Validity
  • System represents best current practices
  • Internal meaning (validity) established
  • For educational tests, content and reliability
    evidence
  • Evidence based on internal structure (i.e.,
    reliability, etc.)
  • Evidence based on test content
  • For psychological tests, depends on latent
    processes
  • Evidence based on response processes
  • Evidence based on internal structure (item
    correlations)
  • But, notice the limitations
  • Response process and test content evidence are
    not relevant to both types of tests
  • External evidence based on relations to other
    variables has no role in validity

18
Internal Evidence for Educational Tests Part I
  • Reliability concept in the Lissitz and Samuelson
    framework is generally multifaceted and
    traditional
  • Item interrelationships
  • Relationship of test scores over conditions or
    time
  • Differential item functioning (DIF)
  • Adverse impact
  • (Perhaps adverse impact and DIF could be
    considered as external information)

19
Internal Evidence for Educational Tests Part II
  • Concept of Content Validity
  • Previous test standards (1985)
  • Content validity was a type of evidence that
    ..demonstrates the degree to which a sample of
    items, tasks or questions on a test are
    representative of some defined universe or domain
    of content
  • Two important elements added by LS
  • Cognitive complexity level
  • whether the test covers the relevant
    instructional or content domain and the coverage
    is at the right level of cognitive complexity
  • Test development procedures
  • Information about item writer credentials and
    quality control

20
Test Blueprints as Content Validity Evidence
  • Blueprints represent domain strcture by
    specifying percentages of test items that should
    fall in various categories
  • Example- test blueprint for NAEP for mathematics
  • Five content strands
  • Three levels of complexity
  • Majority of states employ similar strands
  • But, blueprints and other forms of test
    specifications (along with reliability evidence)
    are not sufficient to establish meaning for an
    educational test

21
1. Domain Structure is a Theory Which Changes
Over Time
  • NAEP framework, particularly for cognitive
    complexity, has evolved (NAGB, 2006)
  • Views on complexity level also may change based
    on empirical evidence, such as item difficulty
    modeling, task decomposition and other methods
  • Changes in domain structure also could evolve in
    response to recommendations of panels of experts.
  • National Mathematics Advisory Panel

22
2. Reliability of Classifications is Not Well
Documented
  • Scant evidence that items can be reliably
    classified into the blueprint categories
  • Certain factors in an achievement domain may make
    these categorizations difficult
  • For example, in mathematics a single real-world
    problem may involve algebra and number sense, as
    well as measurement content
  • Item could be classified into three of the five
    strands.
  • Similarly, classifying items for mathematical
    complexity also can be difficult
  • Abstract definitions of the various levels in
    many systems

23
3. Unrepresentative Samples from Domain
  • Practical limitations on testing conditions may
    lead to unrepresentative samples of the content
    domain
  • More objective item formats, such as multiple
    choice and limited constructed response have long
    been favored
  • Reliably and inexpensively scored
  • But these formats may not elicit the deeper
    levels of reasoning that experts believe should
    be assessed for the subject matter

24
4. Irrelevant Item Solving Processes
  • Using content specifications, along with item
    writer credentials and item quality control, may
    not be sufficient to assure high quality tests
  • Leighton and Gierl (2007) view content
    specifications as one of three cognitive models
    for making inferences about examinees thinking
    processes
  • For the cognitive model of test specifications
    for inferences is that no evidence is provided
    that examinees are in fact using the presumed
    skills and knowledge to solve items

25
NAEP Validity Study for Mathematics Grade 4 and
Grade 8
  • Mathematicians examined items from NAEP and some
    state accountability tests
  • Small percent of items deemed flawed (3-7),
  • Larger percent of items deemed marginal (23-30)
  • Marginal items had construct-irrelevant
    difficulties
  • problems with pattern specifications
  • unduly complicated presentation
  • unclear or misleading language
  • excessively time-consuming processes
  • Marginal items previously had survived both
    content-related and empirical methods of
    evaluation

26
Examples of Irrelevant Knowledge, Skills and
Abilities
  • Source
  • National Mathematics Advisory Panel (2008).
    Foundations for success The final report of the
    National Mathematics Advisory Panel. Washington,
    DC Department of Education
  • Method- logical-theoretical analysis by
    mathematicians curriculum experts
  • Mathematics involves aspects of logical analysis,
    spatial ability and verbal reasoning, yet their
    role can be excessive

27
Dependence on Non-Mathematical Knowledge
28
Dependence on Logic, Not Mathematics
29
Excessive Dependence on Spatial Ability
30
Excessive Dependence on Reasoning and Minimal
Mathemataics
31
Implication for Educational Tests
  • Identifying irrelevant sources of item
    performance requires more than content-related
    evidence
  • Latent process evidence is relevant
  • E.g., methods include cognitive analysis (e.g.,
    item difficulty modeling), verbal reports of
    examinees and factor analysis
  • External sources of evidence may provide needed
    safeguards
  • Example Implications of the correlation of an
    algebra test with a test of English
  • If this correlation is too high, it may suggest a
    failure in the system of internal evidence that
    supports test meaning

32
Construct Validity as a Universal System and a
Unifying Concept
  • Features
  • Consistent with current Test Standards (1999)
  • Consistent with many of Lissitz and Samuelsons
    distinctions and elaborations
  • Validity Concept
  • Universal
  • All sources of evidence are included
  • Appropriate for both educational and
    psychological tests
  • Interactive
  • Evidence in one category is influenced or
    informed by adequacy in the other categories

33
Categories of Evidence in the Validity System
  • Eleven categories of evidence
  • Categories apply to both educational and
    psychological tests
  • Consistent with most validity frameworks and the
    current Test Standards (1999), it is postulated
    that tests differ in which categories in the
    system are most crucial to test meaning,
    depending on its intended use
  • Even so, most categories of evidence are
    potentially relevant to a test

34
A Universal Validity System
Scoring Models
Item Design Principles
Other Measures
Testing Conditions
Psycho- metric Properties
Test Specs
Latent Process Studies
Utility
Domain Structure
Impact
Logic/ Theory
Internal ?Meaning
External?Significance
35
Internal Categories of Evidence
Logic/Theoretical Analysis Theory of the subject matter content, specification of areas and their interrelationships
Latent Process Studies Studies on content interrelationships, prerequisite skills, impact of task features testing conditions on responses, etc.
Testing Conditions Available test administration methods, scoring mechanisms (raters, machine scoring, computer algorithms), testing time, locations, etc.
Item Design Principles Scientific evidence and knowledge about how features of items impact the KSAs applied by examinees-- Formats, item context, complexity and specific content
36
Internal Categories of Evidence
Domain Structure Specification of content areas and levels, as well as relative importance and interrelationships
Test Specifications Blueprints specifying domain structure representation, constraints on item features, specification of testing conditions
Psychometric Properties Item interrelationships, DIF, reliability, relationship of item psychometric properties to content stimulus features, reliability
Scoring Models Psychometric models and procedures to combine responses within and between items, weighting of items, item selection standards, relationship of scores to proficiency categories, etc. Decisions about dimensionality, guessing, elimination of poorly fitting items etc. impacts scores and their relationships
37
External Categories of Evidence
Utility Relationship of scores to external variables, criteria categories
Other Measures Relationship of scores to other tests of knowledge, skills and abilities
Impact Consequences of test use, adverse impact, proficiency levels etc

38
The Universal System of Validity
  • Test Specifications is the most essential
    category it determines (with Scoring Models)
  • Representation of domain structure
  • Psychometric properties of the test
  • External relationships of test scores
  • Preceding Test Specifications are categories of
    scientific evidence, knowledge and theory
  • Domain Structure
  • Item Design Principles
  • In turn preceded by
  • Latent Process Studies
  • Logical/Theoretical Analysis
  • Testing Conditions

39
General Features of Validity System
  • Test meaning is determined by internal sources of
    information
  • Test significance is determined by external
    sources of information
  • Content aspects of the test are central to test
    meaning
  • Test specifications, which includes test content
    and test development procedures, have a central
    role in determining test meaning
  • Test specifications also determine the
    psychometric properties of tests, including
    reliability information

40
General Features of the Universal Validity System
  • Broad system of evidence is relevant to support
    Test Specifications
  • Item Design Principles --Relevancy of examinees
    responses to the intended domain
  • Domain Structure --Regarded as a theory
  • Other preceding evidence
  • Latent Process Studies
  • Logical/theoretical analyses of the domain
  • Testing Conditions

41
General Features of the Universal Validity System
  • Interactions among components
  • Internal evidence ? expectations for external
  • External evidence informs adequacy of evidence
    from internal sources
  • Potential inadequacies arise when
  • Hypotheses are not confirmed
  • Unintended consequences of test use
  • System of evidence includes both theoretical and
    practical elements
  • Relevant to educational and psychological tests

42
The Universal System of Validity
  • Example of Feedback
  • Speeded math test to emphasize automatic
    numerical processes
  • External evidence-- strong adverse impact
  • Internal evidence categories to question
  • Item Design
  • Relationship of item speededness to automaticity
  • Domain Structure
  • Heavy emphasis on the automaticity of numerical
    skills

43
Application to Educational and Psychological
Tests Achievement
  • Current emphasis
  • Test specification
  • Central to standards-based testing
  • Domain structures
  • Essential to blueprints
  • Scoring models Psychometric properties
  • State-of-art in large scale testing
  • Underemphasized areas
  • Item design principles
  • Research basis is emerging
  • Latent process studies
  • Important in establishing construct-relevancy of
    student responses
  • Logical/Theoretical Analysis
  • Important in defining domain structure
  • Implications of feedback from studies on
  • Utility, Other Measures, Impact

44
Application to Educational and Psychological
Tests Achievement
  • Example Item Design Latent Process Studies
  • Item response format for mathematics items
  • Katz, I.R., Bennett, R.E., Berger, A.E. (2000).
    Effects of response format on difficulty of
    SAT-Mathematics items Its not the strategy.
    Journal of Educational Measurement, 37(1), 39-57.
  • Mathematical non-mathematical item content
  • National Mathematics Advisory Panel

45
Application to Educational and Psychological
Tests Personality
  • Current emphasis
  • Logical/Theoretical Analysis
  • I.e., personality theories
  • Utility
  • Prediction of job performance
  • Other Measures
  • Factor analytic studies
  • Underemphasized areas
  • Test Specifications
  • Domain Structure
  • Item Design Principles
  • Latent Process Studies

46
Application to Educational and Psychological
Tests Personality
  • Test Specifications Domain Structure
  • Ignoring domain structure ? Lack of convergent
    validity
  • Multifaceted personality constructs
  • Unbalanced or uncontrolled item set
  • Best represented facet emphasizied in item
    selection
  • Item selection will not be consistent
  • Example Conscientiousness construct facets
  • Dependabilty, Achievement (Moutafi et al 2006)
  • Opposing relationship to commitment
  • Duty (-), Achievement Striving ()

47
Application to Educational and Psychological
Tests Personality
  • Test Specifications Domain Structure
  • Example of structure in personality
  • Facet theory to
  • Define domain membership
  • Define domain structure observations
  • Roskam, E. Broers, N. (1996). Constructing
    questionnaires An application of facet design
    and item response theory to the study of
    lonesomeness. In G. Engelhard M. Wilson
    (Eds.). Objective Measurement Theory into
    Practice Volume 3. Norwood, NJ Ablex Publishing.
    Pp. 349-385.

48
Facet Theory Approach to Measure of Lonesomeness
49
Application to Educational and Psychological
Tests Personality
  • Item Design Principles Latent Process Studies
  • Most measures are self-report format
  • Basis of self-report may involve strong
    construct-irrelevant aspects
  • Tasks require judgments about relevance of
    statement to own behavior and then reliably
    summarizing
  • California Psychological Inventory items
  • When in a group of people I usually do what the
    others want rather than make suggestions
  • There have been a few times when I have been very
    mean to another person.
  • I am a good mixer.
  • I am a better talker than listener.

50
Application to Educational and Psychological
Tests Personality
  • Science of self-report is emerging and linked to
    cognitive psychology
  • Stone, A. A., Turkkan, J. S., Bachrach, C.A.,
    Jobe, J. B., Kurtzman, H. S. Cain, V. S.
    (2000). The science of self-report. Mahwah, NJ
    Erlbaum Publishers.
  • Studies on how item and test design impacts
    self-report accuracy
  • Self-reports under optimal conditions are biased
  • Daily diaries of dietary self-reports contain
    insufficient calories to sustain life
  • Smith, A. F., Jobe, J. B., Mingay, D.
    M.  (1991b).  Retrieval from memory of dietary
    information.  Applied Cognitive Psychology, 5,
    269-296.
  • Personality inventories are far less optimal for
    reliable reporting

51
Application to Educational and Psychological
Tests Personality
  • Mechanisms in self-report
  • Response styles
  • Social desirability
  • Acquiesence
  • Memory Context
  • When memory information is sufficient, other
    methods are applied
  • Context
  • Information earlier in the questionnaire
  • Ambiguity of issue discussed
  • Moods evoked by earlier questions

52
Self-Report Context Effects
53
Summary
  • History of validity shows changes in the concept
  • Notion of types still apparent
  • Construct validity is a universal system of
    evidence relevant to diverse tests
  • Construct validity is appropriate for educational
    tests
  • Content aspect is not sufficient
Write a Comment
User Comments (0)
About PowerShow.com