Item Response Theory - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Item Response Theory

Description:

More than 50 years of conceptual and methodological development. Item Response ... A LOT A LITTLE AT ALL. Vigorous activities, running, Lifting heavy objects, ... – PowerPoint PPT presentation

Number of Views:1961
Avg rating:3.0/5.0
Slides: 49
Provided by: danmu6
Category:
Tags: item | lota | response | theory

less

Transcript and Presenter's Notes

Title: Item Response Theory


1
Item Response Theory
  • Dan Mungas, Ph.D.
  • Department of Neurology
  • University of California, Davis

2
What is it?Why should anyone care?
3
IRT Basics
4
Item Response Theory - What Is It
  • Modern approach to psychometric test development
  • Mathematical measurement theory
  • Associated numeric and computational methods
  • Widely used in large scale educational,
    achievement, and aptitude testing
  • More than 50 years of conceptual and
    methodological development

5
Item Response Theory - Methods
  • Dataset consists of rectangular table
  • rows correspond to examinees
  • columns correspond to items
  • IRT applications simultaneously estimate examinee
    ability and item parameters
  • iterative, maximum likelihood estimation
    algorithms
  • processor intensive, no longer a problem

6
Basic Data Structure
7
Item Types
  • Dichotomous
  • Multiple Choice
  • Polytomous
  • Information is greater for polytomous item than
    for the same item dichotomized at a cutpoint

8
What is the item level response
  • Smallest discrete unit (e.g. Object Naming)
  • Sum of correct responses (trials in word list
    learning test)
  • For practical reasons, continuous measures might
    have to be recoded into ordinal scales with
    reduced response categories (10, 15)

9
Item Response Theory - Basic Results
  • Item parameters
  • difficulty
  • discrimination
  • correction for guessing
  • most applicable for multiple choice items
  • Subject Ability (in the psychometric sense)
  • Capacity to successfully respond to test items
    (or propensity to respond in a certain direction)
  • Net result of all genetic and environmental
    influences
  • Measured by scales composed of homogenous items
  • Item difficulty and subject ability are on the
    same scale

10
Item Characteristic Curves
11
Item Response Theory - Outcomes
  • Item-Level Results
  • Item Characteristic Curve (ICC)
  • non-linear function relating ability to
    probability of correct response to item
  • Item Information Curve (IIC)
  • non-linear function showing precision of
    measurement (reliability) at different ability
    points
  • Both curves are defined by the item parameters

12
Item Characteristic Curves
13
Information Curves
14
(No Transcript)
15
Item Response Theory - Outcomes
  • Test-Level Results
  • Test Characteristic Curve (TCC)
  • non-linear function relating ability to expected
    total test score
  • Test Information Curve (TIC)
  • non-linear function showing precision of
    measurement (reliability) at different ability
    points
  • Both sum of item level functions of included items

16
Test Characteristic CurveMini-Mental State
Examination
17
Information Curves
18
Item Response Theory - Fundamental Assumptions
  • Unidimensionality - items measure a homogenous,
    single domain
  • Local independence - covariance among items is
    determined only by the latent dimension measured
    by the item set

19
IRT Models
  • 1PL (Rasch)
  • Only Difficulty and Ability are estimated
  • Discrimination is assumed to be equal across
    items
  • 2PL
  • Discrimination, Difficulty and Ability are
    estimated
  • Guessing is assumed to not have an effect
  • 3PL
  • Discrimination, Difficulty, Guessing, and Ability
    are estimated (multiple choice items)

20
Item Response Theory - Invariance Properties
  • Invariance requires that basic assumptions are
    met
  • Item parameters are invariant across different
    samples
  • Within the range of overlap of distributions
  • Distributions of samples can differ
  • Ability estimates are invariant across different
    item sets
  • Assumes that ability range of items spans ability
    range of subjects that is of interest

21
Why Do We Care -Applications of IRT in Health
Care Settings
  • Refined scoring of tests
  • Characterization of psychometric properties of
    existing tests
  • Construction of new tests

22
Test Scoring
  • IRT permits refined scoring of items that allows
    for differential weighting of items based on
    their item parameters

23
Physical Function Scale Hays, Morales Reise
(2000)
Item LIMITED LIMITED NOT LIMITED A LOT A
LITTLE AT ALL Vigorous activities,
running, Lifting heavy objects, Strenuous
sports 1 2 3 Climbing one flight 1 2 3 Walki
ng more than 1 mile 1 2 3 Walking one
block 1 2 3 Bathing / dressing
self 1 2 3 Preparing meals / doing
laundry 1 2 3 Shopping 1 2 3 Getting around
inside home 1 2 3 Feeding self 1 2 3
24
How to Score Test
  • Simple approach there are numbers that will be
    circled total these up, and we have a score.
  • But should limited a lot for walking a mile
    receive the same weight as limited a lot in
    getting around inside the home?
  • Should limited a lot for walking one block be
    twice as bad as limited a little for walking
    one block?

25
How IRT Can Help
  • IRT provides us with a data-driven means of
    rational scoring for such measures
  • Items that are more discriminating are given
    greater weight
  • In practice, the simple sum score is often very
    good improvement is at the margins

26
Description of Psychometric Properties
  • The Test Information Curve (TIC) shows
    reliability that continuously varies by ability
  • Depicts ability levels associated with high and
    low reliability
  • The standard error of measurement is directly
    related to information value (I(Q))
  • SEM(Q) 1 / sqrt(I(Q))
  • SEM (Q) and I(Q) also have a direct
    correspondence to traditional r
  • r (Q) 1 - 1/ I(Q)

27
I(Q), SEM, r
28
TICs for English and Spanish language Versions of
Two Scales
Mungas et al., 2004
29
Construction of New Scales
  • Items can be selected to create scales with
    desired measurement properties
  • Can be used for prospective test development
  • Can be used to create new scales from existing
    tests/item pools
  • IRT will not overcome inadequate items

30
TICs from an Existing Global Cognition Scale and
Re-Calibrated Existing Cognitive Tests
Mungas et al., 2003
31
Principles of Scale Construction
  • Information corresponds to assessment goals
  • Broad and flat TIC for longitudinal change
    measure in population with heterogenous ability
  • For selection or diagnostic test, peak at point
    of ability continuum where discrimination is most
    important
  • But normal cognition spans a 4.0 s.d. range, and
    is even greater in demographically diverse
    populations

32
Other Issues In IRT
  • Polytomous IRT models are available
  • Useful for ordinal (Likert) rating scales
  • Each possible score of the item (minus 1) is
    treated like a separate item with a different
    difficulty parameter
  • Information is greater for polytomous item than
    for the same item dichotomized at a cutpoint

33
Other Issues in IRT
  • Applicable to broad range of content domains
  • IRT certainly applies to cognitive abilities
  • Also applies to other health outcomes
  • Quality of life
  • Physical function
  • Fatigue
  • Depression
  • Pain

34
Other Issues in IRT
  • Differential Item Function - Test Bias
  • IRT provides explicit methods to evaluate and
    quantify the extent to which items and tests have
    different measurement properties in different
    groups
  • e.g. racial and ethnic groups, linguistic groups,
    gender

35
English and Spanish Item Characteristic Curves
for Lamb/Cordero Item
36
English and Spanish Item Characteristic Curves
for Stone/Piedra Item
37
Differential Item Function (DIF)
  • DIF refers to systematic bias in measuring true
    ability - doesnt address group differences in
    ability

38
Challenges/ Limitations of IRT
  • Large samples required for stable estimation
  • 150-200 for 1PL
  • 400-500 for 2PL
  • 600-1000 for 3PL
  • Analytic methods are labor intensive
  • There are a number of (expensive ) applications
    readily available for IRT analyses
  • Evaluation of basic assumptions, identification
    of appropriate model, and systematic IRT analysis
    require considerable expertise and labor

but, R!!
39
Computerized Adaptive Testing (CAT)
  • IRT based computer driven method
  • Selects items that most closely match examinees
    ability
  • Administers only items needed to achieve a
    pre-specified level of precision in measurement
    (information, s.e.m., reliability)

40
Why CAT
  • Efficiency
  • Administration -
  • Standardization
  • Time efficiency
  • Data collection
  • Scoring
  • Computer can implement complex scoring algorithms

41
CAT Example 1
42
CAT Example 2
43
Practical Considerations for CAT
44
What You Need for CAT
  • Computer technology
  • Item Selection
  • Item Administration
  • Scale Scoring
  • Item bank with IRT parameters
  • Range of item difficulty relevant to measurement
    needs

45
What is Straightforward/Easy?
  • Dichotomous items
  • Multiple choice items
  • Ordered polytomous response scales
  • Up to 10-15 response options

46
Technical Challenges
  • Continuous response scales (memory, timed tasks)
  • Can be recoded into smaller number of ordered
    response ranges
  • Lose information

47
Methodological Challenges
  • Sample size requirements
  • Minimally 300-600 cases for stable estimation of
    item parameters
  • Differential Item Function and Measurement Bias
  • Essentially involves item calibration within
    groups of interest
  • e.g., age, education, language, gender, race
  • Available literature provides minimal guidance

48
References
  • Mungas, D., Reed, B. R., Kramer, J. H. (2003).
    Psychometrically matched measures of global
    cognition, memory, and executive function for
    assessment of cognitive decline in older persons.
    Neuropsychology, 17(3), 380-392.
  • Mungas, D., Reed, B. R., Crane, P. K., Haan, M.
    N., González, H. (2004). Spanish and English
    Neuropsychological Assessment Scales (SENAS)
    Further development and psychometric
    characteristics. Psychological Assessment, 16(4),
    347-359.
Write a Comment
User Comments (0)
About PowerShow.com