INTRO 2 IRT - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

INTRO 2 IRT

Description:

'IRT refers to a set of mathematical models that describe, in ... dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too! ... a clear statement [picture! ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 40
Provided by: timcro
Category:
Tags: intro | irt | pictures | std

less

Transcript and Presenter's Notes

Title: INTRO 2 IRT


1
INTRO 2 IRT
  • Tim Croudace

2
Descriptions of IRT
  • IRT refers to a set of mathematical models that
    describe, in probabilistic terms, the
    relationship between a persons response to a
    survey question/test item and his or her level of
    the latent variable being measured by the
    scale
  • Fayers and Hays p55
  • Assessing Quality of Life in Clinical Trials.
    Oxford Univ Press
  • Chapter on Applying IRT for evaluating
    questionnaire item and scale properties.
  • This latent variable is usually a hypothetical
    construct trait/domain or ability which is
    postulated to exist but cannot be measured by a
    single observable variable/item.
  • Instead it is indirectly measured by using
    multiple items or questions in a multi-item
    test/scale.

3
The data 0000 1000 0001 0010 1001 1010 0011 1011
0100 1100 0101 0110 1101 1110 0111 1111
n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45
31
logit phi ah 0 ah 1zi
ah0 a10
ah1 a21
ah0 a40
Sources of knowledge q1 radio q2 newspapers
q3 reading q4 lectures A single latent
dimension Z Normal (mean 0 std dev 1 ) so Var
1 too!
4
Simple sum scores (n1729 new individual values)
0 0 0 0 n Total score
0 0 0 0 477 0 477 zeros
added to data set (new column) 1 0
0 0 63 1 0 0 0 1
12 1 0 0 1 0 150 1
1 0 0 1 7 2 1 0
1 0 32 2 0 0 1 1
11 2 1 0 1 1 4 3
0 1 0 0 231 1 1 1
0 0 94 2 0 1 0 1
13 2 0 1 1 0 378 2
1 1 0 1 12 3 1 1
1 0 169 3 0 1 1 1
45 3 1 1 1 1 31 4
5
Binary Factor / Latent Trait Analysis Results
logit-probit model
Warming up to this sort of thing soon .
U1
U2
U3
Up
. . .
F
2 items with similar thresholds and similar
slopes 3 items with different thresholds but
similar slopes
6
The key concept latent factor models for
constructs underpinning multiple binary (0/1)
responses
  • based on innovations in educational testing and
    psychometric statistics gt 50 years old
  • Same models used in educational testing with
    correct incorrect answers can be applied to
    symptom present / absent data (both binary)
  • Extensions to ordinal outcomes (Likert scales)
  • Flexibility in parametric form available
  • Semi- and non-parametric approaches too

7
Binary IRT The A B C D of it
8
Linear vs non-linear regression of response
probability on latent variable
y-axis prob of response (Yes) on
a simple binary (Yes/No) scale item

Adapted without permission from a slide by Prof
H Goldstein
x-axis score on latent construct being
measured
9
Ordinal IRT The A B C D of GRM
10
IRT models
  • Simplest case of a latent trait analysis
  • Manifest variables are binary only 2
    distinctions are made
  • these take 0/1 values
  • Yes / No
  • Right / Wrong
  • Symptom present / absent
  • Agree / disagree distinctions for attitudes more
    likely to be ordinal gt2 response categories ..
    see next lecture IRT 2 on Friday
  • For scoring of individuals
  • (not parameter estimation for items)
  • it is frequently assumed that the UNOBSERVED
    (latent) variable
  • lt the latent factor / traitgt
  • is not only continuous but normally distributed
  • or the prior distn is normal but the posterior
    distn may not be

11
IRT for binary data
  • The most commonly used model was developed by
  • Lord-Birnbaum model (Lord, 1952 Birnbaum, )
  • 2-parameter logistic
  • a.k.a. the logit-probit model Bartholomew
    (1987)
  • The model is essentially a non-linear single
    factor model
  • When applied to binary data, the traditional
    linear factor model is only an approximation to
    the appropriate item response model
  • sometimes satisfactory, but sometimes very poor
    (we can guess when)
  • Some accounts of Item Response Theory make it
    sound like a revolutionary very modern
    development
  • this is not true!
  • It should not replace or displace classical
    concepts, and has suffered from being presented
    and taught as disconnected from these
  • A unified treatment can be given that builds one
    from the other (McDonald, 1999) but this would be
    a one term course on its own

12
What IRT does
  • IRT models provide a clear statement picture!
  • of the performance of each item in the
    scale/test
  • and
  • how the scale/test functions, overall,
  • for measuring the construct of interest
  • in the study population
  • The objective is to model each item by
    estimating the properties describing item
    performance characteristics
  • hence Item Characteristic Curve
  • or Symptom Response Function.

13
Very bland (but simple) example
  • Lombard and Doering (1947) data
  • Questions on cancer knowledge with four
    addressing the source of the information
  • Fitting a latent variable model might be proposed
    as a way of constructing a measure of how well
    informed an individual is about cancer
  • A second stage might relate knowledge about
    cancer to knowledge about other diseases or
    general knowlege

14
Very bland (but simple) example
  • Lombard and Doering (1947) data
  • Questions on cancer knowledge with four
    addressing the source of the information
  • radio
  • newspapers
  • (solid) reading (books?)
  • lectures
  • 2 to the power 4 i.e. 16 possible response
    patterns from 0000 to 1111

15
Data
0000 1000 0001 0010 1001 1010 0011 1011 0100 1100
0101 0110 1101 1110 0111 1111
n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45
31
  • Lombard and Doering (1947) data
  • 2 to the power 4
  • i.e. 16 possible response patterns (all occur)
  • with more items this is neither likely nor
    necessary
  • frequency shown for
  • 0000 to 1111
  • frequency is the number with each item response
    pattern

16
The data 0000 1000 0001 0010 1001 1010 0011 1011
0100 1100 0101 0110 1101 1110 0111 1111
n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45
31
logit phi ah 0 ah 1zi
ah0 a10
ah1 a21
ah0 a40
Sources of knowledge q1 radio q2 newspapers
q3 reading q4 lectures A single latent
dimension Z Normal (mean 0 std dev 1 ) so Var
1 too!
17
Basic objectives of modelling
  • When multiple items are applied in a test /
    survey can use latent variable modelling to
  • explore inter-relationships among observed
    responses
  • determine whether the inter-relationships can be
    explained by a small number of factors
  • THEN , to assign a SCORE to each individual each
    on the basis of their responses
  • Basically to rank order (arrange) or quantify
    (score) survey participants, test takers,
    individuals who have been studied
  • CAN BE THOUGHT OF AS ADDING A NEW SCORE TO YOUR
    DATASET FOR EACH INDIVIDUAL
  • this analysis will also help you to understand
    the properties of each item, as a measure of the
    target construct (what properties?)
  • GRAPHICAL REPRESENTATION IS BEST

18
Item Properties that we are interested in are
captured graphically by so called Item
Characteristics Curves (ICCs)
19
  • Item/Symptom Test/Scale INFORMATION
  • is useful and necessary to examine score
    precision (the accuracy of estimated scores)
  • we are interested in this for different
    individuals (individuals with different score
    values)
  • by inspecting the amount of information about
    each score level, across the score range (range
    of estimated scores) we are identifying
    variations in measurement precision (reliable of
    individuals estimated scores)
  • this enables us to make statements about the
    effective measurement range of an instrument in
    an population

20
e.g. Item Characteristics Curves
21
Item information functions- add them together to
get TIF
beware y axis scaling not all the same
22
Test Information Function
23
Item information functions- shown alongside
their ICCs
3.0 0.14
0.14 0.40
beware y axis scaling not all the same
24
1 / Sqrt Information s.e.m
Standard error of measuremenr is not constant
(U-shaped, not symmetrical)
25
Approximate reliability
  • Reliability
  • 1 1/Info
  • 1 1 / 1 / (s.e.m 2)
  • s.e.m. standard error of measurement

26
Back to the Data
0000 1000 0001 0010 1001 1010 0011 1011 0100 1100
0101 0110 1101 1110 0111 1111
n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45
31
  • Lombard and Doering (1947) data
  • 2 to the power 4
  • i.e. 16 possible response patterns (all occur)
  • with more items this is neither likely nor
    necessary
  • frequency shown for
  • 0000 to 1111
  • frequency is the number with each item response
    pattern

What would be the easiest thing to do with these
numbers to score the patterns..?
27
Answer ..
0000 1000 0001 0010 1001 1010 0011 1011 0100 1100
0101 0110 1101 1110 0111 1111
  • Simply add them up

What would be the easiest thing to do with these
numbers to score the patterns..?
28
Simple sum scores (n1729 new individual values)
0 0 0 0 n Total score
0 0 0 0 477 0 477 zeros
added to data set (new column) 1 0
0 0 63 1 0 0 0 1
12 1 0 0 1 0 150 1
1 0 0 1 7 2 1 0
1 0 32 2 0 0 1 1
11 2 1 0 1 1 4 3
0 1 0 0 231 1 1 1
0 0 94 2 0 1 0 1
13 2 0 1 1 0 378 2
1 1 0 1 12 3 1 1
1 0 169 3 0 1 1 1
45 3 1 1 1 1 31 4
29
Weighted by discriminating power scores
0 0 0 0 n
Total Factor Component weighted by alpha h
1 score score score 0 0 0
0 477 0 -0.98 0 0 1 0 0
0 63 1 -0.68 0.72 0.72 0 0
0 1 12 1 -0.67 0.77 0.77 0
0 1 0 150 1 -0.46 1.34 1.34 1
0 0 1 7 2 -0.41 0.72
0.77 1.48 1 0 1 0 32
2 -0.23 0.72 1.34 2.06 0 0 1
1 11 2 -0.22 1.34 0.77 2.10 1 0
1 1 4 3 0.0 0.72 1.34
0.77 2.82 0 1 0 0 231
1 0.16 3.40 3.40 1 1 0 0
94 2 0.42 0.723.40 4.12 0 1 0
1 13 2 0.43 3.40 0.77 4.16 0
1 1 0 378 2 0.66 3.40
1.34 4.74 1 1 0 1 12
3 0.72 0.72 3.40 0.77 4.88 1 1 1
0 169 3 0.99 0.72 3.401.34 5.46
0 1 1 1 45 3 1.02 3.401.34
0.77 5.50 1 1 1 1 31
4 1.41 0.723.401.340.77 6.22
0.72 3.40 1.34 0.77
30
The data 0000 1000 0001 0010 1001 1010 0011 1011
0100 1100 0101 0110 1101 1110 0111 1111
n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45
31
logit phi ah 0 ah 1zi
ah0 a10
ah1 a21
ah0 a40
Sources of knowledge q1 radio q2 newspapers
q3 reading q4 lectures A single latent
dimension Z Normal (mean 0 std dev 1 ) so Var
1 too!
31
Weighted by discriminating power scores
0 0 0 0 n
Total Factor Component weighted by alpha h
1 score score score 0 0 0
0 477 0 -0.98 0 0 1 0 0
0 63 1 -0.68 0.72 0.72 0 0
0 1 12 1 -0.67 0.77 0.77 0
0 1 0 150 1 -0.46 1.34 1.34 1
0 0 1 7 2 -0.41 0.72
0.77 1.48 1 0 1 0 32
2 -0.23 0.72 1.34 2.06 0 0 1
1 11 2 -0.22 1.34 0.77 2.10 1 0
1 1 4 3 0.0 0.72 1.34
0.77 2.82 0 1 0 0 231
1 0.16 3.40 3.40 1 1 0 0
94 2 0.42 0.723.40 4.12 0 1 0
1 13 2 0.43 3.40 0.77 4.16 0
1 1 0 378 2 0.66 3.40
1.34 4.74 1 1 0 1 12
3 0.72 0.72 3.40 0.77 4.88 1 1 1
0 169 3 0.99 0.72 3.401.34 5.46
0 1 1 1 45 3 1.02 3.401.34
0.77 5.50 1 1 1 1 31
4 1.41 0.723.401.340.77 6.22
0.72 3.40 1.34 0.77
32
Something a little more subtle
  • Simple sum scores assumes all item responses
    equally useful at defining the construct
  • may not be the case
  • If items are differentially important
  • different discriminating power with respect to
    what we are measuring, we might want to take that
    into accounf
  • How? Weighted sum scores Component scores
  • weighted by what?
  • weighted by the estimates (factor loading type
    parameter) from a latent variable model
  • latent trait model with a single latent factor

33
Weightedscores
Weights alpha h 1 parameters Q1 0.72 Q2 3.40 Q3 1
.34 Q4 0.77 These numbers related to the slopes
of the Ss
34
Estimated component scores (weighted values)
????? 0.72 3.40 1.34 0.77
0 0 0 0 n
Total Factor Component weighted by alpha h
1 score score score 0 0 0
0 477 0 -0.98 0 0 1 0 0
0 63 1 -0.68 0.72 0.72 0 0
0 1 12 1 -0.67 0.77 0.77 0
0 1 0 150 1 -0.46 1.34 1.34 1
0 0 1 7 2 -0.41 0.72
0.77 1.48 1 0 1 0 32
2 -0.23 0.72 1.34 2.06 0 0 1
1 11 2 -0.22 1.34 0.77 2.10 1 0
1 1 4 3 0.0 0.72 1.34
0.77 2.82 0 1 0 0 231
1 0.16 3.40 3.40 1 1 0 0
94 2 0.42 0.723.40 4.12 0 1 0
1 13 2 0.43 3.40 0.77 4.16 0
1 1 0 378 2 0.66 3.40
1.34 4.74 1 1 0 1 12
3 0.72 0.72 3.40 0.77 4.88 1 1 1
0 169 3 0.99 0.72 3.401.34 5.46
0 1 1 1 45 3 1.02 3.401.34
0.77 5.50 1 1 1 1 31
4 1.41 0.723.401.340.77 6.22
35
But the bees knees are..
  • The estimated factor scores from the model
  • Not just some simple sum or unweighted or
    weighted items
  • Takes into account the proposed score
    distribution (gaussian normal) and the estimated
    model parameters (but not the fact that they are
    estimates rather than known values) and more
    besides (when missing data are present)

the estimated factor scores
36
A graphical and interactiveintroduction to IRT
  • Play with the key features of IRT models
  • www2.uni-jena.de/svw/metheval/irt/VisualIRT.pdf

37
a b (see) 2 parameter IRT model
  • VisualIRT (pdf)
  • Page
  • VisualIRT (pdf)
  • Page

Individuals score new ruler value Any
hypothetical latent variable factor/trait
continuum expressed in a z-score metric
(gaussian normal (0,1) Item properties slope
item discrimination location item commonality
difficulty/prevalance/ severity
38
IRT Resources
  • A visual guide to Item Response Theory
  • I. Partchev
  • Introduction to RIT,
  • R.Baker
  • http //ericae.net/irt/baker/toc.htm
  • An introduction to modern measurement theory
  • B Reeve
  • Chapter in Fayers and Machin QoL book
  • P Fayers
  • ABC of Item Response Theory
  • H Goldstein
  • Moustaki papers, and online slides (FA at 100)
  • LSE books (Bartholomew, Knott, Moustaki, Steele)

39
Applications of Item Response Theory to
Practical Testing Problems Frederick M. Lord. 274
pages. 1980. Applying The Rasch Model Trevor G.
Bond and Christine M. Fox 255 pages. 2001.
Constructing Measures An Item Response Modeling
Approach      Mark Wilson. 248 pages. 2005.
The EM Algorithm and Related Statistical Models
     Michiko Watanabe and Kazunori Yamaguchi.
250 pages. 2004. Essays on Item Response Theory
Edited by Anne Boomsma, Marijtje A.J. van Duijn,
Tom A.A. Snijders. 438 pages. 2001. Explanatory
Item Response Models A Generalized Linear and
Nonlinear Approach      Edited by Paul De Boeck
and Mark Wilson. 382 pages. 2004. Fundamentals
of Item Response Theory Ronald K. Hambleton, H.
Swaminathan, and H. Jane Rogers. 184 pages. 1991.
Handbook of Modern Item Response Theory Edited
by Wim J. van der Linden and Ronald K. Hambleton.
510 pages. 1997. Introduction to Nonparametric
Item Response Theory Klaas Sijtsma and Ivo W.
Molenaar. 168 pages. 2002. Item Response Theory
Mathilda Du Toit. 906 pages. 2003. Item Response
Theory for Psychologists Susan E. Embretson and
Steven P. Reise. 376 pages. 2000. Item Response
Theory Parameter Estimation Techniques (Second
Edition, Revised and Expanded w/CD) Frank Baker
and Seock-Ho Kim. 495 pages. 2004. Item Response
Theory Principles and Applications Ronald K.
Hambleton and Hariharan Swaminathan. 332 pages.
1984. Logit and Probit Ordered and Multinomial
Models Vani K. Borooah. 96 pages. 2002. Markov
Chain Monte Carlo in Practice      W.R. Gilks,
Sylvia Richardson, and D.J. Spiegelhalter. 512
pages. 1995. Monte Carlo Statistical Methods
     Christian P. Robert and George Casella. 645
pages. 2004. Polytomous Item Response Theory
Models      Remo Ostini and Michael L. Nering.
120 pages. 2005. Rasch Models for Measurement
David Andrich. 96 pages. 1988. Rasch Models
Foundations, Recent Developments, and
Applications Edited by Gerhard H. Fischer and Ivo
W. Molenaar. 436 pages. 1995. The Sage Handbook
of Quantitative Methodology for the Social
Sciences Edited by David Kaplan. 511 pages. 2004.
Test Equating, Scaling, and Linking Methods and
Practices (Second Edition) Michael J. Kolen and
Robert L. Brennan. 548 pages. 2004.
Write a Comment
User Comments (0)
About PowerShow.com