# Estimation of Item Response Models - PowerPoint PPT Presentation

PPT – Estimation of Item Response Models PowerPoint presentation | free to download - id: 692bd2-MjQyZ The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Estimation of Item Response Models

Description:

### Estimation of Item Response Models Mister Ibik Division of Psychology in Education Arizona State University EDP 691: Advanced Topics in Item Response Theory – PowerPoint PPT presentation

Number of Views:5
Avg rating:3.0/5.0
Slides: 55
Provided by: RLevy
Category:
Tags:
Transcript and Presenter's Notes

Title: Estimation of Item Response Models

1
Estimation of Item Response Models
• Mister Ibik
• Division of Psychology in Education
• Arizona State University
• EDP 691 Advanced Topics in Item Response Theory

2
Motivation and Objectives
• Why estimate?
• Distinguishing feature of IRT modeling as
compared to classical techniques is the presence
of parameters
• These parameters characterize and guide inference
regarding entities of interest (i.e., examinees,
items)
• We will think through
• Different estimation situations
• Alternative estimation techniques
• The logic and mathematics underpinning these
techniques
• Various strengths and weaknesses
• What you will have
• A detailed introduction to principles and
mathematics
• A resource to be revisitedand revisitedand
revisited

3
Outline
• Some Necessary Mathematical Background
• Maximum Likelihood and Bayesian Theory
• Estimation of Person Parameters When Item
Parameters are Known
• ML
• MAP
• EAP
• Estimation of Item Parameters When Person
Parameters are Known
• ML
• Simultaneous Estimation of Item and Person
Parameters
• JML
• CML
• MML
• Other Approaches

4
Background Finding the Root of an Equation
• Newton-Raphson Algorithm
• Finds the root of an equation
• Example the function f(x) x2
• Has a root (where f(x) 0) at x 0

5
Newton-Raphson
• Newton-Raphson takes a given point, x0, and
systematically progresses to find the root of the
equation
• Utilizes the slope of the function to find where
the root may be
• The slope of the function is given by the
derivative
• Denoted
• Gives the slope of the straight line that is
tangent to f(x) at x
• Tangent best linear prediction of how the
function is changing
• For x0, the best guess for the root is the point
where f'(x) 0
• This occurs at
• So the next candidate point for the root is

6
Newton-Raphson Updating (1)
• Suppose x0 1.5

f'(x0) 3
f(x0) 2.25
x1 0.75
x0 1.5
7
Newton-Raphson Updating (2)
• Now x1 0.75

f'(x1) 1.5
f(x1) 0.5625
x2 0.375
x1 0.75
8
Newton-Raphson Updating (3)
• Now x2 0.375

f'(x2) 0.75
f(x2) 0.1406
x3 0.1875
x2 0.375
9
Newton-Raphson Updating (4)
• Now x3 0.1875

f'(x3) 0.375
f(x3) 0.0352
x4 0.0938
x3 0.1875
10
Newton-Raphson Example
Iteration Value f(x)
0 1.5000 2.2500 3.0000 0.7500 0.7500
1 0.7500 0.5625 1.5000 0.3750 0.3750
2 0.3750 0.1406 0.7500 0.1875 0.1875
3 0.1875 0.0352 0.3750 0.0938 0.0938
4 0.0938 0.0088 0.1875 0.0469 0.0469
5 0.0469 0.0022 0.0938 0.0234 0.0234
6 0.0234 0.0005 0.0469 0.0117 0.0117
7 0.0117 0.0001 0.0234 0.0059 0.0059
8 0.0059 0.0000 0.0117 0.0029 0.0029
9 0.0029 0.0000 0.0059 0.0015 0.0015
10 0.0015 0.0000 0.0029 0.0007 0.0007
11
Newton-Raphson Summary
• Iterative algorithm for finding the root of an
equation
• Takes a starting point and systematically
progresses to find the root of the function
• Requires the derivative of the function
• Each successive point is given by
• The process continues until we get arbitrarily
close, as usually measured by the change in some
function

12
Difficulties With Newton-Raphson
• Some functions have multiple roots
• Which root is found often depends on the start
value

13
Difficulties With Newton-Raphson
• Numerical complications can arise
• When the derivative is relatively small in
magnitude, the algorithm shoots into outer space

14
Logic of Maximum Likelihood
• A general approach to parameter estimation
• The use of a model implies that the data may be
sufficiently characterized by the features of the
model, including the unknown parameters
• Parameters govern the data in the sense that the
data depend on the parameters
• Given values of the parameters we can calculate
the (conditional) probability of the data
• P(Xij 1 ?i, bj) exp(?i bj)/(1 exp(?i
bj))
• Maximum likelihood (ML) estimation asks What
are the values of the parameters that make the
data most probable?

15
Example Series of Bernoulli Variables With
Unknown Probability
• Bernoulli variable P(X 1) p
• The probability of the data is given by pX
(1-p)(1-X)
• Suppose we have two random variables X1 and X2
• When taken as a function of the parameters, it is
called the likelihood
• Suppose X1 1, X2 0
• P(X1 1, X2 0p) L(pX1 1, X2 0) p
(1-p)
• Choose p to maximize the conditional probability
of the data
• For p 0.1, L 0.1 (1-0.1) 0.09
• For p 0.2, L 0.2 (1-0.2) 0.16
• For p 0.3, L 0.3 (1-0.3) 0.21

16
Example Likelihood Function
17
The Likelihood Function in IRT
• The Likelihood may be thought of as the
conditional probability, where the data are known
and the parameters vary
• Let Pij P(Xij 1 ?i, ?j)
• The goal is to maximize this function what
values of the parameters yield the highest value?

18
Log-Likelihood Functions
• It is numerically easier to maximize the natural
logarithm of the likelihood
• The log-likelihood has the same maximum as the
likelihood

19
Maximizing the Log-Likelihood
• Note that at the maximum of the function, the
slope of the tangent line equals 0
• The slope of the tangent is given by the first
derivative
• If we can find the point at which the first
derivative equals 0, we will have also found the
point at which the function is maximized

20
Overview of Numerical Techniques
• One can maximize the lnL function by finding a
point where its derivative is 0
• A variety of methods are available for maximizing
L, or lnL
• Newton-Raphson
• Fisher Scoring
• Estimation-Maximization (EM)
• The generality of ML estimation and these
numerical techniques results in the same concepts
and estimation routines being employed across
modeling situations
• Logistic regression, log-linear modeling, FA,
SEM, LCA

21
ML Estimation of Person Parameters When Item
Parameters Are Known
• Assume item parameters bj, aj, and cj, are known
• Assume unidimensionality, local and respondent
independence
• Conditional probability now depends on person
parameter only
• Likelihood function for the person parameters
only

22
ML Estimation of Person Parameters When Item
Parameters Are Known
• Choose each ?i such that L or lnL is maximized
• Lets suppose we have one examinee
• Maximize this function using any of several
methods
• Well use Newton-Raphson

23
Newton-Raphson Estimation Recap
• Recall NR seeks to find the root of a function
(where 0)

What is the derivative of this function?
What is our function of interest?
Current value
• Derivative of the function of interest
• Updated value

Function of interest
24
Newton-Raphson Estimation of Person Parameters
• Newton-Raphson uses the derivative of the
function of interest
• Our function is itself a derivative, the first
derivative of lnL with respect to ?i
• Well need the second derivative as well as the
first derivative

25
ML Estimation of Person Parameters When Item
Parameters Are Known The Log-Likelihood
• The log-likelihood to be maximized
• Select a start value and iterate towards a
solution using Newton-Raphson
• A hill-climbing sequence

26
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson
• Start at -1.0

27
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson
• Move to 0.09

28
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson
• Move to -0.0001
• When the change in ?i is arbitrarily small (e.g.,
less than 0.001), stop estimation
• No meaningful change in next step
• The key is that the tangent is 0

29
Newton-Raphson Estimation of Multiple Person
Parameters
• But we have N examinees each with a ?i to be
estimated
• We need a multivariate version of the
Newton-Raphson algorithm

30
First Order Derivatives
• First order derivatives of the log-likelihood
• ?lnL/??i only involves terms corresponding to
subject i

Why???
31
Second Order Derivatives
• Hessian second order partial derivatives of the
log-likelihood
• This matrix needs to be inverted
• In the current context, this matrix is diagonal

Why???
32
Second Order Derivatives
• The inverse of the Hessian is diagonal with
elements that are the reciprocals of the diagonal
of the Hessian
• Updates for each ?i do not depend on any other
subjects ?

33
Second Order Derivatives
• The updates for each ?i are independent of one
another
• The procedure can be performed one examinee at a
time

34
ML Estimation of Person Parameters When Item
Parameters Are Known Standard Errors
• The approximate, asymptotic standard error of the
ML estimate of ?i is
• where I(?i) is the information function
• Standard errors are
• asymptotic with respect to the number of items
• approximate because only an estimate of ?i is
employed
• asymptotically approximately unbiased

35
ML Estimation of Person Parameters When Item
Parameters Are Known Strengths
• ML estimates have some desirable qualities
• They are consistent
• If a sufficient statistic exists, then the MLE is
a function of that statistic (Rasch models)
• Asymptotically normally distributed
• Asymptotically most efficient (least variable)
estimator among the class of normally distributed
unbiased estimators
• Asymptotically with respect to what?

36
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses
• ML estimates have some undesirable qualities
• Estimates may fly off into outer space
• They do not exist for so called perfect scores
(all 1s or 0s)
• Can be difficult to compute or verify when the
likelihood function is not single peaked (may
occur with 3-PLM or more complex IRT models)

37
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses
• Strategies to handle wayward solutions
• Bound the amount of change at any one iteration
• Atheoretical
• No longer common
• Use an alternative estimation framework (Fisher,
Bayesian)
• Strategies to handle perfect scores
• Do not estimate ?i
• Use an alternative estimation framework
(Bayesian)
• Strategies to handle local maxima
• Re-estimate the parameters using different
starting points and look for agreement

38
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses
• An alternative to the Newton-Raphson technique is
Fishers method of scoring
• Instead of the Hessian, it uses the information
matrix (based on the Hessian)
• This usually leads to quicker convergence
• Often is more stable than Newton-Raphson
• But what about those perfect scores?

39
Bayes Theorem
• We can avoid some of the problems that occur in
ML estimation by employing a Bayesian approach
• All entities treated as random variables
• Bayes Theorem for random variables A and B

Posterior distribution of A, given B The
probability of A, given B.
• Conditional probability of B, given A
• Prior probability of A
• Marginal probability of B

40
Bayes Theorem
• If A is discrete
• If A is continuous
• Note that P(BA) L(AB)

41
Bayesian Estimation of Person Parameters The
Posterior
• Select a prior distribution for ?i denoted P(?i)
• Recall the likelihood function takes on the form
P(Xi ?i)
• The posterior density of ?i given Xi is
• Since P(Xi) is a constant

42
Bayesian Estimation of Person Parameters The
Posterior
• The Likelilhood
• The Prior
• The Posterior

43
Maximum A Posteriori Estimation of Person
Parameters
• The Maximum A Posteriori (MAP) estimate is
the maximum of the posterior density of ?i
• Computed by maximizing the posterior density, or
its log
• Find ?i such that
• Use Newton-Raphson or Fisher scoring
• Max of lnP(?i Xi) occurs at max of lnP(Xi
?i) lnP(?i)
• This can be thought of as augmenting the
likelihood with prior information

44
Choice of Prior Distribution
• Choosing P(?i) U(-8, 8) yields the posterior to
be proportional to the likelihood
• In this case, the MAP is very similar to the ML
estimate
• The prior distribution P(?i) is often assumed to
be N(0, 1)
• The normal distribution commonly justified by
appeal to CLT
• Choice of mean and variance identifies the scale
of the latent continuum

45
MAP Estimation of Person Parameters Features
• The approximate, asymptotic standard error of the
MAP is
• where I(?i) is the information from the posterior
density
• Advantages of the MAP estimator
• Exists for every response pattern why?
• Generally leads to a reduced tendency for local
extrema
• Disadvantages of the MAP estimator
• Must specify a prior
• Exhibits shrinkage in that it is biased towards
the mean May need lots of items to swamp the
prior if its misspecified
• Calculations are iterative and may take a long
time
• May result in local extrema

46
Expected A Posteriori (EAP) Estimation of Person
Parameters
• The Expected A Posteriori (EAP) estimator is the
mean of the posterior distribution
• Exact computations are often intractable
• We approximate the integral using numerical
techniques
• Essentially, we take a weighted average of the
values, where the weights are determined by the
posterior distribution
• Recall that the posterior distribution is itself
determined by the prior and the likelihood

47
• The Posterior Distribution
• Evaluate the heights of the distribution at each
point
• Use the relative heights as the weights

? .165
.021 / .165 .127
.002 / .165 .015
48
• The Expected A Posteriori (EAP) is estimated by a
weighted average
• where H(Qr) is weight of point Qr in the
posterior (compare Embretson Reise, 2000 p.
177)
• The standard error is the standard deviation in
the posterior and may also be approximated via

49
• Exists for all possible response patterns
• Non-iterative solution strategy
• Not a maximum, therefore no local extrema
• Has smallest MSE in the population
• Must specify a prior
• Exhibits shrinkage to the prior mean If the
prior is misspecified, may need lots of items to
swamp the prior

50
ML Estimation of Item Parameters When Person
Parameters Are Known Assumptions
• Assume
• person parameters ?i are known
• respondent and local independence
• Choose values for item parameters that maximize
lnL

51
Newton-Raphson Estimation
• What is the structure of this matrix?

52
ML Estimation of Item Parameters When Person
Parameters Are Known
• Just as we could estimate subjects one at a time
thanks to respondent independence, we can
estimate items one at time thanks to local
independence
• Multivariate Newton-Raphson

53
ML Estimation of Item Parameters When Person
Parameters Are Known Standard Errors
• To obtain the approximate, asymptotic standard
errors
• Invert the associated information matrix, which
yields the variance-covariance matrix
• Take the square root of the elements of the
diagonal
• Asymptotic w.r.t. sample size and approximate
because we only have estimates of the parameters
• This is conceptually similar to those for the
estimation of ?
• But why do we need a matrix approach?

54
ML Estimation of Item Parameters When Person
Parameters Are Known Standard Errors
• ML estimates of item parameters have same
properties as those for person parameters
consistent, efficient, asymptotic (w.r.t.
subjects)
• aj parameters can be difficult to estimate, tend
to get inflated with small sample sizes
• cj parameters are often difficult to estimate
well
• Usually because theres not a lot of information
in the data about the asymptote
• Especially true when items are easy
• Generally need larger and more heterogeneous
samples to estimate 2-PL and 3-PL
• Can employ Bayesian estimation (more on this
later)