Loading...

PPT – Estimation of Item Response Models PowerPoint presentation | free to download - id: 692bd2-MjQyZ

The Adobe Flash plugin is needed to view this content

Estimation of Item Response Models

- Mister Ibik
- Division of Psychology in Education
- Arizona State University
- EDP 691 Advanced Topics in Item Response Theory

Motivation and Objectives

- Why estimate?
- Distinguishing feature of IRT modeling as

compared to classical techniques is the presence

of parameters - These parameters characterize and guide inference

regarding entities of interest (i.e., examinees,

items) - We will think through
- Different estimation situations
- Alternative estimation techniques
- The logic and mathematics underpinning these

techniques - Various strengths and weaknesses
- What you will have
- A detailed introduction to principles and

mathematics - A resource to be revisitedand revisitedand

revisited

Outline

- Some Necessary Mathematical Background
- Maximum Likelihood and Bayesian Theory
- Estimation of Person Parameters When Item

Parameters are Known - ML
- MAP
- EAP
- Estimation of Item Parameters When Person

Parameters are Known - ML
- Simultaneous Estimation of Item and Person

Parameters - JML
- CML
- MML
- Other Approaches

Background Finding the Root of an Equation

- Newton-Raphson Algorithm
- Finds the root of an equation
- Example the function f(x) x2
- Has a root (where f(x) 0) at x 0

Newton-Raphson

- Newton-Raphson takes a given point, x0, and

systematically progresses to find the root of the

equation - Utilizes the slope of the function to find where

the root may be - The slope of the function is given by the

derivative - Denoted
- Gives the slope of the straight line that is

tangent to f(x) at x - Tangent best linear prediction of how the

function is changing - For x0, the best guess for the root is the point

where f'(x) 0 - This occurs at
- So the next candidate point for the root is

Newton-Raphson Updating (1)

- Suppose x0 1.5

f'(x0) 3

f(x0) 2.25

x1 0.75

x0 1.5

Newton-Raphson Updating (2)

- Now x1 0.75

f'(x1) 1.5

f(x1) 0.5625

x2 0.375

x1 0.75

Newton-Raphson Updating (3)

- Now x2 0.375

f'(x2) 0.75

f(x2) 0.1406

x3 0.1875

x2 0.375

Newton-Raphson Updating (4)

- Now x3 0.1875

f'(x3) 0.375

f(x3) 0.0352

x4 0.0938

x3 0.1875

Newton-Raphson Example

Iteration Value f(x)

0 1.5000 2.2500 3.0000 0.7500 0.7500

1 0.7500 0.5625 1.5000 0.3750 0.3750

2 0.3750 0.1406 0.7500 0.1875 0.1875

3 0.1875 0.0352 0.3750 0.0938 0.0938

4 0.0938 0.0088 0.1875 0.0469 0.0469

5 0.0469 0.0022 0.0938 0.0234 0.0234

6 0.0234 0.0005 0.0469 0.0117 0.0117

7 0.0117 0.0001 0.0234 0.0059 0.0059

8 0.0059 0.0000 0.0117 0.0029 0.0029

9 0.0029 0.0000 0.0059 0.0015 0.0015

10 0.0015 0.0000 0.0029 0.0007 0.0007

Newton-Raphson Summary

- Iterative algorithm for finding the root of an

equation - Takes a starting point and systematically

progresses to find the root of the function - Requires the derivative of the function
- Each successive point is given by
- The process continues until we get arbitrarily

close, as usually measured by the change in some

function

Difficulties With Newton-Raphson

- Some functions have multiple roots
- Which root is found often depends on the start

value

Difficulties With Newton-Raphson

- Numerical complications can arise
- When the derivative is relatively small in

magnitude, the algorithm shoots into outer space

Logic of Maximum Likelihood

- A general approach to parameter estimation
- The use of a model implies that the data may be

sufficiently characterized by the features of the

model, including the unknown parameters - Parameters govern the data in the sense that the

data depend on the parameters - Given values of the parameters we can calculate

the (conditional) probability of the data - P(Xij 1 ?i, bj) exp(?i bj)/(1 exp(?i

bj)) - Maximum likelihood (ML) estimation asks What

are the values of the parameters that make the

data most probable?

Example Series of Bernoulli Variables With

Unknown Probability

- Bernoulli variable P(X 1) p
- The probability of the data is given by pX

(1-p)(1-X) - Suppose we have two random variables X1 and X2
- When taken as a function of the parameters, it is

called the likelihood - Suppose X1 1, X2 0
- P(X1 1, X2 0p) L(pX1 1, X2 0) p

(1-p) - Choose p to maximize the conditional probability

of the data - For p 0.1, L 0.1 (1-0.1) 0.09
- For p 0.2, L 0.2 (1-0.2) 0.16
- For p 0.3, L 0.3 (1-0.3) 0.21

Example Likelihood Function

The Likelihood Function in IRT

- The Likelihood may be thought of as the

conditional probability, where the data are known

and the parameters vary - Let Pij P(Xij 1 ?i, ?j)
- The goal is to maximize this function what

values of the parameters yield the highest value?

Log-Likelihood Functions

- It is numerically easier to maximize the natural

logarithm of the likelihood - The log-likelihood has the same maximum as the

likelihood

Maximizing the Log-Likelihood

- Note that at the maximum of the function, the

slope of the tangent line equals 0 - The slope of the tangent is given by the first

derivative - If we can find the point at which the first

derivative equals 0, we will have also found the

point at which the function is maximized

Overview of Numerical Techniques

- One can maximize the lnL function by finding a

point where its derivative is 0 - A variety of methods are available for maximizing

L, or lnL - Newton-Raphson
- Fisher Scoring
- Estimation-Maximization (EM)
- The generality of ML estimation and these

numerical techniques results in the same concepts

and estimation routines being employed across

modeling situations - Logistic regression, log-linear modeling, FA,

SEM, LCA

ML Estimation of Person Parameters When Item

Parameters Are Known

- Assume item parameters bj, aj, and cj, are known
- Assume unidimensionality, local and respondent

independence

- Conditional probability now depends on person

parameter only

- Likelihood function for the person parameters

only

ML Estimation of Person Parameters When Item

Parameters Are Known

- Choose each ?i such that L or lnL is maximized
- Lets suppose we have one examinee
- Maximize this function using any of several

methods - Well use Newton-Raphson

Newton-Raphson Estimation Recap

- Recall NR seeks to find the root of a function

(where 0) - NR updates follow the general structure

What is the derivative of this function?

What is our function of interest?

Current value

- Derivative of the function of interest

- Updated value

Function of interest

Newton-Raphson Estimation of Person Parameters

- Newton-Raphson uses the derivative of the

function of interest - Our function is itself a derivative, the first

derivative of lnL with respect to ?i - Well need the second derivative as well as the

first derivative - Updates given by

ML Estimation of Person Parameters When Item

Parameters Are Known The Log-Likelihood

- The log-likelihood to be maximized
- Select a start value and iterate towards a

solution using Newton-Raphson - A hill-climbing sequence

ML Estimation of Person Parameters When Item

Parameters Are Known Newton-Raphson

- Start at -1.0

ML Estimation of Person Parameters When Item

Parameters Are Known Newton-Raphson

- Move to 0.09

ML Estimation of Person Parameters When Item

Parameters Are Known Newton-Raphson

- Move to -0.0001
- When the change in ?i is arbitrarily small (e.g.,

less than 0.001), stop estimation - No meaningful change in next step
- The key is that the tangent is 0

Newton-Raphson Estimation of Multiple Person

Parameters

- But we have N examinees each with a ?i to be

estimated - We need a multivariate version of the

Newton-Raphson algorithm

First Order Derivatives

- First order derivatives of the log-likelihood
- ?lnL/??i only involves terms corresponding to

subject i

Why???

Second Order Derivatives

- Hessian second order partial derivatives of the

log-likelihood - This matrix needs to be inverted
- In the current context, this matrix is diagonal

Why???

Second Order Derivatives

- The inverse of the Hessian is diagonal with

elements that are the reciprocals of the diagonal

of the Hessian - Updates for each ?i do not depend on any other

subjects ?

Second Order Derivatives

- The updates for each ?i are independent of one

another - The procedure can be performed one examinee at a

time

ML Estimation of Person Parameters When Item

Parameters Are Known Standard Errors

- The approximate, asymptotic standard error of the

ML estimate of ?i is - where I(?i) is the information function
- Standard errors are
- asymptotic with respect to the number of items
- approximate because only an estimate of ?i is

employed - asymptotically approximately unbiased

ML Estimation of Person Parameters When Item

Parameters Are Known Strengths

- ML estimates have some desirable qualities
- They are consistent
- If a sufficient statistic exists, then the MLE is

a function of that statistic (Rasch models) - Asymptotically normally distributed
- Asymptotically most efficient (least variable)

estimator among the class of normally distributed

unbiased estimators - Asymptotically with respect to what?

ML Estimation of Person Parameters When Item

Parameters Are Known Weaknesses

- ML estimates have some undesirable qualities
- Estimates may fly off into outer space
- They do not exist for so called perfect scores

(all 1s or 0s) - Can be difficult to compute or verify when the

likelihood function is not single peaked (may

occur with 3-PLM or more complex IRT models)

ML Estimation of Person Parameters When Item

Parameters Are Known Weaknesses

- Strategies to handle wayward solutions
- Bound the amount of change at any one iteration
- Atheoretical
- No longer common
- Use an alternative estimation framework (Fisher,

Bayesian) - Strategies to handle perfect scores
- Do not estimate ?i
- Use an alternative estimation framework

(Bayesian) - Strategies to handle local maxima
- Re-estimate the parameters using different

starting points and look for agreement

ML Estimation of Person Parameters When Item

Parameters Are Known Weaknesses

- An alternative to the Newton-Raphson technique is

Fishers method of scoring - Instead of the Hessian, it uses the information

matrix (based on the Hessian) - This usually leads to quicker convergence
- Often is more stable than Newton-Raphson
- But what about those perfect scores?

Bayes Theorem

- We can avoid some of the problems that occur in

ML estimation by employing a Bayesian approach - All entities treated as random variables
- Bayes Theorem for random variables A and B

Posterior distribution of A, given B The

probability of A, given B.

- Conditional probability of B, given A

- Prior probability of A

- Marginal probability of B

Bayes Theorem

- If A is discrete
- If A is continuous
- Note that P(BA) L(AB)

Bayesian Estimation of Person Parameters The

Posterior

- Select a prior distribution for ?i denoted P(?i)
- Recall the likelihood function takes on the form

P(Xi ?i) - The posterior density of ?i given Xi is
- Since P(Xi) is a constant

Bayesian Estimation of Person Parameters The

Posterior

- The Likelilhood

- The Prior

- The Posterior

Maximum A Posteriori Estimation of Person

Parameters

- The Maximum A Posteriori (MAP) estimate is

the maximum of the posterior density of ?i - Computed by maximizing the posterior density, or

its log - Find ?i such that
- Use Newton-Raphson or Fisher scoring
- Max of lnP(?i Xi) occurs at max of lnP(Xi

?i) lnP(?i) - This can be thought of as augmenting the

likelihood with prior information

Choice of Prior Distribution

- Choosing P(?i) U(-8, 8) yields the posterior to

be proportional to the likelihood - In this case, the MAP is very similar to the ML

estimate - The prior distribution P(?i) is often assumed to

be N(0, 1) - The normal distribution commonly justified by

appeal to CLT - Choice of mean and variance identifies the scale

of the latent continuum

MAP Estimation of Person Parameters Features

- The approximate, asymptotic standard error of the

MAP is - where I(?i) is the information from the posterior

density - Advantages of the MAP estimator
- Exists for every response pattern why?
- Generally leads to a reduced tendency for local

extrema - Disadvantages of the MAP estimator
- Must specify a prior
- Exhibits shrinkage in that it is biased towards

the mean May need lots of items to swamp the

prior if its misspecified - Calculations are iterative and may take a long

time - May result in local extrema

Expected A Posteriori (EAP) Estimation of Person

Parameters

- The Expected A Posteriori (EAP) estimator is the

mean of the posterior distribution - Exact computations are often intractable
- We approximate the integral using numerical

techniques - Essentially, we take a weighted average of the

values, where the weights are determined by the

posterior distribution - Recall that the posterior distribution is itself

determined by the prior and the likelihood

Numerical Integration Via Quadrature

- The Posterior Distribution
- With quadrature points
- Evaluate the heights of the distribution at each

point - Use the relative heights as the weights

? .165

.021 / .165 .127

.002 / .165 .015

EAP Estimation of via Quadrature

- The Expected A Posteriori (EAP) is estimated by a

weighted average - where H(Qr) is weight of point Qr in the

posterior (compare Embretson Reise, 2000 p.

177) - The standard error is the standard deviation in

the posterior and may also be approximated via

quadrature

EAP Estimation of via Quadrature

- Advantages
- Exists for all possible response patterns
- Non-iterative solution strategy
- Not a maximum, therefore no local extrema
- Has smallest MSE in the population
- Disadvantages
- Must specify a prior
- Exhibits shrinkage to the prior mean If the

prior is misspecified, may need lots of items to

swamp the prior

ML Estimation of Item Parameters When Person

Parameters Are Known Assumptions

- Assume
- person parameters ?i are known
- respondent and local independence
- Choose values for item parameters that maximize

lnL

Newton-Raphson Estimation

- What is the structure of this matrix?

ML Estimation of Item Parameters When Person

Parameters Are Known

- Just as we could estimate subjects one at a time

thanks to respondent independence, we can

estimate items one at time thanks to local

independence - Multivariate Newton-Raphson

ML Estimation of Item Parameters When Person

Parameters Are Known Standard Errors

- To obtain the approximate, asymptotic standard

errors - Invert the associated information matrix, which

yields the variance-covariance matrix - Take the square root of the elements of the

diagonal - Asymptotic w.r.t. sample size and approximate

because we only have estimates of the parameters - This is conceptually similar to those for the

estimation of ? - But why do we need a matrix approach?

ML Estimation of Item Parameters When Person

Parameters Are Known Standard Errors

- ML estimates of item parameters have same

properties as those for person parameters

consistent, efficient, asymptotic (w.r.t.

subjects) - aj parameters can be difficult to estimate, tend

to get inflated with small sample sizes - cj parameters are often difficult to estimate

well - Usually because theres not a lot of information

in the data about the asymptote - Especially true when items are easy
- Generally need larger and more heterogeneous

samples to estimate 2-PL and 3-PL - Can employ Bayesian estimation (more on this

later)