Computerized Adaptive Testing

About This Presentation

Title:

Computerized Adaptive Testing

Description:

Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing Evidence-Based Practice Requires accurate diagnosis, treatment placement, and ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 33

Provided by: dhd82

Learn more at: https://www.chestnut.org

Category:

more less

Transcript and Presenter's Notes

Title: Computerized Adaptive Testing

1
Reducing the duration and cost of assessment with
the GAIN Computer Adaptive Testing
2
Evidence-Based Practice

Requires accurate diagnosis, treatment placement,
and outcomes monitoring
Assessment over a wide range of domains
The cost of evidence-based assessment is
Time
Respondent Burden
Increased staff resources (including training

3
Improving Efficiency

The use of screeners and short-form instruments
has significantly improved the efficiency of the
assessment process
Can help determine whether a full assessment is
warranted
But not a substitute for a full assessment
Lack of precision
Floor and ceiling effects
Limited content validity

4
Computerized Adaptive Testing

Selects items from a large bank of items based on
the responses made to previous items.
Continues to select and administer items until
sufficient measurement precision is obtained.
Combines the precision and comprehensiveness of a
full assessment with the efficiency of a screener.

5
CAT Process
Typical Pattern of Responses
Increased Difficulty

Score is calculated and the next best item is
selected based on item difficulty

Middle Difficulty
/- 1 Std. Error
Decreased Difficulty
Correct
Incorrect
6
CAT in Clinical Assessment
7
CAT in Clinical Assessment Issues

Triage of individuals to support clinical
decision making

Measurement of multiple clinical dimensions and
subdimensions

Persons with atypical presentation of symptoms

Generalizability of assessment to various groups

8
Clinical Decision Making

How severe are the symptoms?
What type of treatment is most appropriate?
Can CAT be used to answer these questions more
efficiently?

9
Strategy

Use CAT to place persons into low, moderate and
high levels of substance abuse and dependency.
Starting Rules
Using screener measures to set the initial
measure and select the first item
Variable Stop Rules
Tight precision around cut points
Less precision away from cut points

10
CAT Standard Error
11
Results

CAT to full-measure correlations ranged from .87
to .99
Classification of persons into treatment groups
based on CAT and full measure (kappa
coefficients) ranged from .66 to .71.
Screener starting rule improved CAT efficiency by
7 percent
Variable stop rules improved efficiency by 15-38

12
Measuring Multiple Dimensions
13
Assessment on Multiple Dimensions

Instruments often measure multiple domains
In CAT, treating a multi-domain measure as
measuring one domain is problematic
Some subdimensions may not be adequately measured

14
Strategy Content Balancing

Set an item quota for each subscale
Maximum number of subscale items to administer
during the CAT
An item is selected if
Its subscale quota has not been met
Provides maximum information

15
Content Balancing Procedures
Method Screener Content Balanced
None No No
Screener Yes No
Mixed Yes Yes
Full No Yes
16
Percentage of Items Administered by Subscale
IMDS Scale N Items None Screener Mixed Full
Depression 1 99 100 100 100
Depression 3 79 77 100 100
Homicidal/ Suicidal 1 21 100 100 100
Homicidal/ Suicidal 3 8 8 100 100
Anxiety 1 100 100 100 100
Anxiety 3 100 100 100 100
Trauma 1 100 100 100 100
Trauma 3 100 100 100 100
17
Cont. Balancing CAT to Full IMDS Correlations
IMDS Scales None Screener Mixed Full
IMDS 0.98 0.98 0.98 0.97
Depression 0.96 0.94 0.96 0.96
Homicidal/Suicidal 0.60 0.83 0.96 0.95
Anxiety 0.96 0.95 0.96 0.96
Trauma 0.97 0.97 0.97 0.97
Average r 0.89 0.93 0.97 0.96
18
Identifying Persons with Atypical Presentation of
Symptoms
19
Overview

Implications Clients sometimes endorse severe
clinical symptoms that are not reflected by
overall scores on standard assessments.
Statistics that can detect atypical presentation
of symptoms have important clinical implications.
Strategy Identify fit statistics sensitive to
atypical presentation in a CAT context

20
Rasch Fit Statistics

Fit statistics are used to test particular
hypotheses.
Atypicalness Used to detect unexpected outlying,
off-target responses. Outlier sensitive
Example A person with a high level on the
measured trait misses an easy item.
Randomness Used to detect unexpected inlying,
targeted responses.
Both infit and outfit are chi-square statistics.
An infit or outfit value of 1.0 indicates perfect
fit to the Rasch model.

21
Problems with Fit
Responses by Severity Low High Responses by Severity Low High Responses by Severity Low High Randomness Atypicalness
111 11111100000 0000 0.3 0.5
111 10101100010 0000 0.6 1.0
111 11101010000 0000 1.0 1.0
111 00001110000 0000 0.9 1.3
011 11111110000 0000 3.8 1.0
111 11111100000 0001 3.8 1.0
101 01010101010 1010 4.0 2.3
000 00000000011 1111 12.6 4.3
22
Clinical Implications of Misfit

Our analyses indicate that there are subgroups
who endorse severe symptoms without endorsement
of milder symptoms.
Examples
Atypical suicide
Substance use withdrawal without dependence

23
Atypicalness by Number of Items
Number of Items Atypicalness Categories Atypicalness Categories Atypicalness Categories
Number of Items Uber Typical Typical Atypical
16 30.2 48.1 21.7
12 34.3 51.1 14.6
8 38.4 53.2 8.4
4 58.2 40.0 1.8
24
Content Balancing and Atypicalness
Atypicalness Category None Screener Mixed Full Full IMDS
Proto Typical 26.7 34.6 48.3 50.5 49.2
Typical 69.0 58.7 40.8 38.9 38.4
Atypical 4.3 6.5 10.9 10.6 12.4
Kappa .27 .32 .48 .50 --
25
Future Research

Identify alternative fit statistics that are more
sensitive to atypical presentation of symptoms
Determine when it is likely that someone may be
present with atypical symptoms, and if so, select
items to confirm atypicalness.

26
Generalizability of CAT to Various Groups
27
Overview

Persons at the same severity level may differ in
their endorsement of specific items.
This is called differential item functioning
(DIF)
On the GAIN, DIF has been detected by
Age (adolescent vs. adult)
Gender
Ethnicity/Race
Drug of choice

28
DIF By GAIN Scale
Scale Total Age Gender Race Prim. Drug
Internal Mental Distress 43 13 5 10 26
Crime Violence 31 11 14 22 27
Behavioral Complexity 33 12 12 17 22
Substance Problems 16 8 5 9 16
29
DIF and CAT

The presence of DIF can limit our ability to
generalize measurement findings across different
groups.
Controlling for DIF becomes complicated as the
number of DIF items and groups/factors increases.
Currently exploring a number of methods for
controlling DIF in CAT.

30
Potential of CAT in Clinical Practice

Reduce respondent burden
Reduce staff resources
Reduce data fragmentation
Streamline complex assessment procedures
Assist in clinical decision making
Identify persons with atypical profiles
Improve measurement generalizability

31
Future Research

How do we put it all together?
Much of the research in the area of CAT has used
computer simulation. There is a need to test
working CAT systems in clinical practice.

Computerized Adaptive Testing - PowerPoint PPT Presentation

Computerized Adaptive Testing

Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing Evidence-Based Practice Requires accurate diagnosis, treatment placement, and ... – PowerPoint PPT presentation