How to Assess and Measure Competency - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

How to Assess and Measure Competency

Description:

Test specifications ensure each assessment is similar, fair, and covers critical content ... Dr. Thomas Haladyna, Arizona State University. 27 ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 47

Provided by: RobS158

Category:

more less

Transcript and Presenter's Notes

Title: How to Assess and Measure Competency

1
How to Assess andMeasure Competency

Robert C. Shaw, Jr., PhD
Program Director

2
Presentation Outline

Describe a programs responsibilities
Assess appropriate content
Measure abilities as precisely as possible
Reference each cut score to a criterion

3
The validity claim

Our program is confident we can make valid
inferences from an assessment because
we carefully selected and structured the content
and
observed scores are reasonably precise
Weakness in either claim diminishes the validity
argument

4
Define appropriate content

What should we assess?

5
Information sources for content
Certification Boards Expectations
6
What should we assess?

A program should seek multiple opinions about
program content
May mean more than one faculty person in the
program
Could extend to survey results from several
stakeholders
Those who hire your graduates
Those who graduated

7
Describe potential content

Define potential content by describing job
behaviors or tasks
Interpret ABG results
Determine the appropriate time to refer a patient
for consultation from another service
Adjust mechanical ventilation settings to
optimize oxygenation for a patient while
minimizing the risk of pulmonary injury

8
Define terminal behaviors

Focus terminal assessments on end-product
behavior you expect students to master
Insert a pulmonary artery catheter in a patient
within a critical care setting using standard
technique while minimizing risks of infection and
lung involvement
Integrate pulmonary function testing results with
patient history and other laboratory results to
produce a diagnosis

9
Measure task criticality

Typically expressed by the interaction of a
importance/significance/risk measure
and a
frequency/extent measure

10
Potential survey measurements

How important is the task to success?
OR
How significant is the task to safe and effective
practice?

4Extremely
3Very
2Moderately
1Minimally

11
Potential survey measurements

If this task is incorrectly performed, how strong
is the risk?

3 Potentially fatal
2Likely to increase morbidity
1 Unlikely to have an adverse effect

3High
2Moderate
1Low

12
Potential survey measurements

How frequently do you perform the task?

3Every week
2A few times each year
1Less than once a year

3Very often
2Occasionally
1Infrequently

13
Potential survey measurements

Have you performed the task in the last year?

1Yes
0No

14
What can we do with task measurements?

Normed-referenced approach
Rank order tasks from most to least critical
Start at the top and work down using available
time
Criterion-referenced approach
Identify tasks that are sufficiently critical to
ensure program coverage and competency assessment

15
Select item type(s) for each assessment

Constructed response (e.g., short answer, essay,
performance)
Short development time
Long scoring time
Scores have strong subjective characteristics
Selected response (e.g., true/false, matching,
multiple-choice)
Long development time
Short scoring time
Scores have strong objective characteristics

16
High stakes terminal assessments should be
standardized

Specify how the assessment should look before
writing/selecting items
Test specifications ensure each assessment is
similar, fair, and covers critical content

17
Test specifications are typically two-dimensional
18
Entire test blueprint/matrix
19
Test specifications and items

Each item should be linked to a task and a
cognitive process level
It helps to store items in a database
A sophisticated database will permit additional
layers of classification
Acute/chronic care
Age groups

20
Item banking software

FastTest
www.assess.com/frmSoftCat.htm
ExamView
www.pearsonncs.com/examview/
examview.htm
LXRTest
www.lxrtest.com/

21
Measure abilities precisely

Are we confident an assessment has yielded a
sufficiently precise ability estimate?

22
Reliability

Theoretical premise
Observed scores are assumed to express true
ability plus some measurement error
High reliability implies low measurement error

23
Reliability

Reliability indices are R2 values, which express
the percentage of observed score variance that
can be attributed to true score variance
How high is high enough?
A test score reliability value of at least .85 is
a characteristic of large-scale, standardized
assessments, many exceed .90
Sufficiently reliable test scores from a test
built by a program should show values of at least
.60

24
Reliability

Reliability is an attribute of a set of test
scores, it is not an attribute of a test
Therefore, a program should assess reliability
for each group
KR20 is appropriate for dichotomously scored
(0,1) items
Coefficient alpha works for polytomously (0,
1,n) scored items

25
Why are selected response items used for so many
assessments?

Assuming the time to assess is constant, more
responses can be elicited from students using
selected response items
more items
broader content coverage
increased information
enhanced measurement precision
stronger validity
Scores are more strongly objective

26
Add items or options?

A program cannot go wrong by adding more items to
an assessment
A program may only consume space and time by
adding more options to multiple-choice items
There is growing evidence items with 3 options
are optimal, particularly when doing so permits
inclusion of more items on an assessment
Dr. Thomas Haladyna, Arizona State University

27
Up to a point, measurement precision and item
quantity are directly related
Reliability
Higher quality items
Lower quality items
Item Count
28
What encourages high item quality?

Write well
Clear, concise, accurate
Remove unnecessary information from the stimulus
Present nuanced choices that require a
sophisticated mastery of material to correctly
respond
Item review is another opportunity to seek
multiple opinions

29
What encourages high item quality?

Avoid formats known to be flawed
D. All of the above
D. None of the above
Negative wording
All of the following are true EXCEPT
Which of the following is not true?

30
What encourages high item quality?

Apply quality improvement principles
Analyze item performance
Retain items that contribute to test score
reliability
Change or discard items that fail to contribute
or negatively affect reliability

31
Item analysis properties

Difficulty
p proportion of students who correctly
responded
Discrimination
rpb correlation between item success and
students test scores

32
Item difficulty
Contribution to Test Score Reliability
1.0
0.0
0.4
0.6
p
33
Item discrimination

Because rpb values are correlations, values
reflect one of three possibilities relative to
reliability
Positive contribution
No contribution
Negative contribution

34
Using item parameters diagnostically

Relative to reliability contribution, item
p values provide magnitude information
rpb values provide magnitude and direction ( or
-) information

35
Using item parameters diagnostically

Difficulty and discrimination properties equally
contribute to reliability
The best items show .30ltplt.70 AND ppbgt.20
The worst items exist at the difficulty extremes
and show zero or negative discrimination

36
After diagnosing an item that shows a weak or
negative reliability contribution

What should we do?
Observe option response frequencies and mean
scores
Identify incorrect responses that attracted
students with test scores equal to or greater
than the average
Replace the offending option with a less
attractive response
Rewrite the stem to clarify ambiguities
OR
Discard the whole item and use a better one the
next time

37
Item analysis software

Iteman
www.assess.com/Software/iteman.htm
examSystem II
www.pearsonncs.com/examsystem/index.htm
LXRTest
www.lxrtest.com/
True Score II
www.nine-patch.com/TSCDL.htm
Excel Templates Free
www.eflclub.com/elvin/publications/2003/itemanalys
is.html

38
Internal resources may be available

There is a good probability a large university
with education, psychology, and/or statistics
departments will have a system available for
scoring items and providing analyses of test
scores and items

39
Reference each cut score to a criterion

Should we define and assess minimal competence
for our program?

40
Cut points

Highly reliable test scores reveal differences
between students abilities and can help
accurately rank order students, which may be
important to employers
However, the program is likely interested in
assessing whether each student is sufficiently
competent to safely and effectively practice
Such assessment concerns typically surface as
students are about to graduate

41
Measuring minimal competence

A program should decide whether it wants to
create one large assessment with a single
compensatory cut point
OR
Should each content domain have its own cut, a
conjunctive model

42
Why are there so many compensatory cut competency
assessments?

If a program selects the more rigorous
conjunctive model, then each component test will
produce its own set of scores, each with its own
reliability
Each component must have a sufficient number of
items or data points to be confident each student
groups test scores will show adequate
reliability
Modules of less than 80-100 program-made items
are unlikely to produce adequate reliability

43
Seek multiple opinions . . . again

Program faculty should define skills competent
practitioners possess
This is a group activity
Each cut point should be linked to a definition
of minimally competent practitioners

44
Performance assessments