Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement

About This Presentation

Title:

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement

Description:

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004 – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 21

Provided by: JeffreyM159

Learn more at: http://plaza.ufl.edu

Category:

more less

Transcript and Presenter's Notes

Title: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement

1
Improving Content Validity A Confidence
Interval for Small Sample Expert Agreement

Jeffrey M. Miller Randall D. Penfield
NCME, San Diego
April 13, 2004
University of Florida
millerjm_at_ufl.edu penfield_at_coe.ufl.edu

2
INTRODUCING CONTENT VALIDITY

Validity refers to the degree to which evidence
and theory support the interpretations of test
scores entailed by proposed uses of tests
(AERA/APA/NCME, 1999)
Content validity refers to the degree to which
the content of the items reflects the content
domain of interest (APA, 1954)

3
THE NEED FOR IMPROVED REPORTING
Content is a precursor to drawing a score-based
inference. It is evidence-in-waiting (Shepard,
1993 Yalow Popham, 1983)
Unfortunately, in many technical manuals,
content representation is dealt with in a
paragraph, indicating that selected panels of
subject matter experts (SMEs) reviewed the test
content, or mapped the items to the content
standards(Crocker, 2003)
4
QUANTIFYING CONTENT VALIDITY

Several indices for quantifying expert agreement
have been proposed
The mean rating across raters is often used in
calculations
However, the mean alone does not provide
information regarding its proximity to the
unknown population mean.
We need a usable inferential procedure go gain
insight into the accuracy of the sample mean as
an estimate of the population mean.

5
THE CONFIDENCE INTERVAL

A simple method is to calculate the
traditional Waldconfidence interval
However, this interval is inappropriate for
rating scales.

Too few raters and response categories to assume
population normality has not been violated.
No reason to believe the distribution should be
normal.
The rating scale is bounded with categories that
are discrete.

6
AN ALTERNATIVE IS THE
SCORE CONFIDENCE INTERVAL FOR RATING SCALES

Penfield (2003) demonstrated that the Score
method outperformed the Wald interval especially
when
The number of raters was small (e.g., 10)
The number of categories was small (e.g., 5)

Furthermore, this interval is asymmetric
It is based on the actual distribution for the
mean rating of concern.
Further, the limits cannot extend below or above
the actual limits of the categories.

7
STEPS TO CALCULATING THE SCORE CONFIDENCE INTERVAL

1. Obtain values for n, k, and z
n the number of raters
K the highest possible rating
z the standard normal variate associated with
the confidence level (e.g., /- 1.96 at 95
confidence)

2. Calculate the mean item ratingThe sum of
the ratings for an item divided by the number of
raters

3. Calculate p p
Or if scale begins with 1 then
p

4. Use p to calculate the upper and lower limits
for a confidence interval for population
proportion (Wilson, 1927)

5. Calculate the upper and lower limits of the
Score confidence intervalfor the population mean
rating

Shorthand Example
Item 3 ? 8
The content of this item represents the ability
to add single-digit numbers.
1 2 3 4
Strongly Disagree Disagree
Agree Strongly Agree
Suppose the expert review session includes 10
raters.
The responses are 3, 3, 3, 3, 3, 3, 3, 3, 3, 4

Shorthand Example
n 10
k 4
z 1.96
the sum of the items 31
31/10 3.10
p so,
p 31 / (104) 0.775

Shorthand Example (cont.)
(65.842 11.042) / 87.683 0.625
(65.842 11.042) / 87.683 0.877

Shorthand Example (cont.)
3.100 1.96sqrt(0.938/10) 2.500
3.100 1.96sqrt(0.421/10) 3.507

We are 95 confident that the population mean
rating falls somewhere between 2.500 and 3.507
17

Content Validation
Method 1 Retain only items with a Score interval
of a particular width based on
A priori determination of appropriateness
An empirical standard (25th and 75th percentiles
of all widths)
2. Method 2 Retain items based on hypothesis
test that the lower limit is above a particular
value

EXAMPLE WITH 4 ITEMS

Conclusions
Score method provides a confidence interval that
is not dependent on the normality assumption
Outperforms the Wald interval when the number of
raters and scale categories is small
Provides a decision-making method for the fate of
items in expert review sessions.
Computational complexity can be eased through
simple programming in Excel, SPSS, and SAS

For further reading,
Penfield, R. D. (2003). A score method for
constructing asymmetric confidence intervals for
the mean of a rating scale item. Psychological
Methods, 8, 149-163.
Penfield, R. D., Miller, J. M. (in press).
Improving content validation studies using an
asymmetric confidence interval for the mean of
expert ratings. Applied Measurement in Education.

Write a Comment

User Comments (0)

About PowerShow.com

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement - PowerPoint PPT Presentation

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004 – PowerPoint PPT presentation