Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement - PowerPoint PPT Presentation

About This Presentation
Title:

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement

Description:

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004 – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 21
Provided by: JeffreyM159
Learn more at: http://plaza.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement


1
Improving Content Validity A Confidence
Interval for Small Sample Expert Agreement
  • Jeffrey M. Miller Randall D. Penfield
  • NCME, San Diego
  • April 13, 2004
  • University of Florida
  • millerjm_at_ufl.edu penfield_at_coe.ufl.edu

2
INTRODUCING CONTENT VALIDITY
  • Validity refers to the degree to which evidence
    and theory support the interpretations of test
    scores entailed by proposed uses of tests
    (AERA/APA/NCME, 1999)
  • Content validity refers to the degree to which
    the content of the items reflects the content
    domain of interest (APA, 1954)

3
THE NEED FOR IMPROVED REPORTING
Content is a precursor to drawing a score-based
inference. It is evidence-in-waiting (Shepard,
1993 Yalow Popham, 1983)
Unfortunately, in many technical manuals,
content representation is dealt with in a
paragraph, indicating that selected panels of
subject matter experts (SMEs) reviewed the test
content, or mapped the items to the content
standards(Crocker, 2003)
4
QUANTIFYING CONTENT VALIDITY
  • Several indices for quantifying expert agreement
    have been proposed
  • The mean rating across raters is often used in
    calculations
  • However, the mean alone does not provide
    information regarding its proximity to the
    unknown population mean.
  • We need a usable inferential procedure go gain
    insight into the accuracy of the sample mean as
    an estimate of the population mean.

5
THE CONFIDENCE INTERVAL
  • A simple method is to calculate the
    traditional Waldconfidence interval
  • However, this interval is inappropriate for
    rating scales.
  1. Too few raters and response categories to assume
    population normality has not been violated.
  2. No reason to believe the distribution should be
    normal.
  3. The rating scale is bounded with categories that
    are discrete.

6
AN ALTERNATIVE IS THE
SCORE CONFIDENCE INTERVAL FOR RATING SCALES
  • Penfield (2003) demonstrated that the Score
    method outperformed the Wald interval especially
    when
  • The number of raters was small (e.g., 10)
  • The number of categories was small (e.g., 5)
  • Furthermore, this interval is asymmetric
  • It is based on the actual distribution for the
    mean rating of concern.
  • Further, the limits cannot extend below or above
    the actual limits of the categories.

7
STEPS TO CALCULATING THE SCORE CONFIDENCE INTERVAL
  • 1. Obtain values for n, k, and z
  • n the number of raters
  • K the highest possible rating
  • z the standard normal variate associated with
    the confidence level (e.g., /- 1.96 at 95
    confidence)

8
  • 2. Calculate the mean item ratingThe sum of
    the ratings for an item divided by the number of
    raters

9
  • 3. Calculate p p
  • Or if scale begins with 1 then
  • p

10
  • 4. Use p to calculate the upper and lower limits
    for a confidence interval for population
    proportion (Wilson, 1927)

11
  • 5. Calculate the upper and lower limits of the
    Score confidence intervalfor the population mean
    rating

12
  • Shorthand Example
  • Item 3 ? 8
  • The content of this item represents the ability
    to add single-digit numbers.
  • 1 2 3 4
  • Strongly Disagree Disagree
    Agree Strongly Agree
  • Suppose the expert review session includes 10
    raters.
  • The responses are 3, 3, 3, 3, 3, 3, 3, 3, 3, 4

13
  • Shorthand Example
  • n 10
  • k 4
  • z 1.96
  • the sum of the items 31
  • 31/10 3.10
  • p so,
    p 31 / (104) 0.775

14
  • Shorthand Example (cont.)
  • (65.842 11.042) / 87.683 0.625
  • (65.842 11.042) / 87.683 0.877

15
  • Shorthand Example (cont.)
  • 3.100 1.96sqrt(0.938/10) 2.500
  • 3.100 1.96sqrt(0.421/10) 3.507

16

We are 95 confident that the population mean
rating falls somewhere between 2.500 and 3.507
17
  • Content Validation
  • Method 1 Retain only items with a Score interval
    of a particular width based on
  • A priori determination of appropriateness
  • An empirical standard (25th and 75th percentiles
    of all widths)
  • 2. Method 2 Retain items based on hypothesis
    test that the lower limit is above a particular
    value

18
     
  • EXAMPLE WITH 4 ITEMS

     
19
  • Conclusions
  • Score method provides a confidence interval that
    is not dependent on the normality assumption
  • Outperforms the Wald interval when the number of
    raters and scale categories is small
  • Provides a decision-making method for the fate of
    items in expert review sessions.
  • Computational complexity can be eased through
    simple programming in Excel, SPSS, and SAS

20
  • For further reading,
  • Penfield, R. D. (2003). A score method for
    constructing asymmetric confidence intervals for
    the mean of a rating scale item. Psychological
    Methods, 8, 149-163.
  • Penfield, R. D., Miller, J. M. (in press).
    Improving content validation studies using an
    asymmetric confidence interval for the mean of
    expert ratings. Applied Measurement in Education.
Write a Comment
User Comments (0)
About PowerShow.com