Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias

Description:

Center for the Study of Evaluation. National Center for Research on Evaluation, ... A survey inquired about raters' perceptions regarding bias in rating students ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 22
Provided by: EvaB2
Category:

less

Transcript and Presenter's Notes

Title: Rating Performance Assessments of Students With and Without Disabilities: A Generalizability Study of Teacher Bias


1
Rating Performance Assessments of Students With
and Without Disabilities A Generalizability
Study of Teacher Bias
Jose-Felipe Martinez-Fernandez Ann M.
Mastergeorge
UCLA Graduate School of Education Information
StudiesCenter for the Study of
EvaluationNational Center for Research on
Evaluation, Standards, and Student Testing
American Educational Research Association New
Orleans April 1-5, 2001
2
Introduction
  • Performance assessments are increasingly popular
    methods for the evaluation of academic
    performance.
  • A number of studies have shown that well trained
    raters can be reliable scorers of performance
    assessments for the general population of
    students.
  • This study addressed whether any bias exists from
    trained raters when scoring performance
    assessments of students with disabilities.

3
Purpose
  • Compare the sources of score variability for
    students with and without disabilities in
    Language Arts and Mathematics performance
    assessments.
  • Determine if important differences exist across
    student groups in terms of variance components,
    and if so whether rater (teacher) bias plays a
    role.
  • Complement results with raters perceptions on
    bias (their own and others).

4
Method
  • Student and Rater samples come from a larger
    district-wide validation study involving
    thousands of performance assessments.
  • Teachers from each grade and content area were
    trained as Raters.
  • A total of 6 studies (each with different raters
    and students) were performed for 3rd , 7th and
    9th grade assessments in Language Arts and
    Mathematics.

5
Method (continued)
  • For each study, 60 assessments (30 from regular
    education students and 30 from students who
    received some kind of accommodation) were rated
    by 4 raters in two occasions.
  • Raters were aware of each students disability
    status only in the 2nd rating occasion. Bias is
    defined as systematic differences in the scores
    across occasions.
  • No practice or memory effects expected.
  • Score scale ranges from 1 to 4.

6
Method (continued)
  • Two kinds of Generalizability designs First a
    nested-within-disability design with all 60
    students P(D) x R x O.
  • Second, separate fully crossed P x R x O
    designs for each disability group of 30 students.
  • Math assessments consisted of two tasks. Both a
    random P x R x O x T design and a fixed P x R
    x O design averaging over tasks were used.
  • A survey inquired about raters perceptions
    regarding bias in rating students with
    disabilities (their own and other raters).

7
Score Distributions
8
Generalizability Results Nested Design
Language Arts ScoreRater x Occasion x Person
(Disability)
9
Generalizability Results (continued)Nested
Design Mathematics ScoreTask x Rater x
Occasion x Person (Disability)
10
Generalizability Results (continued)Crossed
Design by Disability Language Arts ScoreRater
x Occasion x Person
 
 
11
Generalizability Results (continued)Crossed
Design by Disability Mathematics ScoreTask x
Rater x Occasion x Person
 
 
 
 
12
Generalizability Results (continued)Crossed
Design by Disability Mathematics with Task
facet fixed ScorePerson x Rater x Occasion,
averaging over the two tasks
13
Rater Survey Rater Perceptions ( plt.01. N40)
14
Rater Survey (continued)Mean Score of Raters on
Self and Others Regarding Fairness and Bias on
Scoring
15
Discussion
  • Variance Components
  • Person (P) component is always the largest (50
    to 70 of variance across designs). However
    there still exists a good amount of measurement
    error (triple interaction, ignored facets).
  • Some differences exist between regular education
    and disability groups in terms of variance
    components

16
Discussion (continued)
  • Differences between groups
  • Total amount of variance is always less in the
    disability groups (more skewed distribution).
  • Variance due to persons (P) and therefore
    Dependability coefficients are lower for the
    disability group in Language Arts. This is also
    true in Mathematics if we use a fixed averaged
    task facet, but not with two random tasks.

17
Discussion (continued)
  • Rater Bias
  • No Rater (R) main effects. No leniency
    differences across raters.
  • No rating occasion (O) effect. Overall there
    is no bias introduced by rater knowledge of
    disability status.
  • No rater interactions with tasks or occasions.

18
Discussion (continued)
  • However, there is a non-negligible Person by
    Rater (PxR) interaction which is considerably
    larger for disability students.
  • This does not necessarily constitute bias but can
    still compromise validity of scores for
    accommodated students.
  • Are features in papers from students with
    disabilities differentially salient to different
    raters?

19
Discussion (continued)
  • There is a Large Person by Task (PxT) interaction
    in Math, but it is considerably smaller for
    students with disabilities
  • Disability students may not be as aware of the
    different nature of the tasks so that this
    somehow natural interaction (Miller Linn, 2000
    and others) would show.
  • Accommodations may not be having the intended
    leveling effects.
  • With a random task facet the lower PxT
    interaction increases reliability for
    disability students.

20
Discussion (continued)
  • From Rater Survey
  • Teachers believe that there is a certain bias and
    unfairness from raters when scoring performance
    assessments from students with disabilities.
  • Raters see themselves as more fair and unbiased
    than the general population of raters.
  • Whether this is due to training, or to initially
    high self-perceptions is not clear. A not
    uncommon Im great but others arent as much
    kind of effect could be the sole reason.

21
Future Directions and Questions
  • Are there different patterns for different kinds
    of disabilities/accommodations?
  • Are accommodations being used appropriately and
    having the intended effects?
  • Do patterns hold for raters at the local school
    sites who in general receive less training?
  • Does rater background influence the size and
    nature of these effects and interactions?
  • How does the testing occasion facet influence
    variance components/other interactions?
Write a Comment
User Comments (0)
About PowerShow.com