Three part colloquium series: - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Three part colloquium series:

Description:

Background: ... cultural groups may have a patronizing tone (i.e., aren't 'their' ways cute) ... Use of item writers from diverse backgrounds ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 47
Provided by: ctay8
Learn more at: https://education.uw.edu
Category:

less

Transcript and Presenter's Notes

Title: Three part colloquium series:


1
Three part colloquium series
  • Can Large Scale Tests be Fair to All Students
    Research on Bias Issues for WASL (November 2)
  • WASL History and Early Research Everything You
    Needed to Know About WASL but Didnt Think to Ask
    (December 1)
  • Classroom-Based Assessments and State Standards
    Implementing Alternatives to Standardized Tests
    (December 11)

2
Can Large-Scale Tests be Fair to All
Students?Bias Issues Related to WASL
  • Catherine S. Taylor
  • University of Washington
  • November 2, 2006

3
Background
  • 10 years experience in test development (1981
    1991) prior to coming to the University of
    Washington
  • Moved to the University of Washington in 1991
    (School Reform Law passed in 1993)
  • Principal Investigator for RD Grant (1994 -
    1995) to support development of prototype
    assessments of the Essential Academic Learning
    Requirements (EALRs)
  • Washington State Technical Advisory Committee for
    Assessment (1995-1999)
  • Principal Investigator for WASL Validity Research
    Grant (2000-2004) to investigate validity of
    WASL scores

4
The focuses of my research
  • How to prepare teachers for effective
    classroom-based assessments
  • Validity theory
  • Validity and large scale testing policy
  • Threats to the validity of large scale tests

5
Focuses of this presentation
  • Study of Bias and Sensitivity Review procedures
    used for WASL (2004)
  • Report of input from two Public Forums on Bias
    and Sensitivity (2004)
  • Yakima
  • Seattle
  • Studies of Differential Item Functioning (AKA
    statistical bias) in WASL test items (1997-2001)

6
What is an Item?
  • An item is a question or set of directions
    (prompt)
  • Multiple-choice item
  • A question or prompt
  • 3-4 answer choices, only one of which is correct
  • Performance item
  • A question or prompt
  • Space in which students construct an answer
  • A rule for assigning points to students answers
  • WASL performance items
  • short answer (0-2 points)
  • Extended response (0-4 points)

7
WASL items are developed using state of the art
procedures
  • Test Specifications define how many and what
    types of items will be on a test
  • Item Specifications define exactly what kinds of
    items will assess each Grade Level Expectation
    (GLE)
  • Item writing overseen by skilled test developers
  • Item reviews check for match to GLEs by teachers
  • Bias and sensitivity reviews by individuals who
    represent the diversity of WA State students

8
WASL test items are tested using state of the
art procedures
  • Item pilots items are randomly assigned to
    students throughout WA State
  • Item data reviews based on students
    performances
  • Statistical difficulty Is the item easy or
    difficult because of content tested NOT some flaw
    in the item?
  • Statistical validity Do high performing students
    do better on the item than low performing
    students?
  • Statistical bias Is item performance related to
    level of knowledge and skill NOT group membership?

9
Study 1 Bias Sensitivity Reviews
  • Committee members represent diversity in the
    student population (regions, ethnicity, gender,
    socio-economic status, religion, special
    population issues)
  • Members review reading passages and items for
  • Implied or overt stereotyping or negative
    representations of any group
  • Too much or too little representation of any
    group
  • Terms that may be confusing to students based on
    language, region, culture, socio-economic status,
    etc.
  • Controversial issues and topics that may affect
    some groups more than others

10
Procedures Used to Observe Bias Sensitivity
Reviews
  • Participant-observer
  • Recorded panelists comments during review process
  • Cross checked records with facilitator notes
  • Looked for patterns in notes/records in relation
    to reading passages and items

11
Results of Bias and Sensitivity Review
Observations
  • Few passages or test items are identified as
    problematic
  • Reading passages present the greatest potential
    for bias
  • Sources of bias in reading passages are subtle

12
Reading passages present the greatest potential
for bias
  • WASL includes
  • narrative and informative passages
  • passages with social studies, science, and
    literary content
  • WASL reading passages are from published sources
  • Authors resist changes to their published writing
    (even when changes lessen bias/stereotyping)

13
Sources of bias in reading passages are subtle
  • Alterations of original narratives
  • Use of legends and folk tales may be altered to
    fit Western notions of literature
  • Language changes can change meaning (first feast
    vs. barbeque)
  • Othering
  • Biographies may focus on how individuals overcame
    or coped with their minority status (Jackie
    Robinson Helen Keller)
  • Informational passages about cultural groups may
    have a patronizing tone (i.e., arent their
    ways cute)
  • Interpretations Items may focus on
    interpretations that are unique to middle class
    values rather than values of the culture of origin

14
Study 2 Bias Sensitivity Forums
  • Two community forums (Yakima and Seattle)
  • Community members came together to discuss
    concerns about WASL
  • Participants included
  • Teachers and school administrators
  • Tribal elders
  • Latino community leaders
  • Parents and community members

15
Procedures used to Gather Data during Bias
Sensitivity Forums
  • Did mock bias sensitivity review
  • Presented methods used for statistical bias
    analysis (also called differential item
    functioning (DIF))
  • Showed items flagged for DIF and asked for likely
    causes
  • Small group discussion with reports to larger
    group
  • Recorded participant ideas about bias issues in
    WASL
  • Examined written notes and chart paper for themes

16
Themes in Participant Comments
  • Need for involvement of minority teachers in all
    stages of WASL development work
  • Need for sensitivity to cultural values in
    selection of reading passages, item content, and
    the types of questions (particularly in reading)
  • Need for inclusion of tribal elders in selection
    of text and contexts for WASL items
  • Need for inclusion of individuals with cultural
    expertise in bias/sensitivity review panels

17
Study 3 Differential Item Functioning (DIF)
Analyses Typical Steps in a DIF Analysis
  • Identify groups to be compared
  • Compute item performance for students in
    different groups at each total test score
  • Summarize the differences in performance across
    all test scores

18
(No Transcript)
19
DIF Can Go Both Ways
  • When individual students get their total scores
    from different items thats normal
  • When there is a pattern in how groups of students
    get their total scores - thats DIF
  • When students in a group do better than expected
    on an item based on their total test score DIF is
    in favor of the group
  • When students in a group do more poorly than
    expected on an item based on their total test
    score, DIF is against the group.

20
Typical Causes of DIF
  • Impact Students from different groups receive
    different educational experiences such that item
    performance differences reflect true differences
    in knowledge/skills.
  • Culture/Background Students from different
    backgrounds bring unique perspectives to bear on
    test items.
  • Flaws Flaws in items that cause one group to
    respond differently than another.

21
Research on DIF for WASL Test Items
  • Studies conducted after items had been
  • reviewed by bias sensitivity committee
  • examined for statistical bias
  • used in an operational test
  • Compared performance of
  • Males and Females
  • White students and Black/African American
    students
  • White students and Latino/Hispanic students
  • White students and Native American students
  • White students and Asian/Pacific Islander students

22
Research on DIF for WASL Test Items
  • Examined test items from
  • 1997, 1998, 1999, 2000, 2001 Grade 4 Reading and
    Mathematics
  • 1998, 1999, 2000, 2001 Grade 7 Reading and
    Mathematics
  • 1999, 2000, 2001 Grade 10 Reading and Mathematics

23
DIF Results for Reading
  • Most reading items showed no statistical bias
  • Reading items flagged for Gender DIF
  • Multiple choice items tend to favor boys
  • Performance items tend to favor girls
  • DIF items favoring boys tend to be related to
    informational passages
  • Reading items flagged for Ethnic DIF
  • Multiple-choice items asking for text
    interpretation tend to favor white students
  • Performance-items asking for text interpretation
    tend to favor minority students
  • Patterns became more extreme across grade levels

24
Mean Number of Reading Items Flagged for DIF
(Males Females)
25
Mean Number of Reading Items Flagged for DIF
(Asian/Pacific Islander White)
26
Mean Number of Reading Items Flagged for DIF
(Black/African White)
27
Mean Number of Reading Items Flagged for DIF
(Native American White)
28
Mean Number of Reading Items Flagged for DIF
(Latino/Hispanic White)
29
Excerpt from a reading passage
  • The best looking fences are often the simplest.
    A simple fence around a beautiful home can be
    like a frame around a picture. The house isnt
    hidden its beauty is enhanced by the frame. But
    a fence can be a massive, ugly thing, too, made
    of bricks and mortar. Sometimes the insignificant
    little fences do their job just as well as the
    ten-foot walls. Maybe its only a string
    stretched between here and there in a field. The
    message is clear dont cross here.
  • Every fence has its own personality and some
    dont have much. There are friendly fences. A
    friendly fence takes kindly to being leaned on.
    There are friendly fences around some
    playgrounds. And some playgrounds fences are more
    fun to play on than anything they surround. There
    are more mean fences than friendly fences
    overall, though. Some have their own built-in
    invitation not to be sat upon. Unfriendly fences
    get it right back sometimes. You seldom see one
    that hasnt been hit, bashed, or bumped or in
    some way broken or knocked down.

30
Example of a Reading an Item that Shows
Statistical Bias in Favor of Focal Groups
  • In the sixth paragraph, the author talks about
    friendly and unfriendly fences. How can you tell
    them apart?
  • _________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    _________
  • Favors Latinos, Blacks/African Americans, and
    Asian/Pacific Islanders

31
Example of a Reading Item that Shows Statistical
Bias in Favor of Focal Groups
  • What is the authors attitude toward fences?
    Give three pieces of evidence from the essay to
    support your point.
  • _________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    __________________________________________________
    _________
  • favors females, Asian/Pacific Islanders, and
    Latinos

32
Example of a Reading Item that Shows Statistical
Bias in Favor of Males and Whites
33
DIF Results for Mathematics
  • Most mathematics items showed no statistical bias
  • Mathematics items flagged for Gender DIF
  • Multiple choice items tend to favor boys
  • Performance items tend to favor girls
  • DIF items favoring boys tend to require simple
    applications of mathematical procedures in
    number, algebra, geometry, and statistics
  • DIF items favoring girls tend to assess data
    analysis, measurement, complex applications,
    reasoning, and problem-solving
  • Number of items flagged for DIF increased across
    grade levels

34
DIF Results for Mathematics
  • Ethnic DIF statistical patterns
  • Performance items were flagged for DIF more often
    than multiple-choice items
  • Slightly more of the flagged performance items
    favored minority students, although differences
    were small

35
DIF Results for Mathematics
  • Content analysis of Mathematics items flagged for
    Ethnic DIF
  • Flagged items favoring Asian/Pacific Islander
    students generally assessed number concepts,
    computation, geometric procedures, algebraic
    procedures, and simple statistics
  • Flagged items favoring Black/African, Native
    American, and Latino/Hispanic students generally
    assessed number, number patterns, computation,
    and logical reasoning
  • Flagged items favoring White students generally
    assessed data analysis, data representation,
    measurement, reasoning, and problem-solving

36
Mean Number of Mathematics Items Flagged for DIF
(Males Females)
37
Mean Number of Mathematics Items Flagged for DIF
(Asian/Pacific Islander White)
38
Mean Number of Mathematics Items Flagged for DIF
(Black/African White)
39
Mean Number of Mathematics Items Flagged for DIF
(Native American White)
40
Mean Number of Mathematics Items Flagged for DIF
(Latino/Hispanic White)
41
Example of a Mathematics Item that Shows
Statistical Bias in Favor of Focal Groups
  • Favor Latinos, Native Americans, Asian/Pacific
    Islanders, Black/African Americans, and Females

42
Example of a Mathematics Item that Shows
Statistical Bias in Favor of Focal Groups
  • Favors Asian/Pacific Islanders

43
Conclusions from DIF Studies
  • Results suggest
  • Exclusive reliance on multiple-choice items for
    reading tests may result in bias against girls
    and minority students particularly when items
    assess interpretation of text
  • Exclusive reliance on multiple-choice items for
    mathematics tests may result in bias against
    girls
  • Ethnic DIF results in mathematics suggest that
    content of instruction differs for students in
    different groups

44
Additional Points
  • Similar results have been found in studies of
    other tests
  • However, these results can only be generalized
    when
  • Items are written in the same way as WASL items
    (structured, not too open-ended)
  • Diverse, appropriate interpretations and problem
    solutions are selected for use to train scorers

45
Can Standardized Tests be Fair to All Students?
  • Yes, under some conditions
  • Use of reading passages that maintain cultural
    characteristics
  • Well developed performance items that present
    clear directions to students
  • Use of item writers from diverse backgrounds
  • Selection of anchor papers and training papers
    that represent diverse, valid responses
  • Cultural experts in bias sensitivity reviews

46
For further information on my WASL research
  • December 1 colloquium for results of research the
    overall validity and reliability of WASL scores
  • December 11 colloquium for discussions of the
    research basis for use of classroom-based
    evidence as an alternative to WASL for high
    school graduation
Write a Comment
User Comments (0)
About PowerShow.com