A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency

Description:

Title: STANAG 6001 and the Common European Framework of Reference for Languages: Learning, teaching, assessment Last modified by: Elvira Swender – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 49
Provided by: nato152
Learn more at: https://www.natobilc.org
Category:

less

Transcript and Presenter's Notes

Title: A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency


1
A Tale of Two TestsSTANAG and CEFRComparing
the Results of side-by-side testing of reading
proficiency
  • BILC Conference
  • May 2010
  • Istanbul, Turkey
  • Dr. Elvira Swender, ACTFL

2
With apologies to the author
3
With apologies to the author
We had a Dickens of a time with this
study.
4
Overview
  • Two systems STANAG and CEFR
  • Two tests of reading proficiency
  • BAT-Reading
  • Leipzig Test of Reading Proficiency (LTRP)
  • The side-by-side study
  • Observations
  • Questions

5
Two Systems
6
Why is there a need to relate STANAG and CEFR?
  • To recognize linguistic abilities of military
    personnel in civilian society
  • To provide a framework to military institutions
    in nation states operating STANAG qualifications
    who need to equate them with CEFR for the purpose
    of gaining civilian recognition of military
    qualifications
  • To provide guidance to employers, trainers,
    non-language experts on how to interpret/evaluate
    CEFR qualifications
  • To identify competence gaps thereby determine
    whether an individual is capable of undertaking a
    job requiring a given SLP
  • To allow informed decisions to be made on
    appropriate linguistic competence

7
Birds of a Feather
8
Broad Questions?
  • Can the two systems be compared?
  • Are the two systems related?
  • Can the two systems be aligned?
  • Can the two systems be equated?

9
Comparing CEFR and STANAGSimilarities
  • Feature CEFR STANAG

Describe language abilities on a scale from
little or no ability to that of a highly
articulate speaker
A1, A2, B1, B2, C1, C2
0, 1, 1, 2, 2, 3, 3, 4, 4, 5
Criterion referenced
Address speaking, listening, reading, and writing
Contain can-do statements
Describe tasks (functions), contexts, and
expectations for accuracy
All criteria, some of the time
All criteria, all of the time
10
A Summary of the Major Contrasts
CEFR STANAG
  • The primary purpose is to check learners
    progress in developing communicative competence
    within a specific course of study.
  • The primary purpose is to test individuals
    general proficiency across a wide range of topics
    regardless of their course of study.
  • The primary users of the information are
    teachers and administrators, employers.
  • The primary users of the information are the
    teachers and students.
  • By design, STANAG is under-specified for
    measuring step-by-step progress within a specific
    curriculum.
  • By design, the CEFR is under-specified for
    testing of general, real-world proficiency.

11
About this Study
  • University of Leipzig
  • April 19-23, 2010
  • Proctored on-line tests in computer lab
  • Goal was to involve five groups with 20
    participants each
  • Levels A1, A2, B1, B2, C1 according to course
    enrolled
  • Split test design
  • half of the participants in each group took the
    BAT-R test first, the other half took the RPT-E
    first
  • Tests taken on different days
  • 2 to 3 days apart depending on group
  • 90 minutes per test

12
Characteristics of Participants
  • Gender
  • Female 65 Male 35
  • Age
  • Average 25 (Range 19-63)
  • First language
  • German (85)
  • Arabic, Russian, Polish, Brazilian, Chinese, Thai
  • Mean of years of English study in school
  • German students 8.7 years
  • Foreign students 5.1 years
  • Enrolled in 1 of 5 different levels
  • English Language Institute to English teacher
    trainees

13
BAT Reading Test
  • Test of English reading proficiency
  • Advisory scores for calibrating national
    proficiency tests
  • STANAG 6001 (version 3), Levels 1,2,3
  • Internet-delivered and computer scored
  • Developed by BILC Test Working Group
  • Delivered by ACTFL

14
Format
  • Criterion-referenced tests
  • Allow for direct application of the STANAG
    Proficiency Scale
  • Texts and tasks are aligned by level
  • Each proficiency level is tested separately
  • Test takers take all items for Levels 1,2,3
  • 20 texts at each level
  • One item with 4 multiple choice responses per text

15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Scoring Criteria
  • The proficiency rating is assigned based on two
    separate scores
  • Floor sustained ability across a range of
    tasks and contexts specific to one level
  • Ceiling non-sustained ability at the next
    higher proficiency level
  • Must show mastery at a level to be assigned
    that level
  • Non-compensatory scoring
  • Performance at the next higher level provides
    evidence of random, emerging, or developing
    proficiency at the next higher level.
  • Developing proficiency at the next higher level
    indicates a rating.

22
Leipzig Test of Reading Proficiency
  • Test of English reading proficiency for entering
    and exiting students at universities in the state
    of Saxony/Germany
  • To determine proficiency levels from A1 to C1
    according to the CEFR
  • For placement and certification purposes
  • Entrance and exit requirements in all subjects
  • Developed by the University of Leipzig under a
    grant from the state of Saxony

23
Format
  • 5 texts with 3 questions each per level
  • 15 items per level
  • Multiple choice questions
  • one correct answer and three distracters
  • Entire Series of tests
  • Combine 2 or 3 adjoining levels
  • A1-B1 or B1-B2 or B1-C1
  • Version of the test used in this study
  • B1-C1

24
Level A1
  • 5 texts 60-100 words each
  • Major tasks and functions
  • Topic recognition and comprehension of simple
    single facts
  • Content
  • Basic personal and social needs
  • Text type
  • Very short, simple straight-forward texts notes,
    post cards, simple instructions and directions
  • 3 MC questions per text
  • Global, selective, detail

25
Screen shot of A1 item
  • to come (requestedfrom Helen)

26
Level C1
  • 5 texts 200-300 words each
  • Major tasks and functions
  • Complex information processing including
    inferences, hypotheses, and nuances
  • Content
  • Academic, professional, and literary material
  • Text type
  • Op/ed pieces, analyses and commentaries, detailed
    technical reports, literary texts
  • 3 MC questions per text
  • global, detail, inference

27
(No Transcript)
28
Scoring Criteria
  • Total number of points
  • Rate highest levels that have a combined total of
    at least 18 points with the lower level with at
    least 11 points (70)
  • 18-24 points (60-80) lower level
  • 25-30 points (81-100) higher level

29
Findings
30
A1 A2 B1 B2 C1 TOTAL
0 1 1
1 2 4 1 7
1 4 6 10
2 1 16 6 3 26
2 6 1 7
3 5 10 15
TOTAL 3 9 23 17 14 66
31
Scatter Plot of Total Raw Scores
BAT-R Total Score
LTRP Total Score
(Correlation of Total Raw Scores r .905, p lt
.001)
32
With the current data, one could say
  • At the lowest and highest ends of the scales
    there is alignment
  • No one who was rated 1 was also rated B2 or C1
  • No one who was rated 3 was rated A1, A2, or B1.
  • The middle ranges are where there is the least
    amount of alignment
  • A BAT-R 2 can be anything from A2 to C1

33
A1 A2 B1 B2 C1 TOTAL
0 1 1
1 2 4 1 7
1 4 6 10
2 1 16 6 3 26
2 6 1 7
3 5 10 15
TOTAL 3 9 23 17 14 66
34
With the current data, one could say
  • BAT-R LTRP
  • 0 0 or A1
  • 1 A1 or A2, (Mostly A2)
  • 1 A2 or B1 (Mostly B1)
  • 2 A2, B1, B2, or C1 (Mostly B1)
  • 2 B2 or C1 (Mostly B2)
  • 3 B2 or C1 (Mostly C1)

35
With the current data, one could say
  • LTRP BAT-R
  • A1 0 or 1 (Mostly 1)
  • A2 1, 1 or 2 (Mostly 1)
  • B1 1 or 2 (Mostly 2)
  • B2 2, 2 or 3 (Mostly 2)
  • C1 2, 2 or 3 (Mostly 3)

36
Estimated Probability
Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating
BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating
LTRP Rating   0 1 1 2 2 3
LTRP Rating 0 0.93 0.07 . . . .
LTRP Rating A1 0.30 0.67 0.03 . . .
LTRP Rating A2 0.01 0.49 0.40 0.09 . .
LTRP Rating B1 . 0.03 0.21 0.74 0.01 0.01
LTRP Rating B2 . . 0.01 0.57 0.23 0.18
LTRP Rating C1 . . . 0.04 0.08 0.88
Shaded values are highest probability on the row. Shaded values are highest probability on the row. Shaded values are highest probability on the row. Shaded values are highest probability on the row.
37
What is the probability?
  • That a BAT-R 2 is also a LTRP
  • A2 9
  • B1 74
  • B2 57
  • C1 5

38
What is the probability?
  • That a BAT-R 3 is also an LTRP
  • B1 9
  • B2 18
  • C1 88

39
What is the probability?
  • That a LTRP B1 is also a BAT-R
  • 1 3
  • 1 21
  • 2 74
  • 2 1
  • 3 1

40
What is the probability?
  • That a LTRP B2 is also a BAT-R
  • 1 1
  • 2 57
  • 2 23
  • 3 18

41
Answering the Broad Questions
Can the two systems be compared?
YES
Are the two systems related?
YES
Can the two systems be aligned?
Somewhat
Can the two systems be equated?
Probably not
42
Heat Chart
STANAG 6001
CEFR
43
When comparing testing systems
  • Ask about the purpose of the test
  • Placement, progress, prove a level, etc.
  • Ask about what the test is testing 
  • Is it a test of achievement, performance,
    proficiency? 
  • Does it test spontaneous abilities or rehearsed
    performance? 
  • Ask about how the test scores are determined
  • Non-compensatory
  • prove a floor and ceiling
  • Total points
  • Ask if research exists

44
Answers from a CEFR Expert
  • CEFR is not one system. It is NOT intended to be
    used to transfer scores from one country to the
    next or from one language to another but rather
    to set a framework within which educators can
    build curricula.
  • Not a harmonisation project
  • Alignment is problematic because we do not know
    what we are aligning. Not a matter of alignment
    or equivalency but a matter of relationship
  • The scale is an origin for comparison. The scale
    functions as exemplars and activities. The scale
    is a meta-framework for learning and teaching.
  • Conversation with Nick Saville,
  • Cambridge, England
  • April 15, 2010

45
In Closing
  • It is a far, far better thing that we do than we
    have ever done
  • to know how to use test scores.

46
Questions?
  • Contact eswender_at_actfl.org

47
Extra slides
48
Crosstabulation of Test Results
Write a Comment
User Comments (0)
About PowerShow.com