A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency

Description:

Title: STANAG 6001 and the Common European Framework of Reference for Languages: Learning, teaching, assessment Last modified by: Elvira Swender – PowerPoint PPT presentation

Number of Views:196

Avg rating:3.0/5.0

Slides: 49

Provided by: nato152

Learn more at: https://www.natobilc.org

Category:

more less

Transcript and Presenter's Notes

Title: A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency

1
A Tale of Two TestsSTANAG and CEFRComparing
the Results of side-by-side testing of reading
proficiency

BILC Conference
May 2010
Istanbul, Turkey
Dr. Elvira Swender, ACTFL

2
With apologies to the author
3
With apologies to the author
We had a Dickens of a time with this
study.
4
Overview

Two systems STANAG and CEFR
Two tests of reading proficiency
BAT-Reading
Leipzig Test of Reading Proficiency (LTRP)
The side-by-side study
Observations
Questions

5
Two Systems
6
Why is there a need to relate STANAG and CEFR?

To recognize linguistic abilities of military
personnel in civilian society
To provide a framework to military institutions
in nation states operating STANAG qualifications
who need to equate them with CEFR for the purpose
of gaining civilian recognition of military
qualifications
To provide guidance to employers, trainers,
non-language experts on how to interpret/evaluate
CEFR qualifications
To identify competence gaps thereby determine
whether an individual is capable of undertaking a
job requiring a given SLP
To allow informed decisions to be made on
appropriate linguistic competence

7
Birds of a Feather
8
Broad Questions?

Can the two systems be compared?
Are the two systems related?
Can the two systems be aligned?
Can the two systems be equated?

9
Comparing CEFR and STANAGSimilarities

Feature CEFR STANAG

Describe language abilities on a scale from
little or no ability to that of a highly
articulate speaker
A1, A2, B1, B2, C1, C2
0, 1, 1, 2, 2, 3, 3, 4, 4, 5
Criterion referenced
Address speaking, listening, reading, and writing
Contain can-do statements
Describe tasks (functions), contexts, and
expectations for accuracy
All criteria, some of the time
All criteria, all of the time
10
A Summary of the Major Contrasts
CEFR STANAG

The primary purpose is to check learners
progress in developing communicative competence
within a specific course of study.

The primary purpose is to test individuals
general proficiency across a wide range of topics
regardless of their course of study.

The primary users of the information are
teachers and administrators, employers.

The primary users of the information are the
teachers and students.

By design, STANAG is under-specified for
measuring step-by-step progress within a specific
curriculum.

By design, the CEFR is under-specified for
testing of general, real-world proficiency.

11
About this Study

University of Leipzig
April 19-23, 2010
Proctored on-line tests in computer lab
Goal was to involve five groups with 20
participants each
Levels A1, A2, B1, B2, C1 according to course
enrolled
Split test design
half of the participants in each group took the
BAT-R test first, the other half took the RPT-E
first
Tests taken on different days
2 to 3 days apart depending on group
90 minutes per test

12
Characteristics of Participants

Gender
Female 65 Male 35
Age
Average 25 (Range 19-63)
First language
German (85)
Arabic, Russian, Polish, Brazilian, Chinese, Thai
Mean of years of English study in school
German students 8.7 years
Foreign students 5.1 years
Enrolled in 1 of 5 different levels
English Language Institute to English teacher
trainees

13
BAT Reading Test

Test of English reading proficiency
Advisory scores for calibrating national
proficiency tests
STANAG 6001 (version 3), Levels 1,2,3
Internet-delivered and computer scored
Developed by BILC Test Working Group
Delivered by ACTFL

14
Format

Criterion-referenced tests
Allow for direct application of the STANAG
Proficiency Scale
Texts and tasks are aligned by level
Each proficiency level is tested separately
Test takers take all items for Levels 1,2,3
20 texts at each level
One item with 4 multiple choice responses per text

15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Scoring Criteria

The proficiency rating is assigned based on two
separate scores
Floor sustained ability across a range of
tasks and contexts specific to one level
Ceiling non-sustained ability at the next
higher proficiency level
Must show mastery at a level to be assigned
that level
Non-compensatory scoring
Performance at the next higher level provides
evidence of random, emerging, or developing
proficiency at the next higher level.
Developing proficiency at the next higher level
indicates a rating.

22
Leipzig Test of Reading Proficiency

Test of English reading proficiency for entering
and exiting students at universities in the state
of Saxony/Germany
To determine proficiency levels from A1 to C1
according to the CEFR
For placement and certification purposes
Entrance and exit requirements in all subjects
Developed by the University of Leipzig under a
grant from the state of Saxony

23
Format

5 texts with 3 questions each per level
15 items per level
Multiple choice questions
one correct answer and three distracters
Entire Series of tests
Combine 2 or 3 adjoining levels
A1-B1 or B1-B2 or B1-C1
Version of the test used in this study
B1-C1

24
Level A1

5 texts 60-100 words each
Major tasks and functions
Topic recognition and comprehension of simple
single facts
Content
Basic personal and social needs
Text type
Very short, simple straight-forward texts notes,
post cards, simple instructions and directions
3 MC questions per text
Global, selective, detail

25
Screen shot of A1 item

to come (requestedfrom Helen)

26
Level C1

5 texts 200-300 words each
Major tasks and functions
Complex information processing including
inferences, hypotheses, and nuances
Content
Academic, professional, and literary material
Text type
Op/ed pieces, analyses and commentaries, detailed
technical reports, literary texts
3 MC questions per text
global, detail, inference

27
(No Transcript)
28
Scoring Criteria

Total number of points
Rate highest levels that have a combined total of
at least 18 points with the lower level with at
least 11 points (70)
18-24 points (60-80) lower level
25-30 points (81-100) higher level

29
Findings
30
A1 A2 B1 B2 C1 TOTAL
0 1 1
1 2 4 1 7
1 4 6 10
2 1 16 6 3 26
2 6 1 7
3 5 10 15
TOTAL 3 9 23 17 14 66
31
Scatter Plot of Total Raw Scores
BAT-R Total Score
LTRP Total Score
(Correlation of Total Raw Scores r .905, p lt
.001)
32
With the current data, one could say

At the lowest and highest ends of the scales
there is alignment
No one who was rated 1 was also rated B2 or C1
No one who was rated 3 was rated A1, A2, or B1.
The middle ranges are where there is the least
amount of alignment
A BAT-R 2 can be anything from A2 to C1

33
A1 A2 B1 B2 C1 TOTAL
0 1 1
1 2 4 1 7
1 4 6 10
2 1 16 6 3 26
2 6 1 7
3 5 10 15
TOTAL 3 9 23 17 14 66
34
With the current data, one could say

BAT-R LTRP
0 0 or A1
1 A1 or A2, (Mostly A2)
1 A2 or B1 (Mostly B1)
2 A2, B1, B2, or C1 (Mostly B1)
2 B2 or C1 (Mostly B2)
3 B2 or C1 (Mostly C1)

35
With the current data, one could say

LTRP BAT-R
A1 0 or 1 (Mostly 1)
A2 1, 1 or 2 (Mostly 1)
B1 1 or 2 (Mostly 2)
B2 2, 2 or 3 (Mostly 2)
C1 2, 2 or 3 (Mostly 3)

36
Estimated Probability
Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating Estimated Probability of a BAT-R Rating Based on LTRP Rating
BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating BAT-R Rating
LTRP Rating 0 1 1 2 2 3
LTRP Rating 0 0.93 0.07 . . . .
LTRP Rating A1 0.30 0.67 0.03 . . .
LTRP Rating A2 0.01 0.49 0.40 0.09 . .
LTRP Rating B1 . 0.03 0.21 0.74 0.01 0.01
LTRP Rating B2 . . 0.01 0.57 0.23 0.18
LTRP Rating C1 . . . 0.04 0.08 0.88
Shaded values are highest probability on the row. Shaded values are highest probability on the row. Shaded values are highest probability on the row. Shaded values are highest probability on the row.
37
What is the probability?

That a BAT-R 2 is also a LTRP
A2 9
B1 74
B2 57
C1 5

38
What is the probability?

That a BAT-R 3 is also an LTRP
B1 9
B2 18
C1 88

39
What is the probability?

That a LTRP B1 is also a BAT-R
1 3
1 21
2 74
2 1
3 1

40
What is the probability?

That a LTRP B2 is also a BAT-R
1 1
2 57
2 23
3 18

41
Answering the Broad Questions
Can the two systems be compared?
YES
Are the two systems related?
YES
Can the two systems be aligned?
Somewhat
Can the two systems be equated?
Probably not
42
Heat Chart
STANAG 6001
CEFR
43
When comparing testing systems

Ask about the purpose of the test
Placement, progress, prove a level, etc.
Ask about what the test is testing
Is it a test of achievement, performance,
proficiency?
Does it test spontaneous abilities or rehearsed
performance?
Ask about how the test scores are determined
Non-compensatory
prove a floor and ceiling
Total points
Ask if research exists

44
Answers from a CEFR Expert

CEFR is not one system. It is NOT intended to be
used to transfer scores from one country to the
next or from one language to another but rather
to set a framework within which educators can
build curricula.
Not a harmonisation project
Alignment is problematic because we do not know
what we are aligning. Not a matter of alignment
or equivalency but a matter of relationship
The scale is an origin for comparison. The scale
functions as exemplars and activities. The scale
is a meta-framework for learning and teaching.
Conversation with Nick Saville,
Cambridge, England
April 15, 2010