Dvid Gergely: Building a Case for Euro Examinations or A case study

About This Presentation

Title:

Dvid Gergely: Building a Case for Euro Examinations or A case study

Description:

The Mission of the Study. Initial decisions by Euro. Management decision to select GramVoc only ... Using standard setting data from the Standardization phase ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 18

Provided by: CETT1

Category:

more less

Transcript and Presenter's Notes

Title: Dvid Gergely: Building a Case for Euro Examinations or A case study

1
Dávid GergelyBuilding a Case for Euro
Examinations or A case study
2

The Mission of the Study

Piloting the Manual and seeing how good the
methodology of linking is.
Getting initial measures for items and tasks
calibrated to the CEF
Establishing a link for Euro examinations with
the CEF.
In sum, validate the test by following the
methodology outlined in the Manual. Build a case
for the CEF link by collecting validity evidence.

3
Initial decisions by Euro

Management decision to select GramVoc only
a question of finances
North The most difficult task you could pick
Unpopular kind of test
The Dutch CEF Construct project focused on
reading and listening
ALTE produced grids for speaking and listening
Any CEF scales relevant to the GramVoc paper?

4
Productive orientation of CEF

General Linguistic Range B2
Has a sufficient range of language to be able to
give clear descriptions, express viewpoints and
develop arguments without much conspicuous
searching for words, using some complex sentence
forms to do so.

5
In retrospect

Advantages of selecting the GramVoc paper
The knowledge of language underlies all other
skills in the examination
GramVoc project as pilot for the rest of the Euro
papers. As part of the efforts of the Hungarian
Accreditation Board, Euro Examinations will do
level setting exercises to all skills-based Euro
papers.

6
Process and Audience for the Case Study

Four phases of action according to the Manual
Familiarization
Specification
Standardization (of judgements)
Empirical validation
Working with the team of full-time item writers
as holders of standards

7
The Familiarization Phase 1

Survey of familiarity with the CEF scales
Descriptors from 15 scales, 133 items, as in a
test
Statistical analyses
Initial facility value of responses 0.4
Low? How low?
16/133 descriptors nobody got the level right.
Significantly more B1 descriptors.
No descriptor -- same descriptor problem

8
The Familiarization Phase 2

Insights from categorizing descriptors
No correct identification of level, spread of
responses 16
lt50 of team correctly identified level 55
50 correctly identified level 62
In cases uncertainty, tendency to place level of
descriptor higher than in CEF. Lower Euro
standards? Leniency?
Chi-squares Leniency not related to any of the
scales, but it is to level B2.

9
The Specification Phasea qualitative content
audit

Lack of yardsticks for a test like GramVoc
Van Ek and Trim volumes not useful.
CEF provides description of 15 categories, but
without level specification pp. (108-117).
Euro specifications need attention.
Two lines of work
Elucidating item-writers concepts
Expert analysis of what (item focuses) actually
goes into the test on the basis of the scope, the
gradation and stability between 2 consecutive
test administrations.

10
Specification Phase 2Elucidating item-writers
concepts

Item-writers conceptualisations of levels
coherent? In line with CEF? In line with Euro
specifications?
Item writers select best task for each task
type and level. Answer What is it that makes
this task the best for you?
Series of workshops to bring item-writers
conceptualisations to light.

11
Specification Phase 3Expert analysis of item
focuses

Evidence of construct under-representation?
Anything else measured, other than the construct?
Items to generate construct-irrelevant variance?
2 experts identify item focuses, then jointly
finalize classification of items acc. to 15 CEF
categories.
Predict problematic items.

12
Results Specification Phase

Item-writers concepts broadly match CEF. Better
overall results than in familiarization phase.
Statistical analysis of expert classifications
Distribution of focuses related to task type and
author (text), but not related to level and
administration.
Results similar when two administrations at the
same level were compared.
Lack of significant focus differences by level
prompted investigation of item complexity.
Statistical test inconclusive p 0.05

13
The Standardization of Judgements Line 1

Investigating the gap between Local Euro
standards and the CEF standards
Item-writers identified descriptors on the basis
of collations the content of which exceeded local
standards
Tabulation and qualitative analysis of responses.
History of descriptors taken into account.
The gap does not widen up the CEF scale. Most
conspicuous at B2, but less considerable if
descriptor history is accounted for.
Why do the uncalibrated descriptors represent a
higher level of requirements than those that went
through it?

14
Standardization of Judgements Line 2 Video
rating conference

CEF Performance Samples Link to Norths rating
conference (1996/2000)
A second-best option and problems
How similar was the rating of the Euro
item-writers to each others?
Encouraging results
Reliability of scale use Chronbachs Alpha 0.96
Kendalls W 0.85

15
Standardization of JudgementsLine 3 Standard
Setting

With about 20 scripts per level for both test
2003 and 2004
An examinee-based method. Scripts carefully
chosen, arranged in decreasing order of ability
Overfitting candidates
Info about items
Rating done twice bearing in mind
Round1 conventional Euro standards, Kendalls W
ranged 0.8 - 0.83
Round 2 CEF standards, Kendalls W ranged 0.75 -
0.79
Results provided additional info about Line 1

16
Empirical Validation Phase

Empirical validation started very early
Internal validation item analyses
Independent analyses
Joint analyses of same level papers
External validation
Using standard setting data from the
Standardization phase as ratings
Calibrate overall test difficulties
Anchor item means of independent analyses to
calibrated overall test difficulties
Use a corrected version of Norths scale
Compare cutoffs obtained in this way with
conventional Euro cutoffs.

Thank you.

Write a Comment

User Comments (0)

About PowerShow.com

Dvid Gergely: Building a Case for Euro Examinations or A case study - PowerPoint PPT Presentation

Dvid Gergely: Building a Case for Euro Examinations or A case study

The Mission of the Study. Initial decisions by Euro. Management decision to select GramVoc only ... Using standard setting data from the Standardization phase ... – PowerPoint PPT presentation