Dvid Gergely: Building a Case for Euro Examinations or A case study - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Dvid Gergely: Building a Case for Euro Examinations or A case study

Description:

The Mission of the Study. Initial decisions by Euro. Management decision to select GramVoc only ... Using standard setting data from the Standardization phase ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 18
Provided by: CETT1
Category:

less

Transcript and Presenter's Notes

Title: Dvid Gergely: Building a Case for Euro Examinations or A case study


1
Dávid GergelyBuilding a Case for Euro
Examinations or A case study
2

The Mission of the Study
  • Piloting the Manual and seeing how good the
    methodology of linking is.
  • Getting initial measures for items and tasks
    calibrated to the CEF
  • Establishing a link for Euro examinations with
    the CEF.
  • In sum, validate the test by following the
    methodology outlined in the Manual. Build a case
    for the CEF link by collecting validity evidence.

3
Initial decisions by Euro
  • Management decision to select GramVoc only
  • a question of finances
  • North The most difficult task you could pick
  • Unpopular kind of test
  • The Dutch CEF Construct project focused on
    reading and listening
  • ALTE produced grids for speaking and listening
  • Any CEF scales relevant to the GramVoc paper?

4
Productive orientation of CEF
  • General Linguistic Range B2
  • Has a sufficient range of language to be able to
    give clear descriptions, express viewpoints and
    develop arguments without much conspicuous
    searching for words, using some complex sentence
    forms to do so.

5
In retrospect
  • Advantages of selecting the GramVoc paper
  • The knowledge of language underlies all other
    skills in the examination
  • GramVoc project as pilot for the rest of the Euro
    papers. As part of the efforts of the Hungarian
    Accreditation Board, Euro Examinations will do
    level setting exercises to all skills-based Euro
    papers.

6
Process and Audience for the Case Study
  • Four phases of action according to the Manual
  • Familiarization
  • Specification
  • Standardization (of judgements)
  • Empirical validation
  • Working with the team of full-time item writers
    as holders of standards

7
The Familiarization Phase 1
  • Survey of familiarity with the CEF scales
  • Descriptors from 15 scales, 133 items, as in a
    test
  • Statistical analyses
  • Initial facility value of responses 0.4
  • Low? How low?
  • 16/133 descriptors nobody got the level right.
    Significantly more B1 descriptors.
  • No descriptor -- same descriptor problem

8
The Familiarization Phase 2
  • Insights from categorizing descriptors
  • No correct identification of level, spread of
    responses 16
  • lt50 of team correctly identified level 55
  • 50 correctly identified level 62
  • In cases uncertainty, tendency to place level of
    descriptor higher than in CEF. Lower Euro
    standards? Leniency?
  • Chi-squares Leniency not related to any of the
    scales, but it is to level B2.

9
The Specification Phasea qualitative content
audit
  • Lack of yardsticks for a test like GramVoc
  • Van Ek and Trim volumes not useful.
  • CEF provides description of 15 categories, but
    without level specification pp. (108-117).
  • Euro specifications need attention.
  • Two lines of work
  • Elucidating item-writers concepts
  • Expert analysis of what (item focuses) actually
    goes into the test on the basis of the scope, the
    gradation and stability between 2 consecutive
    test administrations.

10
Specification Phase 2Elucidating item-writers
concepts
  • Item-writers conceptualisations of levels
    coherent? In line with CEF? In line with Euro
    specifications?
  • Item writers select best task for each task
    type and level. Answer What is it that makes
    this task the best for you?
  • Series of workshops to bring item-writers
    conceptualisations to light.

11
Specification Phase 3Expert analysis of item
focuses
  • Evidence of construct under-representation?
    Anything else measured, other than the construct?
    Items to generate construct-irrelevant variance?
  • 2 experts identify item focuses, then jointly
    finalize classification of items acc. to 15 CEF
    categories.
  • Predict problematic items.

12
Results Specification Phase
  • Item-writers concepts broadly match CEF. Better
    overall results than in familiarization phase.
  • Statistical analysis of expert classifications
  • Distribution of focuses related to task type and
    author (text), but not related to level and
    administration.
  • Results similar when two administrations at the
    same level were compared.
  • Lack of significant focus differences by level
    prompted investigation of item complexity.
    Statistical test inconclusive p 0.05

13
The Standardization of Judgements Line 1
  • Investigating the gap between Local Euro
    standards and the CEF standards
  • Item-writers identified descriptors on the basis
    of collations the content of which exceeded local
    standards
  • Tabulation and qualitative analysis of responses.
    History of descriptors taken into account.
  • The gap does not widen up the CEF scale. Most
    conspicuous at B2, but less considerable if
    descriptor history is accounted for.
  • Why do the uncalibrated descriptors represent a
    higher level of requirements than those that went
    through it?

14
Standardization of Judgements Line 2 Video
rating conference
  • CEF Performance Samples Link to Norths rating
    conference (1996/2000)
  • A second-best option and problems
  • How similar was the rating of the Euro
    item-writers to each others?
  • Encouraging results
  • Reliability of scale use Chronbachs Alpha 0.96
  • Kendalls W 0.85

15
Standardization of JudgementsLine 3 Standard
Setting
  • With about 20 scripts per level for both test
    2003 and 2004
  • An examinee-based method. Scripts carefully
    chosen, arranged in decreasing order of ability
  • Overfitting candidates
  • Info about items
  • Rating done twice bearing in mind
  • Round1 conventional Euro standards, Kendalls W
    ranged 0.8 - 0.83
  • Round 2 CEF standards, Kendalls W ranged 0.75 -
    0.79
  • Results provided additional info about Line 1

16
Empirical Validation Phase
  • Empirical validation started very early
  • Internal validation item analyses
  • Independent analyses
  • Joint analyses of same level papers
  • External validation
  • Using standard setting data from the
    Standardization phase as ratings
  • Calibrate overall test difficulties
  • Anchor item means of independent analyses to
    calibrated overall test difficulties
  • Use a corrected version of Norths scale
  • Compare cutoffs obtained in this way with
    conventional Euro cutoffs.

17
  • Thank you.
Write a Comment
User Comments (0)
About PowerShow.com