Effects of Item Content Characteristics on Item Difficulty of Multiple Choice Test Items in an EFL Listening Assessment - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Effects of Item Content Characteristics on Item Difficulty of Multiple Choice Test Items in an EFL Listening Assessment

Description:

Title: Effects of Item Content Characteristics on Item Difficulty of Multiple Choice Test Items in an EFL Listening Assessment Author: Velas Last modified by – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 32
Provided by: Vel60
Category:

less

Transcript and Presenter's Notes

Title: Effects of Item Content Characteristics on Item Difficulty of Multiple Choice Test Items in an EFL Listening Assessment


1
Effects of Item Content Characteristics on Item
Difficulty of Multiple Choice Test Items in an
EFL Listening Assessment
Ikkyu Choi University of California, Los Angeles
  • 2010.10.30. ECOLT
    2010

2
Background
  • Korean College Scholastic Ability Test (CSAT)
  • one of main criteria for the new university
    students selection process
  • the highest-stakes test administered in Korea
  • several distinguishing characteristics from its
    predecessors, including the introduction of a
    dedicated English listening section (consisting
    of multiple choice items)

3
Background
  • One Thorny Problem Listening Section
  • much easier than its reading counterpart as well
    as pre-aimed standards (Cha, 1997 Kim, 2001
    Lee, 2001)
  • low item discrimination (Kim, 2001)

4
Background
  • One Thorny Problem Listening Section
  • much easier than its reading counterpart as well
    as pre-aimed standards (Cha, 1997 Kim, 2001
    Lee, 2001)
  • low item discrimination (Kim, 2001)
  • -gt a need for increasing the difficulty level of
    the English Listening Comprehension (ELC) items

5
The Purpose of the Study
  • To identify variables and the underlying factor
    structure that affect the difficulty of multiple
    choice test items such as the ones adopted in the
    CSAT listening section

6
Research Questions
  • What are the characteristics of the CSAT type
    multiple choice ELC test items and their
    relationships?
  • What relationships exist between item content
    characteristics and item difficulty?

7
Review of Literature
  • In Free-Response Assessment Contexts
  • Buck and Tatsuoka (1998) identify 15 item
    content characteristics and 14 interactions among
    the content characteristics as meaningful
    predictors of task difficulty
  • Brindley and Slatyer (2002) control the item
    difficulty by manipulating some of item content
    characteristics
  • Carr (2006) construct a model that accounts for
    the item difficulty in a reading comprehension
    context

8
Review of Literature
  • In TOEFL Listening Contexts
  • Freedle and Kostin (1996)14 variables, including
    the type of topic, required degree of inference,
    and the location of information, were significant
    in predicting item difficulty
  • Nissan, DeVincenzi, and Tang (1996) five
    meaningful predictors of item difficulty,
    including the frequency of negatives and
    infrequent vocabulary, and the degree of
    familiarity of roles speakers had
  • Kostin (2004)14 significant predictors, most of
    which were found significant in the two earlier
    studies

9
Review of Literature
  • In the CSAT Context
  • Lee et al. (2003) and Chang (2004) the degree of
    inference, grammatical competence and time
    required to answer the item, number of attractive
    distracters and their degree of attractiveness,
    and the level of grammar involved in the item (of
    the reading section)
  • Jin and Park (2004)14 meaningful predictors of
    the CSAT English test item difficulty

10
Research Questions
  • What are the characteristics of the CSAT type
    multiple choice ELC test items and their
    relationships?
  • What relationships exist between item content
    characteristics and item difficulty?

11
Methodology
  • Participants
  • Test takers 1,280 Korean middle- and high-
    school students
  • Item Contents Raters 2 graduate students
    majoring in English education
  • Test Items
  • 120 items from 78 CSAT preparation examinations
    (4 matched formats, each 30 items)
  • involved a conversation between a male and a
    female, and required test takers to identify
    specific information from the given conversation
  • Each item had two sub-questions, which asked the
    test takers to indicate their levels of
    confidence to get the given item right and the
    degree of their comprehension of stimulus.

12
Methodology
  • Item Contents Variables
  • variables that were expected or found to be
    influential on the test taker performance in
    theory (e.g., Brown et al., 1984 Rost, 2002) and
    relevant empirical studies (e.g., Freedle
    Kostin, 1993 Kostin, 2004)
  • 27 item characteristic variables were selected
  • divided into 6 groups according to their
    characteristics Word Level, Sentence Level, Key
    Sentence, Discourse Level, Item Level, and
    Item/Stimulus Overlap

13
Methodology
  • Content Rating Instruments
  • taken directly from, or sometimes derived from
    those used by Bachman (1990), Bachman, Davidson,
    Ryan, and Choi (1995), Bachman, Davidson, and
    Milanovic (1996), Buck and Tatsuoka (1998),
    Freedle and Kostin (1993), Kostin (2004), Carr
    (2006) and Nissan, DeVincenzi, and Tang (1996)
  • classified into three categories (Carr, 2006),
    namely counting, calculating, and judging, in
    terms of appropriate measurement procedures

14
Excerpt from the Rating Instrument

Variable Name Operational Definition Category Rating
WLNIDW Number of words not listed in middle school English textbooks in stimulus Word Counted
WLNWMS Number of words that contain more than three syllables in stimulus Word Counted
WLNIMV Number of idiomatic/multiword verbs Word Counted
WLAWL Average word length in characters Word Calculated
WLDIF Judged relevance of the words not listed in middle school English textbooks to key information of stimulus Word Calculated
SLNDC Number of dependent clauses in stimulus Sentence Counted
SLDIF The FleschKincaid Grade Level of stimulus Sentence Calculated
SLNWCR Number of within-sentence referential expressions in stimulus Sentence Counted
SLNBCR Number of between-sentence referential expressions in stimulus Sentence Counted
KSLOC Key sentence location more difficult when it is located in the middle Key Sentence Judged
15
Data Analysis
  • Item Contents Analysis
  • inter-rater reliability for ratings of judged
    variables r.84
  • descriptive statistics including means, standard
    deviations, minimum and maximum values, skewness,
    and kurtosis
  • Item Difficulty Estimation
  • test taker performance the proportion of test
    takers who did not provide correct response
  • the degree of the confidence the average of
    responses on the first sub-question
  • the degree of the comprehension the average of
    responses on the second sub-questions

16
Data Analysis
  • Initial Model 1

17
Data Analysis
  • Initial Model 2

18
Data Analysis
  • Initial Model 3

19
Results
  • Item Content Characteristics
  • infrequent use of difficult words (words not
    included in the middle school textbooks)
  • the stems and options in the ELC items showed
    very limited variability
  • the mere counting of match between the options
    and the stimulus and the difficulty the test
    takers might have actually faced could differ due
    to the overlap
  • some key sentences were recorded at a high speech
    rate, but it could be compensated by hints and
    repetitions often found in the stimulus

20
Results
  • Item Difficulty
  • test taker performance close to the normal
    distribution
  • confidence and comprehension indicators close to
    the normal distribution
  • linear dependency of Confidence and Comprehension
    Indicators (r.989)
  • -gt In order to avoid multicolinearity, only the
    comprehension indicator was retained.

21
Results
  • Candidate Model 1

22
Results
  • Candidate Model 2

23
Results
  • Candidate Model 3

24
Results
  • Candidate Model 1

25
Results
  • Candidate Model 2

26
Results
  • Candidate Model 3

27
Results
  • Model Fit

Model No. Chi-square (df, sig) CFI NNFI SRMR RMSEA
1 34.22 (29, p.23) .99 .98 .058 .026
2 39.49 (38, p.40) 1.00 .99 .055 .012
3 21.00 (17, p.23) .99 .98 .062 .039
28
Results
  • Model Fit

Model No. Chi-square (df, sig) CFI NNFI SRMR RMSEA
1 34.22 (29, p.23) .99 .98 .058 .026
2 39.49 (38, p.40) 1.00 .99 .055 .012
3 21.00 (17, p.23) .99 .98 .062 .039
-gt All three models showed good fit to the data.
Considering goodness of fit, practicality, and
interpretability, the third model, which
accounted for item difficulty with the stimulus
complexity and item/stimulus overlap, was chosen
as the final model.
29
Implications
  • The frequency of difficult words in a stimulus
    could be utilized as an effective means of item
    difficulty control.
  • While counting of surface matches between a
    stimulus and its options could indicate high
    difficulty for a certain item, judged ratings of
    the degree of the overlap could point to the
    opposite direction

30
Limitations
  • a small sample of 120 items made the results from
    covariance structure analysis unstable
  • a small number of raters
  • a rather simplistic, linear model of accounting
    for the difficulty of the ELC items without
    considering test takers

31
  • Thank You!!!
Write a Comment
User Comments (0)
About PowerShow.com