Title: Are all questions created equal?: Factors that influence cloze question difficulty.
1Are all questions created equal? Factors that
influence cloze question difficulty.
- Brooke Soden Hensler
- Carnegie Mellon University
- (starting graduate school at
- Florida Center for Reading Research this Fall)
- Joseph E. Beck
Carnegie Mellon University
Society for the Scientific Study of Reading
July 2006
Funding National Science Foundation
2Why Look at Multiple Choice Cloze Questions?
- Multiple Choice Cloze are widely used assessments
of comprehension - Problem outcome measure is typically binary
(little information about student). - Goal use multiple choice cloze questions to
- More accurately assess students
- Track student reading development
- Better understand what makes cloze questions hard
3Project LISTENs Computer Reading Tutor(Mostow
Aist, 2001)
- Automated
- Students use throughout year
- Accompanying paper standardized test scores (pre
post)
4Student is reading a story aloud to the Reading
Tutor
5A question appearsReading Tutor reads both
Question and Response Choices.(Mostow, et al.,
2004)
6Student resumes reading story aloud to the
Reading Tutor
7Reading Tutor Advantages
- Well-specified unbiased question construction
(randomly generated) - Questions automatically administered, scored,
recorded - Longitudinal collection over school year
- Large N (students questions)
8How many Qs from Whom?Data Description
- 81,175 Questions
- 1042 Students
- 11 Median number of questions answered
- (Many students infrequent users of tutor)
- 2001-02 2002-03 School years
- Diverse population in Pittsburgh area
9Research Questions
- Is a particular part of speech (e.g., nouns,
verbs, etc.) more difficult for students? - If nouns are learned first (Gentner, 1982
Golinkoff, et al., 2000), might students be more
proficient at answering noun questions? - Which factors influence question difficulty?
- How can we better assess students using multiple
choice cloze questions? - Vocabulary researchers have given partial credit
for correct part of speech (e.g., Schwanenflugel,
et al., 1997)
10Approach
- Build logistic regression model to predict
individual question performance - Terms in model student identity, part of speech
of answer, properties of question (e.g., question
length) - Advantages of modeling approach
- Simultaneously estimates impact of question
properties and student proficiency on question
performance - Makes use of all 80k questions
11Effect of Parts of Speech
lt
lt
lt
Nouns
Verbs
Adjectives
Adverbs
(p lt 0.001)
(p lt 0.05)
(p lt 0.001)
12Effect of Parts of Speech
lt
lt
lt
Nouns
Verbs
Adjectives
Adverbs
(p lt 0.001)
(p lt 0.05)
(p lt 0.001)
harder
easier
13Impact of other Part of Speech terms
Difficulty Significance Most Common
? p lt 0.01 Part of Speech of
Choices ? p lt 0.001 with
Answers POS Sally had to _______ her lips when
she heard the news. (cloud, purse, holds,
magnificent) Henry read his _______ under the
tree. (cup, dog, book, hair)
14Impact of other Part of Speech terms
Difficulty Significance Most Common
? p lt 0.01 Part of Speech of
Choices ? p lt 0.001 with
Answers POS Henry read his _______ under the
tree. (cup, dog, book, hair) Sally had to
_______ her lips when she heard the news. (lamp,
purse, beautiful, magnificent)
? more common POS easier
? less common POS harder
15Impact of other Part of Speech terms
Difficulty Significance Most Common
? p lt 0.01 Part of Speech of
Choices ? p lt 0.001 with
Answers POS Henry read his _______ under the
tree. (cup, dog, book, hair) Sally had to
_______ her lips when she heard the news. (lamp,
purse, beautiful, magnificent)
(noun)
? more choices with correct POS harder
(verb)
- fewer choices
- with correct POS
easier
16Impact of other terms
Difficulty Significance Question
? p lt 0.001 Length Deletion
? p lt 0.001 Location We can _______
the stars in the sky despite the bright city
lights around us. (at, with, most, see) They
rode their _______ . (farmer, bikes, play, blue)
17Impact of other terms
Difficulty Significance Question
? p lt 0.001 Length Deletion
? p lt 0.001 Location We can _______
the stars in the sky despite the bright city
lights around us. (at, with, most, see) They
rode their _______ . (farmer, bikes, play, blue)
? longer harder
? shorter easier
18Impact of other terms
Difficulty Significance Question
? p lt 0.001 Length Deletion
? p lt 0.001 Location We can _______
the stars in the sky despite the bright city
lights around us. (at, with, most, see) They
rode their _______ . (farmer, bikes, play, blue)
? blank earlier harder
? blank later easier
19Using model to assess student reading
comprehension
- Model estimates Beta parameter for each student
- Represents how well student did at answering
cloze questions (controlling for difficulty
factors) - Should correlate with external comprehension
measure - Compare Beta vs. percent correct for predicting
WRMT comprehension composite - Student Beta r .644, p lt .001
- Percent correct r .507, p lt .001
- Reliability of difference in correlations, p lt
.01 - Also provides check on validity of regression
model
N 465, 1 extreme outlier was eliminated from
analyses.
20Conclusions
- Length of question, location of deleted word, and
part of speech of correct answer affect question
difficulty. - Logistic regression is a strong choice for
analyzing cloze data. - Multiple-choice cloze questions can assess a
student at a more accurate level than current
practice.
21Questions?
- Nominated for Best Paper Award
- Soden Hensler, B., Beck, J. E. (2006). Better
student assessing by - finding difficulty factors in a fully automated
comprehension - measure. Intelligent Tutoring Systems.
- Brooke Soden Hensler
- bsodenhensler_at_gmail.com
- Joseph E. Beck
- joseph.beck_at_gmail.com
- Project LISTEN The Reading Tutor
- http//www.cs.cmu.edu/listen/
22References
- Gentner, D. (1981). Some interesting differences
between verbs and nouns. Cognition and Brain
Theory, 4(2). - Golinkoff, R.M., Hirsh-Pasek, K., Bloom, L.,
Smith, L. B., Woodward, A. L., Akhtar, N.,
Tomasello, M., Hollich, G. (2000). Becoming a
word learner A debate on lexical acquisition.
New York Oxford University Press. - Mostow, J. Aist, G. (2001). Evaluating tutors
that listen An overview of Project LISTEN. In K.
Forbus P. Feltovich (Eds.), Smart Machines in
Education (169 - 234) Menlo Park, CA MIT/AAAI
Press. - Mostow, J., Beck, J. E., Bey, J., Cuneo, A.,
Sison, J., Tobin, B. Valeri, J. (2004). Using
automated questions to assess reading
comprehension, vocabulary, and effects of
tutorial interventions. Technology, Instruction,
Cognition and Learning, 2, p. 97-134 - Schwanenflugel, P.J., Stahl, S. A., McFalls, E.
L. (1997). Partial word knowledge and vocabulary
growth during reading comprehension. Journal of
Literacy Research, 29(4).
23Additional Slides
24Terms in Model
Factors Description of Term
Part of Speech Simplified part of speech classification of the correct answer as Noun, Verb, Adjective, Adverb, or Function Word.
Most Common Part of Speech Whether or not the correct answers POS is the most common POS the word could take on.
POS Confusability The number of POS the correct answer can take on.
Level of Difficulty 4 Levels of Difficulty based on frequency in English or special annotation.
Student Identity Unique Identification for each student.
Covariates
Question Length Number of characters of the cloze question and the corresponding response choices.
Deletion Location Proportion of the sentence that is before the blank (location of word deletion).
Choices with Answer's POS Probability that the student could have answered the question using only part of speech information.
25Developmental Trends in Learning Parts of Speech
26Developmental Trends in Learning Parts of Speech
p .52
p .64
p .99
p .71
p lt .001
27Syntactic Awareness
p .73
p .48
p .01
p .02
p lt .001
28Effect of Part of SpeechInterpretation
positive Beta means student is more likely to
answer question correctly
Part of Speech Noun lt Verb lt Adjective lt Adverb lt Function Words
Beta 0.39 0.29 0.19 0.12 (comparison point)
Significance p lt .001 p lt .001 p lt .001 p lt .001 ---