Title: THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR A COMPUTER-ADAPTIVE TEST
1THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR
A COMPUTER-ADAPTIVE TEST
- University of Hertfordshire
- School of Computer Science
- Mariana Lilley
- Dr. Trevor Barker
- Dr. Carol Britton
2Objectives
- Overview of ongoing research at the University of
Hertfordshire on the use of computer-adaptive
tests (CATs) - Our approach to the generation of automated
feedback - Student attitude
- Future work
3Research overview
- Research started in 2001.
- Five empirical studies, involving over 350
participants. - Findings suggest that computer-adaptive test
(CAT) approach has the potential to offer a more
consistent and accurate measurement of student
proficiency levels than the one offered by
non-adaptive computer-based tests (CBTs). - Statistical analysis of the data gathered to date
suggests that the CAT approach is a fair measure
of proficiency levels, producing higher
test-retest correlations than either CBT or
off-computer assessments. - More importantly, these results were observed in
three different subject domains, namely English
as a second language, Visual Basic programming
and Human-Computer Interaction. This was taken
to indicate that the approach can be transferred
and generalised to different subject domains.
4Traditional and adaptive approaches to testing
- Computer-Based Tests (CBTs) mimic aspects of a
paper-and-pencil test - Accuracy and speed of marking
- Predefined set of questions presented to all
participants and thus questions are not tailored
for each individual student - Computer-Adaptive Tests (CATs) mimic aspects of
an oral interview - Accuracy and speed of marking
- Questions are dynamically selected and thus
tailored according to student performance
5Main benefits of the adaptive approach
- Questions that are too easy or too difficult are
likely to - Be demotivating
- Provide little or no valuable information about
student knowledge - Questions at the boundary of student knowledge
are likely to - Be challenging
- Be motivating
- Provide lecturers with valuable information with
regard to student ability - Beginning in the days when education was for the
privileged few, the wise tutor would modify the
oral examination of a student by judiciously
choosing questions appropriate to the student's
knowledge and ability (Wainer, 1990).
6Computer-Adaptive Test
- Based on Item Response Theory (IRT)
- If a student answers a question correctly, the
estimate of his/her ability is raised and a more
difficult question is presented - If a student answers a question incorrectly, the
estimate of his/her ability is lowered and an
easier question follows - Can be of fixed or variable length
- Score
7Item Response Theory
- Family of mathematical functions
- Most well-known models for dichotomously scored
questions - One-Parameter Logistic Model (1-PL)
- Two-Parameter Logistic Model (2-PL)
- Three-Parameter Logistic Model (3-PL).
- In the CAT application introduced here
- 3-PL Model
- Fixed length
8The 3-PL model from IRT
- ?, represents student's ability
- b, represents question's difficulty
- a, represents question's discrimination
- c, represents pseudo-chance
9Level of difficulty
- One of the underlying ideas within Bloom's
taxonomy of cognitive skills (Anderson
Krathwohl, 2001) is that tasks can be arranged in
a hierarchy from less to more complex.
10Feedback provided for the first and second
assessment sessions
- Scores sent via email
- Students seemed pleased to receive their scores
via email - Some students reported that the score on its own
provided learners with very little if any
help in determining which part of the subject
domain they should revise next or which topic
they should prioritise - Student views were in line with the opinion of
the experts who participated in the pedagogical
evaluation of the CAT prototype (Lilley Barker,
2002)
11Feedback provided for the first and second
assessment sessions
- To ltltStudent_Namegtgt
- Your score for the Visual Basic Test 1 was
ltltStudent_Scoregtgt. - This is an automated message from
- The Programming_Module team
12Assessment
- Bachelor of Science (BSc) in Computer Science
- 123 participants
- The participants took the test on week 30 as part
of their real assessment for the module - 6 non-adaptive questions followed by 14 adaptive
ones - Human-Computer Interaction
- Issues related to the use of sound at interfaces
- Graphical representation at interfaces, focusing
on the use of colour and images - User-centred approaches to requirements gathering
- Design, prototyping and construction
- Usability goals and User experience goals
- Evaluation paradigms and techniques
13Providing students with a copy of the test
- A simple potential solution was to provide
students with a copy of all questions they got
wrong. - A major limitation of this approach was lack of
explanation or comment on their performance. - It seemed unlikely that providing students with
the answers to the questions they did not get
right would foster research and/or reflection
skills. - A further practical limitation of the approach
was increased exposure of the objective questions
stored in the database. - Re-use of questions is one of the perceived
benefits of computer-assisted assessments
(Freeman Lewis, 1998 Harvey Mogey, 1999).
14Automated feedback using Item Response Theory
(IRT)
- Overall proficiency level calculated as in
previous assessments using the CAT application
(i.e. using the 3-PL Model). - A proficiency level was calculated for each set
of student responses for a given topic. - Questions answered incorrectly by each individual
student identified. - Design and implementation of a feedback database
- Feedback according to topic
- Feedback according to question
15Summary of overall performance
16Summary of performance per topic
17Feedback according to topic
18Feedback according to topic
19Feedback according to question
- Section named Based on your test performance, we
suggest the following areas for revision. - This section of the feedback document comprised a
list of points for revision, based on the
questions answered incorrectly by each individual
student. - This feedback sentence did not reproduce the
question itself. - The feedback sentence listed specific sections
within the recommended reading and/or additional
learning materials. - The same feedback sentence could be used for more
than one question in the database.
20Example of question
21Example of question
22Example of feedback sentence related to questions
regarding bit-depth
- Do some independent research on bit depth (the
number of bits per pixel allocated for storing
indexed colour information in a graphics file).
As a starting point, see http//www.microsoft.com/
windowsxp/experiences/glossary_a-g.asp24-bitcolor
. See also Chapter 5 from Principles of
Interactive Multimedia, as section 5.6.4
introduces important aspects related to the use
of colour at interfaces.
23Student attitude towards the feedback format
adopted
- All students who participated in Assessment 3
invited to express their views on the feedback
format used (optional). - 58 students replied to our email (47).
- Students asked to classify the feedback received
as "very useful", "useful" or "not useful". - Students were also asked to present one positive
and one negative aspect of the feedback provided.
24Summary of positive aspects
25Summary of negative aspects
26Summary of problems with document layout and/or
type
27Example of automated feedback generated
28Feedback according to topic
29Recommended action(s)
30Discussion
- Like Denton (2003), it is our belief that the
potential benefits of automated feedback have not
yet been fully explored by academic staff, even
by those who are already making use of
computer-assisted assessment tools. - Our initial ideas on how CATs/IRT can be used to
provide students with personalised, meaningful
feedback include - An ability estimation algorithm based on the
Three-Parameter Logistic Model from IRT - A feedback database
- Feedback sentences selected from the feedback
database based on the ability level estimated and
questions answered incorrectly - For each individual student only those sentences
that applied to his or her test performance are
selected - Selected feedback sentences added to a new Word
document and sent to each individual student
email account.
31Discussion
- Learners like to be assessed and value comments
on their performance. - The investment of effort by learners necessitates
comment from tutors. - As class sizes increase and more use is made of
online formative and summative assessment
methods, it becomes increasingly difficult to
provide individual feedback in HE. - Students still value a human contribution to
feedback, but they also realise that this is
becoming rarer in their academic lives. - Student attitude to this approach was positive in
general. - At the very least we have shown that our
automated feedback method identifies areas of
weakness and strength and provides useful advice
for individual development.
32Future work
- Creation of one distinct feedback sentence per
question. - It is anticipated that these sentences should
resemble the actual question more than the
current comments do. - "Would it be possible to attach the question and
the correct answers from the test?" - Overall layout of the document will be reviewed
- To facilitate the location of information on the
feedback sheet (some learners reported that they
did not intuitively locate their overall score in
the feedback document) - The distribution of the feedback document as a
PDF rather than Word (DOC) file is also being
considered
33Future work
- Review our assumption in that performance in one
topic area within a subject domain is the best
indicator of performance in a related topic area
in the same domain. - It is possible that students might have differing
abilities in quite similar topic areas. - Impact on test length and/or proficiency level
estimation. - Increase personalisation of the feedback, we are
intending to compare learner performance in
previous assessments with his or her performance
in most current one.