THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR A COMPUTER-ADAPTIVE TEST - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR A COMPUTER-ADAPTIVE TEST

Description:

School of Computer Science. Mariana Lilley. Dr. Trevor Barker. Dr. Carol Britton. Objectives ... evaluation of the CAT prototype (Lilley & Barker, 2002) ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 34

Provided by: mariana54

Category:

more less

Transcript and Presenter's Notes

Title: THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR A COMPUTER-ADAPTIVE TEST

1
THE GENERATION OF AUTOMATED STUDENT FEEDBACK FOR
A COMPUTER-ADAPTIVE TEST

University of Hertfordshire
School of Computer Science
Mariana Lilley
Dr. Trevor Barker
Dr. Carol Britton

2
Objectives

Overview of ongoing research at the University of
Hertfordshire on the use of computer-adaptive
tests (CATs)
Our approach to the generation of automated
feedback
Student attitude
Future work

3
Research overview

Research started in 2001.
Five empirical studies, involving over 350
participants.
Findings suggest that computer-adaptive test
(CAT) approach has the potential to offer a more
consistent and accurate measurement of student
proficiency levels than the one offered by
non-adaptive computer-based tests (CBTs).
Statistical analysis of the data gathered to date
suggests that the CAT approach is a fair measure
of proficiency levels, producing higher
test-retest correlations than either CBT or
off-computer assessments.
More importantly, these results were observed in
three different subject domains, namely English
as a second language, Visual Basic programming
and Human-Computer Interaction. This was taken
to indicate that the approach can be transferred
and generalised to different subject domains.

4
Traditional and adaptive approaches to testing

Computer-Based Tests (CBTs) mimic aspects of a
paper-and-pencil test
Accuracy and speed of marking
Predefined set of questions presented to all
participants and thus questions are not tailored
for each individual student
Computer-Adaptive Tests (CATs) mimic aspects of
an oral interview
Accuracy and speed of marking
Questions are dynamically selected and thus
tailored according to student performance

5
Main benefits of the adaptive approach

Questions that are too easy or too difficult are
likely to
Be demotivating
Provide little or no valuable information about
student knowledge
Questions at the boundary of student knowledge
are likely to
Be challenging
Be motivating
Provide lecturers with valuable information with
regard to student ability
Beginning in the days when education was for the
privileged few, the wise tutor would modify the
oral examination of a student by judiciously
choosing questions appropriate to the student's
knowledge and ability (Wainer, 1990).

6
Computer-Adaptive Test

Based on Item Response Theory (IRT)
If a student answers a question correctly, the
estimate of his/her ability is raised and a more
difficult question is presented
If a student answers a question incorrectly, the
estimate of his/her ability is lowered and an
easier question follows
Can be of fixed or variable length
Score

7
Item Response Theory

Family of mathematical functions
Most well-known models for dichotomously scored
questions
One-Parameter Logistic Model (1-PL)
Two-Parameter Logistic Model (2-PL)
Three-Parameter Logistic Model (3-PL).
In the CAT application introduced here
3-PL Model
Fixed length

8
The 3-PL model from IRT

?, represents student's ability
b, represents question's difficulty
a, represents question's discrimination
c, represents pseudo-chance

9
Level of difficulty

One of the underlying ideas within Bloom's
taxonomy of cognitive skills (Anderson
Krathwohl, 2001) is that tasks can be arranged in
a hierarchy from less to more complex.

10
Feedback provided for the first and second
assessment sessions

Scores sent via email
Students seemed pleased to receive their scores
via email
Some students reported that the score on its own
provided learners with very little if any
help in determining which part of the subject
domain they should revise next or which topic
they should prioritise
Student views were in line with the opinion of
the experts who participated in the pedagogical
evaluation of the CAT prototype (Lilley Barker,
2002)

11
Feedback provided for the first and second
assessment sessions

To ltltStudent_Namegtgt
Your score for the Visual Basic Test 1 was
ltltStudent_Scoregtgt.
This is an automated message from
The Programming_Module team

12
Assessment

Bachelor of Science (BSc) in Computer Science
123 participants
The participants took the test on week 30 as part
of their real assessment for the module
6 non-adaptive questions followed by 14 adaptive
ones
Human-Computer Interaction
Issues related to the use of sound at interfaces
Graphical representation at interfaces, focusing
on the use of colour and images
User-centred approaches to requirements gathering
Design, prototyping and construction
Usability goals and User experience goals
Evaluation paradigms and techniques

13
Providing students with a copy of the test

A simple potential solution was to provide
students with a copy of all questions they got
wrong.
A major limitation of this approach was lack of
explanation or comment on their performance.
It seemed unlikely that providing students with
the answers to the questions they did not get
right would foster research and/or reflection
skills.
A further practical limitation of the approach
was increased exposure of the objective questions
stored in the database.
Re-use of questions is one of the perceived
benefits of computer-assisted assessments
(Freeman Lewis, 1998 Harvey Mogey, 1999).

14
Automated feedback using Item Response Theory
(IRT)

Overall proficiency level calculated as in
previous assessments using the CAT application
(i.e. using the 3-PL Model).
A proficiency level was calculated for each set
of student responses for a given topic.
Questions answered incorrectly by each individual
student identified.
Design and implementation of a feedback database
Feedback according to topic
Feedback according to question

15
Summary of overall performance
16
Summary of performance per topic
17
Feedback according to topic
18
Feedback according to topic
19
Feedback according to question

Section named Based on your test performance, we
suggest the following areas for revision.
This section of the feedback document comprised a
list of points for revision, based on the
questions answered incorrectly by each individual
student.
This feedback sentence did not reproduce the
question itself.
The feedback sentence listed specific sections
within the recommended reading and/or additional
learning materials.
The same feedback sentence could be used for more
than one question in the database.

20
Example of question
21
Example of question
22
Example of feedback sentence related to questions
regarding bit-depth

Do some independent research on bit depth (the
number of bits per pixel allocated for storing
indexed colour information in a graphics file).
As a starting point, see http//www.microsoft.com/
windowsxp/experiences/glossary_a-g.asp24-bitcolor
. See also Chapter 5 from Principles of
Interactive Multimedia, as section 5.6.4
introduces important aspects related to the use
of colour at interfaces.

23
Student attitude towards the feedback format
adopted

All students who participated in Assessment 3
invited to express their views on the feedback
format used (optional).
58 students replied to our email (47).
Students asked to classify the feedback received
as "very useful", "useful" or "not useful".
Students were also asked to present one positive
and one negative aspect of the feedback provided.

24
Summary of positive aspects
25
Summary of negative aspects
26
Summary of problems with document layout and/or
type
27
Example of automated feedback generated
28
Feedback according to topic
29
Recommended action(s)
30
Discussion

Like Denton (2003), it is our belief that the
potential benefits of automated feedback have not
yet been fully explored by academic staff, even
by those who are already making use of
computer-assisted assessment tools.
Our initial ideas on how CATs/IRT can be used to
provide students with personalised, meaningful
feedback include
An ability estimation algorithm based on the
Three-Parameter Logistic Model from IRT
A feedback database
Feedback sentences selected from the feedback
database based on the ability level estimated and
questions answered incorrectly
For each individual student only those sentences
that applied to his or her test performance are
selected
Selected feedback sentences added to a new Word
document and sent to each individual student
email account.

31
Discussion

Learners like to be assessed and value comments
on their performance.
The investment of effort by learners necessitates
comment from tutors.
As class sizes increase and more use is made of
online formative and summative assessment
methods, it becomes increasingly difficult to
provide individual feedback in HE.
Students still value a human contribution to
feedback, but they also realise that this is
becoming rarer in their academic lives.
Student attitude to this approach was positive in
general.
At the very least we have shown that our
automated feedback method identifies areas of
weakness and strength and provides useful advice
for individual development.

32
Future work

Creation of one distinct feedback sentence per
question.
It is anticipated that these sentences should
resemble the actual question more than the
current comments do.
"Would it be possible to attach the question and
the correct answers from the test?"
Overall layout of the document will be reviewed
To facilitate the location of information on the
feedback sheet (some learners reported that they
did not intuitively locate their overall score in
the feedback document)
The distribution of the feedback document as a
PDF rather than Word (DOC) file is also being
considered

33
Future work

Review our assumption in that performance in one
topic area within a subject domain is the best
indicator of performance in a related topic area
in the same domain.
It is possible that students might have differing
abilities in quite similar topic areas.
Impact on test length and/or proficiency level
estimation.
Increase personalisation of the feedback, we are
intending to compare learner performance in
previous assessments with his or her performance
in most current one.