Automatically Grading Programming Assignments with Web-CAT

About This Presentation

Title:

Automatically Grading Programming Assignments with Web-CAT

Description:

Automatically Grading Programming Assignments with Web-CAT Stephen H. Edwards Virginia Tech Dept. of Computer Science edwards_at_cs.vt.edu http://web-cat.sourceforge.net/ – PowerPoint PPT presentation

Number of Views:191

Avg rating:3.0/5.0

Slides: 41

Provided by: StephenE151

Category:

more less

Transcript and Presenter's Notes

Title: Automatically Grading Programming Assignments with Web-CAT

1
Automatically Grading Programming Assignments
with Web-CAT

Stephen H. Edwards
Virginia Tech
Dept. of Computer Science
edwards_at_cs.vt.edu
http//web-cat.sourceforge.net/

2
My goals today are to

Explain how requiring students to formulate and
test hypotheses about their own code can improve
their understanding and performance
Describe our experiences with an alternate
grading approach supported by a new tool Web-CAT
Describe some of the flexibility in Web-CAT for
supporting other approaches
Convince you software testing can be an
importantand practicaladdition to classroom
practices

3
Students hold onto ineffective techniques

Too often, intro students believe that if their
code
compiles, the errors are mostly gone
runs correctly when I try it once, it is correct
runs on the instructor-provided sample input, it
is correct
has a problem, it can be fixed by trial and error

4
What is reflection-in-action?

For an expert, when the current technique is
failing
Step back and reflect I must be missing
something
Re-examine the situation, your solution, and your
implicit assumptions about the problem
Leads to guesses (hypotheses) about why the
solution isnt working or why something else will
be better
Carry out an experiment which serves to
generate both a new understanding of the
phenomenon and a change in the situation

5
Practicing software testing will help students
frame and carry out experiments

The problem too much focus on synthesis and
analysis too early in teaching CS
Need to be able to read and comprehend source
code
Envision how a change in the code will result in
a change in the behavior
Need explicit, continually reinforced practice in
hypothesizing about program behavior and then
experimentally verifying their hypotheses

6
Student comments suggest their current testing
practices are often weak

I run them through some simple tests to ensure
that it is operating as expected. But for the
most part I have always relied on supplied test
data
I dont think about test cases until I am
confident my program is 100 working. Of course,
it almost never is
I usually write the whole thing up and then start
doing rapid-fire tests of everything I can think
of.

7
A comprehensive strategy is necessary for a
culture shift in what students do

Students cannot test their own code
Want a culture shift in student behavior
A single upper-division course would have little
impact on practices in other classes
So Systematically incorporate testing practices
across many courses

CS1
CS2
Testing Practices
OO Design
Data Struct
8
Expect students to apply their testing skills all
the time in programming assignments

Expect students to test their own work
Empower students by engaging them in the
process of assessing their own programs
Require students to demonstrate the correctness
of their own work through testing
Do this consistently across many courses

9
What tools and techniques should I teach?

We want to start with skills that are directly
applicable to authentic student-oriented tasks
Dont want to add bureaucratic busywork to
assignments
Without tool support, this is a lost cause!
It is imperative to give students skills they
value
But most textbooks only give a conceptual
intro to idealized industrial practices, not
techniques students can use in their own
assignments

10
Test-driven development is very accessible for
students

Also called test-first coding
Focuses on thorough unit testing at the level of
individual methods/functions
Write a little test, write a little code
Tests come first, and describe what is expected,
then followed by code, which must be revised
until all tests pass
Encourages lots of small (even tiny) iterations
See http//web-cat.sf.net/ for on-line references

11
Students can apply TDD in assignments and get
immediate, useful benefits

Conceptually, easy for students to understand and
relate to
Increases confidence in code
Increases understanding of requirements
Preempts big bang integration

12
The problem is devising an effective assessment
strategy

Need to assess student performance at testing
Need to give productive feedback
Need to provide rapid turnaround
Cannot afford huge increase in resources required

13
Conventional automated assessment does not
encourage good testing habits

Student uploads program
Program is compiled
Executed against test data
Scored based on output

14
The conventional approach provides useful
benefits that do lead to a cultural change

Fast, precise feedback to students
Chance(s) to improve based on feedback
Good assessment of behavior
Systematic use resulted in culture change

15
But the conventional approach may discourage
desired behavior and skills

Focus is on output correctness, first and
foremost
Get it working first, work on commenting,
structure, etc. later
Students not encouraged or rewarded for testing
on their own
Students often do less testing

16
Proper grading and feedback can provide positive
incentive for desirable behavior

Decide what behavior to foster
Choose a corresponding scoring/reward
system
Design feedback approach
Use students adaptive nature to drive cultural
change

17
Proper grading and feedback is critical to
reinforcing desired behavior

Assess test validity correctness of students
tests
Assess test completeness the thoroughness of
students tests
Assess program correctness behavior of students
solution
Multiply scores as percentages

18
Students improve their code quality when using
Web-CAT
Newly written untested code

Commerical-quality code
19
Students start earlier and finish earlier when
they use Web-CAT

20
An evaluation of submitted code indicates
students program more effectively
Bold ? p .05 significance Without With TDD
Recorded grades 90.2 96.1
TA assessment 98.1 98.2
Automated grader assessment 76.8 94.0
Faults on master test suite 36.7 24.9
Projected Defects/KSLOC 70 38 (45 less!)
How early was first submission? 2.2 days 4.2 days
21
After using TDD and Web-CAT, students clearly
perceive practical benefits
Agree Disagree
More helpful at detecting errors than Curator 4.3
Provides excellent support for TDD 4.1
Increases my confidence in correctness 3.9
Increases my confidence when making changes 3.8
Makes me test my solution more thoroughly 3.8
Makes me more systematic in devising tests 3.8
Would like to use, even if not required 3.8
22
Student reactions are very positive toward TDD

I am very excited about using TDD.
I agree that TDD can be beneficial and Im glad
we are being required to experiment with it in
this course.
If it increases the effectiveness of my
programming and decreases the time I spend
debugging, then I am all for it.
Previously, I had to quit my detailed testing
and stick to making the program appear to work
with the sample data given every time a deadline
drew near. With TDD, the tests are such an
integral part of the project that no
time-conserving measure will save me.

23
We use Web-CAT to automatically process student
submissions and check their work

Web application written in 100 pure Java
Deployed as a servlet
Built on Apples WebObjects
Uses a large-grained plug-in architecture
internally, providing for easily extensible data
model, UI, and processing features

24
Web-CATs strengths are targeted at broader use

Security mini-plug-ins for different
authentication schemes, global user permissions,
and per-course role-based permissions
Portability 100 pure Java servlet for Web-CAT
engine
Extensibility Completely language-neutral,
process-agnostic approach to grading, via
site-wide or instructor-specific grading plug-ins
Manual grading HTML web printouts of student
submissions can be directly marked up by course
staff to provide feedback

25
Grading plug-ins are the key to process
flexibility and extensibility in Web-CAT

Processing for an assignment consists of a tool
chain or pipeline of one or more grading
plug-ins
The instructor has complete control over which
plug-ins appear in the pipeline, in what order,
and with what parameters
A simple and flexible, yet powerful way for
plug-ins to communicate with Web-CAT, with each
other
We have a number of existing plug-ins for Java,
C, Scheme, Prolog, Pascal, Standard ML,
Instructors can write and upload their own
plug-ins
Plug-ins can be written in any language
executable on the server (we usually use Perl)

26
The most well-known plug-in is for grading Java
assignments that include student tests

ANT-based build of arbitrary Java projects
PMD and Checkstyle static analysis
ANT-based execution of student-written JUnit
tests
Carefully designed Java security policy
Clover test coverage instrumentation
ANT-based execution of optional instructor
reference tests
Unified HTML web printout
Highly configurable (PMD rules, Checkstyle rules,
supplemental jar files, supplemental data files,
java security policy, point deductions, and lots
more)

27
Web-CAT supports a variety of languages, and its
Java plug-in is aimed at software testing

ANT-based build of arbitrary Java projects
PMD and Checkstyle static analysis
ANT-based execution of student-written JUnit
tests
Carefully designed Java security policy
Clover test coverage instrumentation
ANT-based execution of optional instructor
reference tests
Unified HTML web printout
Highly configurable (PMD rules, Checkstyle rules,
supplemental jar files, supplemental data files,
java security policy, point deductions, and lots
more)

28
Web-CAT provides timely, constructive feedback on
how to improve performance

Indicates where code can be improved
Indicates which parts were not tested well enough
Provides as many revise/ resubmit cycles as
possible

29
The most important step in writing testable
assignments is

Learning to write tests yourself
Writing an instructors solution with tests that
thoroughly cover all the expected behavior
Practice what you are teaching/preaching

30
Students get frustrated without feedback, so
reference tests must provide some

If students only get a score, but no other
feedback for how to improve, they get easily
frustrated
We augment our reference tests to provide hints
for failed tests, cross-referenced to the program
assignment

Requirements in assignment spec mul this command takes two arguments from the evaluation stack and multiplies them11.
Feedback to student on failed test Your testing does not fully cover (11)
More detailed alternate feedback (11) mul command failed, expected 4 but received 8
31
Students will try to get Web-CAT to do their work
for them

Students appreciate the feedback, but will avoid
thinking at (nearly) all costs
Too much feedback encourages students to use
Web-CAT for testing instead of writing their own
teststhey use it as a development tool instead
of simply to check their work
This limits the learning benefits, which come in
large part from students writing their own tests
Lesson balance providing suggestive feedback
without giving away the answers lead the
student to think about the problem

32
We have also tried to influence student work
habits to improve their success

Encourage early submission by providing extra
incentives or using late penalties
Score bonuses and/or penalties are easy
Another useful approach
Generous limit on the total number of submissions
(60)
Hints disappear one day before the due date
Project closes for one day to encourage students
to step away and reflect on the last bug
Project opens again for one day with hints
re-enabled, but with a cap on how much the score
can improve

33
Lessons for writing program assignments intended
for automatic grading

Requires greater clarity and specificity
Requires you to explicitly decide what you wish
to test, and what you wish to leave open to
student interpretation
Requires you to unambiguously specify the
behaviors you intend to test
Requires preparing a reference solution before
the project is due, more upfront work for
professors or TAs
Grading is much easier as many things are taken
care by Web-CAT course staff can focus on
assessing design

34
Areas to look out for in writing testable
assignments

How do you write tests for the following
Main programs
Code that reads/write to/from stdin/stdout or
files
Code with graphical output
Code with a graphical user interface

35
Testing main programs

The key think in object-oriented terms
There should be a principal class that does all
the work, and a really short main program
The problem is then simply how to test the
principal class (i.e., test all of its methods)
Make sure you specify your assignments so that
such principal classes provide enough accessors
to inspect or extract what you need to test

36
Testing input and output behavior

The key specify assignments so that input and
output use streams given as parameters, and are
not hard-coded to specific sources destinations
Then use string-based streams to write test
cases show students how
In Java, we use BufferedReaders and PrintWriters
for all I/O
In C, we use istreams and ostreams for all I/O

37
Testing programs with graphical output

The key if graphics are only for output, you can
ignore them in testing
Ensure there are enough methods to extract the
key data in test cases
We use this approach for testing Karel the Robot
programs, which use graphic animation so students
can observe behavior

38
Testing programs with graphical UIs

This is a harder problemmaybe too distracting
for many students, depending on their level
The key question what is the goal in writing the
tests? Is it the GUI you want to test, some
internal behavior, or both?
Three basic approaches
Specify a well-defined boundary between the GUI
and the core, and only test the core code
Switch in an alternative implementation of the UI
classes during testing
Test by simulating GUI events

39
Conclusion including software testing helps
promote learning and performance

If you require students to write their own tests
Our experience indicates students are more likely
to complete assignments on time, produce one
third less bugs, and achieve higher grades on
assignments
It is definitely more work for the instructor
But it definitely improves the quality of
programming assignment writeups and student
submissions

40
Visit our SourceForge project!