Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor

About This Presentation

Title:

Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor

Description:

... to multiple-choice for domain knowledge assessment ... Uses background knowledge of domain in assessment in addition to previously scored essays ... – PowerPoint PPT presentation

Number of Views:547

Avg rating:3.0/5.0

Slides: 22

Provided by: darrel65

Category:

more less

Transcript and Presenter's Notes

Title: Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor

1
Automated Scoring and Annotation of Essays with
theIntelligent Essay Assessor

Darrell Laham , Ph.D.
Knowledge Analysis Technologies

www.knowledge-technologies.com
dlaham_at_knowledge-technologies.com NCME Annual
Meeting April 12, 2001
2
Knowledge Analysis Technologies, LLC

Founded In 1998 as University of Colorado
spin-off
Mission Develop and market applications using
core Latent Semantic Analysis technology
Flagship Product Intelligent Essay Assessor
Technical staff 9 members of which 5 Ph.D.s have
developed and patented the technologyLandauer,
Laham, Foltz, Lochbaum, Streeter
RD support Ongoing government contracts and
grants Army Research Institute, Air Force
Research Labs, Office of Naval Research, NSF,
NIST, DARPA
IEA in use Army (TRADOC), Prentice-Hall Keys to
Success, Test University, university, middle
school distance learning. Seeking strategic
partnerships with testing agencies / content
providers to integrate technology with
instructional materials.

3
Automated essay scoring

Intelligent Essay Assessor technologies
Latent Semantic Analysis for scoring quality of
content and providing tutorial feedback
Style Mechanics measures (NLP surface
features) for scoring and validation of essay as
appropriate for task
Student essays written to directed prompts
Constructed-response alternative to
multiple-choice for domain knowledge assessment
Directed essay questions or summaries
Writing Across the Curriculum
Reliable, objective, consistent and immediate
Used as second reader, formative evaluations,
diagnostic tutorials, interactive textbooks

4
Inter-rater reliability for standardized and
classroom tests
5
Inter-rater reliability for resolved reader
scores
6
The Intelligent Essay Assessor
Customized
Reader
Overall
CONTENT
Expert Scored Essays
Content

Score
Style

Mechanics
variance VL
Confidence
7
Diagnosis Prescription in an Interactive
Tutorial

Embedded assessment in online learning
Direct student to appropriate learning content
Returns separate Overall, Content, Style and
Mechanics scores with associated rubric feedback
Can provide other useful feedback
Suggest revisions
Flag missing, irrelevant or redundant content
Determine source of knowledge
Detect plagiarism (from text or other students)
Ask students to summarize presented materials to
promote deeper levels of content learning

8
Prentice Hall Companion Websites
9
Prentice Hall Companion Websites
10
Core TechnologyLatent Semantic Analysis

A new kind of artificial intelligence that uses
Machine-learning and neural networks
Sophisticated mathematics
Enormous computing power
Large amounts of electronic text
To capture the meaning of
General English
Subject-specific vocabulary and concepts
By reading
Textbooks, reference material, notes, etc.

11
Why Use LSA?

Measures semantic content against prescribed
standards of quality based on human judgment
NOT just a keyword system, automatically learns
synonyms and variations in wording
Successfully models human judgments in various
psycholinguistic tasks
Passes TOEFL synonym test at mean of foreign
students
Word Sorting tasks
Concept Classification tasks
Passes domain specific multiple-choice tests when
trained on domain knowledge (passed Psych 101
final exam when trained on textbook)

12
Relative prediction strength of individual IEA
components
13
Relative percent contribution for IEA components
14
Reliability for GMAT Issue Test set varying
numbers of pre-scored essays in the Training set
(LSA component measure only N292)
15
Scattergram for GMAT 1 Test Set
16
Reliability by Reader Experience Intro
Psychology Essays on Neural Transmission
17
Summary Street Produces Better Summaries as
Judged by Teachers2-Week Trial Middle School
Students
18
Intelligent Essay Assessor Case Study

900 Narrative Essays by Middle School Students
Prompt Usually the gate was closed but today it
was left open
Scored by an international testing organization
IEA trained on General Knowledge
IEA learns from human scores of training set to
score new essays
IEA agrees with human readers as well as the
human readers agree with each other
(correlation of 90)

19
Scattergram for Narrative Essays
20
IEA compared to Other Systems

Focus is on quality of content as judged by
people rather than on measures of surface
features keywords
Uses background knowledge of domain in assessment
in addition to previously scored essays
Measures what students are saying rather than
just how well they are saying it
Does best when linked to course student learning
materials provides formative assessment of
domain knowledge with tutorial feedback rather
than just a simple overall score
Requires fewer training essays (100 vs. 500)
More difficult to coach student in ways to
receive artificially high score (e.g. use
semi-colons or say Thus and Therefore)
Models do NOT use any count variables (Word
count, etc.)
Proven equally accurate as e-rater on GMAT test