Title: Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor
1Automated Scoring and Annotation of Essays with
theIntelligent Essay Assessor
- Darrell Laham , Ph.D.
- Knowledge Analysis Technologies
www.knowledge-technologies.com
dlaham_at_knowledge-technologies.com NCME Annual
Meeting April 12, 2001
2Knowledge Analysis Technologies, LLC
- Founded In 1998 as University of Colorado
spin-off - Mission Develop and market applications using
core Latent Semantic Analysis technology - Flagship Product Intelligent Essay Assessor
- Technical staff 9 members of which 5 Ph.D.s have
developed and patented the technologyLandauer,
Laham, Foltz, Lochbaum, Streeter - RD support Ongoing government contracts and
grants Army Research Institute, Air Force
Research Labs, Office of Naval Research, NSF,
NIST, DARPA - IEA in use Army (TRADOC), Prentice-Hall Keys to
Success, Test University, university, middle
school distance learning. Seeking strategic
partnerships with testing agencies / content
providers to integrate technology with
instructional materials.
3Automated essay scoring
- Intelligent Essay Assessor technologies
- Latent Semantic Analysis for scoring quality of
content and providing tutorial feedback - Style Mechanics measures (NLP surface
features) for scoring and validation of essay as
appropriate for task - Student essays written to directed prompts
- Constructed-response alternative to
multiple-choice for domain knowledge assessment - Directed essay questions or summaries
- Writing Across the Curriculum
- Reliable, objective, consistent and immediate
- Used as second reader, formative evaluations,
diagnostic tutorials, interactive textbooks
4Inter-rater reliability for standardized and
classroom tests
5Inter-rater reliability for resolved reader
scores
6The Intelligent Essay Assessor
Customized
Reader
Overall
CONTENT
Expert Scored Essays
Content
Score
Style
Mechanics
variance VL
Confidence
7Diagnosis Prescription in an Interactive
Tutorial
- Embedded assessment in online learning
- Direct student to appropriate learning content
- Returns separate Overall, Content, Style and
Mechanics scores with associated rubric feedback - Can provide other useful feedback
- Suggest revisions
- Flag missing, irrelevant or redundant content
- Determine source of knowledge
- Detect plagiarism (from text or other students)
- Ask students to summarize presented materials to
promote deeper levels of content learning
8Prentice Hall Companion Websites
9Prentice Hall Companion Websites
10Core TechnologyLatent Semantic Analysis
- A new kind of artificial intelligence that uses
- Machine-learning and neural networks
- Sophisticated mathematics
- Enormous computing power
- Large amounts of electronic text
- To capture the meaning of
- General English
- Subject-specific vocabulary and concepts
- By reading
- Textbooks, reference material, notes, etc.
11Why Use LSA?
- Measures semantic content against prescribed
standards of quality based on human judgment - NOT just a keyword system, automatically learns
synonyms and variations in wording - Successfully models human judgments in various
psycholinguistic tasks - Passes TOEFL synonym test at mean of foreign
students - Word Sorting tasks
- Concept Classification tasks
- Passes domain specific multiple-choice tests when
trained on domain knowledge (passed Psych 101
final exam when trained on textbook)
12Relative prediction strength of individual IEA
components
13Relative percent contribution for IEA components
14Reliability for GMAT Issue Test set varying
numbers of pre-scored essays in the Training set
(LSA component measure only N292)
15Scattergram for GMAT 1 Test Set
16Reliability by Reader Experience Intro
Psychology Essays on Neural Transmission
17Summary Street Produces Better Summaries as
Judged by Teachers2-Week Trial Middle School
Students
18Intelligent Essay Assessor Case Study
- 900 Narrative Essays by Middle School Students
- Prompt Usually the gate was closed but today it
was left open - Scored by an international testing organization
- IEA trained on General Knowledge
- IEA learns from human scores of training set to
score new essays - IEA agrees with human readers as well as the
human readers agree with each other
(correlation of 90)
19Scattergram for Narrative Essays
20IEA compared to Other Systems
- Focus is on quality of content as judged by
people rather than on measures of surface
features keywords - Uses background knowledge of domain in assessment
in addition to previously scored essays - Measures what students are saying rather than
just how well they are saying it - Does best when linked to course student learning
materials provides formative assessment of
domain knowledge with tutorial feedback rather
than just a simple overall score - Requires fewer training essays (100 vs. 500)
- More difficult to coach student in ways to
receive artificially high score (e.g. use
semi-colons or say Thus and Therefore) - Models do NOT use any count variables (Word
count, etc.) - Proven equally accurate as e-rater on GMAT test
21Other LSA / IEA Products (available or under
development)
- Intelligent Tutoring Systems for Leadership
Skills - with Army Research Institute Yale University
- Collaborative Learning Environments
- with Knowledge Forum (Scardamalia and Bereiter)
- Job, training, and personnel analysis
(www.careermap.org) - with Air Force Research Laboratory
- Interactive Electronic Library Tools
- with National Science Foundation
- Interactive Electronic Technical Manuals
- with Office of Naval Research
- Tools for rapid development of on-line courses
- with NIST and eCollege