While waiting for the talk to start, try to find 4 mistakes in this student essay.

About This Presentation

Title:

While waiting for the talk to start, try to find 4 mistakes in this student essay.

Description:

While waiting for the talk to start, try to find 4 mistakes in this student essay. Question: Suppose you are running in a straight line at constant speed. – PowerPoint PPT presentation

Number of Views:1076

Avg rating:3.0/5.0

Slides: 82

Provided by: LRDC5

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: While waiting for the talk to start, try to find 4 mistakes in this student essay.

1
While waiting for the talk to start, try to find
4 mistakes in this student essay.
Question Suppose you are running in a straight
line at constant speed. You throw a pumpkin
straight up. Where will it land? Explain why.
Student Once the pumpkin leaves my hand, the
horizontal force that I am exerting on it no
longer exists, only a vertical force (caused by
my throwing it). As it reaches its maximum
height, gravity (exerted vertically downward)
will cause the pumpkin to fall. Since no
horizontal force acted on the pumpkin from the
time it left my hand, it will fall at the same
place where it left my hands.
2
Can (physics) tutoring systems be more effective
than human tutors?

Kurt VanLehn
Pittsburgh Science of Learning Center
LRDC The Computer Science Department
University of Pittsburgh

3
Thanks!

Current team
Pamela Jordan (lead)
Patricia Albacete
Min Chi
John Connelly
Roxana Gheorghui
Sung-Young Jung
Brian Moses Hall
Uma Pappuswamy
Mike Ringenberg

Team Alumni
Dumisizwe Bhembe
Michael Boettner
Andy Gaydos
Maxim Makatchev
Antonio Roque
Carolyn Rosé
Stephanie Siler
Ramesh Srivistava
Roy Wilson

Art Graessers group at the University of
Memphis
4
The Learning Science research questionIncreasing
tutoring systems effectiveness?

Computer aided instruction (CAI) gt classroom by
d0.4 sigma
Kulik, 1994
Intelligent tutoring systems (ITS) gt classroom by
d1.0 sigma
Koedinger et al. 1997 VanLehn et al. 2006
Human tutors (HT) gt classroom by d2.0 sigma
Bloom, 1984
How can we build tutoring systems that are as
effective as human tutors?
where effect size (Cohens d)
gain(experimental) gain(controls) /
standard_deviation(pooled)

5
The Cognitive Science research question The
more interactivity, the more gain?
?
6
The Computer Science research question Deep
linguistic techniques vs shallow?
Shallow linguistic Deep linguistic
Natural language understanding (NLU) LSA, other bag-of-words Syntactic grammars, lexicons, semantics
Dialog management Finite state networks Reactive planning
Natural language generation (NLG) Text templates Plan-based
Non-routine language Ignored Anaphora, negation,
Because the techniques are compared in the
context of a tutoring system, we can evaluate
them for pedagogical effectiveness as well as the
usual measures of speed, accuracy, generality,
etc.
7
Outline

Introduction
Focus on multi-step problem solving
What is human tutoring?
Research questions
Why2-Atlas
Evaluations
Of individual techniques
Of the whole sysem

Next
8
A multi-step quantitative problem
Step
Step
Step
Step
Step
Step
Step
9
A multi-step qualitative problem

Q Suppose a man is running in a straight line at
constant speed. He throws a pumpkin straight up.
Where will it land?

Step
Initially, the man and the pumpkin have the same
horizontal velocity. His throw exerts a net
force vertically on the pumpkin, thus causing a
vertical acceleration, which leaves the
horizontal velocity unaffected
Step
Step
Step
10
A multi-step problem where order of steps doesnt
matter

Q Why do most computers have a disk drive? Why
cant they have only RAM?
Student
RAMs content disappears when power quits, but
disk content persist.
RAM is usually holds less information than disk
RAM takes battery power, so larger RAM takes more
power
Certain information, e.g., operating system and
user files, must be stored permanently.

Step
Step
Step
Step
11
Outline

Introduction
Focus on multi-step problem solving
What is human tutoring?
Research questions
Why2-Atlas
Evaluations
Of individual techniques
Of the whole sysem

Next
12
Human tutorial dialogue is a sequence of
episodes, one per step
Q Why does a computer need disk as well as RAM?

S RAM is too small. Only the disk is big
enough.
T Thats usually true. But suppose you bought
a lot of RAM? Why wouldnt that work?
S The battery would run out too fast.
T Excellent. What else?
S Thats it.
T What if the battery dies?
S Oh. The RAM dies.
T Anything wrong with that?
S You lose your files.
T Besides the users files, what else would be
lost?
S Beats me.
T The operating system!

13
Schematic of tutorial dialogue

Problem statement
Step
Step
Step
Step
Answer
Reflection (optional)

14
Schematic of dialogue about a single step
Stepend
T Tell
Stepstart
T Elicit
S Correct
Remediation
T Hint, or prompt, or explain, or analogy, or
S Incorrect
15
Comparisons of expert to novice human tutors
Stepend
T Tell
Novices
Stepstart
Experts
T Elicit
S Correct
T Hint, or prompt, or explain, or analogy, or
S Incorrect
Experts may have a wider variety
16
Outline

Introduction
Focus on multi-step problem solving
What is human tutoring?
Research questions
Why2-Atlas
Evaluations
Of individual techniques
Of the whole sysem

Next
17
The Learning Science research questionIncreasing
tutoring system effectiveness

CAI Remediation on answer only
ITS (e.g., Andes) Remediation on each step
Hint sequence, with final bottom out hint
Human tutors Remediation on each step
Natural language dialogues
Many tutorial tactics
A tutoring system with Natural Language for its
remediation?

18
The Cognitive Science research question The
more interactivity, the more gain?
?
19
The Computer Science research question Deep
linguistic techniques vs shallow?
Shallow linguistic Deep linguistic
Natural language understanding (NLU) LSA, other bag-of-words Syntactic grammars, lexicons, semantics
Dialog management Finite state networks Reactive planning
Natural language generation (NLG) Text templates Plan-based
Non-routine language Ignored Anaphora, negation,
Evaluate for pedagogical effectiveness as well
as the usual measures of speed, accuracy,
generality, etc.
20
A task domain where deep understanding may add
value

Qualitative physics
A massive truck and a light car have a head-on
collision. Which suffers the greater impact
force? Why?
Linguistic relationship matter
car, truck, exerts, more, force
Detecting deep misconceptions
E.g., Bigger things exert more force.
Unfortunately, these misconceptions are
notoriously resistant to instruction
Try giving 10 hours of instruction

21
Outline

Introduction
Focus on multi-step problem solving
What is human tutoring?
What is an ITS? CAI?
Research questions
Why2-Atlas
Evaluations
Of individual techniques
Of the whole sysem

Next
22
Students screen for Why2-Atlas
Problem
Dialogue history
Students essay
Students turn in the dialogue
23
Schematic of Why2-Atlas tutorial dialogue

T ltdisplays problemgt
S ltEnters essaygt
T ltanalyzes essay to identify missing
incorrect steps picks one starts a script for
remediation of the stepgt
T When the pumpkin is in the air, what forces
act on it?
ltmany turnsgt
T Please change your essay
S ltEdits essaygt
Tltanalyzes essay to identify missing incorrect
steps picks one starts a scriptgt
ltmany turnsgt
T Please change your essay
S ltEdits essaygt
T ltAnalyzes essay decides all steps are
correctgt
T Excellent! Heres my version ltpresents
ideal essaygt

24
Real example (slide 1 of 3)

Question Suppose a man is running in a straight
line at constant speed. He throws a pumpkin
straight up. Where will it land? Explain.
Students Essay The pumpkin would land in front
of me because of the horizontal speed that is
exerting on the pumpkin.

25
Example (slide 2 of 3)

Tutor I see some opportunities for growth. Let's
chat. So let's discuss a point that seems to be
missing in your essay. Can you tell me what
forces are acting on the pumpkin after it is
released?
Student Gravity force

Picks a missing step (forces) starts a KCD
script
Correct, so goes to next state of the KCD
26
Example (slide 3 of 3)

Tutor Good. Since we know that gravity is the
only force acting on the pumpkin, let me ask you
this. As the gravitational force pulls the
pumpkin down, does it also pull the pumpkin
horizontally a little bit?
Student Yes.
Tutor Let's look at this more closely. In which
direction does the earth's gravitational force
act?
Student Vertical
Tutor Let's consider this some more. Is that
vertical upwards or vertical downwards?
Student Vertical downwards

Wrong, so subdialogue
Incomplete, so subdialogue
27
Schematic of a single step
Stepend
T Tell
Stepstart
T Elicit
S Correct
KCD
T S T S T
S Incorrect
28
Outline

Introduction
Focus on multi-step problem solving
What is human tutoring?
What is an ITS? CAI?
Research questions
Why2-Atlas
Evaluations
Of individual techniques
Of the whole sysem

Next
29
Why2-Atlas main modules
Student enters/edits the essay
Words
Sentence understander
FOPL propositions
Essay understander
Missing / bad steps
Discourse manager decides what to do w.r.t.
history
Clarification
Done
Script for remedying missing/bad step
KCD script interpreter
RealPro NLG
Ideal essay
Student
30
Modules evaluated (in yellow)
Student enters/edits the essay
Words
Sentence understander
FOPL propositions
Essay understander
Missing / bad steps
Discourse manager decides what to do w.r.t.
history
Clarification
Done
Script for remedying missing/bad step
KCD script interpreter
RealPro NLG
Ideal essay
Student
31
Evaluate for accuracy (w.r.t. human judges) and
speed
Student enters/edits the essay
Words
Sentence understander
Propositions

Deep NLU Carmel
LCFlex parser
Comlex lexicon
Semantic authoring tool
Shallow NLU Naïve Bayes LSA
Hybrids CarmelTC Rapel
Result
Similar accuracy
Complementary errors
Best to use all 3

Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
32
Evaluate for utility as tool
Student enters/edits the essay

Re-implemented as TuTalk
GUI authoring system
XML authoring system
Handy features (e.g., /- feedback) for ITS
Currently being used by 4 projects

Words
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
33
Evaluate for accuracy speed
Student enters/edits the essay
Words
Next few slides
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
34
Essay analysis You probably found all 4
incorrect steps. Can the essay analyzer?
Question Suppose you are running in a straight
line at constant speed. You throw a pumpkin
straight up. Where will it land? Explain why.
Student Once the pumpkin leaves my hand, the
horizontal force that I am exerting on it no
longer exists, only a vertical force (caused by
my throwing it). As it reaches its maximum
height, gravity (exerted vertically downward)
will cause the pumpkin to fall. Since no
horizontal force acted on the pumpkin from the
time it left my hand, it will fall at the same
place where it left my hands.
35
Research problem, more precisely, is

Given
Students sentences s1s2, s3s4s5,
Set of correct steps c1c2, c3c4c5,
Set of incorrect stepsi1i2, i3, i4i5,.
Determine Which correct and incorrect steps
match the students sentences
Directly (graph matching)
Indirectly, using domain knowledge

36
Why do we need indirect matching?

The student said (incorrectly)
The pumpkin slows down, so it lands behind me.
Correct steps
Yada
Yada
Yada
Incorrect steps
Yada
When there is no force to propel an object along,
it slows down
Air friction matters
Yada

Essay analyzer should output both derivations,
with estimates of their probabilities
37
First method Abduction using Tacitus-Lite

Backchaining theorem prover (like Prolog)
Students utterance ? goal to be proved
Problem statement ? givens
Proofs of earlier student utterances ? more
givens
Accepts goals without proof (at a cost)
Because not everything can be anticipated
Searches for lowest cost proof
Checks consistency as it goes
Dont try to prove p when the proof already has
p.

38
Derivation 1 (of 2) for The pumpkin slows down
Incorrect inference rule
The velocity of the pumpkin is decreasing
Student said this
Imprecision
An inference rule
The horizontal component of the velocity of the
pumpkin is decreasing
(The net force causes the velocity, so) zero net
force implies velocity decreases
The horizontal component of the net force on the
pumpkin is zero
A correct inference rule
Net force is sum of forces
The horizontal component of the air friction
force on the pumpkin is zero
The horizontal component of the mans force on
the pumpkin is zero
given
given
39
Derivation 2 (of 2) for The pumpkin slows down
The velocity of the pumpkin is decreasing
Student said this
Imprecision
The horizontal component of the velocity of the
pumpkin is decreasing
Kinematics
The horizontal component of the acceleration of
the pumpkin is negative
Newtons second law
The horizontal component of the net force on the
pumpkin is negative
Net force is sum of forces
The horizontal component of the air friction
force on the pumpkin is negative
The horizontal component of the mans force on
the pumpkin is zero
False assumption
given
40
Results of using Tacitus-Lite

Acceptable accuracy, but far too slow
Cost may not be a good substitute for probability
when there are multiple competing explanations

41
Second method Precompute the time-consuming
reasoning

Precomputions
The deductive closure of the problem statement
givens
Save as directed graph
Label subsets of nodes that represent correct and
incorrect steps
Convert to Bayesian network train
To analyze a students utterance
Clamp directly matched nodes as evidence
Run Bayesian network
Read out most probable steps

42
Results Fast enough. Better accuracy, but
not by much.
43
Summary

Methods
Abductive theorem prover
Bayesian deductive closure
Results
Similar accuracy
Bayesian deductive closure faster than
abductive theorem prover

Student enters/edits the essay
Words
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
44
Outline

Introduction
Focus on multi-step problem solving
What is human tutoring?
What is an ITS? CAI?
Research questions
Why2-Atlas
Evaluations
Of individual techniques
Of the whole sysem

Next
45
Evaluation framework
Only step remediation varies with the condition

Pretest (1 hr)
Training (5 to 10 hrs)For each question, do
Student enters initial essay
Tutor analyses it for missing incorrect steps,
picks one, and discusses it with student
Student enters revised essay
Tutor either congratulates student presents
ideal essayor goes to step 2
Posttest (1 hr)

46
Conditions

Expert Human tutors
Text-based communication
Spoken communication
Computer tutors
Why2-Atlas (VanLehn et al.)
ITSPOKE (Litman et al.)
Why2-AutoTutor (Graesser et al.)
Control conditions
Canned text remediation
Textbook

47
(No Transcript)
48
Human tutors
Stepend
T Tell
Stepstart
T Elicit
S Correct
T Hint, or prompt, or explain, or analogy, or
S Incorrect
49
Why2-Atlas
Stepend
T Tell
Stepstart
T Elicit
S Correct
Knowledge construction dialogue
S Incorrect
50
Why2-AutoTutor
Stepend
T Tell
Stepstart
T Elicit
S Correct
Hint,or prompt,or assert
S Incorrect
51
Canned-text remediation
Stepend
T Tell
Stepstart
T Elicit
S Correct
lttextgt
S Incorrect
52
Results from 7 experiments

Why2-Atlas Why2-AutoTutor
Trend for gt, but not significant
Why2-Atlas may need more development
Why2 gt Textbook
In Textbook condition, students do not write
essays
Why2 Human tutoring !!!
Human tutoring Canned text remediation
Exception If pre-physics students get
instruction designed for post-physics students,
then Human tutoring gt Canned text remediation

53
Impact significance of the results

Why2-Atlas Why2-AutoTutor
Common in AI that complex techniques are only
slightly better than simple ones, at least
initially.
Why2 gt Textbook
Common in Learning Sciences that active gt passive
Why2 Human tutoring Canned text remediation
Highly counter-intuitive to Learning and
Cognitive scientists (including us)

54
Hypothesis 1 Exactly how tutors remedy a step
doesnt matter much
Stepend
T Tell
Stepstart
T Elicit
S Correct
Whats in here doesnt matter much

S Incorrect
55
Other studies where type of step remediation had
little impact

Human tutors
Human tutoring human tutoring with only
content-free prompting for step remediation (Chi
et al., 2001)
Human tutoring solving a problem in pairs with
a video solution available (Chi et al., 2007)
Human tutoring canned text during post-practice
remediation (Katz et al., 2003)
Human tutoring an ITS (Reif Scott, 1999)
Micro-analyses of human tutoring (VanLehn et al.,
2003)
Socratic human tutoring didactic human tutoring
(Rosé et al., 2001a Johnson Johnson, 1992)
Natural language tutoring systems
Circsim (canned text) Cirsim Tutor (Evens
Michael, 2007)
Andes-Atlas Andes with canned text (Rosé et al,
2001b)
Cognitive geometry tutor (Aleven et al., 2004)

56
Hypothesis 2 Cannot eliminate the step
remediation loop
Stepend
T Tell
Stepstart
Must avoid this
T Elicit
S Correct
Text
S Incorrect
57
Studies consistent with harmfulness of just
telling explaining

Human tutoring
Human tutoring gt textbook alone (Azevedo Evens
VanLehn)
Human tutoring gt lecture/demo (Wood et al. 1978
Swanton,
Natural language tutoring systems
NLT gt textbook alone (Graesser Evens Lane
Vanlehn)
NLT gt lecture/demo (Craig)

58
Conclusions

Learning Science Can computer tutors be as
effective as human tutors?
Yes, as long as students attempt steps with
feedback hints on each
Computer Science When is deep linguistic
technology more effective than shallow?
Several positive results at module level
At whole system level, still tied, but
encouraging
Cognitive Science The higher the interactivity,
the higher the learning gains?
No. See next slide

59
The interactivity plateau
Claim Perhaps Blooms 2 experiments were
confounded
60
How can we achieve super-human results?
?
61
Future work (slide 1 of 3)Increasing engagement

NeuroCog engagement meter (DARPA)
Can we reliably measure engagment with fMRI?
Can we train students to maintain engagement with
it?
Interesting problems (PSLC)
Ill-defined design problems
Recommender system
ITS as a member of a social network (DFK, PSLC)
Pairs gt solos for engagment, but correctness?
Can we add an ITS without destroying engagement?

62
Future work (slide 2 of 3)Faster learning
Faster authoring

Author student interface PowerPoint
Fast to learn use
e.g., type Let V1, V2 be the initial, final
velocities
Freedom domain independence
As students master a step
Tutor does it, or
It gets folded into a larger step
TruthBench
Knowledge acquisition for truth checking vs.
Knowledge acquisition for solving an
(ill-defined) problem
Examples instead of hints

63
Future work (slide 3 of 3) Teach what an AI
learner needs

Explicit teaching of backwards chaining (PSLC)
Accelerates learning transfers (Chi VanLehn,
2007)
Explicit teaching of confluences
KE ½ m v2 ? If mass and kinetic energy are
constant, then velocity must be constant
Explicit teaching of abstraction planning
KE ½ m v2 ? If need a velocity, then find a
kinetic energy
Dream system A model human learner
For testing curriculum designs
Getting the step sizes right

64
Thanks!

See www.pitt.edu/vanlehnfor publications

65
When to use deep vs. shallow?
Shallow linguistic Deep linguistic
Sentence understanding LSA, Rainbow, Rappel Carmel parser, semantics
Essay/Discourse understanding LSA Abduction, Bnets
Dialog management Finite state networks Reactive planning
Natural language generation Text Plan-based
Use both
Use deep
Use locally smart FSA
Use equivalent texts
66

It aint so much the things we dont know that
get us into trouble. Its the things we know that
just aint so.
-- Josh Billings (Henry Wheeler Shaw)

67
A deep sentence understanderCarmel

LCFlex parser, a robust parser that uses
skipping, insertion flexible unification (Rosé
Lavie 2001)
Comlex, with 40,000 lexemes (Grishman et al.,
1994)
A broad-coverage, domain-independent, English
syntactic grammar
CarmelTools for semi-automatically creating
semantic functions (Rosé 2000, Rosé et al., AIED
2003).

68
Shallow sentence understanders

Words only
Rainbow A Naïve Bayes text classifier
Given new bag of words, calculates most probable
domain propositions using Bayes rule
LSA, several others
Worse, not currently used
Words syntactic features
Rappel (Jordan)
Minipar produces dependency relations between
words
Ripper builds one classifier per predicate type
per argument of the predicate type
CarmelTC (Rosé et al., HLT/NAACL 2003)
Worse, not currently used

69
Results

Speed All were fast enough
Accuracy All were too low
Deep Words Words syntactic features
Best (Jordan et al, ITS04)
Run all 3
Use heuristics to choose 1 of 3 outputs
E.g., If velocity or speed appear in the
sentence, then velocity should appear in the
output propositions somewhere.

70
Direct matching via largest common subgraph
(Shearer et al., 2001)
Expectation The speed of the pumpkin is the
same as the speed of man.
compare(x1, x2, same)
x2
x1
speed(x1, pumpkin, ...)
speed(x2, man, ...)
Student said allow us to compare it to the
speed of the pumpkin.
compare(x5, x6, ...)
x5
speed(x5,pumpkin,...)
71
Generation of the ATMS for a problem starts with
givens (node atomic prop. red incorrect)
Correct givens
Buggy given
72
Apply rules forward (RA rule application)
RA1
RA2
Correct givens
Buggy given
73
Stop when no more rule applications
RA4
RA3
RA1
RA2
Correct givens
Buggy given
74
Propositions are not always ground!Variables
(colored links) shared across nodes
Correct givens
Buggy given
75
Specific subsets correspond to expectations /
misconceptions
Expectation (a key step in the explanation)?
Expectation
Misconception
Correct givens
Buggy given
76
At runtime, find all node subsets that unify with
the student input
Input
Correct givens
Buggy given
77
In this case, two subsets (happy faces) unify
with students input
Input
Correct givens
Buggy given
78
Output the expectations/misconceptions that are
directly matched
Input
RA4
RA3
Direct match
RA2
Close, but not a direct match
RA1
Correct givens
Buggy given
79
Bnet nodes represent sets of nodes for
expectations/misconceptions
Input
RA4
RA3
RA2
RA1
Correct givens
Buggy given
80
Clamp students input nodes (happy faces) and
update net
High posterior probability
RA4
RA3
Moderate posterior probability
RA2
RA1
Low posterior probability
Correct givens
Buggy given
81
Training and evaluation

Topology of network is given by deductive
closure, etc.
Only learn the conditional probabilities
293 sentences coded by human
Expectation/Maximization
10-fold cross validation

Write a Comment

User Comments (0)

About PowerShow.com

While waiting for the talk to start, try to find 4 mistakes in this student essay. - PowerPoint PPT Presentation

While waiting for the talk to start, try to find 4 mistakes in this student essay.

While waiting for the talk to start, try to find 4 mistakes in this student essay. Question: Suppose you are running in a straight line at constant speed. – PowerPoint PPT presentation