Title: While waiting for the talk to start, try to find 4 mistakes in this student essay.
1While waiting for the talk to start, try to find
4 mistakes in this student essay.
Question Suppose you are running in a straight
line at constant speed. You throw a pumpkin
straight up. Where will it land? Explain why.
Student Once the pumpkin leaves my hand, the
horizontal force that I am exerting on it no
longer exists, only a vertical force (caused by
my throwing it). As it reaches its maximum
height, gravity (exerted vertically downward)
will cause the pumpkin to fall. Since no
horizontal force acted on the pumpkin from the
time it left my hand, it will fall at the same
place where it left my hands.
2Can (physics) tutoring systems be more effective
than human tutors?
- Kurt VanLehn
- Pittsburgh Science of Learning Center
- LRDC The Computer Science Department
- University of Pittsburgh
3Thanks!
- Current team
- Pamela Jordan (lead)
- Patricia Albacete
- Min Chi
- John Connelly
- Roxana Gheorghui
- Sung-Young Jung
- Brian Moses Hall
- Uma Pappuswamy
- Mike Ringenberg
- Team Alumni
- Dumisizwe Bhembe
- Michael Boettner
- Andy Gaydos
- Maxim Makatchev
- Antonio Roque
- Carolyn Rosé
- Stephanie Siler
- Ramesh Srivistava
- Roy Wilson
Art Graessers group at the University of
Memphis
4The Learning Science research questionIncreasing
tutoring systems effectiveness?
- Computer aided instruction (CAI) gt classroom by
d0.4 sigma - Kulik, 1994
- Intelligent tutoring systems (ITS) gt classroom by
d1.0 sigma - Koedinger et al. 1997 VanLehn et al. 2006
- Human tutors (HT) gt classroom by d2.0 sigma
- Bloom, 1984
- How can we build tutoring systems that are as
effective as human tutors? - where effect size (Cohens d)
gain(experimental) gain(controls) /
standard_deviation(pooled)
5The Cognitive Science research question The
more interactivity, the more gain?
?
6The Computer Science research question Deep
linguistic techniques vs shallow?
Shallow linguistic Deep linguistic
Natural language understanding (NLU) LSA, other bag-of-words Syntactic grammars, lexicons, semantics
Dialog management Finite state networks Reactive planning
Natural language generation (NLG) Text templates Plan-based
Non-routine language Ignored Anaphora, negation,
Because the techniques are compared in the
context of a tutoring system, we can evaluate
them for pedagogical effectiveness as well as the
usual measures of speed, accuracy, generality,
etc.
7Outline
- Introduction
- Focus on multi-step problem solving
- What is human tutoring?
- Research questions
- Why2-Atlas
- Evaluations
- Of individual techniques
- Of the whole sysem
Next
8A multi-step quantitative problem
Step
Step
Step
Step
Step
Step
Step
9A multi-step qualitative problem
- Q Suppose a man is running in a straight line at
constant speed. He throws a pumpkin straight up.
Where will it land? -
Step
Initially, the man and the pumpkin have the same
horizontal velocity. His throw exerts a net
force vertically on the pumpkin, thus causing a
vertical acceleration, which leaves the
horizontal velocity unaffected
Step
Step
Step
10A multi-step problem where order of steps doesnt
matter
- Q Why do most computers have a disk drive? Why
cant they have only RAM? - Student
- RAMs content disappears when power quits, but
disk content persist. - RAM is usually holds less information than disk
- RAM takes battery power, so larger RAM takes more
power - Certain information, e.g., operating system and
user files, must be stored permanently.
Step
Step
Step
Step
11Outline
- Introduction
- Focus on multi-step problem solving
- What is human tutoring?
- Research questions
- Why2-Atlas
- Evaluations
- Of individual techniques
- Of the whole sysem
Next
12Human tutorial dialogue is a sequence of
episodes, one per step
Q Why does a computer need disk as well as RAM?
- S RAM is too small. Only the disk is big
enough. - T Thats usually true. But suppose you bought
a lot of RAM? Why wouldnt that work? - S The battery would run out too fast.
- T Excellent. What else?
- S Thats it.
- T What if the battery dies?
- S Oh. The RAM dies.
- T Anything wrong with that?
- S You lose your files.
- T Besides the users files, what else would be
lost? - S Beats me.
- T The operating system!
13Schematic of tutorial dialogue
- Problem statement
- Step
- Step
- Step
- Step
- Answer
- Reflection (optional)
14Schematic of dialogue about a single step
Stepend
T Tell
Stepstart
T Elicit
S Correct
Remediation
T Hint, or prompt, or explain, or analogy, or
S Incorrect
15Comparisons of expert to novice human tutors
Stepend
T Tell
Novices
Stepstart
Experts
T Elicit
S Correct
T Hint, or prompt, or explain, or analogy, or
S Incorrect
Experts may have a wider variety
16Outline
- Introduction
- Focus on multi-step problem solving
- What is human tutoring?
- Research questions
- Why2-Atlas
- Evaluations
- Of individual techniques
- Of the whole sysem
Next
17The Learning Science research questionIncreasing
tutoring system effectiveness
- CAI Remediation on answer only
- ITS (e.g., Andes) Remediation on each step
- Hint sequence, with final bottom out hint
- Human tutors Remediation on each step
- Natural language dialogues
- Many tutorial tactics
- A tutoring system with Natural Language for its
remediation?
18The Cognitive Science research question The
more interactivity, the more gain?
?
19The Computer Science research question Deep
linguistic techniques vs shallow?
Shallow linguistic Deep linguistic
Natural language understanding (NLU) LSA, other bag-of-words Syntactic grammars, lexicons, semantics
Dialog management Finite state networks Reactive planning
Natural language generation (NLG) Text templates Plan-based
Non-routine language Ignored Anaphora, negation,
Evaluate for pedagogical effectiveness as well
as the usual measures of speed, accuracy,
generality, etc.
20A task domain where deep understanding may add
value
- Qualitative physics
- A massive truck and a light car have a head-on
collision. Which suffers the greater impact
force? Why? - Linguistic relationship matter
- car, truck, exerts, more, force
- Detecting deep misconceptions
- E.g., Bigger things exert more force.
- Unfortunately, these misconceptions are
notoriously resistant to instruction - Try giving 10 hours of instruction
21Outline
- Introduction
- Focus on multi-step problem solving
- What is human tutoring?
- What is an ITS? CAI?
- Research questions
- Why2-Atlas
- Evaluations
- Of individual techniques
- Of the whole sysem
Next
22Students screen for Why2-Atlas
Problem
Dialogue history
Students essay
Students turn in the dialogue
23Schematic of Why2-Atlas tutorial dialogue
- T ltdisplays problemgt
- S ltEnters essaygt
- T ltanalyzes essay to identify missing
incorrect steps picks one starts a script for
remediation of the stepgt - T When the pumpkin is in the air, what forces
act on it? - ltmany turnsgt
- T Please change your essay
- S ltEdits essaygt
- Tltanalyzes essay to identify missing incorrect
steps picks one starts a scriptgt - ltmany turnsgt
- T Please change your essay
- S ltEdits essaygt
- T ltAnalyzes essay decides all steps are
correctgt - T Excellent! Heres my version ltpresents
ideal essaygt
24Real example (slide 1 of 3)
- Question Suppose a man is running in a straight
line at constant speed. He throws a pumpkin
straight up. Where will it land? Explain. - Students Essay The pumpkin would land in front
of me because of the horizontal speed that is
exerting on the pumpkin.
25Example (slide 2 of 3)
- Tutor I see some opportunities for growth. Let's
chat. So let's discuss a point that seems to be
missing in your essay. Can you tell me what
forces are acting on the pumpkin after it is
released? - Student Gravity force
Picks a missing step (forces) starts a KCD
script
Correct, so goes to next state of the KCD
26Example (slide 3 of 3)
- Tutor Good. Since we know that gravity is the
only force acting on the pumpkin, let me ask you
this. As the gravitational force pulls the
pumpkin down, does it also pull the pumpkin
horizontally a little bit? - Student Yes.
- Tutor Let's look at this more closely. In which
direction does the earth's gravitational force
act? - Student Vertical
- Tutor Let's consider this some more. Is that
vertical upwards or vertical downwards? - Student Vertical downwards
Wrong, so subdialogue
Incomplete, so subdialogue
27Schematic of a single step
Stepend
T Tell
Stepstart
T Elicit
S Correct
KCD
T S T S T
S Incorrect
28Outline
- Introduction
- Focus on multi-step problem solving
- What is human tutoring?
- What is an ITS? CAI?
- Research questions
- Why2-Atlas
- Evaluations
- Of individual techniques
- Of the whole sysem
Next
29Why2-Atlas main modules
Student enters/edits the essay
Words
Sentence understander
FOPL propositions
Essay understander
Missing / bad steps
Discourse manager decides what to do w.r.t.
history
Clarification
Done
Script for remedying missing/bad step
KCD script interpreter
RealPro NLG
Ideal essay
Student
30Modules evaluated (in yellow)
Student enters/edits the essay
Words
Sentence understander
FOPL propositions
Essay understander
Missing / bad steps
Discourse manager decides what to do w.r.t.
history
Clarification
Done
Script for remedying missing/bad step
KCD script interpreter
RealPro NLG
Ideal essay
Student
31Evaluate for accuracy (w.r.t. human judges) and
speed
Student enters/edits the essay
Words
Sentence understander
Propositions
- Deep NLU Carmel
- LCFlex parser
- Comlex lexicon
- Semantic authoring tool
- Shallow NLU Naïve Bayes LSA
- Hybrids CarmelTC Rapel
- Result
- Similar accuracy
- Complementary errors
- Best to use all 3
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
32Evaluate for utility as tool
Student enters/edits the essay
- Re-implemented as TuTalk
- GUI authoring system
- XML authoring system
- Handy features (e.g., /- feedback) for ITS
- Currently being used by 4 projects
Words
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
33Evaluate for accuracy speed
Student enters/edits the essay
Words
Next few slides
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
34Essay analysis You probably found all 4
incorrect steps. Can the essay analyzer?
Question Suppose you are running in a straight
line at constant speed. You throw a pumpkin
straight up. Where will it land? Explain why.
Student Once the pumpkin leaves my hand, the
horizontal force that I am exerting on it no
longer exists, only a vertical force (caused by
my throwing it). As it reaches its maximum
height, gravity (exerted vertically downward)
will cause the pumpkin to fall. Since no
horizontal force acted on the pumpkin from the
time it left my hand, it will fall at the same
place where it left my hands.
35Research problem, more precisely, is
- Given
- Students sentences s1s2, s3s4s5,
- Set of correct steps c1c2, c3c4c5,
- Set of incorrect stepsi1i2, i3, i4i5,.
- Determine Which correct and incorrect steps
match the students sentences - Directly (graph matching)
- Indirectly, using domain knowledge
36Why do we need indirect matching?
- The student said (incorrectly)
- The pumpkin slows down, so it lands behind me.
- Correct steps
- Yada
- Yada
- Yada
- Incorrect steps
- Yada
- When there is no force to propel an object along,
it slows down - Air friction matters
- Yada
Essay analyzer should output both derivations,
with estimates of their probabilities
37First method Abduction using Tacitus-Lite
- Backchaining theorem prover (like Prolog)
- Students utterance ? goal to be proved
- Problem statement ? givens
- Proofs of earlier student utterances ? more
givens - Accepts goals without proof (at a cost)
- Because not everything can be anticipated
- Searches for lowest cost proof
- Checks consistency as it goes
- Dont try to prove p when the proof already has
p.
38Derivation 1 (of 2) for The pumpkin slows down
Incorrect inference rule
The velocity of the pumpkin is decreasing
Student said this
Imprecision
An inference rule
The horizontal component of the velocity of the
pumpkin is decreasing
(The net force causes the velocity, so) zero net
force implies velocity decreases
The horizontal component of the net force on the
pumpkin is zero
A correct inference rule
Net force is sum of forces
The horizontal component of the air friction
force on the pumpkin is zero
The horizontal component of the mans force on
the pumpkin is zero
given
given
39Derivation 2 (of 2) for The pumpkin slows down
The velocity of the pumpkin is decreasing
Student said this
Imprecision
The horizontal component of the velocity of the
pumpkin is decreasing
Kinematics
The horizontal component of the acceleration of
the pumpkin is negative
Newtons second law
The horizontal component of the net force on the
pumpkin is negative
Net force is sum of forces
The horizontal component of the air friction
force on the pumpkin is negative
The horizontal component of the mans force on
the pumpkin is zero
False assumption
given
40Results of using Tacitus-Lite
- Acceptable accuracy, but far too slow
- Cost may not be a good substitute for probability
when there are multiple competing explanations
41Second method Precompute the time-consuming
reasoning
- Precomputions
- The deductive closure of the problem statement
givens - Save as directed graph
- Label subsets of nodes that represent correct and
incorrect steps - Convert to Bayesian network train
- To analyze a students utterance
- Clamp directly matched nodes as evidence
- Run Bayesian network
- Read out most probable steps
42Results Fast enough. Better accuracy, but
not by much.
43Summary
- Methods
- Abductive theorem prover
- Bayesian deductive closure
- Results
- Similar accuracy
- Bayesian deductive closure faster than
abductive theorem prover
Student enters/edits the essay
Words
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
44Outline
- Introduction
- Focus on multi-step problem solving
- What is human tutoring?
- What is an ITS? CAI?
- Research questions
- Why2-Atlas
- Evaluations
- Of individual techniques
- Of the whole sysem
Next
45Evaluation framework
Only step remediation varies with the condition
- Pretest (1 hr)
- Training (5 to 10 hrs)For each question, do
- Student enters initial essay
- Tutor analyses it for missing incorrect steps,
picks one, and discusses it with student - Student enters revised essay
- Tutor either congratulates student presents
ideal essayor goes to step 2 - Posttest (1 hr)
46Conditions
- Expert Human tutors
- Text-based communication
- Spoken communication
- Computer tutors
- Why2-Atlas (VanLehn et al.)
- ITSPOKE (Litman et al.)
- Why2-AutoTutor (Graesser et al.)
- Control conditions
- Canned text remediation
- Textbook
47(No Transcript)
48Human tutors
Stepend
T Tell
Stepstart
T Elicit
S Correct
T Hint, or prompt, or explain, or analogy, or
S Incorrect
49Why2-Atlas
Stepend
T Tell
Stepstart
T Elicit
S Correct
Knowledge construction dialogue
S Incorrect
50Why2-AutoTutor
Stepend
T Tell
Stepstart
T Elicit
S Correct
Hint,or prompt,or assert
S Incorrect
51Canned-text remediation
Stepend
T Tell
Stepstart
T Elicit
S Correct
lttextgt
S Incorrect
52Results from 7 experiments
- Why2-Atlas Why2-AutoTutor
- Trend for gt, but not significant
- Why2-Atlas may need more development
- Why2 gt Textbook
- In Textbook condition, students do not write
essays - Why2 Human tutoring !!!
- Human tutoring Canned text remediation
- Exception If pre-physics students get
instruction designed for post-physics students,
then Human tutoring gt Canned text remediation
53Impact significance of the results
- Why2-Atlas Why2-AutoTutor
- Common in AI that complex techniques are only
slightly better than simple ones, at least
initially. - Why2 gt Textbook
- Common in Learning Sciences that active gt passive
- Why2 Human tutoring Canned text remediation
- Highly counter-intuitive to Learning and
Cognitive scientists (including us)
54Hypothesis 1 Exactly how tutors remedy a step
doesnt matter much
Stepend
T Tell
Stepstart
T Elicit
S Correct
Whats in here doesnt matter much
S Incorrect
55Other studies where type of step remediation had
little impact
- Human tutors
- Human tutoring human tutoring with only
content-free prompting for step remediation (Chi
et al., 2001) - Human tutoring solving a problem in pairs with
a video solution available (Chi et al., 2007) - Human tutoring canned text during post-practice
remediation (Katz et al., 2003) - Human tutoring an ITS (Reif Scott, 1999)
- Micro-analyses of human tutoring (VanLehn et al.,
2003) - Socratic human tutoring didactic human tutoring
(Rosé et al., 2001a Johnson Johnson, 1992) - Natural language tutoring systems
- Circsim (canned text) Cirsim Tutor (Evens
Michael, 2007) - Andes-Atlas Andes with canned text (Rosé et al,
2001b) - Cognitive geometry tutor (Aleven et al., 2004)
56Hypothesis 2 Cannot eliminate the step
remediation loop
Stepend
T Tell
Stepstart
Must avoid this
T Elicit
S Correct
Text
S Incorrect
57Studies consistent with harmfulness of just
telling explaining
- Human tutoring
- Human tutoring gt textbook alone (Azevedo Evens
VanLehn) - Human tutoring gt lecture/demo (Wood et al. 1978
Swanton, - Natural language tutoring systems
- NLT gt textbook alone (Graesser Evens Lane
Vanlehn) - NLT gt lecture/demo (Craig)
58Conclusions
- Learning Science Can computer tutors be as
effective as human tutors? - Yes, as long as students attempt steps with
feedback hints on each - Computer Science When is deep linguistic
technology more effective than shallow? - Several positive results at module level
- At whole system level, still tied, but
encouraging - Cognitive Science The higher the interactivity,
the higher the learning gains? - No. See next slide
59The interactivity plateau
Claim Perhaps Blooms 2 experiments were
confounded
60How can we achieve super-human results?
?
61Future work (slide 1 of 3)Increasing engagement
- NeuroCog engagement meter (DARPA)
- Can we reliably measure engagment with fMRI?
- Can we train students to maintain engagement with
it? - Interesting problems (PSLC)
- Ill-defined design problems
- Recommender system
- ITS as a member of a social network (DFK, PSLC)
- Pairs gt solos for engagment, but correctness?
- Can we add an ITS without destroying engagement?
62Future work (slide 2 of 3)Faster learning
Faster authoring
- Author student interface PowerPoint
- Fast to learn use
- e.g., type Let V1, V2 be the initial, final
velocities - Freedom domain independence
- As students master a step
- Tutor does it, or
- It gets folded into a larger step
- TruthBench
- Knowledge acquisition for truth checking vs.
- Knowledge acquisition for solving an
(ill-defined) problem - Examples instead of hints
63Future work (slide 3 of 3) Teach what an AI
learner needs
- Explicit teaching of backwards chaining (PSLC)
- Accelerates learning transfers (Chi VanLehn,
2007) - Explicit teaching of confluences
- KE ½ m v2 ? If mass and kinetic energy are
constant, then velocity must be constant - Explicit teaching of abstraction planning
- KE ½ m v2 ? If need a velocity, then find a
kinetic energy - Dream system A model human learner
- For testing curriculum designs
- Getting the step sizes right
64Thanks!
- See www.pitt.edu/vanlehnfor publications
65When to use deep vs. shallow?
Shallow linguistic Deep linguistic
Sentence understanding LSA, Rainbow, Rappel Carmel parser, semantics
Essay/Discourse understanding LSA Abduction, Bnets
Dialog management Finite state networks Reactive planning
Natural language generation Text Plan-based
Use both
Use deep
Use locally smart FSA
Use equivalent texts
66- It aint so much the things we dont know that
get us into trouble. Its the things we know that
just aint so. - -- Josh Billings (Henry Wheeler Shaw)
67A deep sentence understanderCarmel
- LCFlex parser, a robust parser that uses
skipping, insertion flexible unification (Rosé
Lavie 2001) - Comlex, with 40,000 lexemes (Grishman et al.,
1994) - A broad-coverage, domain-independent, English
syntactic grammar - CarmelTools for semi-automatically creating
semantic functions (Rosé 2000, Rosé et al., AIED
2003).
68Shallow sentence understanders
- Words only
- Rainbow A Naïve Bayes text classifier
- Given new bag of words, calculates most probable
domain propositions using Bayes rule - LSA, several others
- Worse, not currently used
- Words syntactic features
- Rappel (Jordan)
- Minipar produces dependency relations between
words - Ripper builds one classifier per predicate type
per argument of the predicate type - CarmelTC (Rosé et al., HLT/NAACL 2003)
- Worse, not currently used
69Results
- Speed All were fast enough
- Accuracy All were too low
- Deep Words Words syntactic features
- Best (Jordan et al, ITS04)
- Run all 3
- Use heuristics to choose 1 of 3 outputs
- E.g., If velocity or speed appear in the
sentence, then velocity should appear in the
output propositions somewhere.
70Direct matching via largest common subgraph
(Shearer et al., 2001)
Expectation The speed of the pumpkin is the
same as the speed of man.
compare(x1, x2, same)
x2
x1
speed(x1, pumpkin, ...)
speed(x2, man, ...)
Student said allow us to compare it to the
speed of the pumpkin.
compare(x5, x6, ...)
x5
speed(x5,pumpkin,...)
71Generation of the ATMS for a problem starts with
givens (node atomic prop. red incorrect)
Correct givens
Buggy given
72Apply rules forward (RA rule application)
RA1
RA2
Correct givens
Buggy given
73Stop when no more rule applications
RA4
RA3
RA1
RA2
Correct givens
Buggy given
74Propositions are not always ground!Variables
(colored links) shared across nodes
Correct givens
Buggy given
75Specific subsets correspond to expectations /
misconceptions
Expectation (a key step in the explanation)?
Expectation
Misconception
Correct givens
Buggy given
76At runtime, find all node subsets that unify with
the student input
Input
Correct givens
Buggy given
77In this case, two subsets (happy faces) unify
with students input
Input
Correct givens
Buggy given
78Output the expectations/misconceptions that are
directly matched
Input
RA4
RA3
Direct match
RA2
Close, but not a direct match
RA1
Correct givens
Buggy given
79Bnet nodes represent sets of nodes for
expectations/misconceptions
Input
RA4
RA3
RA2
RA1
Correct givens
Buggy given
80Clamp students input nodes (happy faces) and
update net
High posterior probability
RA4
RA3
Moderate posterior probability
RA2
RA1
Low posterior probability
Correct givens
Buggy given
81Training and evaluation
- Topology of network is given by deductive
closure, etc. - Only learn the conditional probabilities
- 293 sentences coded by human
- Expectation/Maximization
- 10-fold cross validation