While waiting for the talk to start, try to find 4 mistakes in this student essay. - PowerPoint PPT Presentation

About This Presentation
Title:

While waiting for the talk to start, try to find 4 mistakes in this student essay.

Description:

While waiting for the talk to start, try to find 4 mistakes in this student essay. Question: Suppose you are running in a straight line at constant speed. – PowerPoint PPT presentation

Number of Views:1076
Avg rating:3.0/5.0
Slides: 82
Provided by: LRDC5
Category:

less

Transcript and Presenter's Notes

Title: While waiting for the talk to start, try to find 4 mistakes in this student essay.


1
While waiting for the talk to start, try to find
4 mistakes in this student essay.
Question Suppose you are running in a straight
line at constant speed. You throw a pumpkin
straight up. Where will it land? Explain why.
Student Once the pumpkin leaves my hand, the
horizontal force that I am exerting on it no
longer exists, only a vertical force (caused by
my throwing it). As it reaches its maximum
height, gravity (exerted vertically downward)
will cause the pumpkin to fall. Since no
horizontal force acted on the pumpkin from the
time it left my hand, it will fall at the same
place where it left my hands.
2
Can (physics) tutoring systems be more effective
than human tutors?
  • Kurt VanLehn
  • Pittsburgh Science of Learning Center
  • LRDC The Computer Science Department
  • University of Pittsburgh

3
Thanks!
  • Current team
  • Pamela Jordan (lead)
  • Patricia Albacete
  • Min Chi
  • John Connelly
  • Roxana Gheorghui
  • Sung-Young Jung
  • Brian Moses Hall
  • Uma Pappuswamy
  • Mike Ringenberg
  • Team Alumni
  • Dumisizwe Bhembe
  • Michael Boettner
  • Andy Gaydos
  • Maxim Makatchev
  • Antonio Roque
  • Carolyn Rosé
  • Stephanie Siler
  • Ramesh Srivistava
  • Roy Wilson

Art Graessers group at the University of
Memphis
4
The Learning Science research questionIncreasing
tutoring systems effectiveness?
  • Computer aided instruction (CAI) gt classroom by
    d0.4 sigma
  • Kulik, 1994
  • Intelligent tutoring systems (ITS) gt classroom by
    d1.0 sigma
  • Koedinger et al. 1997 VanLehn et al. 2006
  • Human tutors (HT) gt classroom by d2.0 sigma
  • Bloom, 1984
  • How can we build tutoring systems that are as
    effective as human tutors?
  • where effect size (Cohens d)
    gain(experimental) gain(controls) /
    standard_deviation(pooled)

5
The Cognitive Science research question The
more interactivity, the more gain?
?
6
The Computer Science research question Deep
linguistic techniques vs shallow?
Shallow linguistic Deep linguistic
Natural language understanding (NLU) LSA, other bag-of-words Syntactic grammars, lexicons, semantics
Dialog management Finite state networks Reactive planning
Natural language generation (NLG) Text templates Plan-based
Non-routine language Ignored Anaphora, negation,
Because the techniques are compared in the
context of a tutoring system, we can evaluate
them for pedagogical effectiveness as well as the
usual measures of speed, accuracy, generality,
etc.
7
Outline
  • Introduction
  • Focus on multi-step problem solving
  • What is human tutoring?
  • Research questions
  • Why2-Atlas
  • Evaluations
  • Of individual techniques
  • Of the whole sysem

Next
8
A multi-step quantitative problem
Step
Step
Step
Step
Step
Step
Step
9
A multi-step qualitative problem
  • Q Suppose a man is running in a straight line at
    constant speed. He throws a pumpkin straight up.
    Where will it land?

Step
Initially, the man and the pumpkin have the same
horizontal velocity. His throw exerts a net
force vertically on the pumpkin, thus causing a
vertical acceleration, which leaves the
horizontal velocity unaffected
Step
Step
Step
10
A multi-step problem where order of steps doesnt
matter
  • Q Why do most computers have a disk drive? Why
    cant they have only RAM?
  • Student
  • RAMs content disappears when power quits, but
    disk content persist.
  • RAM is usually holds less information than disk
  • RAM takes battery power, so larger RAM takes more
    power
  • Certain information, e.g., operating system and
    user files, must be stored permanently.

Step
Step
Step
Step
11
Outline
  • Introduction
  • Focus on multi-step problem solving
  • What is human tutoring?
  • Research questions
  • Why2-Atlas
  • Evaluations
  • Of individual techniques
  • Of the whole sysem

Next
12
Human tutorial dialogue is a sequence of
episodes, one per step
Q Why does a computer need disk as well as RAM?
  • S RAM is too small. Only the disk is big
    enough.
  • T Thats usually true. But suppose you bought
    a lot of RAM? Why wouldnt that work?
  • S The battery would run out too fast.
  • T Excellent. What else?
  • S Thats it.
  • T What if the battery dies?
  • S Oh. The RAM dies.
  • T Anything wrong with that?
  • S You lose your files.
  • T Besides the users files, what else would be
    lost?
  • S Beats me.
  • T The operating system!

13
Schematic of tutorial dialogue
  • Problem statement
  • Step
  • Step
  • Step
  • Step
  • Answer
  • Reflection (optional)

14
Schematic of dialogue about a single step
Stepend
T Tell
Stepstart
T Elicit
S Correct
Remediation
T Hint, or prompt, or explain, or analogy, or
S Incorrect
15
Comparisons of expert to novice human tutors
Stepend
T Tell
Novices
Stepstart
Experts
T Elicit
S Correct
T Hint, or prompt, or explain, or analogy, or
S Incorrect
Experts may have a wider variety
16
Outline
  • Introduction
  • Focus on multi-step problem solving
  • What is human tutoring?
  • Research questions
  • Why2-Atlas
  • Evaluations
  • Of individual techniques
  • Of the whole sysem

Next
17
The Learning Science research questionIncreasing
tutoring system effectiveness
  • CAI Remediation on answer only
  • ITS (e.g., Andes) Remediation on each step
  • Hint sequence, with final bottom out hint
  • Human tutors Remediation on each step
  • Natural language dialogues
  • Many tutorial tactics
  • A tutoring system with Natural Language for its
    remediation?

18
The Cognitive Science research question The
more interactivity, the more gain?
?
19
The Computer Science research question Deep
linguistic techniques vs shallow?
Shallow linguistic Deep linguistic
Natural language understanding (NLU) LSA, other bag-of-words Syntactic grammars, lexicons, semantics
Dialog management Finite state networks Reactive planning
Natural language generation (NLG) Text templates Plan-based
Non-routine language Ignored Anaphora, negation,
Evaluate for pedagogical effectiveness as well
as the usual measures of speed, accuracy,
generality, etc.
20
A task domain where deep understanding may add
value
  • Qualitative physics
  • A massive truck and a light car have a head-on
    collision. Which suffers the greater impact
    force? Why?
  • Linguistic relationship matter
  • car, truck, exerts, more, force
  • Detecting deep misconceptions
  • E.g., Bigger things exert more force.
  • Unfortunately, these misconceptions are
    notoriously resistant to instruction
  • Try giving 10 hours of instruction

21
Outline
  • Introduction
  • Focus on multi-step problem solving
  • What is human tutoring?
  • What is an ITS? CAI?
  • Research questions
  • Why2-Atlas
  • Evaluations
  • Of individual techniques
  • Of the whole sysem

Next
22
Students screen for Why2-Atlas
Problem
Dialogue history
Students essay
Students turn in the dialogue
23
Schematic of Why2-Atlas tutorial dialogue
  • T ltdisplays problemgt
  • S ltEnters essaygt
  • T ltanalyzes essay to identify missing
    incorrect steps picks one starts a script for
    remediation of the stepgt
  • T When the pumpkin is in the air, what forces
    act on it?
  • ltmany turnsgt
  • T Please change your essay
  • S ltEdits essaygt
  • Tltanalyzes essay to identify missing incorrect
    steps picks one starts a scriptgt
  • ltmany turnsgt
  • T Please change your essay
  • S ltEdits essaygt
  • T ltAnalyzes essay decides all steps are
    correctgt
  • T Excellent! Heres my version ltpresents
    ideal essaygt

24
Real example (slide 1 of 3)
  • Question Suppose a man is running in a straight
    line at constant speed. He throws a pumpkin
    straight up. Where will it land? Explain.
  • Students Essay The pumpkin would land in front
    of me because of the horizontal speed that is
    exerting on the pumpkin.

25
Example (slide 2 of 3)
  • Tutor I see some opportunities for growth. Let's
    chat. So let's discuss a point that seems to be
    missing in your essay. Can you tell me what
    forces are acting on the pumpkin after it is
    released?
  • Student Gravity force

Picks a missing step (forces) starts a KCD
script
Correct, so goes to next state of the KCD
26
Example (slide 3 of 3)
  • Tutor Good. Since we know that gravity is the
    only force acting on the pumpkin, let me ask you
    this. As the gravitational force pulls the
    pumpkin down, does it also pull the pumpkin
    horizontally a little bit?
  • Student Yes.
  • Tutor Let's look at this more closely. In which
    direction does the earth's gravitational force
    act?
  • Student Vertical
  • Tutor Let's consider this some more. Is that
    vertical upwards or vertical downwards?
  • Student Vertical downwards

Wrong, so subdialogue
Incomplete, so subdialogue
27
Schematic of a single step
Stepend
T Tell
Stepstart
T Elicit
S Correct
KCD
T S T S T
S Incorrect
28
Outline
  • Introduction
  • Focus on multi-step problem solving
  • What is human tutoring?
  • What is an ITS? CAI?
  • Research questions
  • Why2-Atlas
  • Evaluations
  • Of individual techniques
  • Of the whole sysem

Next
29
Why2-Atlas main modules
Student enters/edits the essay
Words
Sentence understander
FOPL propositions
Essay understander
Missing / bad steps
Discourse manager decides what to do w.r.t.
history
Clarification
Done
Script for remedying missing/bad step
KCD script interpreter
RealPro NLG
Ideal essay
Student
30
Modules evaluated (in yellow)
Student enters/edits the essay
Words
Sentence understander
FOPL propositions
Essay understander
Missing / bad steps
Discourse manager decides what to do w.r.t.
history
Clarification
Done
Script for remedying missing/bad step
KCD script interpreter
RealPro NLG
Ideal essay
Student
31
Evaluate for accuracy (w.r.t. human judges) and
speed
Student enters/edits the essay
Words
Sentence understander
Propositions
  • Deep NLU Carmel
  • LCFlex parser
  • Comlex lexicon
  • Semantic authoring tool
  • Shallow NLU Naïve Bayes LSA
  • Hybrids CarmelTC Rapel
  • Result
  • Similar accuracy
  • Complementary errors
  • Best to use all 3

Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
32
Evaluate for utility as tool
Student enters/edits the essay
  • Re-implemented as TuTalk
  • GUI authoring system
  • XML authoring system
  • Handy features (e.g., /- feedback) for ITS
  • Currently being used by 4 projects

Words
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
33
Evaluate for accuracy speed
Student enters/edits the essay
Words
Next few slides
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
34
Essay analysis You probably found all 4
incorrect steps. Can the essay analyzer?
Question Suppose you are running in a straight
line at constant speed. You throw a pumpkin
straight up. Where will it land? Explain why.
Student Once the pumpkin leaves my hand, the
horizontal force that I am exerting on it no
longer exists, only a vertical force (caused by
my throwing it). As it reaches its maximum
height, gravity (exerted vertically downward)
will cause the pumpkin to fall. Since no
horizontal force acted on the pumpkin from the
time it left my hand, it will fall at the same
place where it left my hands.
35
Research problem, more precisely, is
  • Given
  • Students sentences s1s2, s3s4s5,
  • Set of correct steps c1c2, c3c4c5,
  • Set of incorrect stepsi1i2, i3, i4i5,.
  • Determine Which correct and incorrect steps
    match the students sentences
  • Directly (graph matching)
  • Indirectly, using domain knowledge

36
Why do we need indirect matching?
  • The student said (incorrectly)
  • The pumpkin slows down, so it lands behind me.
  • Correct steps
  • Yada
  • Yada
  • Yada
  • Incorrect steps
  • Yada
  • When there is no force to propel an object along,
    it slows down
  • Air friction matters
  • Yada

Essay analyzer should output both derivations,
with estimates of their probabilities
37
First method Abduction using Tacitus-Lite
  • Backchaining theorem prover (like Prolog)
  • Students utterance ? goal to be proved
  • Problem statement ? givens
  • Proofs of earlier student utterances ? more
    givens
  • Accepts goals without proof (at a cost)
  • Because not everything can be anticipated
  • Searches for lowest cost proof
  • Checks consistency as it goes
  • Dont try to prove p when the proof already has
    p.

38
Derivation 1 (of 2) for The pumpkin slows down
Incorrect inference rule
The velocity of the pumpkin is decreasing
Student said this
Imprecision
An inference rule
The horizontal component of the velocity of the
pumpkin is decreasing
(The net force causes the velocity, so) zero net
force implies velocity decreases
The horizontal component of the net force on the
pumpkin is zero
A correct inference rule
Net force is sum of forces
The horizontal component of the air friction
force on the pumpkin is zero
The horizontal component of the mans force on
the pumpkin is zero
given
given
39
Derivation 2 (of 2) for The pumpkin slows down
The velocity of the pumpkin is decreasing
Student said this
Imprecision
The horizontal component of the velocity of the
pumpkin is decreasing
Kinematics
The horizontal component of the acceleration of
the pumpkin is negative
Newtons second law
The horizontal component of the net force on the
pumpkin is negative
Net force is sum of forces
The horizontal component of the air friction
force on the pumpkin is negative
The horizontal component of the mans force on
the pumpkin is zero
False assumption
given
40
Results of using Tacitus-Lite
  • Acceptable accuracy, but far too slow
  • Cost may not be a good substitute for probability
    when there are multiple competing explanations

41
Second method Precompute the time-consuming
reasoning
  • Precomputions
  • The deductive closure of the problem statement
    givens
  • Save as directed graph
  • Label subsets of nodes that represent correct and
    incorrect steps
  • Convert to Bayesian network train
  • To analyze a students utterance
  • Clamp directly matched nodes as evidence
  • Run Bayesian network
  • Read out most probable steps

42
Results Fast enough. Better accuracy, but
not by much.
43
Summary
  • Methods
  • Abductive theorem prover
  • Bayesian deductive closure
  • Results
  • Similar accuracy
  • Bayesian deductive closure faster than
    abductive theorem prover

Student enters/edits the essay
Words
Sentence understander
Propositions
Essay understander
Missing/incorrect steps
Discourse manager decides what to do w.r.t.
history
Script
Clarification
Done
KCD script interpreter
RealPro NLG
Ideal essay
Student
44
Outline
  • Introduction
  • Focus on multi-step problem solving
  • What is human tutoring?
  • What is an ITS? CAI?
  • Research questions
  • Why2-Atlas
  • Evaluations
  • Of individual techniques
  • Of the whole sysem

Next
45
Evaluation framework
Only step remediation varies with the condition
  • Pretest (1 hr)
  • Training (5 to 10 hrs)For each question, do
  • Student enters initial essay
  • Tutor analyses it for missing incorrect steps,
    picks one, and discusses it with student
  • Student enters revised essay
  • Tutor either congratulates student presents
    ideal essayor goes to step 2
  • Posttest (1 hr)

46
Conditions
  • Expert Human tutors
  • Text-based communication
  • Spoken communication
  • Computer tutors
  • Why2-Atlas (VanLehn et al.)
  • ITSPOKE (Litman et al.)
  • Why2-AutoTutor (Graesser et al.)
  • Control conditions
  • Canned text remediation
  • Textbook

47
(No Transcript)
48
Human tutors
Stepend
T Tell
Stepstart
T Elicit
S Correct
T Hint, or prompt, or explain, or analogy, or
S Incorrect
49
Why2-Atlas
Stepend
T Tell
Stepstart
T Elicit
S Correct
Knowledge construction dialogue
S Incorrect
50
Why2-AutoTutor
Stepend
T Tell
Stepstart
T Elicit
S Correct
Hint,or prompt,or assert
S Incorrect
51
Canned-text remediation
Stepend
T Tell
Stepstart
T Elicit
S Correct
lttextgt
S Incorrect
52
Results from 7 experiments
  • Why2-Atlas Why2-AutoTutor
  • Trend for gt, but not significant
  • Why2-Atlas may need more development
  • Why2 gt Textbook
  • In Textbook condition, students do not write
    essays
  • Why2 Human tutoring !!!
  • Human tutoring Canned text remediation
  • Exception If pre-physics students get
    instruction designed for post-physics students,
    then Human tutoring gt Canned text remediation

53
Impact significance of the results
  • Why2-Atlas Why2-AutoTutor
  • Common in AI that complex techniques are only
    slightly better than simple ones, at least
    initially.
  • Why2 gt Textbook
  • Common in Learning Sciences that active gt passive
  • Why2 Human tutoring Canned text remediation
  • Highly counter-intuitive to Learning and
    Cognitive scientists (including us)

54
Hypothesis 1 Exactly how tutors remedy a step
doesnt matter much
Stepend
T Tell
Stepstart
T Elicit
S Correct
Whats in here doesnt matter much

S Incorrect
55
Other studies where type of step remediation had
little impact
  • Human tutors
  • Human tutoring human tutoring with only
    content-free prompting for step remediation (Chi
    et al., 2001)
  • Human tutoring solving a problem in pairs with
    a video solution available (Chi et al., 2007)
  • Human tutoring canned text during post-practice
    remediation (Katz et al., 2003)
  • Human tutoring an ITS (Reif Scott, 1999)
  • Micro-analyses of human tutoring (VanLehn et al.,
    2003)
  • Socratic human tutoring didactic human tutoring
    (Rosé et al., 2001a Johnson Johnson, 1992)
  • Natural language tutoring systems
  • Circsim (canned text) Cirsim Tutor (Evens
    Michael, 2007)
  • Andes-Atlas Andes with canned text (Rosé et al,
    2001b)
  • Cognitive geometry tutor (Aleven et al., 2004)

56
Hypothesis 2 Cannot eliminate the step
remediation loop
Stepend
T Tell
Stepstart
Must avoid this
T Elicit
S Correct
Text
S Incorrect
57
Studies consistent with harmfulness of just
telling explaining
  • Human tutoring
  • Human tutoring gt textbook alone (Azevedo Evens
    VanLehn)
  • Human tutoring gt lecture/demo (Wood et al. 1978
    Swanton,
  • Natural language tutoring systems
  • NLT gt textbook alone (Graesser Evens Lane
    Vanlehn)
  • NLT gt lecture/demo (Craig)

58
Conclusions
  • Learning Science Can computer tutors be as
    effective as human tutors?
  • Yes, as long as students attempt steps with
    feedback hints on each
  • Computer Science When is deep linguistic
    technology more effective than shallow?
  • Several positive results at module level
  • At whole system level, still tied, but
    encouraging
  • Cognitive Science The higher the interactivity,
    the higher the learning gains?
  • No. See next slide

59
The interactivity plateau
Claim Perhaps Blooms 2 experiments were
confounded
60
How can we achieve super-human results?
?
61
Future work (slide 1 of 3)Increasing engagement
  • NeuroCog engagement meter (DARPA)
  • Can we reliably measure engagment with fMRI?
  • Can we train students to maintain engagement with
    it?
  • Interesting problems (PSLC)
  • Ill-defined design problems
  • Recommender system
  • ITS as a member of a social network (DFK, PSLC)
  • Pairs gt solos for engagment, but correctness?
  • Can we add an ITS without destroying engagement?

62
Future work (slide 2 of 3)Faster learning
Faster authoring
  • Author student interface PowerPoint
  • Fast to learn use
  • e.g., type Let V1, V2 be the initial, final
    velocities
  • Freedom domain independence
  • As students master a step
  • Tutor does it, or
  • It gets folded into a larger step
  • TruthBench
  • Knowledge acquisition for truth checking vs.
  • Knowledge acquisition for solving an
    (ill-defined) problem
  • Examples instead of hints

63
Future work (slide 3 of 3) Teach what an AI
learner needs
  • Explicit teaching of backwards chaining (PSLC)
  • Accelerates learning transfers (Chi VanLehn,
    2007)
  • Explicit teaching of confluences
  • KE ½ m v2 ? If mass and kinetic energy are
    constant, then velocity must be constant
  • Explicit teaching of abstraction planning
  • KE ½ m v2 ? If need a velocity, then find a
    kinetic energy
  • Dream system A model human learner
  • For testing curriculum designs
  • Getting the step sizes right

64
Thanks!
  • See www.pitt.edu/vanlehnfor publications

65
When to use deep vs. shallow?
Shallow linguistic Deep linguistic
Sentence understanding LSA, Rainbow, Rappel Carmel parser, semantics
Essay/Discourse understanding LSA Abduction, Bnets
Dialog management Finite state networks Reactive planning
Natural language generation Text Plan-based
Use both
Use deep
Use locally smart FSA
Use equivalent texts
66
  • It aint so much the things we dont know that
    get us into trouble. Its the things we know that
    just aint so.
  • -- Josh Billings (Henry Wheeler Shaw)

67
A deep sentence understanderCarmel
  • LCFlex parser, a robust parser that uses
    skipping, insertion flexible unification (Rosé
    Lavie 2001)
  • Comlex, with 40,000 lexemes (Grishman et al.,
    1994)
  • A broad-coverage, domain-independent, English
    syntactic grammar
  • CarmelTools for semi-automatically creating
    semantic functions (Rosé 2000, Rosé et al., AIED
    2003).

68
Shallow sentence understanders
  • Words only
  • Rainbow A Naïve Bayes text classifier
  • Given new bag of words, calculates most probable
    domain propositions using Bayes rule
  • LSA, several others
  • Worse, not currently used
  • Words syntactic features
  • Rappel (Jordan)
  • Minipar produces dependency relations between
    words
  • Ripper builds one classifier per predicate type
    per argument of the predicate type
  • CarmelTC (Rosé et al., HLT/NAACL 2003)
  • Worse, not currently used

69
Results
  • Speed All were fast enough
  • Accuracy All were too low
  • Deep Words Words syntactic features
  • Best (Jordan et al, ITS04)
  • Run all 3
  • Use heuristics to choose 1 of 3 outputs
  • E.g., If velocity or speed appear in the
    sentence, then velocity should appear in the
    output propositions somewhere.

70
Direct matching via largest common subgraph
(Shearer et al., 2001)
Expectation The speed of the pumpkin is the
same as the speed of man.
compare(x1, x2, same)
x2
x1
speed(x1, pumpkin, ...)
speed(x2, man, ...)
Student said allow us to compare it to the
speed of the pumpkin.
compare(x5, x6, ...)
x5
speed(x5,pumpkin,...)
71
Generation of the ATMS for a problem starts with
givens (node atomic prop. red incorrect)
Correct givens
Buggy given
72
Apply rules forward (RA rule application)
RA1
RA2
Correct givens
Buggy given
73
Stop when no more rule applications
RA4
RA3
RA1
RA2
Correct givens
Buggy given
74
Propositions are not always ground!Variables
(colored links) shared across nodes
Correct givens
Buggy given
75
Specific subsets correspond to expectations /
misconceptions
Expectation (a key step in the explanation)?
Expectation
Misconception
Correct givens
Buggy given
76
At runtime, find all node subsets that unify with
the student input
Input
Correct givens
Buggy given
77
In this case, two subsets (happy faces) unify
with students input
Input
Correct givens
Buggy given
78
Output the expectations/misconceptions that are
directly matched
Input
RA4
RA3
Direct match
RA2
Close, but not a direct match
RA1
Correct givens
Buggy given
79
Bnet nodes represent sets of nodes for
expectations/misconceptions
Input
RA4
RA3
RA2
RA1
Correct givens
Buggy given
80
Clamp students input nodes (happy faces) and
update net
High posterior probability
RA4
RA3
Moderate posterior probability
RA2
RA1
Low posterior probability
Correct givens
Buggy given
81
Training and evaluation
  • Topology of network is given by deductive
    closure, etc.
  • Only learn the conditional probabilities
  • 293 sentences coded by human
  • Expectation/Maximization
  • 10-fold cross validation
Write a Comment
User Comments (0)
About PowerShow.com