Title: Cohesion and Learning in a Tutorial Spoken Dialog System
1Cohesion and Learning in a Tutorial Spoken Dialog
System
2Outline
- Tutoring
- Goals
- 4 issues in measuring cohesion
- Why theyre interesting
- How we test them
- Results
3Natural Language Dialog Tutoring
- Human tutors are better than classroom
instruction (Bloom 84) - Intelligent Tutoring Systems (ITSs) hope to
replicate this advantage - Is Dialog important to learning?
- Dialog acts question answering, explanatory
reasoning, deep student answers (Graesser et al.
95, Forbes-Riley et al. 05) - Difficult to automatically tag dialog input, so
- Automatically detectable dialog features
- Average turn length, etc. (Litman et al. 04)
- We look at Cohesion
- Lexical Co-occurrence between turns
4Goals and Results
- Goals
- Want to find if cohesion is correlated with
learning in our tutoring dialogs. - If it is, may inform ITS design
- Want to find a computationally tractable measure
of cohesion - So can be used in a real-time tutor
- Results
- Do find strong correlations with learning
- For low pre-testers
- For interactive (tutor to student) measures of
cohesion - Robust to multiple measures of lexical cohesion
54 Issues
- Why/How identify cohesion in dialogs?
- Do students of different skill levels respond to
cohesion in the same way? - (Is there an aptitude/treatment
interaction?) - Is Interactivity Important?
- What other processing steps help?
6Issue 1 How identify cohesion in dialogs?
- Why might cohesion be important in tutoring?
- McNamara Kintsch (96)
- Students read high low coherence text
- High coherence text was low coherence version
altered to - Use consistent referring expressions
- Identify anaphora
- Supply background information
- Interaction between pre-test score response to
textual coherence - Low pre-testers learned more from more coherent
text - High pre-testers learned LESS from more coherent
text
7Measuring Cohesion
- Measurements from Computational Linguistics
- Hearst(94) topic segmentation, text
- Word-count similarity of spans of text
- Olney Cai (05) topic segmentation, tutorial
dialog - Several measures, including Hearsts
- Morris Hirst (91) Lexical Chains
- Thesaurus entries
- Barzilay Eldihad (97) Automatic Lexical Chains
- WordNet senses
- We develop measures similar to Hearsts
- But novel in that
- Applied to dialog rather than text,
- used to find correlations with learning
8Issue 1 How identify cohesion in dialogs?
- Defining Cohesion
- Halliday and Hassan (76)
- Grammatical vs Lexical Cohesion
- Lexical Cohesion
- Reiteration
- Exact word repetition
- Synonym repetition
- Near Synonym repetition
- Super-ordinate class
- General referring noun
- Cohesion measured by counting cohesive ties
- Two words joined by a cohesive device (i.e.
reiteration)
9Issue 1 How identify cohesion in dialogs?
- Defining Cohesion
- Halliday and Hassan (76)
- Grammatical vs Lexical Cohesion
- Lexical Cohesion
- Reiteration
- Exact word repetition
- Synonym repetition
- Near Synonym repetition
- Super-ordinate class
- General referring noun
- Cohesion measured by counting cohesive ties
- Two words joined by a cohesive device (i.e.
reiteration)
10Issue 1 How identify cohesion in dialogs?
- How we measure Lexical Cohesion
- We count cohesive ties between turns
- Tokens (with stop words)
- (token word)
- Tokens (stop words removed)
- (Stops high frequency, low information words)
- Stems (stop words removed)
11Stems
- Stem non-inflected
- core of a word
- Porter Stemmer
- Allows us to find ties
- between various
- inflected forms of
- the same word in adjacent turns.
- Turns are tutor and student contributions to
Tutoring Dialogs collected by the ITSPOKE group.
12Applying Cohesion measures to our Corpora example
Turn Contribution
Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity.
ITSPOKE Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction?
Cohesive Ties Matches Count
Token w/stop packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14
Token, no stop packet, horizontal, only, force, acting, there, will, still, after 9
Stem, no stop packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11
13Applying Cohesion measures to our Corpora example
Turn Contribution
Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity.
ITSPOKE Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction?
Cohesive Ties Matches Count
Token w/stop packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14
Token, no stop packet, horizontal, only, force, acting, there, will, still, after 9
Stem, no stop packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11
14Applying Cohesion measures to our Corpora example
Turn Contribution
Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity.
ITSPOKE Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction?
Cohesive Ties Matches Count
Token w/stop packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14
Token, no stop packet, horizontal, only, force, acting, there, will, still, after 9
Stem, no stop packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11
15Applying Cohesion measures to our Corpora example
Turn Contribution
Student Essay No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity. No. The airplane and the packet have the same horizontal velocity. When the packet is dropped, the only force acting on it is g, and the net force is zero. The packet accelerates vertically down, but does not accelerate horizontally. The packet keeps moving at the same velocity while it is falling as it had when it was on the airplane. There will be displacement because the packet still moves horizontally after it is dropped. The packet will keep moving past the center of the swimming pool because of its horizontal velocity.
ITSPOKE Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction? Uh huh. There is more still that your essay should cover. Maybe this will help you remember some of the details need in the explanation. After the packet is released, the only force acting on it is gravitational force, which acts in the vertical direction. What is the magnitude of the acceleration of the packet in the horizontal direction?
Cohesive Ties Matches Count
Token w/stop packet, horizontal, the, it, is, of, only, force, acting, on, there, will, still, after 14
Token, no stop packet, horizontal, only, force, acting, there, will, still, after 9
Stem, no stop packet, horizont, onli, forc, act, acceler, vertic, there, will, still, after 11
16Issue 2 Is there an aptitude/treatment
interaction?
- Why there might be
- McNamara Kintsch
- How we test it
- Mean pre-test split
- All students
- Above-mean pretest students (high pre-testers)
- Below-mean pretest students (low pre-testers)
17Issue 3 Is interactivity Important?
- Why it might be
- Chi et al. (01)
- Tutor centered, Student centered, Interactive
- Deep learning through self construction
- Not tutor actions alone
- Litman Forbes-Riley (05)
- Learning correlated with both
- student utterances that display reasoning
- tutor questions that require reasoning
- How we test it
- Interactive corpus compare tutor to student
turns - Tutoronly corpus
- Studentonly corpus
18Issue 4 What other processing steps help?
- Tried several on training corpus
- Removing stop words
- N-turn spans
- Selecting substantive turns
- TF-IDF normalization
- Turn-normalized counts
- (Raw tie count / of turns in dialog)
- Found final options on training corpus
- One turn spans, turn normalization, no TF-IDF, no
substantive turn selection - All reported results use these options
- Tested options on new corpus
19Where did the corpora come from?
- ITSPOKE is a speech-enabled version of
- Why-2 Atlas (VanLehn et al. 02)
- Qualitative physics
- Tutoring Cycle
- Student reads instructional materials
- Takes a pre-test
- Starts Interactive tutoring cycle
- Problem
- Essay
- Tutor evaluates essay, engages in dialog
- Revise essay
- Repeat
- Takes a post-test
20Tutoring Corpora
- Transcripts of tutoring sessions
- Training corpus (fall 2003)
- 20 students, 5 problems each
- 95 dialogs (5 had no dialog)
- 13 low pre-testers, 7 high pre-testers
- Testing corpus (spring 2005)
- 34 students, 5 problems each
- 163 dialogs (7 had no dialog)
- 18 low pre-testers, 16 high pre-testers
21Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Less significant on testing data, token with
stops level reduced to a trend
22Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Slightly less significant on testing data
23Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Less significant on testing data, token with
stops level reduced to a trend
24Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Less significant on testing data, token with
stops level reduced to a trend
25Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Less significant on testing data, token with
stops level reduced to a trend
26Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Less significant on testing data, token with
stops level reduced to a trend
27Results Aptitude/Treatment
Tests Tests Tests Tests
Train 2003 Data Train 2003 Data Test 2005 Data Test 2005 Data
Students R P-Value R P-Value
Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words) Grouped by Token (with stop words)
All Students 0.380 0.098 0.207 0.239
Low Pretest 0.614 0.026 0.448 0.062
High Pretest 0.509 0.244 0.014 0.958
Grouped by Token (Stop words removed) Grouped by Token (Stop words removed) Grouped by Token (Stop words removed)
All Students 0.431 0.058 0.269 0.124
Low Pretest 0.676 0.011 0.481 0.043
High Pretest 0.606 0.149 0.132 0.627
Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed) Grouped by Stem (Stop words removed)
All Students 0.423 0.063 0.261 0.135
Low Pretest 0.685 0.010 0.474 0.047
High Pretest 0.633 0.127 0.121 0.655
- Test partial correlation of post-test cohesion
count, controlling for pre-test - Cohesion correlated with learning for low
pre-test students - Not for high pre-test students
- Little difference between types of measurement
- Less significant on testing data, token with
stops level reduced to a trend
28Results Aptitude/Treatment (2003 data)
- No significant difference
- between amounts of
- (turn normalized) cohesion
- in high and low pre-test
- groups.
- Difference in correlation between high and low
pre-testers not due to different amounts of
cohesion.
29Results Interactivity (2003)
- Cohesion between tutor utterances is not
correlated with learning
30Results Interactivity (2003)
- No evidence that cohesion between student
productions is correlated with learning (but
student utterances are very short with computer
tutor)
31Discussion
- Both high and low pre-testers successfully
learned from these dialogs - Our measure of lexical cohesion seems to reflect
only what the low pre-testers do to learn, not
correlated with what high pre-testers do. - McNamara Kintsch also found a positive
correlation for low pre-testers, but a negative
correlation for high pre-testers.
32Discussion
- Our measures are slightly different
- McNamara Kintsch Manipulated coherence in text
- Reader does not contribute to coherence
- Coherence is the extent to which semantic
relations are spelled out in the text, rather
than inferred by the reader. - Low pre-testers probably learned because high
coherence text allowed them to make inferences
they couldnt from the low cohesion text. - Low pre-testers low coherence didnt know the
terms - High coherence may allow a greater number of
successful inferences for their low pre-testers - Our work Dialog
- Student does contribute to cohesion
- Higher cohesion means using more of same terms
- Speculation High cohesion may indicate the
number of successful inferences our low
pre-testers already made. - High pre-testers already know the terms, so new
inferences are not involved in using them.
33Summary
- We have taken automatically computable measures
of cohesion from computational linguistics - Applied them to tutorial dialog
- Found correlations with student learning
34Conclusions
- Simple, automatically computable measures of
lexical cohesion correlate with learning - But only for students with low pre-test scores,
even though low and high pre-testers showed
similar amounts of cohesion. - Correlation is robust to differences in type of
measurement - Its the cohesion between student and tutor
thats important
35Future Work
- Short term
- Cohesion may also be related with learning in
high pre-testers, but were measuring the wrong
kind of cohesion - Work underway to try sense level measures
- Halliday Hassans synonym levels of
reiteration - Acceleration speeding up
- New issues
- Word sense disambiguation (one sense per
discourse?) - Or measuring it in the wrong places
- Try finding cohesion at impasses (VanLehn 03)
- Try finding change in cohesion over time
(Pickering Garrod 04) - Is it the dialog, or the essay?
- Long term
- Test by manipulating cohesion in ITSPOKE
36Thanks
- Diane Litman
- ITSPOKE group
37Questions?
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Cohesion vs Coherence
- Cohesive Devices
- Things that tie different parts of a discourse
together - Anaphora, repetition, etc
- But still may not make sense
- John hid Bills car keys. He likes spinach.
(Jurafsky Martin 00) - Coherence relations
- Semantic relations between utterances.
- Result, Explanation, elaboration, etc. (Hobbs 79)
43Britton Gulgoz 91
- Original text
- Air war in the North, 1965
- By the fall of 1964, Americans in both Saigon and
Washington had begun to focus on Hanoi as the
source of the continuing problem in the south. - Modified text
- Air war in North Vietnam, 1965
- By the beginning of 1965, Americans in both
Saigon and Washington had begun to focus on
Hanoi, capital of North Vietnam, as the source of
the continuing problems in the south.