Title: The Weakest Link: Detecting and Correcting Errors in Learner English
1The Weakest Link Detecting and Correcting
Errors in Learner English
- Pete Whitelock
- Sharp Labs. of Europe, Oxford
- pete_at_sharp.co.uk
2Language Engineering
- Translation
- Text ?Speech
- Dialogue Management
- Information retrieval
- Summarisation
- Question-answering, Information Extraction
- Language Learning and Reference
- OCR, Routing, etc. etc.
3Why?
- Sharp Corporation work on MT since 1979
- SLEs Intelligent Dictionary since 1997
- Help Japanese to read English
- Obvious need for new application
- Help Japanese to write English
- Bilingual example retrieval
- Context-sensitive thesaurus
- Error detection AND correction
4Types of Error
- Omissions, Insertions (usu. of grammatical
elements) - Word Order
- Replacements
- Context-sensitive (real-word) spelling errors
- Homophone and near-homophone errors
- to/too/two, their/there/theyre
- lose/loose, pain/pane
- lend/rend
- Typos
- bane/babe
- than/that, from/form
- Morphological errors
- inflectional, eg agreement errors
- derivational
- safety/safe/safely
- interested/interesting
category-preserving
category-changing
5Semantic and collocational errors
- I don't want to marry with him. ? 0
- We walked in the farm ? on
- Then I asked 0 their suggestions ? for
- I used to play judo but now I play karate ? do
- Please teach me your phone number ? tell/give
- Could you teach me the way to the station ? tell
- I became to like him ? came/started
- The light became dim ? grew
- When he became 16 ? turned/reached
- My boyfriend presented me some flowers ? gave
- I'm very bad at writing pictures ? drawing
- My brother always wins me at tennis ? beats
- My father often smacked my hip ? bottom
- Tradition in our dairy life ? daily
6History of Language Engineering
- Quantitative
- Statistics
- (relatively) simple statistical models trained on
large quantities of text - Speech
- Symbolic
- Linguistics, AI
- Large teams of people building complex grammars
- Textual esp. translation
1975
1995
A more balanced approach
7Error DetectionSymbolic Approaches
- IBMs Epistle (1982), Critique (1989)
- Heidorn, Jensen, Richardson et al.
- gt MS Word Grammar Checker (1992?)
- Full rule-based parsing
- Error rules (S ? NPsg VPpl)
- Confusion sets
- eg alter/altar, abut/about
- when one member appears, parse with all
- only effective when POSs disjoint
8Statistical Intuition
Given a general model of probability of word
sequences, improbable stretches of text
correspond to errors
9Statistical Approaches IWord n-grams
- IBM Lange (1987), Damerau (1993)
- based on success of ASR technology
- severe data sparseness problems
- eg for a vocabulary of 20,000 words
2 400 million
3 8 trillion
4 1.6 x 1017
10 worse still
11Problem
- there are many tokens of rare types
- there are few types of common token
- gt data sparseness
12And
- Any given N is not enough
- must cause awfully bad effect
- we make many specialised magazines
13 but
- There are techniques to deal with data sparseness
smoothing, clustering, etc. - Trigram model is very effective
- Especially when use POS n-grams
14Statistical Approaches IIPOS n-grams
- Atwell (1987), Schabes et al. (1996)
- It is to fast.
- PP BEZ PREP ADJ STOP
- to_PREP confusable_with too_ADV
- p(ADV ADJ STOP) gtgt
- p(PREP ADJ STOP)
- Not appropriate for items with same POS
15Machine Learning Approaches
- techniques from WSD
- define confusable sets C
- define features of context
- eg specific words, POS sequences, etc
- learn map from features to elements of C
- Bayes, Winnow (Golding, Schabes, Roth)
- LSA (Jones Martin)
- Maximum entropy (Izumi et al.)
- effective, esp. for category preserving errors
16Problems
- experiments typically restricted to small set of
spelling-type errors - but almost any word can be used in error
- data problems with scaling up
- semantic-type errors have huge confusion sets
- but presence in a confusion set is the only
trigger for error processing - where is the probabilistic intuition?
17Statistical Approach -Problem
.
I
the
gave
steak
dog
stewing
18Dependency Structure
I gave the dog stewing steak
gave
obj2
subj
obj
dog
I
steak
spec
mod
the
stewing
19was
When
steak
cheap
bought
dog
I
stewing
the
20 so
- Items that are linguistically close may may be
physically remote - difficult to train contiguous n-gram model
- Items that are physically close may be
linguistically remote - low probabilities are sometimes uninteresting
21ALEKChodorow Leacock (2000)
- Compute MI for word bigrams and trigrams
- 30 words, 10,000 examples for each (NANC)
- TOEFL grade correlates significantly with
proportion of low-frequency n-grams - Mitigate uninteresting improbability by
aggressive thresholding - gt v. low recall (c. 20) for 78 precision
22Bigert Knutsson (2002)
- Detect improbable tri(pos)grams
- Use result of parsing to detect trigrams
straddling syntactic boundaries, and ignore - gt mitigate uninteresting improbability
23Idea
- Compute strength of links between words that are
linguistically adjacent. - Concentrate sparse data in linguistic equivalence
classes - Capture physically longer dependencies
- Weaker links should be a more reliable indicator
of an error - Error correction can be triggered only when
required - Use confusion sets to improve strength of links
24Method I - Parsing
- parse a large quantity of text written by native
speakers (80 million words BNC) - produce dependency structures
- count frequencies of ltword1,dep,word2gt types and
compute strength
25Parsing I
- Get small quantity of hand tagged labeled
bracketed text(1 million words Brown corpus) - Exploit labeling to enrich tagset
- ( (S (NP Implementation/NN_H (PP of/OF (NP (NP
Georgia/NP) 's/POS (NBAR automobile/NN_M
title/NN_M) law/NN_H))) (AUX was/BED) (VP also/RB
recommended/VBB_I (PP by/BY (NP the/AT
outgoing/AJJ jury/NN_H)))) ./.) -
-
26Tagset
- AJJ attributive adjective
- PJJ predicative adjective
- NN_H head noun
- NN_M modifier noun
- AVBB attributive past participle
- AVBG attributive present participle
27Tagset (cont.)
- VB(BDGHIPZ)_(ITA)
- Verb forms for different transitivities
- BE(DGHIPZ) copula
- HV(DGIPZ) auxiliary have
- DO(DPZ) auxiliary do
- MD modals
- TO infinitival to
28Tagset (cont.)
- AT a, the
- DT demonstrative determiners (this, each)
- DP various pronouns (this, nothing, each)
- PP possessive determiners
- SPP subject pronouns
- OPP object pronouns
- EX existential there
- SC subordinating conjunction
- PREP prepositions except by and of
- BY by
- OF of
29Tagset (cont.)
- RB regular adverb
- RC 2nd as in as X as Y
- RD predet adverb (only, just)
- RG post-np adverb (ago, aside, away)
- RI pre-SC/PREP adverb (only, even, just)
- RJ pre-adjective adverb (as, so, very, too)
- RQ pre-numeral adverb (only, about)
- RT temporal adverb (now, then, today)
- NT temporal NP (last week, next May)
30- Define dependency grammar in terms of enriched
tags - MDAUX MDDOZDOPDOD
- ATVERB VBI_TVBH_TVBP_TVBD_TVBZ_TVBG_T
- FIN_MAIN_VERB VBZ_IVBP_IVBD_IVBZ_TVBP_TVBD_
TVBZ_AVBP_AVBD_A - INF_MAIN_VERB VBI_IVBI_TVBI_A
- INF_VERB BEIHVIINF_MAIN_VERB
- P_PAR BEHVBH_IVBH_TVBH_A
- GER BEGHVGVBG_IVBG_TVBG_A
- NFIN_VERB INF_VERBP_PARVBB_TGERTO
- NFIN_MAIN_VERB INF_MAIN_VERBVBB_TVBG_IVBG_TV
BG_AVBH_IVBH_TVBH_A - MAIN_VERB FIN_MAIN_VERBNFIN_MAIN_VERB
- VERB FIN_VERBNFIN_VERB
- ATVERB obj ACC_NP
- MAIN_VERB vcompt TO hope, expect etc.
- MAIN_VERB vcompi INF_VERB have, help, let, see
etc. - MAIN_VERB vcompg GER stop, start, catch, keep
etc.
define various sets of tags to use in
statement of possible dependencies
31- Exploit labeled bracketing to compute dependency
structure -
- 0Implementation/NN_Hltsubj-7gt
- 1of/OFltpmod-0gt
- 2Georgia/NPltpos-3gt
- 3's/POSltspec-6gt
- 4automobile/NN_Mltmod-6gt
- 5title/NN_Mltmod-6gt
- 6law/NN_Hltpobj-1gt
- 7was/BED
- 8also/RBltadv-9gt
- 9recommended/VBB_Tltccompb-7gt
- 10by/BYltpadv-9gt
- 11the/ATltspec-13gt
- 12outgoing/AJJltmod-13gt
- 13jury/NN_Hltpobj-10gt
- 14./.
32was/BED
implementation/NN_H
recommended/VBB_T
also/RB
of/OF
by/BY
law/NN_H
s/POS
jury/NN_H
title/NN_M
Georgia/NP
the/AT
outgoing/AJJ
33- compute MLE that two words with tags ti and
tj,separated by n words, are in a dependency
relation
ltag rtag ltag_is sep rel poss
actual AT NNS_H D 1 spec
7207 6966 96 AT NNS_H D
2 spec 4370 4225 96 AT
NNS_H D 3 spec 3204 1202
37 AT NNS_H D 4 spec
4300 325 7 AT NNS_H D
5 spec 4061 78 1 36,000
entries
34- On the large corpus
- tag the input text (assign a probability to each
possible part-of-speech a word might have) - look at each pair of words, and at each tag that
they might have, and compute the probability that
they are in a dependency relation with those tags
at that distance apart - sort the potential relations by probability
- apply greedy algorithm that tries to add each
dependency in turn and checks that certain
constraints are not violated - stop adding links when threshold exceeded
initially high -
35Constraints-forbidden configurations include
w1 w2 w3
obj
obj
w1 w2 w3 w4
w1 w2 w3
36- count raw frequencies for each pair of lemmas
in combination
obj
give_V page_N
compute contingency table
obj-Y
obj-page
X-obj
8002918
2103
give-obj
150854
10
37- Compute metric which normalises for frequency
of elements (eg t-score) - if combination is more likely than chance, metric
is positive - if combination is less likely than chance,
metric is negative
38For example
- tgive-obj-page -9.4
- tdevote-obj-page 6.0
39Metrics
- MI overestimates infrequent links
- T seems best for error spotting
- Yules Q easy to normalise
- ?2 (chi-squared) not easy to work with
- ? (log-likelihood)
- computed the above for
- 65 m links tokens
- 6.5 m types ( 1.5 m trigrams) gt 1
40Method II - Bootstrapping
- parse the large corpus again, adding a term to
the probability calculation which represents the
collocational strength (Q) - set the threshold lower
- recompute collocational strength
- current parser (unlabeled) accuracy
- 82 precision 88 recall
41Method III On-line Error Detection
- (tag with learner tagger to deal with
category-changing errors) - parse learner text according to the same
grammar and compute strengths of all links - sort links by weakness
- try replacing words in weakest link by
confusables - if link is strengthened, and other links are not
significantly weakened - suggest replacement
- repeat while there are links weaker than threshold
42for instance
associate with
beat me
tall building
his property
high building
win me
associate to
t-score
43Extend to 3-grams
- by accident GOOD
- by car accident BAD
- a knowledge BAD
- a knowledge of GOOD
44Data
- Development data 121 Common Errors of English
made by Japanese - Training data Brown/BNC (only for parser)
- Test data extract from UCLES/CUP learner corpus
(3m words of exam scripts marked up with errors)
45Confusables
- Annotations from learner corpus
- Co-translations from Shogakukans Progressive J-E
dictionary - Synonym sets from OUP Concise Thesaurus
46Good Results
- We settled down to our new house ? settled in
house - The gases from cars are ruining our atmosphere ?
emissions from cars - Such experiments caused a bad effect ?
- had effect
- We had a promise that we would visit the country
? - made promise
- I couldnt study from the evening lecture ?
- learn from lecture
- It gives us the utmost pleasure ? greatest
pleasure
47Bad ResultsIpeople say unlikely things
- Do you remember the view of sunrise in the
desert? ? - know view
- I listened to every speech ?
- every word
- Dudleys trousers slid down his fat bottom ?
- the bottom
SOLUTION more data, longer n-grams
48Bad ResultsIIparser goes wrong
- My most disappointing experience ?
- great experience
- Next, the polluted air from the car does people
harm ? close air
SOLUTION Improve parser
49Bad ResultsIIIthe input text is just too
ill-formed
- I saw them who have got horrible injured cause of
car accident ? - be cause
SOLUTION Learner tagger
50Bad ResultsIVmissed errors due to lack of
evidence
- I will marry with my boyfriend next year
- marry with must be followed by one of small set
of items child, son, daughter - I recommend you go interesting places
- you can go places, but places cant be
modified
SOLUTION more data
51Evaluation
52Summary of results
PREP VERB NOUN ADJ Target Cf. MS Word 2000
Precision 82 67 71 81 90 95
Recall 33 23 26 26 25-50 5
53Conclusions and Directions
- finds and corrects types of error poorly treated
in other approaches - computing collocational strength is necessary but
not sufficient for high precision, high recall
error correction - needs to be integrated with other techniques
- learn optimal combination of evidence eg by using
collocational strengths as (some of) the features
in a ML/WSD system - deploy existing technology in other ways
54New directions
- Essay grading
- not only errors, but the whole distribution of a
learners lexis in the frequency X strength space - on the UCLES data, PASS and FAIL students are
most clearly distinguished by their use of medium
frequency word combinations - PASS students use strong collocations
- FAIL students use free combinations
5500-1 12-7 2 8-63 3 64-1023 4 1024
-2-1? -0.7 speak client explain you get position write you
-1-0.7?-0.3 exaggerate news need coin know opinion have possibility
0-0.3?0.3 recommend academy have minibus find teacher have job take it
10.3?0.7 see musical consider fact meet people have opportunity
20.7?1.0 insert coin book seat make suggestion have chance
frequency / 80m words
strength (Q)
56subtract fail values from pass values
FCE PASS-FCE FAIL
CAE PASS-CAE FAIL
CPE PASS-CPE FAIL
57http//www.sle.sharp.co.uk/JustTheWord
An Explorable Model of English Collocation for
Writers, Learners, Teachers and Testers