Title: Error Analysis for Learning-based Coreference Resolution
1Error Analysis for Learning-based Coreference
Resolution
2Outline
- CR state-of-the-art and our system
- Distribution of errors
- Discussion possible remedies
3Coreference Resolution
- This deal means that Bernard Schwartz can focus
most of his time on Globalstar and that is a key
plus for Globalstar because Bernard Schwartz is
brilliant, said Robert Kaimovitz, a satellite
communications analyst at Unterberg Harris in New
York. - ..
- Globalstar still needs to raise 600 million,
and Schwartz said that the company would try..
4Coreference Resolution
- This deal means that Bernard Schwartz can focus
most of his time on Globalstar and that is a key
plus for Globalstar because Bernard Schwartz is
brilliant, said Robert Kaimovitz, a satellite
communications analyst at Unterberg Harris in New
York. - ..
- Globalstar still needs to raise 600 million,
and Schwartz said that the company would try..
5Coreference Resolution
- This deal means that Bernard Schwartz can focus
most of his time on Globalstar and that is a key
plus for Globalstar because Bernard Schwartz is
brilliant, said Robert Kaimovitz, a satellite
communications analyst at Unterberg Harris in New
York. - ..
- Globalstar still needs to raise 600 million,
and Schwartz said that the company would try..
6Machine Learning Approaches
- Soon et al (2000)
- Cardie Wagstaff (1999)
- Strube et al. (2002)
- Ng Cardie (2001-2004)
- ACE competition
7Features Soon et al. (2000)
- Anaphor is a pronoun
- Anaphor is a definite NP
- Anaphor is an NP with a demonstrative pronoun
(this,..) - Antecedent is a pronoun
- Both markables are proper names
- Number agreement
- Gender agreement
- Alias
- Appositive
- Same surface form
- Semantic class agreement
- Distance in sentences
8Features other approaches
- Cardie Wagstaff 11 Features
- Strube et al. 17 Features (the same standard
features approximate matching (MED)) - Ng Cardie 53 Features (no improvement on the
extended feature set, better results (F63.4)
with manual feature selection)
9Performance Soon et al.
- Soon et als system
- Our reimlementation
C5.0, optimized 56.1 65.5 60.4
C4.5, not optimized 53.5 72.8 61.7
Ripper 44.6 74.8 55.9
SVM 50.9 68.8 58.5
MaxEnt 49.2 64.1 55.7
10Performance Soon et al.
11Tricky and easy anaphors
- Cristea et al. (2002) state-of-the-art
coreference resolution systems have essentially
the same performance level - Pronominal anaphora 80
- Full-scale coreference 60
- Hypothesis tricky vs. easy anaphors
12Our system
- Goal
- Bridge the gap between the theory and the
practice - sophisticated linguistic knowledge data-driven
coreference resolution algorithm
13New Features
- Different aspects of CR
- Surface similarity (122 features)
- Syntax (64)
- Semantic Compatibility (29)
- Salience (136)
- (Anaphoricity)
- More or less sophisticated linguistic theories
exist for all these phenomena
14Evaluation
- Methodology
- Standart dataset (MUC-7)
- Standard learning set-up
- Compare to Soon et al. (2001)
15Performance (F)
Basic feature set Extended f. set
Soon et al., C5.0 60.4 N/A
C4.5 61.7 64.6
SVM 58.5 65.4
Ripper 55.9 57.5
MaxEnt 55.7 59.4
16Performance
17Error analysis
- Different approaches same performance
- Same errors?
- Tricky anaphors? (Cristea et al., 2002)
- Extensive error analysis needed!
18Outline
- CR state-of-the-art and our system
- Distribution of errors
- Discussion possible remedies
19Recall errors
Errors
MUC 17 3.6
Markables 166 35.4
Propagated P 31 6.6
Pronouns 77 16.4
NE-matching 31 6.6
Syntax 39 8.3
Nominal anaphora 104 22.2
total 469 100
20Recall errors - markables
- Auxilliary doc parts
- Tokenization
- Modifiers
- Bracketing/labeling
21Recall errors - markables
- .. there was no requirement for tether to be
manufactured in a contaminant-free enviroment. - A mesmerizing set.
22Recall errors - pronouns
- 1st pl reconstructing the group
- The retiring Republican chairman of the House
Committee on Science want U.S. Businesses to lt..gt
We need to make it easier for the private
sector.. Walker said - 3rd sg, 3rd pl (non-)salience
- The explanation for the History Channels
success begin with its association with another
channel owned by the same parent consortium.
23Recall errors - nominal
- Mostly common noun phrases with different heads,
WordNet does not help much - .. a report on the satellites findings lt..gt the
abilities of U.S. Reconnaissance technology lt..gt
the use of advanced intelligence-gathering tools
lt..gt Remote-sensing instruments..
24Precision errors
Errors
MUC 30 7.4
Markables 76 18.6
Pronouns 78 19.1
NE-matching 20 4.9
Syntax 22 5.4
Nominal anaphora 182 44.6
total 408 100
25Precision errors- pronouns
- incorrect Parsing/Tagging
- Two key vice presidents, Wei Yen and Eric
Carlson, are leaving to start their own Silicon
Valley companies. - (non-)salience
- matching (propagated R)
26Precision errors - nominal
- Mostly same-head descriptions. Possible
solutions - modifiers?
- anaphoricicty detectors?
27P errors nominal - modifiers
- Idea red car cannot corefer with blue car
- Problem list of mutually incompatible
properties? - MUC7 test data
- incompatible modifiers 30
- new mod for anaphora 15
- compatible modifiers 58
- no modifiers 62
28P errors nominal - dnew
- Idea identify and discard unlikely anaphors
- Problem even a very good detector does not help
29Outline
- CR state-of-the-art and our system
- Distribution of errors
- Discussion Possible remedies
30Discussion Errors
- Problematic areas
- Data
- Preprocessing modules
- Features
- Resolution strategy
31Discussion - Data
- bigger corpus
- more uniform doc selection, text only
- better definition of COREF
- better scoring
32Discussion - Preprocessing
- local improvements (e.g. appositions)
- probabilistic architecture to neutralize errors
33Discussion - Features
- feature selection
- ensemble learning
- more targeted learning for under-represented
phenomena (abbreviations)
34Discussion - Resolution
- less local move to the chains level
- less uniform specific treatment for different
types of anaphors
35Discussion Conclusion
- ML approaches to the Coreference Resolution yield
similar performance values - Some anaphors are indeed tricky (esp. crucial for
precision errors) - But some errors can be eliminated within a ML
framework - improving the training material
- elaborated integration of preprocessing modules
- more global resolution strategies
36 37Recall errors
Errors
MUC 17 3.6
Markables 166 35.4
Propagated P 31 6.6
Pronouns 77 16.4
NE-matching 31 6.6
Syntax 39 8.3
Nominal anaphora 104 22.2
total 469 100
38Recall errors - MUC
- Mainly incorrect bracketing
- ..said ltCOREF .. MINvice presidentgtJim
Johannesen, ltCOREF .. MINvice presidentgtvice
president of site development for
McDonaldslt/COREFgtlt/COREFgt.. - Only clear typos etc considered MUC-errors
39Recall errors propagated P
- The company also said the Marine Corps has begun
testing two of its radars as part of a
short-range ballistic missile defense program.
That testing could lead to an order for the
radars. - Crucial for pronouns and indicators for
intrasentential coreference
40Recall errors - matching
- Mostly ORGANIZATIONs.
- Problems
- Abbreviations
- Federal Communication Commission
- FCC
- Hyphenated names
- Ziff-Davis Publishing
- Ziff
- Foreign names
- Taiwan President Lee Teng-hui
- President Lee
41Recall errors - syntax
- Apposition, copula
- Problems
- Parsing mistakes
- Missing constructions
- ..the venture will become synonymous with JSkyB
- P/R trade-off
- ..Kevlar, a synthetic fiber, and Nomex..
- Quantitative constructions
- .. More than quadruple the three-month daily
average of 88,700 shares
42Precision errors
Errors
MUC 30 7.4
Markables 76 18.6
Pronouns 78 19.1
NE-matching 20 4.9
Syntax 22 5.4
Nominal anaphora 182 44.6
total 408 100
43Precision errors - matching
- Finer NE analysis could help, but mostly too
difficult even for humans - Loral
- Loral Space and Communications Corp
- Loral Space
- Space Systems Loral
44Anaphoricity
- Some markables are not anaphors. We can tell that
by looking at them, without any sophisticated
coreference resolution. - Poesio Vieira, Ng Cardie try to identify
Discourse New entities automatically - Not used for this talk
45Anaphoricity
- Some markables are not anaphors. We can tell that
by looking at them, without any sophisticated
coreference resolution. - Poesio Vieira, Ng Cardie try to identify
Discourse New entities automatically - Not used for this talk