Error Analysis for Learning-based Coreference Resolution - PowerPoint PPT Presentation

About This Presentation
Title:

Error Analysis for Learning-based Coreference Resolution

Description:

Pronominal anaphora 80% Full-scale coreference 60% Hypothesis: tricky vs. easy anaphors ... 'new' mod for anaphora 15. compatible modifiers 58. no modifiers ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 46
Provided by: Olga176
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Error Analysis for Learning-based Coreference Resolution


1
Error Analysis for Learning-based Coreference
Resolution
  • Olga Uryupina
  • 27.05.08

2
Outline
  • CR state-of-the-art and our system
  • Distribution of errors
  • Discussion possible remedies

3
Coreference Resolution
  • This deal means that Bernard Schwartz can focus
    most of his time on Globalstar and that is a key
    plus for Globalstar because Bernard Schwartz is
    brilliant, said Robert Kaimovitz, a satellite
    communications analyst at Unterberg Harris in New
    York.
  • ..
  • Globalstar still needs to raise 600 million,
    and Schwartz said that the company would try..

4
Coreference Resolution
  • This deal means that Bernard Schwartz can focus
    most of his time on Globalstar and that is a key
    plus for Globalstar because Bernard Schwartz is
    brilliant, said Robert Kaimovitz, a satellite
    communications analyst at Unterberg Harris in New
    York.
  • ..
  • Globalstar still needs to raise 600 million,
    and Schwartz said that the company would try..

5
Coreference Resolution
  • This deal means that Bernard Schwartz can focus
    most of his time on Globalstar and that is a key
    plus for Globalstar because Bernard Schwartz is
    brilliant, said Robert Kaimovitz, a satellite
    communications analyst at Unterberg Harris in New
    York.
  • ..
  • Globalstar still needs to raise 600 million,
    and Schwartz said that the company would try..

6
Machine Learning Approaches
  • Soon et al (2000)
  • Cardie Wagstaff (1999)
  • Strube et al. (2002)
  • Ng Cardie (2001-2004)
  • ACE competition

7
Features Soon et al. (2000)
  1. Anaphor is a pronoun
  2. Anaphor is a definite NP
  3. Anaphor is an NP with a demonstrative pronoun
    (this,..)
  4. Antecedent is a pronoun
  5. Both markables are proper names
  6. Number agreement
  7. Gender agreement
  8. Alias
  9. Appositive
  10. Same surface form
  11. Semantic class agreement
  12. Distance in sentences

8
Features other approaches
  • Cardie Wagstaff 11 Features
  • Strube et al. 17 Features (the same standard
    features approximate matching (MED))
  • Ng Cardie 53 Features (no improvement on the
    extended feature set, better results (F63.4)
    with manual feature selection)

9
Performance Soon et al.
  • Soon et als system
  • Our reimlementation

C5.0, optimized 56.1 65.5 60.4
C4.5, not optimized 53.5 72.8 61.7
Ripper 44.6 74.8 55.9
SVM 50.9 68.8 58.5
MaxEnt 49.2 64.1 55.7
10
Performance Soon et al.
  • Learning Curve for C5.0

11
Tricky and easy anaphors
  • Cristea et al. (2002) state-of-the-art
    coreference resolution systems have essentially
    the same performance level
  • Pronominal anaphora 80
  • Full-scale coreference 60
  • Hypothesis tricky vs. easy anaphors

12
Our system
  • Goal
  • Bridge the gap between the theory and the
    practice
  • sophisticated linguistic knowledge data-driven
    coreference resolution algorithm

13
New Features
  • Different aspects of CR
  • Surface similarity (122 features)
  • Syntax (64)
  • Semantic Compatibility (29)
  • Salience (136)
  • (Anaphoricity)
  • More or less sophisticated linguistic theories
    exist for all these phenomena

14
Evaluation
  • Methodology
  • Standart dataset (MUC-7)
  • Standard learning set-up
  • Compare to Soon et al. (2001)

15
Performance (F)
Basic feature set Extended f. set
Soon et al., C5.0 60.4 N/A
C4.5 61.7 64.6
SVM 58.5 65.4
Ripper 55.9 57.5
MaxEnt 55.7 59.4
16
Performance
  • Learning Curve, SVM

17
Error analysis
  • Different approaches same performance
  • Same errors?
  • Tricky anaphors? (Cristea et al., 2002)
  • Extensive error analysis needed!

18
Outline
  • CR state-of-the-art and our system
  • Distribution of errors
  • Discussion possible remedies

19
Recall errors
Errors
MUC 17 3.6
Markables 166 35.4
Propagated P 31 6.6
Pronouns 77 16.4
NE-matching 31 6.6
Syntax 39 8.3
Nominal anaphora 104 22.2
total 469 100

20
Recall errors - markables
  • Auxilliary doc parts
  • Tokenization
  • Modifiers
  • Bracketing/labeling

21
Recall errors - markables
  • .. there was no requirement for tether to be
    manufactured in a contaminant-free enviroment.
  • A mesmerizing set.

22
Recall errors - pronouns
  • 1st pl reconstructing the group
  • The retiring Republican chairman of the House
    Committee on Science want U.S. Businesses to lt..gt
    We need to make it easier for the private
    sector.. Walker said
  • 3rd sg, 3rd pl (non-)salience
  • The explanation for the History Channels
    success begin with its association with another
    channel owned by the same parent consortium.

23
Recall errors - nominal
  • Mostly common noun phrases with different heads,
    WordNet does not help much
  • .. a report on the satellites findings lt..gt the
    abilities of U.S. Reconnaissance technology lt..gt
    the use of advanced intelligence-gathering tools
    lt..gt Remote-sensing instruments..

24
Precision errors
Errors
MUC 30 7.4
Markables 76 18.6
Pronouns 78 19.1
NE-matching 20 4.9
Syntax 22 5.4
Nominal anaphora 182 44.6
total 408 100

25
Precision errors- pronouns
  • incorrect Parsing/Tagging
  • Two key vice presidents, Wei Yen and Eric
    Carlson, are leaving to start their own Silicon
    Valley companies.
  • (non-)salience
  • matching (propagated R)

26
Precision errors - nominal
  • Mostly same-head descriptions. Possible
    solutions
  • modifiers?
  • anaphoricicty detectors?

27
P errors nominal - modifiers
  • Idea red car cannot corefer with blue car
  • Problem list of mutually incompatible
    properties?
  • MUC7 test data
  • incompatible modifiers 30
  • new mod for anaphora 15
  • compatible modifiers 58
  • no modifiers 62

28
P errors nominal - dnew
  • Idea identify and discard unlikely anaphors
  • Problem even a very good detector does not help

29
Outline
  • CR state-of-the-art and our system
  • Distribution of errors
  • Discussion Possible remedies

30
Discussion Errors
  • Problematic areas
  • Data
  • Preprocessing modules
  • Features
  • Resolution strategy

31
Discussion - Data
  • bigger corpus
  • more uniform doc selection, text only
  • better definition of COREF
  • better scoring

32
Discussion - Preprocessing
  • local improvements (e.g. appositions)
  • probabilistic architecture to neutralize errors

33
Discussion - Features
  • feature selection
  • ensemble learning
  • more targeted learning for under-represented
    phenomena (abbreviations)

34
Discussion - Resolution
  • less local move to the chains level
  • less uniform specific treatment for different
    types of anaphors

35
Discussion Conclusion
  • ML approaches to the Coreference Resolution yield
    similar performance values
  • Some anaphors are indeed tricky (esp. crucial for
    precision errors)
  • But some errors can be eliminated within a ML
    framework
  • improving the training material
  • elaborated integration of preprocessing modules
  • more global resolution strategies

36
  • Thank You!

37
Recall errors
Errors
MUC 17 3.6
Markables 166 35.4
Propagated P 31 6.6
Pronouns 77 16.4
NE-matching 31 6.6
Syntax 39 8.3
Nominal anaphora 104 22.2
total 469 100

38
Recall errors - MUC
  • Mainly incorrect bracketing
  • ..said ltCOREF .. MINvice presidentgtJim
    Johannesen, ltCOREF .. MINvice presidentgtvice
    president of site development for
    McDonaldslt/COREFgtlt/COREFgt..
  • Only clear typos etc considered MUC-errors

39
Recall errors propagated P
  • The company also said the Marine Corps has begun
    testing two of its radars as part of a
    short-range ballistic missile defense program.
    That testing could lead to an order for the
    radars.
  • Crucial for pronouns and indicators for
    intrasentential coreference

40
Recall errors - matching
  • Mostly ORGANIZATIONs.
  • Problems
  • Abbreviations
  • Federal Communication Commission
  • FCC
  • Hyphenated names
  • Ziff-Davis Publishing
  • Ziff
  • Foreign names
  • Taiwan President Lee Teng-hui
  • President Lee

41
Recall errors - syntax
  • Apposition, copula
  • Problems
  • Parsing mistakes
  • Missing constructions
  • ..the venture will become synonymous with JSkyB
  • P/R trade-off
  • ..Kevlar, a synthetic fiber, and Nomex..
  • Quantitative constructions
  • .. More than quadruple the three-month daily
    average of 88,700 shares

42
Precision errors
Errors
MUC 30 7.4
Markables 76 18.6
Pronouns 78 19.1
NE-matching 20 4.9
Syntax 22 5.4
Nominal anaphora 182 44.6
total 408 100

43
Precision errors - matching
  • Finer NE analysis could help, but mostly too
    difficult even for humans
  • Loral
  • Loral Space and Communications Corp
  • Loral Space
  • Space Systems Loral

44
Anaphoricity
  • Some markables are not anaphors. We can tell that
    by looking at them, without any sophisticated
    coreference resolution.
  • Poesio Vieira, Ng Cardie try to identify
    Discourse New entities automatically
  • Not used for this talk

45
Anaphoricity
  • Some markables are not anaphors. We can tell that
    by looking at them, without any sophisticated
    coreference resolution.
  • Poesio Vieira, Ng Cardie try to identify
    Discourse New entities automatically
  • Not used for this talk
Write a Comment
User Comments (0)
About PowerShow.com