Designing TestBeds for General Anaphora Resolution - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Designing TestBeds for General Anaphora Resolution

Description:

Designing Test-Beds for General Anaphora Resolution. Oana Postolache. oana_at_coli.uni-sb.de ... AR is always a key component of other NLP processes (ex. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 52
Provided by: cris50
Category:

less

Transcript and Presenter's Notes

Title: Designing TestBeds for General Anaphora Resolution


1
Designing Test-Beds for General Anaphora
Resolution
  • Oana Postolache
  • oana_at_coli.uni-sb.de
  • University of Saarland, Saarbrücken, GermanyAl.
    I. Cuza University of Iasi, Romania

Dan Cristea dcristea_at_infoiasi.ro Al. I. Cuza
University of Iasi, Romania ICS - Romanian
Academy, the Iasi Branch, Romania
2
Motivation
  • AR is always a key component of other NLP
    processes (ex. summarisation, IE, Q/A)
  • In the larger setting is it often of importance
    to measure the degree in which a component
    degrade the overall performance of the system
  • Ex the detection of markables alone, the AR
    component alone, etc.

3
Aims
  • Propose a methodology for detection of
    bottlenecks in a pipe-line NLP system
  • Experiments with an architecture made of a
    markable detection module and an AR resolution
    module
  • Propose a methodology of evaluation of the
    behavior of such a system when markables are not
    given
  • Reports recent results of a markable detection
    module and an AR resolution module on two types
    of input

4
Evaluation of a minimum AR system
RE-extractor
AR-engine
5
Evaluation of a minimum AR system
Test the whole system globally
Test the RE-extractor
RE-extractor
AR-engine
Test only the AR-engine
6
Our corpora
  • A plain text corpus of approx. 19,500 words in
    1,966 sentences, extracted from the Orwells
    novel 1984 (Orwell, 1949)
  • A manually annotated corpus for syntactic
    structure containing approx. 6,250 words in 281
    sentences, extracted from the English Penn
    Treebank (Marcus et al., 1994).

7
Markables
  • Generally, conformant with MUC-7 and ACE criteria
  • Differences
  • do not include relative clauses
  • each term of an apposition is taken separately
    (Big Brother, the primal traitor)
  • conjoined expressions are annotated individually
    (John and Mary, hills and mountains)
  • modifying nouns appearing in noun-noun
    modification are not marked separately (glass
    doors, prison food, the junk bond market)

8
Markables
  • What do we mark?
  • noun phrases
  • definite (the principle, the flying object)
  • indefinite (a book, a future star)
  • undetermined (sole guardian of truth)
  • names (Winston Smith, The Ministry of Love)
  • dates (April)
  • currency expressions (40)
  • percentages (48)
  • pronouns
  • personal (I, you, he, him, she, her, it, they,
    them)
  • possessive (his, her, hers, its, their, theirs)
  • reflexive (himself, herself, itself, themselves)
  • demonstrative (this, that, these, those)
  • wh-pronouns when they replace an entity (which,
    who, whom, whose, that)
  • numerals
  • when they refer to entities (four of them, the
    first, the second)

9
The Orwell corpus
  • Chapters 1, 2, 3 and 5 from George Orwells
    Ninety eighty four
  • Automatic detection of markables
  • POS-tagging
  • FDG parser
  • markable any construction dominated by a
    noun/pronoun
  • detection of head and lemma (given)
  • deletion of relative clauses

10
The Orwell corpus dimension
11
The Penn Treebank corpus
  • 7 files from WSJ
  • Extraction of markables from the PTB-style
    constituency trees
  • Collins rules to extract head
  • WordNet script for lemma
  • Dependency links between words

12
The Penn Treebank corpus
13
AR-engine the architecture
14
Terminology
text layer .
REa
REb
REc
REd
REx
PSx
projection layer
DEm
DEj
DE1
semantic layer
15
Terminology
text layer .
REa
REb
REc
REd
REx
PSx
projection layer
DEm
DEj
DE1
semantic layer
16
Terminology
text layer .
REa
REb
REc
REd
REx
PSx
projection layer
DEm
DEj
DE1
semantic layer
17
What is an AR model?
text layer .
REa
REb
REc
REd
REx
PSx
projection layer
DEm
DEj
DE1
semantic layer
18
Phases of the engine
projection phase
19
Phases of the engine
proposing/evoking phase

REa


20
Phases of the engine
proposing/evoking phase
text layer .
projection layer ..
semantic layer.
21
Phases of the engine
completion phase

REa


22
Phases of the engine
completion phase

REa


DEa
23
Phases of the engine
completion phase
text layer .
projection layer ..
semantic layer.
24
Phases of the engine
completion phase
text layer .
projection layer ..
semantic layer.
25
Phases of the engine
re-evaluation phase
.
REb
REa
REc
text layer .

projection layer ..
PSb
PSc
?
semantic layer.
..
DEa
26
Phases of the engine
re-evaluation phase
.
REb
REa
REc
text layer .

projection layer ..
PSb
PSc
semantic layer.
..
DEa
27
Phases of the engine
re-evaluation phase
.
REb
REa
REc
text layer .

projection layer ..
PSb
PSc
semantic layer.
..
DEa
28
Phases of the engine
re-evaluation phase
.
REb
REa
REc
text layer .

projection layer ..
PSb
PSc
semantic layer.
..
DEa
29
Phases of the engine
re-evaluation phase
.
REb
REa
REc
text layer .

projection layer ..
PSb
PSc
semantic layer.
..
DEa
30
Phases of the engine
re-evaluation phase
.
REb
REa
REc
text layer .

projection layer ..
semantic layer.
..
DEa
31
Our model primary attributes
  • Lexical morphological
  • lemma
  • number
  • POS
  • headForm
  • Syntactic
  • synt-role
  • dependency-link
  • npText
  • includedNPs
  • isDefinite, isIndefinite,
  • predNameOf
  • Semantic
  • isMaleName, isFemaleName, isFamilyName, isPerson
  • HeSheItThey
  • Positional
  • offset
  • sentenceID

32
Our model knowledge sources
  • For each attribute there is a knowledge source
    that fetches the value using
  • The POS tagger output
  • The FDG structure
  • Large name databases
  • The WordNet hierarchy
  • Punctuation

33
Knowledge sources - HeSheItThey
  • HeSheItThey Phe, Pshe, Pit, Pthey
  • for pronouns straightforward
  • for NPs
  • n synsets of the head
  • f synsets which are hyponyms of ltfemalegt
  • m synsets which are hyponyms of ltmalegt
  • p synsets which are hyponyms of ltpersongt
  • If NP is plural Phe0, Pshe0, Pit0, Pthey1
  • Else Phe , Pshe ,
    Pit , Pthey0

34
Knowledge sources - wh
  • Source for detecting the referee of a wh-pronoun
  • Case1
  • I saw a blond boy who was playing in the
    garden.
  • Case2
  • The colour of the chair which was underneath
    the table
  • The atmosphere of happiness which she carried
    with her.

35
Our model rules
  • Demolishing rules
  • IncludingRule prohibits coreference between
    nested REs
  • Certifying rules
  • PredNameRule
  • ProperNameRule
  • Promoting/demoting rules
  • HeSheItTheyRule
  • RoleRule
  • NumberRule
  • LemmaRule
  • PersonRule
  • SynonymyRule
  • HypernymyRule
  • WordnetChainRule

36
Our model domain of referential accessibility
  • Linear

37
Evaluation of the RE-extractor
Test the RE-extractor
RE-extractor
AR-engine
When a gold-test pair of markables match?
38
Evaluation of the RE-extractor
Test the RE-extractor
RE-extractor
AR-engine
markable
gold
  • When a gold-test pair of markables match?
  • head matching (HM) if they have the same head

test
markable
39
Evaluation of the RE-extractor
Test the RE-extractor
P, R, F
RE-extractor
AR-engine
l1
gold
  • When a gold-test pair of markables match?
  • partial matching (PM) if they have the same
    head and the mutual overlap is higher than 50
    (compared to the longest span)

test
l2
l2 / l1 gt 0.5
40
Evaluation of the AR-engine
  • Same set of markables (on the identity of head
    criterion)
  • For each anaphor in the gold
  • If it belongs to a chain that doesnt contain any
    other anaphor, then we look in the test set to
    see if it belongs to a similar trivial chain, in
    which case it will take the value 1

i
Ci 1
test
41
Evaluation of the AR-engine
  • Same set of markables (on the identity of head
    criterion)
  • For each anaphor in the gold
  • If it belongs to a chain that doesnt contain any
    other anaphor, then we look in the test set to
    see if it belongs to a similar trivial chain,
    otherwise it will get the value 0

i
Ci 0
test
42
Evaluation of the AR-engine
  • Same set of markables (on the identity of head
    criterion)
  • For each anaphor in the gold
  • If the anaphor belongs to a chain containing
    other n anaphors, then we look in the test set
    and count how many of these n anaphors belong to
    the chain corresponding to the current test-set
    anaphor (we note this number with m). The ratio
    m/n will be the value assigned to the current
    anaphor.

i
1
1
1
gold
ci 2/3
test
0
1
1
43
Evaluation of the AR-engine
  • Same set of markables (on the identity of head
    criterion)
  • For each anaphor in the gold
  • If the anaphor belongs to a chain containing
    other n anaphors, then we look in the test set
    and count how many of these n anaphors belong to
    the chain corresponding to the current test-set
    anaphor (we note this number with m). The ratio
    m/n will be the value assigned to the current
    anaphor.
  • Then we add this number for all anaphors and
    divide by no. of anaphors ?ci / N

i
1
1
1
gold
ci 2/3
test
0
1
1
44
Evaluation of the AR-engine working on
coreferences
RE-extractor
AR-engine
45
Evaluation of the whole system
  • Possibly different set of markables, identified
    on the identity of head criterion and, where
    found both in gold and test, possibly different
    spans
  • same global formula but the contribution of each
    markable is factored by the mutual overlapping
    score, showing the test versus gold overlapping
    of markables

a
gold
mosi b/a
test
b
46
Evaluation of the whole system
  • Possibly different set of markables, identified
    on the identity of head criterion and, where
    found both in gold and test, possibly different
    spans
  • same global formula but the contribution of each
    markable is factored by the mutual overlapping
    score, showing the test versus gold overlapping
    of markables

i
1
1
1
gold
ci 1.2/3
test
0.5
0
0.7
R ?ci / Ng
47
Evaluation of the whole system
  • Possibly different set of markables, identified
    on the identity of head criterion and, where
    found both in gold and test, possibly different
    spans
  • misses (failings to find certain markables)
    influence R
  • false-alarms (markables erroneously considered
    in the test) influence P

i
1
1
1
gold
test
0
0.7
false-alarm
miss
48
Evaluation of the whole system
RE-extractor
AR-engine
49
Commentaries
  • RE-extractor module gives better results on PTB
    than on Orwell
  • human syntactic annotation versus automatic FDG
    structure detection
  • AR module gives slightly better results on PTB
    than on Orwell
  • news (finance) versus belles-lettres
  • heads in PTB extracted by rules relying on the
    human syntactic annotation, in Orwell extracted
    by rules relying on the FDG results
  • Difficult to compare with other
    approaches/authors
  • apparently we are in the upper class
  • BUT not the same corpus, not the same evaluation
    metric

50
Conclusions
  • propose a methodology to evaluate pipe-line
    architectures when the gold and test data are
    available in-between intermediate steps in the
    processing chain. The method allows to appreciate
    the contribution of individual modules
    irrespective of the depreciation of the results
    due to the weakness of the contributing modules
  • report and compare new coreference resolution
    results on input belonging to two different
    registers belles-lettres and news, and to two
    different types of input plain-text and treebank
    annotation
  • introduce a method to evaluate a coreference
    resolution module when the markables in test and
    gold differ not only by number but also by span
  • the coreference resolution model uses a new
    heuristic based on WordNet (the HeSheItThey
    metric a kind of natural gender for nouns)
    which helps a lot.

51
Thank you
Write a Comment
User Comments (0)
About PowerShow.com