Title: Transferring Coreference Chains through Word Alignment
1Transferring Coreference Chains through Word
Alignment
- Oana Postolache?? Dan Cristea??
Constantin Orasan? - oana_at_coli.uni-saarland.de dcristea_at_infoiasi.ro
C.Orasan_at_wlv.ac.uk - ?University of Saarland, Germany
- ?Alexandru Ioan Cuza University, Romania
- ?Institute of Computer Science, Romanian Academy,
Romania - ?University of Wolverhampton, United Kingdom
- LREC06
2What are coreference chains?
3What are coreference chains?
4What are coreference chains?
5What are coreference chains?
6The goal
- Automatic annotation of coreference chains for
languages with sparser resources (Romanian). - Experiment
7Roadmap
- Description of the parallel corpus
- Coreference information annotated
- Experiment
- Automatic word alignment
- Extraction of Romanian REs corresponding to the
English REs - Coreference chains transfer
- Evaluation
- RE evaluation
- Coreference chains evaluation
- Error analysis
- Conclusions
8Outline
- Description of the parallel corpus
- Coreference information annotated
- Experiment
- Automatic word alignment
- Extraction of Romanian REs corresponding to the
English REs - Coreference chains transfer
- Evaluation
- RE evaluation
- Coreference chains evaluation
- Error analysis
- Conclusions
9English-Romanian parallel corpus
- George Orwells novel 1984
- 6,411 sentences.
- The English version is in the process of being
manually annotated with coreference chains (now
we have half of the corpus). - Experimental data three parts from the first
chapter - 13K words.
- 638 sentences.
- The Romanian version manually annotated for
evaluation purposes.
10Coreference information annotated
- Conformant with MUC-7 and ACE 2003.
- Referential expressions are
- Noun-phrases definite, indefinite undetermined
- Proper names
- Pronouns and wh-pronouns
- Numerals
- The REs include only restrictive clauses.
- The term of an apposition is taken separately.
- Conjoined expressions are taken individually.
- Noun premodifiers are not marked.
11Outline
- Description of the parallel corpus
- Coreference information annotated
- Experiment
- Automatic word alignment
- Extraction of Romanian REs corresponding to the
English REs - Coreference chains transfer
- Evaluation
- RE evaluation
- Coreference chains evaluation
- Error analysis
- Conclusions
12Experiment Automatic word alignment
- We used the Romanian-English aligner COWAL (Tufis
et al., 2006). - Performance 83.30 F-measure.
- The first ranked system out of 37 at ACL05
shared task on word alignment.
13ExperimentExtraction of the Romanian REs
- For an Eng RE with words e1, e2, en, we extract
the Rom set of words r1, r2, rm, surface
ordered. - Heads are transferred through the alignment from
Eng to Rom (1n) - We consider the Rom RE as the span of words
between r1 and rm.
14(No Transcript)
15ExperimentExtraction of the Romanian REs
- Four situations
- An Eng RE has a corresponding Rom RE with ONE
head. - An Eng RE has a corresponding Rom RE with ONE OR
MORE heads. - An Eng RE has a corresponding Rom RE with NO
head. - An Eng RE has NO corresponding Rom RE.
-
- Only REs conforming to 1. and 2. are considered.
- The head of the Rom RE is taken as the leftmost
head whose POS is Noun, Pronoun or Numeral.
16ExperimentCoreference chains transfer
- As the Eng REs are clustered in chains referring
to the same entity, and we have the corresponding
Rom REs, we simply import the clustering. - As not all Eng REs have a corresponding Rom RE,
the no. of clusters between Eng and Rom may
differ. - Also there are differences between the lengths of
corresponding clusters.
17Outline
- Description of the parallel corpus
- Coreference information annotated
- Experiment
- Automatic word alignment
- Extraction of Romanian REs corresponding to the
English REs - Coreference chains transfer
- Evaluation
- RE evaluation
- Coreference chains evaluation
- Error analysis
- Conclusions
18Evaluation
- Transferred data (system) is compared against
gold standard data (manual) for Rom.
19Evaluation of the RE heads
- We only consider the heads of the system REs and
the heads of the gold standard REs.
20Evaluation of the RE spans (1/2)
- All REs
- The overlaps between the system REs and the gold
standard REs - 2 (wordsSystemRE ? wordsGoldRE)
- Overlap ----------------------------------------
--------------- - wordsSystemRE wordsGoldRE
21Evaluation of the RE spans (2/2)
- The previous numbers reflect also the penalties
for not having a certain REs in the system, or
having wrong REs (errors also contained in the
heads evaluation). - Only correct system REs
- System REs with a correct head against the
corresponding gold REs.
22Evaluation of coreference chains
- All system REs against the gold REs.
- The correct systems REs against the gold REs.
23Outline
- Description of the parallel corpus
- Coreference information annotated
- Experiment
- Automatic word alignment
- Extraction of Romanian REs corresponding to the
English REs - Coreference chains transfer
- Evaluation
- RE evaluation
- Coreference chains evaluation
- Error analysis
- Conclusions
24Error analysis (1/3)Incorrect detection of Rom
REs
- Wrong alignment
- Eng. adjs/advs/verbs translated in Rom by nouns
- En naturally sanguine face
- Ro fata sangvina de la natura (face sanguine
from the nature) - Choices of the Rom translator
- En The actual writing would be easy
- Ro Scrisul în sine era o treaba usoara (The
writing itself was an easy job) - Eng noun premodifiers translated in Rom as
prepositional phrase postmodifiers or possesives - En a forced labour camp
- Ro un lagar de munca silnica (a camp of forced
labour)
25Error analysis (2/3)Errors in the spans overlap
- Wrong alignment
- Triggered by the choice of translation
-
- En Someone with a comb and a piece of toilet
paper was trying to keep tune with the music. -
- Ro Cineva se straduia, cu un pieptene si o
bucata de hârtie igienica, sa tina isonul
muzicii. - (Someone was trying, with a comb and a piece of
toilet paper, to keep tune with the music.)
26Error analysis (3/3)Incorrect detection of
coreference chains
- Errors due to translation choice
- En The sky was a harsh blue.
- predicative noun subject
-
- Ro Cerul era de un albastru strident.
- (The sky was as a harsh blue.)
27Outline
- Description of the parallel corpus
- Coreference information annotated
- Experiment
- Automatic word alignment
- Extraction of Romanian REs corresponding to the
English REs - Coreference chains transfer
- Evaluation
- RE evaluation
- Coreference chains evaluation
- Error analysis
- Conclusions
28Conclusions
- What and why?
- An automatic method for projecting coreference
chains in parallel corpora - To augment the scarce resources of coreference
information - A preprocessing step prior to manual correction
in the annotation effort - How good?
- References high precision (gt 95) but smaller
recall ( 70) - Coreference chains relatively high F-measure (gt
90) for correct REs
29Acknowledgements
- This work has emerged from a workshop in EUROLAN
2005. - We are grateful to Dan Tufis and his team for
making available to us the Romanian-English
aligned corpus. - This work has been partially supported by the
CEEX research grant ROTEL-29 of AMCSIT (Romanian
Ministry of Education and Research code
PC-D03-PT00-205).
Thank you!