Transferring Coreference Chains through Word Alignment - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Transferring Coreference Chains through Word Alignment

Description:

The REs include only restrictive clauses. The term of an apposition ... The first ranked system out of 37 at ACL'05 shared task on word alignment. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 30
Provided by: oana6
Category:

less

Transcript and Presenter's Notes

Title: Transferring Coreference Chains through Word Alignment


1
Transferring Coreference Chains through Word
Alignment
  • Oana Postolache?? Dan Cristea??
    Constantin Orasan?
  • oana_at_coli.uni-saarland.de dcristea_at_infoiasi.ro
    C.Orasan_at_wlv.ac.uk
  • ?University of Saarland, Germany
  • ?Alexandru Ioan Cuza University, Romania
  • ?Institute of Computer Science, Romanian Academy,
    Romania
  • ?University of Wolverhampton, United Kingdom
  • LREC06

2
What are coreference chains?
3
What are coreference chains?
4
What are coreference chains?
5
What are coreference chains?
6
The goal
  • Automatic annotation of coreference chains for
    languages with sparser resources (Romanian).
  • Experiment

7
Roadmap
  • Description of the parallel corpus
  • Coreference information annotated
  • Experiment
  • Automatic word alignment
  • Extraction of Romanian REs corresponding to the
    English REs
  • Coreference chains transfer
  • Evaluation
  • RE evaluation
  • Coreference chains evaluation
  • Error analysis
  • Conclusions

8
Outline
  • Description of the parallel corpus
  • Coreference information annotated
  • Experiment
  • Automatic word alignment
  • Extraction of Romanian REs corresponding to the
    English REs
  • Coreference chains transfer
  • Evaluation
  • RE evaluation
  • Coreference chains evaluation
  • Error analysis
  • Conclusions

9
English-Romanian parallel corpus
  • George Orwells novel 1984
  • 6,411 sentences.
  • The English version is in the process of being
    manually annotated with coreference chains (now
    we have half of the corpus).
  • Experimental data three parts from the first
    chapter
  • 13K words.
  • 638 sentences.
  • The Romanian version manually annotated for
    evaluation purposes.

10
Coreference information annotated
  • Conformant with MUC-7 and ACE 2003.
  • Referential expressions are
  • Noun-phrases definite, indefinite undetermined
  • Proper names
  • Pronouns and wh-pronouns
  • Numerals
  • The REs include only restrictive clauses.
  • The term of an apposition is taken separately.
  • Conjoined expressions are taken individually.
  • Noun premodifiers are not marked.

11
Outline
  • Description of the parallel corpus
  • Coreference information annotated
  • Experiment
  • Automatic word alignment
  • Extraction of Romanian REs corresponding to the
    English REs
  • Coreference chains transfer
  • Evaluation
  • RE evaluation
  • Coreference chains evaluation
  • Error analysis
  • Conclusions

12
Experiment Automatic word alignment
  • We used the Romanian-English aligner COWAL (Tufis
    et al., 2006).
  • Performance 83.30 F-measure.
  • The first ranked system out of 37 at ACL05
    shared task on word alignment.

13
ExperimentExtraction of the Romanian REs
  • For an Eng RE with words e1, e2, en, we extract
    the Rom set of words r1, r2, rm, surface
    ordered.
  • Heads are transferred through the alignment from
    Eng to Rom (1n)
  • We consider the Rom RE as the span of words
    between r1 and rm.

14
(No Transcript)
15
ExperimentExtraction of the Romanian REs
  • Four situations
  • An Eng RE has a corresponding Rom RE with ONE
    head.
  • An Eng RE has a corresponding Rom RE with ONE OR
    MORE heads.
  • An Eng RE has a corresponding Rom RE with NO
    head.
  • An Eng RE has NO corresponding Rom RE.
  • Only REs conforming to 1. and 2. are considered.
  • The head of the Rom RE is taken as the leftmost
    head whose POS is Noun, Pronoun or Numeral.

16
ExperimentCoreference chains transfer
  • As the Eng REs are clustered in chains referring
    to the same entity, and we have the corresponding
    Rom REs, we simply import the clustering.
  • As not all Eng REs have a corresponding Rom RE,
    the no. of clusters between Eng and Rom may
    differ.
  • Also there are differences between the lengths of
    corresponding clusters.

17
Outline
  • Description of the parallel corpus
  • Coreference information annotated
  • Experiment
  • Automatic word alignment
  • Extraction of Romanian REs corresponding to the
    English REs
  • Coreference chains transfer
  • Evaluation
  • RE evaluation
  • Coreference chains evaluation
  • Error analysis
  • Conclusions

18
Evaluation
  • Transferred data (system) is compared against
    gold standard data (manual) for Rom.

19
Evaluation of the RE heads
  • We only consider the heads of the system REs and
    the heads of the gold standard REs.

20
Evaluation of the RE spans (1/2)
  • All REs
  • The overlaps between the system REs and the gold
    standard REs
  • 2 (wordsSystemRE ? wordsGoldRE)
  • Overlap ----------------------------------------
    ---------------
  • wordsSystemRE wordsGoldRE

21
Evaluation of the RE spans (2/2)
  • The previous numbers reflect also the penalties
    for not having a certain REs in the system, or
    having wrong REs (errors also contained in the
    heads evaluation).
  • Only correct system REs
  • System REs with a correct head against the
    corresponding gold REs.

22
Evaluation of coreference chains
  • All system REs against the gold REs.
  • The correct systems REs against the gold REs.

23
Outline
  • Description of the parallel corpus
  • Coreference information annotated
  • Experiment
  • Automatic word alignment
  • Extraction of Romanian REs corresponding to the
    English REs
  • Coreference chains transfer
  • Evaluation
  • RE evaluation
  • Coreference chains evaluation
  • Error analysis
  • Conclusions

24
Error analysis (1/3)Incorrect detection of Rom
REs
  • Wrong alignment
  • Eng. adjs/advs/verbs translated in Rom by nouns
  • En naturally sanguine face
  • Ro fata sangvina de la natura (face sanguine
    from the nature)
  • Choices of the Rom translator
  • En The actual writing would be easy
  • Ro Scrisul în sine era o treaba usoara (The
    writing itself was an easy job)
  • Eng noun premodifiers translated in Rom as
    prepositional phrase postmodifiers or possesives
  • En a forced labour camp
  • Ro un lagar de munca silnica (a camp of forced
    labour)

25
Error analysis (2/3)Errors in the spans overlap
  • Wrong alignment
  • Triggered by the choice of translation
  • En Someone with a comb and a piece of toilet
    paper was trying to keep tune with the music.
  • Ro Cineva se straduia, cu un pieptene si o
    bucata de hârtie igienica, sa tina isonul
    muzicii.
  • (Someone was trying, with a comb and a piece of
    toilet paper, to keep tune with the music.)

26
Error analysis (3/3)Incorrect detection of
coreference chains
  • Errors due to translation choice
  • En The sky was a harsh blue.
  • predicative noun subject
  • Ro Cerul era de un albastru strident.
  • (The sky was as a harsh blue.)

27
Outline
  • Description of the parallel corpus
  • Coreference information annotated
  • Experiment
  • Automatic word alignment
  • Extraction of Romanian REs corresponding to the
    English REs
  • Coreference chains transfer
  • Evaluation
  • RE evaluation
  • Coreference chains evaluation
  • Error analysis
  • Conclusions

28
Conclusions
  • What and why?
  • An automatic method for projecting coreference
    chains in parallel corpora
  • To augment the scarce resources of coreference
    information
  • A preprocessing step prior to manual correction
    in the annotation effort
  • How good?
  • References high precision (gt 95) but smaller
    recall ( 70)
  • Coreference chains relatively high F-measure (gt
    90) for correct REs

29
Acknowledgements
  • This work has emerged from a workshop in EUROLAN
    2005.
  • We are grateful to Dan Tufis and his team for
    making available to us the Romanian-English
    aligned corpus.
  • This work has been partially supported by the
    CEEX research grant ROTEL-29 of AMCSIT (Romanian
    Ministry of Education and Research code
    PC-D03-PT00-205).

Thank you!
Write a Comment
User Comments (0)
About PowerShow.com