David Farwell, Stephen Helmreich - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

David Farwell, Stephen Helmreich

Description:

Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 7
Provided by: steve2110
Category:

less

Transcript and Presenter's Notes

Title: David Farwell, Stephen Helmreich


1
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
  • David Farwell, Stephen Helmreich
  • Computing Research Laboratory/New Mexico State
    University
  • Lori Levin, Teruko Mitamura
  • Language Technologies Institute/Carnegie Mellon
    University
  • Bonnie Dorr, Rebecca Green
  • Institute for Advanced Computer
    Studies/University of Md.
  • Eduard Hovy
  • Information Sciences Institute/University of S.
    California
  • Keith Miller, Florence Reeder
  • MITRE Corporation
  • Owen Rambow, Nizar Habash
  • Columbia University

2
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
  • What we annotate
  • multiple comparable bilingual text corpora
  • parallel text corpora
  • multiple translations of texts
  • Genre - newspaper texts / DARPA corpus
  • Goals
  • common representation (interlingua)
  • common methodology and tools
  • observe and catalogue different surface
    realizations of the same meaning across and
    within languages

3
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
4
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
5
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
  • Annotation Process
  • Text is syntactically parsed (Connexor / IL0)
  • Reviewed and corrected (TrEd)
  • Annotation to IL1 (Tiamat)
  • Content words annotated for sense (Omega)
  • Arguments annotated for thematic role (LCS)
  • 2 English translations of 6 articles
  • Arabic, French, Hindi, Japanese, Korean, Spanish
  • 12 annotators, 2 at each site
  • Total 144 annotated texts to IL1 level

6
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
  • Results Agreement Time
  • Tools (Tiamat)
  • Manuals (IL0 for 7 languages, IL1)
  • Inter-annotator agreement kappa .83 (mK), .66
    (wn), .59 (theta-roles)
  • Annotation time 4 hours/annotator/ text, 250
    words/text, 2 annotators/text approx. 2 person
    years for 100K at IL1
  • Next step merge IL1 representations and develop
    transformation algorithms to produce IL2
Write a Comment
User Comments (0)
About PowerShow.com