Title: ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca Green, Nizar Habash, Stephen Helmreich, Eduard Hovy, Lori Levin, Keith Miller, Teruko
1ACL Birds of a FeatherCorpus Annotation with
Interlingual ContentInterlingual Annotation of
Multilingual Text Corpora Bonnie Dorr, David
Farwell, Rebecca Green, Nizar Habash, Stephen
Helmreich, Eduard Hovy,Lori Levin, Keith Miller,
Teruko Mitamura, Owen Rambow, Florence Reeder,
Advaith Siddharthan CMU, Columbia University,
ISI/USC, Mitre, New Mexico State University,
University of Maryland
2Theory
- Goal 1 Define a semantic interlingual (IL)
representation that can be used for annotation - Goal 2 Use IL to semantically annotate a
multilingual parallel corpus - Basic Premise definition of IL is informed by
comparing multiple languages and multiple English
translation per foreign-language text
3Annotations Multi-Layered Representation
- IL0 Normalized deep-syntactic dependency
- IL1 IL0 structure semantic annotations from
Omega ontology - IL2 Unifies different IL1 for semantically
similar sentences structurally, a forest of
dependencies with semantic annotations from Omega
ontology, plus coreference - ILmore whatever is unhandled so far
4Notation
Sheikh Mohamed, who is also the Defense Minister
of the United Arab Emirates, announced at the
inauguration ceremony that we want to make Dubai
a new trading center.
5Notation
Sheikh Mohamed, who is also the Defense Minister
of the United Arab Emirates, announced at the
inauguration ceremony that we want to make Dubai
a new trading center.
6Notation
In progress
Coreference Not Shown
Sheikh Mohamed, who is also the Defense Minister
of the United Arab Emirates, announced at the
inauguration ceremony that we want to make Dubai
a new trading center.
7Languages
- Seven Languages
- Arabic, French, Hindi, Japanese, Korean, Spanish
as source languages English as a target language - Domains and Genres
- Economic News
- Total source corpus of about one million words
- 125 source news articles in each language
- Three English professional translation for each
article
8Annotation Support Resources Built
- Annotation Manuals
- Seven IL0 Manuals (English Completed, Foreign in
progress) - One IL1 Manual
- IL2 Manual (in progress)
- Annotation Tools
- Created Tiamat for Annotation
- Reused TrEd tree editor from Prague as is
(thanks!)
9Completed Annotations
- Completed six pairs of English translations (250
words apiece) from each of the source languages
for IL1 level - Ten annotators were asked to annotate nouns,
verbs, adjectives and adverbs only with Omega
concepts - Annotators selected one or more concepts from
both WordNet and Mikrokosmos-derived nodes
10Inter-annotator Agreement
For 95 completed Annotations
Annotrs Agreement Kappa
MikroKosmos 3.50 0.745 0.743
WordNet 6.08 0.660 0.657
Theta Roles 5.75 0.538 0.509
11Planned Production Rate
- Ed, David ?
- Future Plans
- Completed first year of a three-year project
subject to Renewal
12Potential Collaboration
- Share resources
- Tools
- Manuals
- Use a common corpus
- Future comparative analysis
- Discussions
- AMTA 2004 IL workshop
- Other venues