BancTrad: a parallel annotated corpora web interface - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

BancTrad: a parallel annotated corpora web interface

Description:

Faculty of Translation and Interpreting. Universitat Pompeu Fabra ... stochastic-based machine translation systems. Applications. References ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 15
Provided by: Admin644
Category:

less

Transcript and Presenter's Notes

Title: BancTrad: a parallel annotated corpora web interface


1
BancTrad a parallel annotated corpora web
interface
T. Badia, G. Boleda, C. Colominas, M. Garmendia,
A. Gonzàlez, M. Quixal Faculty of Translation
and Interpreting Universitat Pompeu Fabra II
International Conference on Specialised
Translation March 2002 The authors wish to
thank Stefan Bott for his techical help
2
Contents
  • Overview
  • Extralinguistic mark-up and alignment
  • Linguistic analysis
  • Search machine architecture
  • Demo search possibilities and applications

3
Overview
  • BancTrad a parallel annotated corpora web
    interface
  • Languages Catalan, Spanish, English, German,
    French
  • Sources work done in translation courses,
    publishing houses, Internet

4
Mark-up and alignment
Extra-linguistic mark-up Original and
translation references Register (colloquial,
standard, learned) Type of text (normative,
descriptive, literary...) Genre ... Alignment
(DéjaVu, Atril)
5
Mark-up and alignment
Extra-linguistic mark-up Original and
translation references Register (colloquial,
standard, learned) Type of text (normative,
descriptive, literary...) Genre ... Alignment
(DéjaVu, Atril)
6
MS Word form for the mark-up of extralinguistic
features
7
Mark-up and alignment
Extra-linguistic mark-up Original and
translation references Register (colloquial,
standard, learned) Type of text (normative,
descriptive, literary...) Genre ... Alignment
(DéjaVu, Atril)
8
Alignment with Déjà Vu Maintenance (Atril)
9
Linguistic analysis
  • Tagger for Catalan
  • CATCG (UPF Badia et al. 2000)
  • linguistic
  • lemma, POS, syntactic function
  • Tagger for French, German and English
  • TreeTager (IMS Stuttgart Schmidt 1997)
  • stochastic
  • lemma, POS

10
The search engine (I)
Text to corpus Corpus Workbench (CWB) tools
(IMS Stuttgart Christ 1994 ) Corpus query
tool CQP (Corpus Query Processor), from the CWB
11
The search engine (II)
Query routing through the client/server
architecture (query from left to right, results
the other way round)
12
Search possibilities
Three levels of expertise a) Basic mode
sequences of specific word forms b) Intermediate
mode sequences of five words with possibly
annotations form, lemma, morphosyntactic tag,
and syntactic function c) Expert mode queries
expressed in the syntax of CQP (using regular
expressions) http//glotis.upf.es
13
Applications
  • Teaching
  • Translating
  • Doing research (linguistics, translation theory)
  • Creating further language resources
  • multilingual dictionaries
  • chunkers
  • stochastic-based machine translation systems

14
References
Badia, T., G. Boleda, M. Quixal, E. Bofias (2001)
A modular architecture for the processing of free
text, in Proceedings of the Workshop on Modular
Programming applied to NLP at EUROLAN 2001,
Iasi Christ, Oliver (1994) "A modular and
flexible architecture for an integrated corpus
query system", COMPLEX'94, Budapest http//www.ims
.uni-stuttgart.de/projekte/CorpusWorkbench Schmid
, Helmut (1997) Probabilistic Part-of-Speech
Tagging Using Decision Trees, in Daniel Jones and
Harold Somers, editors, New Methods in Language
Processing Studies in Computational Linguistics,
UCL Press, London, pp. 154-164 http//www.ims.uni-
stuttgart.de/projekte/corplex/TreeTagger/DecisionT
reeTagger.html
Write a Comment
User Comments (0)
About PowerShow.com