Multi-lingual - PowerPoint PPT Presentation

About This Presentation
Title:

Multi-lingual

Description:

to put together linguistics (linguists) and computer ... Temporal expressions extractor (French) Large archive of bilingual (French-Bulgarian) texts ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 10
Provided by: nikiv
Category:

less

Transcript and Presenter's Notes

Title: Multi-lingual


1
Multi-lingual multi-institutional distant
learning
  • Example of an international master programme in
    Computational Linguistics
  • 14-16 November, Blaubeuren, Germany

Nikolai Vazov Sofia University
Kiril Simov LML - BAS BulTreeBank project
Petya Ossenova Sofia University BulTreeBank
project
2
General goals of the programme
  • to put together linguistics (linguists) and
    computer technologies (computer scientists)
  • to put together foreign and local expertise
  • to promote international multi-lingual
    cooperation in order to develop multi-lingual
    language electronic resources

3
Programme participants (1-2 year)
  • Two project managment partners
  • French Ministry of Foreign Affairs
  • French Cultural Institute in Sofia
  • Two academic partners
  • University of Sofia (2 departments)
  • University of Paris IV - Sorbonne

4
Programme participants (3 year)
  • Three project managment partners
  • French Ministry of Foreign Affairs
  • French Cultural Institute in Sofia
  • Agence Universitaire de la Francophonie
  • Six academic partners
  • University of Sofia (3 departments)
  • University of Paris IV - Sorbonne
  • LML - Bulgarian Academy of Sciences
  • University of Montréal (RALI OLST)
  • University of Iasi (Romania)
  • RACAI (Romania)

5
Organisation (educational activities)
  • Foreign participants
  • 1-3 intensive teaching sessions
  • distant follow up between the sessions
  • distant examination
  • Local participants
  • successive modules (1-3 weeks each) accompanied
    by web-based courses
  • distant tutorship after the course - individual
    work with students (the format 8 students/15
    professors allows for it)
  • on-line personal library for each student
    (articles to read and discuss with the other
    participants)
  • distant examination

6
Organisation (research activities)
  • Carried out as individual tasks with twofold
    impact
  • development of personal skills in manipulating
    electronic text data (using CLaRK, Perl, MySQL,
    XML, HTML)
  • integration of individual tasks into the main
    goal of the team - creation of mono- and
    multi-lingual electronic resources

7
Organisation (research activities)
  • Examples
  • writing tokenizers for French (solved in
    TreeTagger)
  • sentence boundaries identification (not entirely
    handled by TreeTagger, but indispensable for
    parallel corpora)
  • named entity recognition
  • temporal expression extraction
  • abbreviations identification
  • parenthetic expressions identification
  • concordances (Bulgarian French)

8
On-line ressources and tools
Developed by the team
Other available ressources
  • CLaRK system
  • Morphological dictionary for Bulgarian
  • Large tagged corpus of Bulgarian
  • Concordances (French, Bulgarian)
  • Temporal expressions extractor (French)
  • Large archive of bilingual (French-Bulgarian)
    texts
  • Large tagged corpus FRANTEXT
  • Large bilingual (French-English) aligned corpus
    Hansard
  • Taggers (TreeTagger and LATL)
  • Le Monde sur CD-ROM (with integrated search
    engine)

9
Future work
  • New (better targeted) master  Electronic
    language resources 
  • Goals of the master defined the other way around
    research needs determine the course content and
    not vice versa
  • Envisaged product parallel French-Bulgarian
    corpus with named entity identification
  • So far collection of parallel texts, development
    of proper names transcription module
Write a Comment
User Comments (0)
About PowerShow.com