Title: Nicoletta Calzolari
1CLARIN and FLaReNet new European Initiatives
for Language Resources and Language Technologies
- Nicoletta Calzolari
- Istituto di Linguistica Computazionale del CNR,
Pisa, Italy - glottolo_at_ilc.cnr.it
2Today, many vitality success signs for LRs
- In Spoken, Written, Multimodal areas in
new emerging areas - Statistical approaches
- Different dimensions layers Content
(Ontologies), Emotion, Time, - For Evaluation
- For Training
-
- LREC (gt 900 submissions) many LRs at COLING and
even at ACL!! - ELRA (self-sustaining) LDC
- LRE (new Journal N. Ide NC)
- ISO-TC37-SC4/WG4 (International Standards for
LRs) - AFNLP
- ESFRI - CLARIN (also political strategic role)
- New calls or initiatives in EU, US, ASIA, on LRs,
interoperability, cooperation,
3BUT an important point
- In the 90s
- There was a global vision of the field its main
components - Standards
- Creation of LRs
- Distribution
- Then
- Automatic acquisition
towards the Infrastructure of LRs LT
ELRA
LDC
- While today
- There is an ever increasing set of initiatives
for new LRs, basic robust technologies, models??,
algorithms, - We have a LR community culture
- BUT sort of scattered, opportunistic, not much
coherence
4Today
- The wealth of data of basic technologies is
such that - We should reflect again at the field as a whole
ask if - Standards
- Creation of LRs
- Automatic acquisition
- Distribution
- are still the important components,
- or how they have changed/must change
- Collaborative creation Manag.
could be at the basis of a new Paradigm for LRs
LT of a new Infrastructure ??
Which new challenges towards a new more
mature infrastructure of LRs LTs??
5ISO LMF Lexical Markup Framework
Builds also on EAGLES/ISLE
Structural skeleton, with the basic hierarchy of
information in a lexical entry
various extensions
LMF specs comply with modeling UML principles an
XML DTD allows implementation
NEDO Asian Lang.
The field is mature
NICT Language-Grid Service Ontology
from Monica Monachini
6XML based Abstract Lexicon Interchange Format
Mapping exercise
Entries from existing lexicons have been mapped
to LMF to prove that the model is able to
represent many best practices and achieve
unification
- Major best practices
- OLIF
- PAROLE/SIMPLE
- LC-Star
- WordNet - EuroWordNet
- FrameNet
- BDef formal database of lexicographic definitions
derived from Explanatory Dictionary of
Contemporary French -
- others on the way
from Monica Monachini
7Lexical WEB Content Interoperability ?
Standards
- As a critical step for semantic mark-up in the
SemWeb
WordNets
NomLex
WordNets
ComLex
WordNets
with intelligent agents
SIMPLE
LMF
Lex_x
FrameNet
Lex_y
Standards for Interoperability
Enough??
8Need of tools to make this vision operational
concrete
- New prototype LeXFlow
- (http//xmlgroup.iit.cnr.it98/MILE/lexflow/demo.x
html) - web-based collaborative environment for
semi-automatic management/integration of lexical
resources - enabling interoperability of distributed lexical
resources - accessed by different types of agents
- From Language Resources
- To Language Services
9Architecture for cooperative integration of
lexicons
Agent Role3
Agent Role1
Agent Role4
Agent Role2
Coordination
Web service Interface
Simple-Wordnet Relation Calculator
Application
MultiWordnet Relation Calculator
Web service Interface
Italian Simple
Italian Wordnet
Chinese Wordnet
ILI Mapper
Relation Mapper
Data
10parte, tratto N12348
iperonimia/HYP
A new proposed mero relation
passaggio, strada,via N1290
meronimy/MPT
curvatura, svolta,curva N20944
iponimia/HPO
carreggiata N21225
Synonym
Derived
ILI1.5-3001757-n road,route ILI1.6-3243979-n
ILI1.5-8488101-n bend,crook,turn ILI1.6-9992072-n
ILI1.5-2857000-n passage ILI1.6-3092396-n
ILI1.5-5691718-n stretch ILI1.6-???
ILI1.5-3002522-n roadway ILI1.6-3245327-n
Synonym
Reinforcement validity
tong_dao (?? ) N03092396
??(??)?_? /HYP
che_dao (?? ) N3245327
dao_lu,dao,lu (??,? ,? ) N03243979
??(??)?_? /HPO
wan (? ) N9992072
??_???_? /MPT
11LexFlow
- Architecture for making distributed wordnets
interoperable - It lends itself to different applications in LR
processing - Enrichment of existing lexical resources
- Creation of new resources
- Validation of existing resources
- Can provide a platform for cooperative
collective creation management of LRs, by
providing a web-based environment for the
collaboration interaction of distributed agents
and resources - Prototype of a web application supporting the
GlobalWordNet Grid initiative, i.e. a shared
multi-lingual knowledge base for cross-lingual
processing based on distributed resources over
the Grid
New projectKYOTO
12Some steps for a new generation of LRs
- From huge efforts in building static,
large-scale, general-purpose LRs - To non-static LRs rapidly built on-demand,
tailored to spefic user needs - From closed, locally developed and centralized
resources - To LRs residing over distributed places,
accessible on the web, choreographed by agents
acting over them - From Language Resources
- To Language Services
13UIMA at ILC
- Create an infrastructure to allow
- Distributed access to resources
- Creation of shared resources
- Use of methods to access NLP technologies
- Integrate available software via Web Services
- Standardise resources to be accessed from other
research centers
14Distributed Language Services
- A long-term scenario implying
- content interoperability standards,
- supra-national cooperation and
- development of architectures enabling
accessibility - Create new resources on the basis of existing
- Exchange and integrate information across
repositories - Compose new services on demand
- Collaborative collective/social development and
validation, cross-resource integration and
exchange of information
Language Grid
Wiki
15Many dimensions around the notion of language
finally
- We need to put together
- technical,
- organisational,
- strategic,
- economic,
- political issues of LRs
Two new European Infrastructural Networking
Initiatives
Multilingualism
Political issues e.g. a commonly agreed list of
minimal requirements for national LRs BLARK
Need of bodies for a broad research agenda
strategic actions for LTLRs (W/S /MM) based on
all the dimensions
Interdisciplinarity Multidisciplinarity
- Cultural issues
- Language and cultural identity
- Language and the Humanities
- Economic,
- social issues
- Applications
- Services
Technical issues
16Which Communities?
Technologies exist, but the infrastructure that
puts them together and sustains them is still
missing
for
- Humanities
- Social Sciences
- Digital Libraries
- Cultural Heritage
core
- Language Resources
- Language Technologies
- Standardisation
Enabling infrastr
CLARIN ResInfra
FLaReNet Network
Multilinguality
on
- Grid
- Semantic Web
- Ontologists
- ICT
Focus on cooperation
- Many application domains
- (eculture, egovernment, ehealth, )
for
17CLARIN
ESFRI Research Infrastructures
Common Language Resources and Technologies
Infrastructure for the Humanities Social
Sciences
- Large-scale pan-European collaborative effort
(31 countries) - Make LRs LTs available readily usable to
scholars of humanities social sciences ( all
disciplines) - Need to overcome the present fragmented situation
by harmonising structural and terminological
differences - Basis is a Grid-type infrastructure and Semantic
Web technology - The benefits of computer enhanced language
processing become available only when a critical
mass of coordinated effort is invested in
building an enabling infrastructure, which can
provide services in the form of provision of
tools resources as well as training
counseling across a wide span of domains - The infrastructure will be based on a number of
resource, service and expertise centres
18CLARIN Mission
- Create a comprehensive and free to use
distributed archive of LRs LTs covering not
only the languages of all member states, but also
other languages studied and used in Europe - Through the fact that the tools resources will
be interoperable across languages domains,
contribute to preserving and supporting
multilingual multicultural European heritage - An operational open infrastructure of web
services will introduce a new paradigm of
distributed collaborative development - Allow many contributors to add all kinds of new
services based on existing ones, thus ensuring
reusability and allowing scaling up to suit
individual needs
19How can we tackle these challenges?
- J. Taylor
- eScience is about global collaboration in
- key areas of science and the next generation
- of infrastructures that will enable it
- Need to build new types of platforms
- to allow researchers to combine existing
resources easily to new ones to tackle the big
challenges - to increase the productivity of all interested
researchers, since currently too much time is
wasted by preparatory work
from P. Wittenburg
20- CLARIN establishes such a new generation of
extended infrastructure - Thus CLARIN is not about creating and building
new language resources and technology, but - making them available and accessible
- as services
- in a stable and persistent infrastructure
- to allow tackling the great challenges
- CLARIN http//www.clarin.eu
- Grid Project http//www.mpi.nl/dam-lr
- ISO TC37/SC4 http//www.tc37sc4.org
- Standards Project http//lirics.loria.fr/
from P. Wittenburg
21We have still a long path
also a new project
- in an e-Contentplus Call for a
- Thematic Network on Language Resources
- FLaReNet
- To provide common recommendations (to the EC) for
future actions - To give priorities
- Need of visions
In a global context, in cooperation with CLARIN
also with non-EU members
22Which Communities?
LRs LTs exist, but a global vision, policy and
strategy is still missing
for
- Humanities
- Social Sciences
- Digital Libraries
- Cultural Heritage
core
- Language Resources
- Language Technologies
- Standardisation
- Ontologists
- Content
CLARIN ResInf
EU Forum
FLaReNet Network
Multilinguality
Focus on cooperation
for
- Many application domains
- (eculture, egovernment, ehealth,
intelligence, domotics, content industry, )
for
23FLaReNet Fostering Language Resources Network
- A European forum
- to facilitate interaction among LR stakeholders
- The Network structure considers that LRs present
various dimensions and must be approached from
many perspectives - technical, but also
- organisational
- economic
- legal
- political
- Addresses also
- multicultural and multilingual aspects, essential
when facing access and use of digital content in
todays Europe
24Organised in Thematic Working Groups
- A layered structure, with leading experts
groups (national and European institutions, SMEs,
large companies) for all relevant LR areas (about
40 partners) - in collaboration with CLARIN
- to ensure coherence of LR-related efforts in
Europe - FLaReNet will
- consolidate existing knowledge, presenting it
analytically and visibly - contribute to structuring the area of LRs of the
future by discussing new strategies to - convert existing and experimental technologies
related to LRs into useful economic and societal
benefits - integrate so far partial solutions into broader
infrastructures - consolidate areas mature enough for
recommendation of best practices - anticipate the needs of new types of LRs
25Thematic Areas
- The Chart for the area of LRs in its different
dimensions - Methods and models for LR building, reuse,
interlinking and maintenance - Harmonisation of formats and standards
- Definition of evaluation protocols and evaluation
procedures - Methods for the automatic construction and
processing of LRs
- To build together
- Evolving RoadMap
- Blueprint of actions and infrastructures
26Objectives expected results
- The largest Network of LR and HLT players, with
diverse approaches, efforts and technologies - Enable progress toward community consensus
- Give an extended picture of LRs recast its
definition in the light of recent scientific,
methodological, technological, social
developments - Consolidate methods approaches, common
practices, frameworks and architectures - A roadmap identifying areas where consensus has
been achieved or is emerging vs. areas where
additional discussion and testing is required,
together with an indication of priorities - Recommendations in the form of a plan of coherent
actions for the EU and national organizations - A European model for the LRs of the next years
Ambitious!
27Outcomes of FLaReNet
- The outcomes will be of a directive nature
- to help the EC, and national funding agencies,
identifying priority areas of LRs of major
interest for the public that need public funding
to develop or improve - A blueprint of actions will constitute input to
policy development both at EU and national level - for identifying new language policies that
support linguistic diversity in Europe - in combination with strengthening the language
product market, e.g. for new products
innovative services, especially for less
technologically advanced languages
28These Initiatives, together
- Call for international cooperation also outside
Europe - and will be relevant for
- setting up a global worldwide Forum of Language
Resources and Language Technologies