Nicoletta Calzolari - PowerPoint PPT Presentation

About This Presentation
Title:

Nicoletta Calzolari

Description:

... ehealth, intelligence, domotics, content industry, ) core Multilinguality EU Forum for for Focus on cooperation LRs & LTs exist, ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 29
Provided by: Nicoletta1
Category:

less

Transcript and Presenter's Notes

Title: Nicoletta Calzolari


1
CLARIN and FLaReNet new European Initiatives
for Language Resources and Language Technologies
  • Nicoletta Calzolari
  • Istituto di Linguistica Computazionale del CNR,
    Pisa, Italy
  • glottolo_at_ilc.cnr.it

2
Today, many vitality success signs for LRs
  • In Spoken, Written, Multimodal areas in
    new emerging areas
  • Statistical approaches
  • Different dimensions layers Content
    (Ontologies), Emotion, Time,
  • For Evaluation
  • For Training
  • LREC (gt 900 submissions) many LRs at COLING and
    even at ACL!!
  • ELRA (self-sustaining) LDC
  • LRE (new Journal N. Ide NC)
  • ISO-TC37-SC4/WG4 (International Standards for
    LRs)
  • AFNLP
  • ESFRI - CLARIN (also political strategic role)
  • New calls or initiatives in EU, US, ASIA, on LRs,
    interoperability, cooperation,

3
BUT an important point
  • In the 90s
  • There was a global vision of the field its main
    components
  • Standards
  • Creation of LRs
  • Distribution
  • Then
  • Automatic acquisition

towards the Infrastructure of LRs LT
ELRA
LDC
  • While today
  • There is an ever increasing set of initiatives
    for new LRs, basic robust technologies, models??,
    algorithms,
  • We have a LR community culture
  • BUT sort of scattered, opportunistic, not much
    coherence

4
Today
  • The wealth of data of basic technologies is
    such that
  • We should reflect again at the field as a whole
    ask if
  • Standards
  • Creation of LRs
  • Automatic acquisition
  • Distribution
  • are still the important components,
  • or how they have changed/must change
  • Content interoperability
  • Collaborative creation Manag.
  • Dynamic LRs
  • Sharing

could be at the basis of a new Paradigm for LRs
LT of a new Infrastructure ??
Which new challenges towards a new more
mature infrastructure of LRs LTs??
5
ISO LMF Lexical Markup Framework
Builds also on EAGLES/ISLE
Structural skeleton, with the basic hierarchy of
information in a lexical entry
various extensions
LMF specs comply with modeling UML principles an
XML DTD allows implementation
NEDO Asian Lang.
The field is mature
NICT Language-Grid Service Ontology
from Monica Monachini
6
XML based Abstract Lexicon Interchange Format
Mapping exercise
Entries from existing lexicons have been mapped
to LMF to prove that the model is able to
represent many best practices and achieve
unification
  • Major best practices
  • OLIF
  • PAROLE/SIMPLE
  • LC-Star
  • WordNet - EuroWordNet
  • FrameNet
  • BDef formal database of lexicographic definitions
    derived from Explanatory Dictionary of
    Contemporary French
  • others on the way

from Monica Monachini
7
Lexical WEB Content Interoperability ?
Standards
  • As a critical step for semantic mark-up in the
    SemWeb

WordNets
NomLex
WordNets
ComLex
WordNets
with intelligent agents
SIMPLE
LMF
Lex_x
FrameNet
Lex_y
Standards for Interoperability
Enough??
8
Need of tools to make this vision operational
concrete
  • New prototype LeXFlow
  • (http//xmlgroup.iit.cnr.it98/MILE/lexflow/demo.x
    html)
  • web-based collaborative environment for
    semi-automatic management/integration of lexical
    resources
  • enabling interoperability of distributed lexical
    resources
  • accessed by different types of agents
  • From Language Resources
  • To Language Services

9
Architecture for cooperative integration of
lexicons
Agent Role3
Agent Role1
Agent Role4
Agent Role2
Coordination
Web service Interface
Simple-Wordnet Relation Calculator
Application
MultiWordnet Relation Calculator
Web service Interface
Italian Simple
Italian Wordnet
Chinese Wordnet
ILI Mapper
Relation Mapper
Data
10
parte, tratto N12348
iperonimia/HYP
A new proposed mero relation
passaggio, strada,via N1290
meronimy/MPT
curvatura, svolta,curva N20944
iponimia/HPO
carreggiata N21225
Synonym
Derived
ILI1.5-3001757-n road,route ILI1.6-3243979-n
ILI1.5-8488101-n bend,crook,turn ILI1.6-9992072-n
ILI1.5-2857000-n passage ILI1.6-3092396-n
ILI1.5-5691718-n stretch ILI1.6-???
ILI1.5-3002522-n roadway ILI1.6-3245327-n
Synonym
Reinforcement validity
tong_dao (?? ) N03092396
??(??)?_? /HYP
che_dao (?? ) N3245327
dao_lu,dao,lu (??,? ,? ) N03243979
??(??)?_? /HPO
wan (? ) N9992072
??_???_? /MPT
11
LexFlow
  • Architecture for making distributed wordnets
    interoperable
  • It lends itself to different applications in LR
    processing
  • Enrichment of existing lexical resources
  • Creation of new resources
  • Validation of existing resources
  • Can provide a platform for cooperative
    collective creation management of LRs, by
    providing a web-based environment for the
    collaboration interaction of distributed agents
    and resources
  • Prototype of a web application supporting the
    GlobalWordNet Grid initiative, i.e. a shared
    multi-lingual knowledge base for cross-lingual
    processing based on distributed resources over
    the Grid

New projectKYOTO
12
Some steps for a new generation of LRs
  • From huge efforts in building static,
    large-scale, general-purpose LRs
  • To non-static LRs rapidly built on-demand,
    tailored to spefic user needs
  • From closed, locally developed and centralized
    resources
  • To LRs residing over distributed places,
    accessible on the web, choreographed by agents
    acting over them
  • From Language Resources
  • To Language Services

13
UIMA at ILC
  • Create an infrastructure to allow
  • Distributed access to resources
  • Creation of shared resources
  • Use of methods to access NLP technologies
  • Integrate available software via Web Services
  • Standardise resources to be accessed from other
    research centers

14
Distributed Language Services
  • A long-term scenario implying
  • content interoperability standards,
  • supra-national cooperation and
  • development of architectures enabling
    accessibility
  • Create new resources on the basis of existing
  • Exchange and integrate information across
    repositories
  • Compose new services on demand
  • Collaborative collective/social development and
    validation, cross-resource integration and
    exchange of information

Language Grid
Wiki
15
Many dimensions around the notion of language
finally
  • We need to put together
  • technical,
  • organisational,
  • strategic,
  • economic,
  • political issues of LRs

Two new European Infrastructural Networking
Initiatives
Multilingualism
Political issues e.g. a commonly agreed list of
minimal requirements for national LRs BLARK
Need of bodies for a broad research agenda
strategic actions for LTLRs (W/S /MM) based on
all the dimensions
Interdisciplinarity Multidisciplinarity
  • Cultural issues
  • Language and cultural identity
  • Language and the Humanities
  • Economic,
  • social issues
  • Applications
  • Services

Technical issues
16
Which Communities?
Technologies exist, but the infrastructure that
puts them together and sustains them is still
missing
for
  • Humanities
  • Social Sciences
  • Digital Libraries
  • Cultural Heritage

core
  • Language Resources
  • Language Technologies
  • Standardisation

Enabling infrastr
CLARIN ResInfra
FLaReNet Network
Multilinguality
on
  • Grid
  • Semantic Web
  • Ontologists
  • ICT

Focus on cooperation
  • Many application domains
  • (eculture, egovernment, ehealth, )

for
17
CLARIN
ESFRI Research Infrastructures
Common Language Resources and Technologies
Infrastructure for the Humanities Social
Sciences
  • Large-scale pan-European collaborative effort
    (31 countries)
  • Make LRs LTs available readily usable to
    scholars of humanities social sciences ( all
    disciplines)
  • Need to overcome the present fragmented situation
    by harmonising structural and terminological
    differences
  • Basis is a Grid-type infrastructure and Semantic
    Web technology
  • The benefits of computer enhanced language
    processing become available only when a critical
    mass of coordinated effort is invested in
    building an enabling infrastructure, which can
    provide services in the form of provision of
    tools resources as well as training
    counseling across a wide span of domains
  • The infrastructure will be based on a number of
    resource, service and expertise centres

18
CLARIN Mission
  • Create a comprehensive and free to use
    distributed archive of LRs LTs covering not
    only the languages of all member states, but also
    other languages studied and used in Europe
  • Through the fact that the tools resources will
    be interoperable across languages domains,
    contribute to preserving and supporting
    multilingual multicultural European heritage
  • An operational open infrastructure of web
    services will introduce a new paradigm of
    distributed collaborative development
  • Allow many contributors to add all kinds of new
    services based on existing ones, thus ensuring
    reusability and allowing scaling up to suit
    individual needs

19
How can we tackle these challenges?
  • J. Taylor
  • eScience is about global collaboration in
  • key areas of science and the next generation
  • of infrastructures that will enable it
  • Need to build new types of platforms
  • to allow researchers to combine existing
    resources easily to new ones to tackle the big
    challenges
  • to increase the productivity of all interested
    researchers, since currently too much time is
    wasted by preparatory work

from P. Wittenburg
20
  • eScience Vision
  • CLARIN establishes such a new generation of
    extended infrastructure
  • Thus CLARIN is not about creating and building
    new language resources and technology, but
  • making them available and accessible
  • as services
  • in a stable and persistent infrastructure
  • to allow tackling the great challenges
  • CLARIN http//www.clarin.eu
  • Grid Project http//www.mpi.nl/dam-lr
  • ISO TC37/SC4 http//www.tc37sc4.org
  • Standards Project http//lirics.loria.fr/

from P. Wittenburg
21
We have still a long path
also a new project
  • in an e-Contentplus Call for a
  • Thematic Network on Language Resources
  • FLaReNet
  • To provide common recommendations (to the EC) for
    future actions
  • To give priorities
  • Need of visions

In a global context, in cooperation with CLARIN
also with non-EU members
22
Which Communities?
LRs LTs exist, but a global vision, policy and
strategy is still missing
for
  • Humanities
  • Social Sciences
  • Digital Libraries
  • Cultural Heritage

core
  • Language Resources
  • Language Technologies
  • Standardisation
  • Ontologists
  • Content

CLARIN ResInf
EU Forum
FLaReNet Network
Multilinguality
Focus on cooperation
for
  • EC
  • Funding agencies
  • Many application domains
  • (eculture, egovernment, ehealth,
    intelligence, domotics, content industry, )

for
23
FLaReNet Fostering Language Resources Network
  • A European forum
  • to facilitate interaction among LR stakeholders
  • The Network structure considers that LRs present
    various dimensions and must be approached from
    many perspectives
  • technical, but also
  • organisational
  • economic
  • legal
  • political
  • Addresses also
  • multicultural and multilingual aspects, essential
    when facing access and use of digital content in
    todays Europe

24
Organised in Thematic Working Groups
  • A layered structure, with leading experts
    groups (national and European institutions, SMEs,
    large companies) for all relevant LR areas (about
    40 partners)
  • in collaboration with CLARIN
  • to ensure coherence of LR-related efforts in
    Europe
  • FLaReNet will
  • consolidate existing knowledge, presenting it
    analytically and visibly
  • contribute to structuring the area of LRs of the
    future by discussing new strategies to
  • convert existing and experimental technologies
    related to LRs into useful economic and societal
    benefits
  • integrate so far partial solutions into broader
    infrastructures
  • consolidate areas mature enough for
    recommendation of best practices
  • anticipate the needs of new types of LRs

25
Thematic Areas
  • The Chart for the area of LRs in its different
    dimensions
  • Methods and models for LR building, reuse,
    interlinking and maintenance
  • Harmonisation of formats and standards
  • Definition of evaluation protocols and evaluation
    procedures
  • Methods for the automatic construction and
    processing of LRs
  • To build together
  • Evolving RoadMap
  • Blueprint of actions and infrastructures

26
Objectives expected results
  • The largest Network of LR and HLT players, with
    diverse approaches, efforts and technologies
  • Enable progress toward community consensus
  • Give an extended picture of LRs recast its
    definition in the light of recent scientific,
    methodological, technological, social
    developments
  • Consolidate methods approaches, common
    practices, frameworks and architectures
  • A roadmap identifying areas where consensus has
    been achieved or is emerging vs. areas where
    additional discussion and testing is required,
    together with an indication of priorities
  • Recommendations in the form of a plan of coherent
    actions for the EU and national organizations
  • A European model for the LRs of the next years

Ambitious!
27
Outcomes of FLaReNet
  • The outcomes will be of a directive nature
  • to help the EC, and national funding agencies,
    identifying priority areas of LRs of major
    interest for the public that need public funding
    to develop or improve
  • A blueprint of actions will constitute input to
    policy development both at EU and national level
  • for identifying new language policies that
    support linguistic diversity in Europe
  • in combination with strengthening the language
    product market, e.g. for new products
    innovative services, especially for less
    technologically advanced languages

28
These Initiatives, together
  • Call for international cooperation also outside
    Europe
  • and will be relevant for
  • setting up a global worldwide Forum of Language
    Resources and Language Technologies
Write a Comment
User Comments (0)
About PowerShow.com