Nicoletta Calzolari ILC - CNR - Pisa, Italy - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Nicoletta Calzolari ILC - CNR - Pisa, Italy

Description:

an essential component for the Semantic Web. Language ... AGENTIVE. TELIC. Created_by. cucinare. cuocere. arrostire. bollire. lessare. stufare. friggere ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 16
Provided by: nicole192
Category:
Tags: cnr | ilc | calzolari | italy | nicoletta | pisa

less

Transcript and Presenter's Notes

Title: Nicoletta Calzolari ILC - CNR - Pisa, Italy


1
Nicoletta CalzolariILC - CNR - Pisa, Italy
Language Resources Semantic Web
2
To make the Semantic Web a reality ...
  • need to tackle the twofold challenge of
  • content availability and
  • multilinguality
  • Natural convergence with HLT
  • multilingual semantic processing
  • ontologies
  • semantic-syntactic computational lexicons

3
Computational Multilingual Lexicons an
essential component for the Semantic Web
  • Language - lexicons - are the gateway to
    knowledge
  • Semantic Web developers need repositories of
    words terms - knowledge of their relations in
    language use ontological classification.
  • The cost of adding this structured and
    machine-understandable lexical information can be
    one of the factors that delays its full
    deployment.
  • The effort of making available millions of
    words for dozens of languages is something that
    no small group is able to afford.
  • A radical shift in the lexical paradigm - whereby
    many participants add linguistic content
    descriptions in an open distributed lexical
    framework - is required to make the Web usable

4
Infrastructure of Language Resources...
...static
  • Semantic network Euro-/ItalWordNet
  • Lexicons PAROLE/SIMPLE/CLIPS
  • TreeBank
  • sw

International Standards
But they will never be complete
dynamic
  • Lexical acquisition systems (syntactic
    semantic) from text corpora
  • Robust systems of morphosyntactic syntactic
    analysis
  • Word-sense disambiguation systems

5
Italian Semantic Network Italian module of
EuroWordNet (http//www.hum.uva.nl/ewn/)
  • 50.000 lemmas organized in synonym groups
    (synsets), structured in hierarchies linked by
    130.000 semantic relations
  • 50.000 hyperonymy/hyponymy relations
  • 16.000 relations among different POS (role,
    cause, derivation, etc..)
  • 2.000 part-whole relations
  • 1.500 antonymy relations, etc.
  • Synsets linked to the InterLingual Index
    (ILIPrinceton WordNet),
  • Through the ILI link to all the European WordNets
    (de-facto standard)
  • to the common Top Ontology
  • Possibility of plug-in with domain terminological
    lexicons
  • Usable in IR, CLIR, IE, QA, ...

6
Domain - Semantic class
mangiare
7
Domain - Semantic class
zucchero
mangiare
NATURAL_SUBSTANCE
alloro
tartufo
FLAVOURING
cucinare
cuocere
VEGETAL_ENTITY
mestolo
friggere
mangiare
cucinare
mangiare
mangiare
mangiare
mangiare
mangiare
cucinarecuocerearrostirebollirelessarestufa
re friggere rosolaregrigliare
bollire
mangiare
pentola
mangiare
friggitrice
carne
tavola
forchetta
ristorante
mela
posata
BUILDING
cuoco
carota
FURNITURE
coniglio
bollitore
FOOD
pesce
FRUIT
arrosto
VEGETABLES
pesciera
SUBSTANCE_FOOD
INSTRUMENT
CONTAINER
PROFESSION
ARTIFACT _FOOD
8
machine language learning
9
machine language learning
linguistic learning
development of conceptual networks
linguistic change models
language usage models
adaptive classification systems
information extraction
bootstrapping of lexical information
bootstrapping of grammars
10
Beyond MILE towards open distributed lexicons
Ontology URI http//www.zzz
Semantic Lexicon URI http//www.xxx
Syntactic Constructions URI http//www.yyy
Lex_object semFeature URI http//www.xxxHUMAN
Lex_object syntagmaNT URI http//www.zzzNP
Monolingual/Multilingual Lexicon
11
Target.. Multilingual
Knowledge Management
Technical Feasibility
  • Prerequisite is it an achievable goal a commonly
    agreed text/lexicon annotation protocol also for
    the semantic/conceptual level (to be able to
    automatically establish links among different
    languages)?
  • Yes, at the lexical level
  • More complex, for corpus annotation?

EAGLES/ISLE
12
A few Issues for discussionlexicon standards
  • Semantic Web standards and the needs of content
    processing technologies
  • importance of reaching consensus on (linguistic
    and non-linguistic) content, in addition to
    agreement on formats and encoding issues (words
    convey content knowledge)
  • short/medium term requirements wrt standards for
    multilingual lexicons content encoding, also
    industrial requirements
  • Relation with Spoken language community
  • MILE Asian languages how to cooperate
    concretely?
  • Define further steps necessary to converge on
    common priorities
  • .

13
A few Issues for discussioncontent,
priorities...
  • For which type of resources to invest? wrt short
    vs. medium term results?
  • Need for robust systems, able to acquire/tune
    lexical/linguistic (also multilingual) knowledge,
    to auto-enrich static basic resources?
  • What the relation betw. lexical standards and
    text annotation protocols?
  • Knowledge management is critical. For content
    interoperability, is the field mature enough to
    converge around agreed standards also for the
    semantic/conceptual level (e.g. to automatically
    establish links among different languages)?
  • Is the field of multilingual lexical resources
    ready to tackle the challenges set by the
    Semantic Web development?

Towards a new paradigm??
14
A new paradigm for LR?
  • Where the focus is on cooperation
  • New Strategic Vision?
  • towards a Distributed Open Lexical
    Infrastructure?
  • for distributed cooperative creation,
    management, etc. of Lexical Resources
  • technical organisational requirements

15
ELITE (expression of interest for the 6thFP)
European Lexical Infrastructure and Technology
Language Resources Semantic Web
  • New proposed paradigm for lexicon development
  • Open Distributed Lexical Infrastructure
  • for content description and content
    interoperability,
  • to make lexical resources usable within the
    emerging Semantic Web scenario
Write a Comment
User Comments (0)
About PowerShow.com