Diapositiva 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Diapositiva 1

Description:

Availability of LRs also a 'sensitive' issue, touching the ... ISLE & spoken language & multimodality (Gibbon) Metadata for the lexicon (Peters, Wittenburg) ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 130
Provided by: monicamo
Category:

less

Transcript and Presenter's Notes

Title: Diapositiva 1


1
Infrastructural Language Resources Standards
for Multilingual Computational Lexicons Nicoletta
Calzolari with many others Istituto di
Linguistica Computazionale - CNR -
Pisa glottolo_at_ilc.cnr.it
2
The ENABLER Mission
  • Language Resources (LRs) Evaluation central
    component of the linguistic infrastructure
  • LRs supported by national funding in National
    Projects
  • Availability of LRs also a sensitive issue,
    touching the sphere of linguistic and cultural
    identity, but also with economical and political
    implications
  • The ENABLER Network of National initiatives,
    aims at enabling the realisation of a
    cooperative framework
  • formulate a common agenda of medium- long-term
    research priorities
  • contribute to the definition of an overall
    framework for the provision of LRs

3
towards .
  • Only
  • Combining the strengths of different initiatives
    communities
  • Exploiting at best the modus operandi of the
    national funding authorities in different
    national situations
  • Responding to/anticipating needs and priorities
    of RD industrial communities
  • Promoting the adoption of de facto standards,
    best practices
  • With a clear distinction of tasks roles for
    different actors
  • We can produce the
  • synergies, economy of scale, convergence
    critical mass
  • necessary to provide the infrastructural LRs
    needed to realise the full potential of a
    multilingual global information society

4
Lexicon and Corpusa multi-faceted interaction
  • L ? C tagging
  • C ? L frequencies (of different linguistic
    objects)
  • C ? L proper nouns, acronyms,
  • L ? C parsing, chunking,
  • C ? L training of parsers
  • C ? L lexicon updating
  • C ? L collocational data (MWE, idioms, gram.
    patterns ...)
  • C ? L nuances of meanings semantic clustering
  • C ? L acquisition of lexical (syntactic/semantic)
    knowledge
  • L ? C semantic tagging/word-sense disambiguation
  • ?
    (e.g. in Senseval)
  • C ? L more semantic information on LE
  • C ? L corpus based computational lexicography
  • C ? L validation of lexical models
  • C ? L
  • L ? C ...

5
...Language as a Continuum
Lexicon Corpus as two viewpoints on the same
ling. object . even more in a multilingual
context
  • Interesting - and intriguing - aspects of corpus
    use
  • impossibility of descriptions based on a
    clear-cut boundary betw. what is admitted and
    what is not
  • in actual usage, language displays a large number
    of properties behaving as a continuum, and not as
    properties of yes/no type
  • the same is true for the so-called rules, where
    we find more a tendency towards rules than
    precise rules in corpus evidence
  • difficult to constrain word meaning within a
    rigorously defined organisation by its very
    nature it tends to evade any strict boundary

  • BUT

6
Extraction from texts vs.formal representation
in lexicons
  • It is difficult to constrain word meaning within
    a rigorously defined organisation by its very
    nature it tends to evade any strict boundary
  • The rigour and lack of flexibility of formal
    representation languages causes difficulties when
    mapping into it NL word meaning, ambiguous and
    flexible by its own nature
  • No clear-cut boundary when analysing many
    phenomena its more a continuum
  • The same impression if one looks at examples of
    types of alternations
  • no clear-cut classes across languages
  • or within one language

7
Correlation between different levels of
linguistic description in the design of a
lexical entry
  • To understand word-meaning
  • Focus on the correlation between syntactic and
    semantic aspects
  • But other linguistic levels - such as morphology,
    morphosyntax, lexical cooccurrence, collocational
    data, etc. - are closely interrelated/involved
  • These relations must be captured when accounting
    for meaning discrimination
  • The complexity of these interrelationships makes
    semantic disambiguation such a hard task in NLP
  • Textual corpora as a device to discover and
    reveal the intricacy of these relationships
  • Frame/SIMPLE semantics as a device to unravel and
    disentangle the complex situation into elementary
    and computationally manageable pieces

8
towards Corpus based Semantic Lexicons at least
in principle
  • both in the design of the model ,
  • in the building of the lexicon (at least
    partially)
  • with (semi-)automatic means
  • Design of the lexical entry with a combined
    approach
  • theoretical e.g. Fillmore Frame Semantics/
  • Pustejovsky Generative Lexicon,
  • empirical Corpus evidence
  • even if not always there are sound and explicit
    criteria for classification according to frame
    elements/qualia relations/...

9
Infrastructure of Language Resources...
...static
  • Semantic networks Euro-/ItalWordNet
  • Lexicons PAROLE/SIMPLE/CLIPS
  • TreeBanks

International Standards
But they will never be complete
dynamic
  • Lexical acquisition systems (syntactic
    semantic) from corpora
  • Infrastructure of tools
  • Robust morphosyntactic syntactic analysers
  • Word-sense disambiguation systems
  • Sense classifiers
  • ...

10
ItalWordNet Semantic Network Italian module of
EuroWordNet
  • 50.000 lemmas organized in synonym groups
    (synsets), structured in hierarchies linked by
    130.000 semantic relations
  • 50.000 hyperonymy/hyponymy relations
  • 16.000 relations among different POS (role,
    cause, derivation, etc..)
  • 2.000 part-whole relations
  • 1.500 antonymy relations, etc.
  • Synsets linked to the InterLingual Index
    (ILIPrinceton WordNet),
  • Through the ILI link to all the European WordNets
    (de-facto standard)
  • to the common Top Ontology
  • Possibility of plug-in with domain terminological
    lexicons
  • (legal, maritime)
  • Usable in IR, CLIR, IE, QA, ...

11
EuroWordNet Multilingual Data Structure
12
TOP ConceptsObject,Artifact,Building
Hyperonym edificio,..
home, domicile, .. house
Casa, abitazione, dimora
Role_location stare, abitare, ...
Hyponym villetta catapecchia, bicocca, ..
cottage bungalow
Role_target_direction rincasare
Role_patient affitto, locazione
Mero_part vestibolo
stanza
Synsets linked by Semantic Relations in
ItalWordNet
Holo_part casale frazione
caseggiato
13
Jur-WordNet
  • With ITTG-CNR (Istituto di Teoria e Tecniche
    dellinformazione Giuridica)
  • Jur-WordNet ð Extension for the juridical domain
    of ItalWordNet
  • Knowledge base for multilingual access to sources
    of legal information
  • Source of metadata for semantic mark-up of legal
    texts
  • To be used, together with the generic
    ItalWordNet, in applications of Information
    Extraction, Question Answering, Automatic
    Tagging, Knowledge Sharing, Norm Comparison, etc.

14
Terminological Lexicon of Navigation Sea
Transportation
ð Nolo
Synsets ð 1.614 Lemmas ð 2.116 Senses ð
2.232 Nouns ð 1.621 Verbs ð 205 Adjectives ð
35 Proper Nouns ð 236
15
PAROLE/SIMPLE 12 harmonised computational lexicons
http//www.ilc.cnr.it/clips/
SIMPLE Ital. Sem. Lex. 98-2000
PAROLE Ital. Synt. Lex. 96-98
SGML
SGML
semantics 10,000 senses
morphology 20,000 entriessyntax
20,000 words
CLIPS 2000-2004
phonologymorphology 55,000 words syntax
semantics 55,000 senses
XML
16
machine language learning
17
machine language learning
linguistic learning
development of conceptual networks
linguistic change models
language usage models
adaptive classification systems
information extraction
bootstrapping of lexical information
bootstrapping of grammars
18
Architecture for linguistic knowledge acquisition
...
terminology
unstructured text data
annotation tools
LKG
cross-lingual information retrieval
annotated data
lexica
lexica
multi-lingual information extraction
machine learning for linguistic knowledge
acquisition
lexicon model
user needs
multi-lingual text mining
. towards dynamic lexicons, able to auto-enrich
19
HarmonisationMore more Need of a Global
Viewfor Global Interoperability
  • Integration/sharing of data software/tools
  • Need of compatibility among various components
  • An exemplary cycle
  • Formalisms
  • Grammars
  • Software Taggers,
  • Chunkers, Parsers,
  • Representation
    Annotation
  • Lexicon

    Corpora
  • Terminology
  • Software
  • Acquisition Systems
  • I/O Interfaces

Languages
20
A short guide to ISLE/EAGLES http//www.ilc.cnr
.it/EAGLES96/isle/ISLE_Home_Page.htm
  • Multilingual Computational Lexicon
  • Working Group

21
Target the Multilingual ISLE Lexical
Entry (MILE)
  • General methodological principles (from EAGLES)
  • high granularity factor out the (maximal) set of
    primitive units of lexical info (basic notions)
    with the highest degree of inter-theoretical
    agreement
  • modular and layered various degrees of
    specification possible
  • explicit representation of info
  • allow for underspecification ( hierarchical
    structure)
  • leading principle edited union of existing
    lexicons/models (redundancy is not a problem)
  • open to different paradigms of multilinguality
  • oriented to the creation of large-scale
    distributed lexicons

22
Paths to Discover theBasic Notions of MILE
  • clues in dictionaries to decide on target
    equivalent
  • guidelines for lexicographers
  • clues (to disambiguate/translate) in corpus
    concordances
  • lexical requirements from various types of
    transfer conditions actions in MT systems
  • lexical requirements from interlingua-based
    systems

23
Designing MILESteps towards MILE
  • Creating entries (Bertagna, Reeves, Bouillon)
  • Identifying the MILE Basic Notions
    (Bertagna,Monachini,Atkins,Bouillon)
  • Defining the MILE Lexical Model (Lenci,
    Calzolari, etc.)
  • Formalising MILE (Ide)
  • Development of the ISLE Lexical Tool (Bel)
  • ISLE spoken language multimodality (Gibbon)
  • Metadata for the lexicon (Peters, Wittenburg)
  • A case-study MWEs in MILE (Quochi, lenci,
    Calzolari)
  • the MILE Basic Notions
  • the MILE Lexical Model

24
The MILE Basic Notions (the EAGLES/ISLE CLWG)
  • Basic lexical dimensions info-types relevant to
    establish multilingual links
  • Typology of lexical multilingual correspondences
    (relevant conditions actions)
  • Identified by
  • creating sample multilingual lexical entries
    (Bertagna, Reeves)
  • investigating the use of sense indicators in
    traditional bilingual dictionaries (Atkins,
    Bouillon)
  • .

25
The MILE Lexical Classes Data Categories for
Content Interoperability
  • Francesca Bertagna, Alessandro Lenci, Monica
    Monachini, Nicoletta Calzolari
  • ILCCNR Pisa
  • Pisa University

26
Overview
  • MILE Lexical Model with Lexical Objects and Data
    Categories
  • Mapping of existing lexicons onto MILE
  • RDF schema and DC Registry for some
    pre-instantiated lexical objects together with a
    sample entry from the PAROLE-SIMPLE lexicons in
    MILE
  • Future

27
The MILE Lexical Model
Guidelines syntactic semantic lexicons
GENELEX Model
PAROLE-SIMPLE Lexicons
Multilingual Lexicons (EuroWordNet, etc.)
where after?
MILE Lexical Model
28
The MILE Main Features
  • A general architecture devised as a common
    representational layer for multilingual
    Computational Lexicons
  • both for hand-coded and corpus-driven lexical
    data
  • Key features
  • Modularity
  • Granularity
  • Extensibility and openess - User-adaptability
  • Resource Sharing
  • Content Interoperability
  • Reusability

Semantic Web technologies standards applied at
Lexicon modelling
29
The MILE Lexical Model (MLM)
  • The MLM core is the Multilingual ISLE Lexical
    Entry (MILE)
  • a general schema for multilingual lexical
    resources
  • a lexical meta-entry as a common representational
    layer for multilingual lexicons
  • Computational lexicons can be viewed as different
    instances of the MILE schema

MILE Lexical Model
lexicon2
lexicon1
lexicon3
30
MILEthe building-block model
  • The MILE architecture is designed according to
    the building-block model
  • Lexical entries are obtained by combining various
    types of lexical objects (atomic and complex)
  • Users design their lexicon by
  • selecting and/or specifying the relevant lexical
    objects
  • combine the lexical objects into lexical entries
  • Lexical objects may be shared
  • within the same lexicon (intra-lexicon
    reusability)
  • among different lexicons (inter-lexicon
    reusability)

31
MILEthe building-block model
32
Modularity in MILE
Horizontal organization, where independent, but
interlinked, modules allow to express different
dimensions of lexical entries
multi-MILE
multilingual correspondence conditions
multiple levels of modularity
33
The Mono-MILE
  • Each monolingual layer within Mono-MILE
    identifies a basic unit of lexical description

SemU
basic unit to describe the semantic properties of
the MU
semantic layer
basic unit to describe the syntactic behaviour of
the MU
SynU
syntactic layer
basic unit to describe the inflectional and
derivational morphological properties of the word
MU
morphological layer
34
The Mono-MILE
MU
Within each layer, a basic linguistic information
unit is identified
35
Granularity in MILE
  • Concerns the vertical dimension. Within a given
    lexical layer, varying degrees of depth of
    lexical descriptions are allowed, both shallow
    and deep lexical representations

36
Defining the MLM
  • The MLM is designed as an E-R model (MILE Entry
    Schema)
  • defines the lexical objects and the ways they can
    be combined into a lexical entry
  • The MLM includes 3 types of lexical objects
  • MILE Lexical Classes (MLC)
  • MILE Lexical Data Categories (MDC)
  • MILE Lexical Operations (MLO)

37
The MILE Lexical Objects
  • Within each layer, basic lexical notions are
    represented by lexical objects
  • MILE Lexical Classes MLC
  • MILE Data Categories MDC
  • Lexical operations
  • They are an ontology of lexical objects as an
    abstraction over different lexical models and
    architectures

38
The MILE E/R diagrams
  • The lexical objects are described with E-R
    diagrams which define them and the ways they can
    be combined into a lexical entry

39
MILE Lexical Objects Syntactic Layer
hasSyntacticFrame
MLCSyntacticFrame
1..
MLCSynU
hasFrameSet
MLCFrameSet

composedby
MLCComposition

correspondTo
MLCSemU

MLCCorrespSynUSemU
40
expanding one node.

SynU

SyntacticFrame
Construction
Self
Slot
Slot
Function
Phrase
41
MILE Lexical Objects Semantic Layer
belongsToSynset
MLCSynset

MLCSemU
hasSemFrame
MLCSemanticFrame
0..1
hasSemFeature
MLCSemanticFeature

hasCollocation
MLCCollocation

semanticRelation
MLCSemU

MLCSemanticRelation
42
MILE Lexical Objects Synt-Sem Linking
hasSourceSynu
MLCSynU
MLCCorrespSynUSemU
1
hasTargetSemu
MLCSemU
1
MLCPredicativeCorresp
hasPredicativeCorresp
1
MLCSlotArgCorresp
IncludesSlotArgCorresp
0..
43
Syntax-Semantics Linking
CorrespSynUSemU
PredCorresp
Slot0Arg1 Slot1Arg0
44
Syntax-Semantics Linking
John gave the book to Mary John gave Mary the book
SynU1
SemU1
obj_NP
obl_PP_to
subj_NP
Semantic_FrameGIVE
Arg2 Theme
Arg3 Goal
Arg1 Agent
SynU2
obj_NP
obj_NP
subj_NP
45
Syntax-Semantic Linking in SIMPLE
SynU_migliorare
Intransitive structure
Slot0 Ø
Transitive structure Slot0 Slot1
Frameset

CorrespSynUSemU
CorrespSynUSemU
isomorphic
non-isomorphic
SlotArgCorresp
SlotArgCorresp
PRED_ migliorare
ARG0Agent
ARG1Patient
SemU1_migliorare
SemU2_migliorare
CAUSE_CHANGE_OF_STATE
CHANGE_OF_STATE
46
The Multilingual layer
hasMUMUCorr
MUMUCorresp
1..0
MultiCorresp
hasSynUSynuCorr
SynUSynUCorresp
1..0
hasSemUSemUCorr
SemUSemUCorresp
1..0
hasSynsetMultCorr
SynsetMultCorresp
1..0
hasSemFrameCorr
SemanticFrameMultCorresp
1..0
47
MILE approach to multilinguality
  • Open to various approaches
  • transfer-based
  • monolingual descriptions are used to state
    correspondences (tests and actions) between
    source and target entries
  • interlingua-based
  • monolingual entries linked to
    language-independent lexical objects (e.g.
    semantic frames, primitive predicates, etc.)

48
The Multi-MILE
  • Multi-MILE specifies a formal environment to
    express multilingual correspondences between
    lexical items
  • Source and target lexical entries can be linked
    by exploiting (possibly combined) aspects of
    their monolingual descriptions
  • monolingual lexicons act as pivot lexical
    repositories, on top of which language-to-language
    multilingual modules can be defined

49
The Multi-MILE
  • Multi-MILE may include
  • Multlingual operations to establish transfer
    links between source and target mono-MILE
  • Multlingual lexical objects
  • enrich the source and target lexical
    descripotions, but
  • do not belong to the monolingual lexicons
  • Language-independent lexical objects
  • Primitive semantic frames, interlingual
    synsets, etc.
  • Relevant for interlingua approaches to
    multilinguality

50
Multi-MILE
IT_SemU_2 ? En_SemU_1 IT_SynU_2 ?
En_SynU_1 IT_Slot_0 ?EN_Slot_1 IT_Slot_1 ?
EN_Slot_0
AddFeature to source SemU HUMAN
AddSlot to target SynU MODIF PP_with
51
Multi-MILE
IT Lexicon
EN Lexicon
multilingual conditions
finger
modif(mano)
dito
modif(piede)
toe
multilingual conditions
entrare to enter
run PP_into
PP_di_corsa
52
MILE Lexical Classes
  • Represent the main building blocks of lexical
    entries
  • Formalize the MILE Basic Notions
  • Define an ontology of lexical objects
  • represent lexical notions such as semantic unit,
    syntactic feature, syntactic frame, semantic
    predicate, semantic relation, synset, etc.
  • Similar to class definitions in OO languages
  • specify the relevant attributes
  • define the relations with other classes
  • hierarchically structured

53
MILE Lexical Classesan ontology of lexical
objects
54
MILE Lexical Data Categories
  • MDC are instances of the MILE lexical Classes
  • Can be used off the shelf or as a departure
    point for the definition of new or modified
    categories
  • Enable modular specification of lexical entities
    using all or parts of the lexical information in
    the repository
  • Each MDC respresents a resource
  • uniquely identified by a URI
  • Two types of MDC
  • Core MDC
  • belong to shared repositories (Lexical Data
    Category Registry)
  • lexical objects and linguistic notions with wide
    consensus
  • User Defined MLDC
  • user-specific or language specific lexical
    objects

55
The MILE Data Categories
  • Instances of the MILE Lexical Classes are Data
    Categories
  • MDC can belong to a shared repository or be
    user-defined

MLC
Core MDC
User-defined MDC
56
The MILE Data Categories User-adaptability and
extensibility
MLCSemanticFeature
instance_of
Core
HUMAN ARTIFACT EVENT ANIMAL GROUP
AGE MAMMAL
UserDefined
57
MILE Lexical Data Categories
MLMFeature
MLMGrammaticalFunction
58
MILE Lexical Operations
  • They are used to state conditions and perform
    operations over lexical entries
  • Link syntactic slots and semantic arguments
  • Constrain the syntax-semantic link
  • Express tests and actions in the transfer
    conditions in the multi-MILE
  • They provide the glue to link various
    independent intra-lexical and inter-lexical
    components

59
Multilingual Operations
  • Source-to-target language transfer conditions can
    be expressed by combining multilingual operations
  • Three types of multingual operations
  • Multilingual correspondences
  • Link a source lexical object (MU, SemU, SynU,
    semantic argument, syntactic slot) and a target
    lexical object (MU, SemU, SynU, semantic
    argument, syntactic slot)
  • Add-operations
  • Add lexical information relevant for the
    cross-lingual link, but not present in the source
    or target mono-MILE
  • Constrain-operations
  • Constrain the transfer link to some portions of
    source and target mono-MILE

60
Defining the MLM
MILE Entry Schema
MILE Lexical Classes
RDF/S Descriptions
61
RDF Instantiation of the MLM
Lexicon2
Resources
Lexicon1
Lexicon3
Metadata
Lexical Objects
Resources
Lexical Classes
Lexical Data Categories
62
MILE Lexical Model
  • Ideal structure for rendering in RDF
  • hierarchy of lexical objects built up by
    combining atomic data categories via clearly
    defined relations
  • Proof of concept
  • Create an RDF schema for the MILE Lexical Model
  • version 1.2
  • Instantiate MILE Lexical Data Categories

63
User-Adaptability and Resource Sharing in MILE
  • Compatible with different models of lexical
    analysis
  • Relational semantic models (e.g. WordNet)
  • Syntactic and semantic frames
  • Ontology-based lexicons
  • Compatible with different degrees of
    specification
  • Deep lexical representations (e.g. PAROLE-SIMPLE)
  • Terminological lexicons
  • Compatible with different paradigm of
    multilinguality
  • Lexicons for Transfer Based MT
  • Interlingua-based lexicons

64
The MILE Lexical Model
MILE Lexical Model
65
RDF Instantiation of the MLM
  • Enable universal access to sophisticated
    linguistic info
  • Provide means for inferencing over lexical info
  • Incorporate lexical information into the Semantic
    Web
  • W3C standards
  • Resource Definition Framework (RDF)
  • Ontology Web Language (OWL)
  • Built on the XML web infrastructure to enable the
    creation of a Semantic Web
  • web objects are classified according to their
    properties
  • semantics of relations (links) to other web
    objects precisely defined

66
The RDF Schema
  • Defines classes of objects (MLC) and their
    relations to other objects
  • Like a class definition in Java, etc.
  • Classes and properties in the schema correspond
    to the E-R model
  • Can specify sub-classes/sub-properties and
    inheritance

67
Goals
  • Lexical information will form a central component
    of semantic information
  • Need a standardized, machine processable format
    so that information can be used, merged with
    others
  • Main task get the data model right

See Semantic Web
68
Advantages of RDF
  • Modularity
  • Can create instances of bits of lexical
    information for re-use in a single lexicon or
    across lexicons
  • Instances can be stored in a central repository
    for use by others
  • Can use partial information or all of it
  • Building block approach to lexicon creation
  • Web-compatible
  • RDF instantiation will integrate into Semantic
    Web
  • Inferencing capabilities

69
Example
  • Three parts
  • RDF Schema for lexical entries
  • Defines classes and properties, sub-classes, etc.
  • Sample repository of RDF-instantiated lexical
    objects
  • Three levels of granularity
  • Sample lexicon entries
  • Use repository information at different levels

70
Sample Repositories
  • repository of enumerated classes for lexical
    objects at the lowest level of granularity
  • definition of sets of possible values for various
    lexical objects
  • repository of phrases for common phrase types,
    e.g., NP, VP, etc.
  • repository of constructions for common syntactic
    constructions

71
ltrdfsClass rdfabout"http//www.cs.vassar.edu/i
de/rdf/isle-enumerated-classesFunctionType"gt ltowl
oneOfgt ltrdfSeqgt ltrdfligtSubjlt/rdfligt
ltrdfligtObjlt/rdfligt
ltrdfligtComplt/rdfligt ltrdfligtArglt/rdfligt
ltrdfligtIobjlt/rdfligt
lt/rdfSeqgt lt/owloneOfgt lt/rdfsClassgt
ltrdfsClass rdfabout"http//www.cs.vassar.edu/
ide/rdf/isle-enumerated-classesSynFeatureName"gt
ltowloneOfgt ltrdfSeqgt
ltrdfligttenselt/rdfligt ltrdfligtgenderlt/rdf
ligt ltrdfligtcontrollt/rdfligt
ltrdfligtpersonlt/rdfligt ltrdfligtauxlt/rdfli
gt lt/rdfSeqgt lt/owloneOfgt lt/rdfsClassgt
ltrdfsClass rdfabout"http//www.cs.vassar.edu/
ide/rdf/isle-enumerated-classesSynFeatureValue"gt
ltowloneOfgt ltrdfSeqgt
ltrdfligthavelt/rdfligt ltrdfligtbelt/rdfligt
ltrdfligtsubject_controllt/rdfligt
ltrdfligtobject_controllt/rdfligt
ltrdfligtmasculinelt/rdfligt
ltrdfligtfemininelt/rdfligt lt/rdfSeqgt lt/owloneO
fgt lt/rdfsClassgt
Enumerated classes
72
Sample LDCR for a Phrase Object
ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnsrdfs"http//www.w3.
org/2000/01/rdf-schema"
xmlnsmlc"http//www.cs.vassar.edu/ide/rdf/isle-
schema-v.6"gt ltPhrase rdfID"NP"
rdfslabel"NP"/gt ltPhrase rdfID"Vauxhave"gt
lthasSynFeaturegt ltSynFeaturegt
lthasSynFeatureName rdfvalue"aux"/gt
lthasSynFeatureValue rdfvalue"have"/gt
lt/SynFeaturegt lt/hasSynFeaturegt lt/Phrasegt
lt/rdfRDFgt
73
Sample LDCR entry for a Construction object
ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnsrdfs"http//www.w3
.org/2000/01/rdf-schema"
xmlns"http//www.cs.vassar.edu/ide/rdf/isle-sche
ma-v.6"gt ltConstruction
rdfID"TransIntrans"gt ltslotgt
ltSlotRealization rdfID"NPsubj"gt
lthasFunction rdfvalue"Subj"/gt
ltfilledBy rdfresource
"http//www.cs.vassar.edu/ide/rdf/isle-datcats/Ph
rasesNP"/gt lt/SlotRealizationgt
lt/slotgt ltslotgt ltSlotRealization
rdfID"NPobj"gt lthasFunction
rdfvalue"Obj"/gt ltfilledBy
rdfresource "http//www.cs.vassar.edu
/ide/rdf/isle-datcats/PhrasesNP"/gt
lt/SlotRealizationgt lt/slotgt lt/Constructiongt lt/
rdfRDFgt
74
Full entry
ltEntry rdfID"eat1"gt lthasSynu
rdfparseType"Resource"gt ltSynU
rdfID"eat1-SynU"gt ltexamplegtJohn ate
the cakelt/examplegt lthasSyntacticFramegt
ltSyntacticFrame rdfID"eat1SynFrame"gt
lthasSelfgt ltSelf
rdfID"eat1Self"gt
ltheadedBygt ltPhrase
rdfID"Vauxhave"gt
lthasSynFeaturegt
ltSynFeaturegt
lthasSynFeatureName rdfvalue"aux"/gt
lthasSynFeatureValue
rdfvalue"have"/gt
lt/SynFeaturegt
lt/hasSynFeaturegt
lt/Phrasegt lt/headedBygt
lt/Selfgt
lt/hasSelfgt Continued
75
Continued from previous slide
lthasConstructiongt
ltConstruction rdfID"eat1Const"gt
ltslotgt
ltSlotRealization rdfID"NPsubj"gt
lthasFunction rdfvalue"Subj"/gt
ltfilledBy rdfvalue"NP"/gt

lt/SlotRealizationgt lt/slotgt
ltslotgt
ltSlotRealization rdfID"NPobj"gt
lthasFunction rdfvalue"Obj"/gt
ltfilledBy rdfvalue"NP"/gt
lt/SlotRealizationgt
lt/slotgt
lt/Constructiongt
lt/hasConstructiongt
lthasFrequency rdfvalue"8788" mlccorpus"PAROLE"
/gt lt/SyntacticFramegt
lt/hasSyntacticFramegt lt/SynUgt
lt/hasSynugt lt/Entrygt lt/rdfRDFgt
76
Entry Using Phrase
ltEntry rdfID"eat1"gt lthasSynu
rdfparseType"Resource"gt ltSynU
rdfID"eat1-SynU"gt ltexamplegtJohn ate
the cakelt/examplegt lthasSyntacticFramegt
ltSyntacticFrame rdfID"eat1SynFrame"gt
lthasSelfgt ltSelf
rdfID"eat1Self"gt ltheadedBy
rdfresource
"http//www.cs.vassar.edu/ide/rdf/isle-datcats/Ph
rasesVauxhave"/gt lt/Selfgt
lt/hasSelfgt
lthasConstructiongt ltConstruction
rdfID"eat1Const"gt
ltslotgt ltSlotRealization
rdfID"NPsubj"gt
lthasFunction rdfvalue"Subj"/gt
ltfilledBy rdfresource
"http//www.cs.vassar.edu/ide/rdf/isle-dat
cats/PhrasesNP"/gt
lt/SlotRealizationgt lt/slotgt
ltslotgt
ltSlotRealization rdfID"NPobj"gt
lthasFunction rdfvalue"Obj"/gt
ltfilledBy rdfresource
"http//www.cs.vassar.edu/ide/rdf/i
sle-datcats/PhrasesNP"/gt
lt/SlotRealizationgt lt/slotgt
lt/Constructiongt
lt/hasConstructiongt
lthasFrequency rdfvalue"8788" mlccorpus"PAROLE"
/gt lt/SyntacticFramegt
lt/hasSyntacticFramegt lt/SynUgt
lt/hasSynugt lt/Entrygt
77
Entry Using Construction
ltEntry rdfID"eat1"gt lthasSynu rdfparseType"Reso
urce"gt ltSynU rdfID"eat1-SynU"gt
ltexamplegtJohn ate the cakelt/examplegt
lthasSyntacticFramegt ltSyntacticFrame
rdfID"eat1SynFrame"gt
lthasSelfgt ltSelf
rdfID"eat1Self"gt ltheadedBy
rdfresource
"http//www.cs.vassar.edu/ide/rdf/isle-d
atcats/PhrasesVauxhave"/gt
lt/Selfgt lt/hasSelfgt
lthasConstruction rdfresource
"http//www.cs.vassar.edu/ide/rdf/isle-datcats/Co
nstructionsTransIntrans"/gt
lthasFrequency rdfvalue"8788" mlccorpus"PAROLE"
/gt lt/SyntacticFramegt
lt/hasSyntacticFramegt lt/SynUgt
lt/hasSynugt lt/Entrygt
78
Semantic Representation
  • The data model underlying RDF/UML, etc. is
    universal, abstract enough to capture all types
    of info
  • Semantic representations
  • Registry of basic data categories
  • meta-categories addressee, utterance, etc.
  • Information categories eyebrow movement,
    gestures, pitch,
  • Supporting ONTOLOGY of information categories
  • Interpretative procedures yield another level of
    meaning represent.
  • Registry of categories.

UNINTERPRETED REPRESENATION
INTERPRETED REPRESENTATION
INTERPRETATION PROCESS
79
MILE Lexical Data Category Registry (MDC)
  • Instantiation of pre-defined lexical objects
  • Extension of the shared class schema with
    lexicon-specific sub-classes and sub-properties
  • Can be used off the shelf or as a departure
    point for the definition of new or modified
    categories
  • Enables modular specification of lexical entities
  • eliminate redundancy
  • identify lexical entries or sub-entries with
    shared properties

80
MLC in RDF/S features
features are properties of lexical objects
mlmLexObject
mlmValues
mlmfeature
rdfssubPropertyOf
rdfssubClassOf
mlmsemFeature
rdfssubClassOf
mlmSemValues
mlmsynFeature
mlmSynValues
81
MLC in RDF/S syntactic features
ltrdfsProperty rdfIDsynCat"gt ltrdfssubProperty
Of rdfresource"http//webilc.ilc.cnr.it/lenc
i/isle/mile- schema-v.1synFeature"/gt ltrdfsrang
e rdfresourcehttp//webilc.ilc.cnr.it/lenci/
isle/mile- schema-v.1SynCatValues/gt lt/rdfsProp
ertygt ltrdfsClass rdfIDSynCatValuesgt ltrdfss
ubClassOf rdfresourcehttp//webilc.ilc.cnr.it
/lenci/isle/mile- schema-v.1 SynValues/gt
ltowloneOf rdfparseType"Collection"gt ltowlThin
g rdfabout"Noun"/gt ltowlThing
rdfabout"Verb"/gt ltowlThing
rdfabout"Adjective"/gt ... lt/owloneOfgt
lt/rdfsClassgt lt/rdfsRDFgt
feature values
82
MLC in RDF/S semantic features
ltrdfsProperty rdfIDdomain"gt ltrdfssubProperty
Of rdfresource"http//webilc.ilc.cnr.it/lenc
i/isle/mile- schema-v.1semFeature"/gt ltrdfsrang
e rdfresourcehttp//webilc.ilc.cnr.it/lenci/
isle/mile- schema-v.1 DomainValues/gt lt/rdfsPro
pertygt ltrdfsClass rdfIDDomainValuesgt ltrdfs
subClassOf rdfresourcehttp//webilc.ilc
.cnr.it/lenci/isle/mile- schema-v.1SemValues/gt
ltowloneOf rdfparseType"Collection"gt ltowl
Thing rdfabout"Finance"/gt ltowlThing
rdfabout"Medicine"/gt ltowlThing
rdfabout"Sport"/gt ... lt/owloneOfgt
lt/rdfsClassgt lt/rdfsRDFgt
domain ontology
83
Synsets in RDF/S
mlmword
mlmSynset
rdfsliteral
mlmgloss
rdfsliteral
mlmfeature
mlmsynsetRelation
mlmValues
mlmSynset
cf. also http//www.semanticweb.org/library/wordne
t/wordnet-20000620.rdfs
84
Synsets in RDF/S
ltrdfsClass rdfID"Synset"gt ltrdfslabelgtSynsetlt/
rdfslabelgt ltrdfscommentgtThis class formalizes
the notion of synset as defined in WordNet
(Fellbaum 1998).lt/rdfscommentgt ltrdfssubClassOf
rdfresourceLexObject/gt lt/rdfsClassgt ltrdfsP
roperty rdfID"synsetRelation"gt ltrdfsdomain
rdfresource"Synset"/gt ltrdfsrange
rdfresource"Synset"/gt lt/rdfsPropertygt ltrdfsP
roperty rdfID"hypernym" mlmsource"WordNet1.7"gt
ltrdfscommentgtThe WordNet hypernym
relationlt/rdfscommentgt ltrdfssubPropertyOf
rdfresource"synsetRelation"/gt lt/rdfsPropertygt
ltrdfsProperty rdfID"meronym"
mlmsource"WordNet1.7"gt ltrdfscommentgtThe
WordNet meronym relationlt/rdfscommentgt ltrdfssub
PropertyOf rdfresource"synsetRelation"/gt lt/rdfs
Propertygt
relation between synsets
different types of synset relations
85
WordNet 1.7 Synsets
ltmlmSynset rdfabout"http//www.cogsci.prin
ceton.edu/wn1.7/concept01752990
mlmsource"WordNet1.7"gt ltmlmglossgtA member of
the genus Canislt/mlmglossgt ltmlmwordgtdoglt/mlmwo
rdgt ltmlmwordgtdomestic doglt/mlmwordgt ltmlmwordgt
Canis familiarislt/mlmwordgt ltmdcsynCat
rdfresource"Noun"/gt ltmdcdomain
rdfresource"Zoology"/gt ltmdchypernym rdfreso
urce"http//www.cogsci.princeton.edu/wn1.7/conce
pt 01752283"/gt lt/mlmSynsetgt
features
hypernym
86
Foundations of the Mapping Experiment
87
1. The MILE building-block model
  • The MILE Lexical Classes and the MILE Lexical
    Data Categories are the main building blocks of
    the MILE lexical architecture
  • Building blocks allow two kinds of reusability
  • intra-lexicon reusability (within the same
    lexicon)
  • inter-lexicon reusability (among different
    lexicons)

88
How building-blocks work?
89
2. MILE a meta-entry
  • MILE is
  • a general schema for multilingual lexical
    resources
  • a lexical meta-entry, a common representational
    layer for multilingual lexicons
  • Computational lexicons can be viewed as different
    instances of the MILE schema

MILE
lexicon1
lexicon3
lexicon2
90
MILE and Content Interoperability
  • This common shared compatible representation of
    lexical objects is particularly suited to
  • manipulate objects available in different lexical
    resources
  • understand their deep semantics
  • apply the same operations to lexical objects of
    the same type
  • key elements of Content Interoperability

91
The Mapping Experiment Why?
  • It is a concrete experiment aimed to test the
    expressive potentialities and capabilities of the
    MILE
  • The idea is that if the MILE atomic notions
    combined together in different ways suit the
    different visions underlying two lexicons such
    as FrameNet and NOMLEX,
  • the MILE will come out fortified
  • its adoption as an interface between differently
    conceived lexical architectures can be pushed
    more
  • key issues for content interoperability between
    resources can be addressed

92
The mapping scenarios
  • High level mapping of the objects of a lexicon
    into the objects of the abstract model
  • ? the native structure is maintained and no
    format conversion is performed
  • Translate instances of lexical entries directly
    in MILE
  • ? acts as a true interchange format

93
FrameNet to MILE
94
FrameNet-MILE Observations
  • The mapping is promising
  • Frame ? Predicate (primitive)
  • Frame Elements ? Argument (enlarge the set of
    possible values)
  • Lexical_Unit ? SemU
  • Link SemU-Predicate (obligatory) should become
    underspecified
  • But
  • Lack of inheritance mechanism in the Predicate
    does not allow to represent the hierarchical
    organization of Frames and Sub-frames, temporal
    ordering among Frames, subsumption relations
    among Frames
  • We could add a new object PredicateRelation to
    allow for the description of relations occurring
    between predicates and sub-predicates

95
MLCSynU
MLCSemU
MLCSemanticFrame TypeOfLinkAgentnom IncludedArg
0
MLCCorrespSynUSemU
MLCPredicate
MILE
MLCArgument
MLCArgument
nom-type ((subject))
NOMLEX
96
NOMLEX-MILE Observations
  • The mapping is promising
  • Notions represented in NOMLEX have a
    correspondent in MILE
  • But ..
  • are expressed with two opposite lexical
    structures
  • In NOMLEX,
  • lexical information is expressed in a very
    compact way
  • no clear cut boundaries between the levels of
    linguistic description
  • In MILE
  • compressed info should be decompressed and spread
    over different MILE lexical layers and objects
    SynU, SemU, SemanticFrame with its Predicate and
    relevant Arguments to account for the
    incorporation of the Agent.

97
Lesson Learned from the mapping
  • The results of the experiments are promising
  • FrameNet offers the possibility to be confronted
    with two similar lexical models, but not
    perfectly overlapping lexical objects test
    the adequacy of the linguistic objects
  • NOMLEX gives the opportunity to work with two
    lexicons where linguistic notions correspond but
    are expressed with an opposite lexicon structure
    test the adequacy of the architectural
    model
  • The high granularity and modularity of MILE
  • allow the compatibility with differently packaged
    linguistic objects
  • allow the addition of new objects and relations
    without perverting the general architecture

98
RDF and MILE Why?
  • Some reasons (from Nancy Ide et al. 2003)
  • MILE as a hierarchy of lexical objects built up
    by combining data categories via clearly defined
    relations is an ideal structure for rendering in
    RDF
  • RDF mechanism, with the capacity of expressing
    named relations between objects, offers a
    web-based means to represent the MILE
    architecture
  • RDF representation of linguistic information is
    an invaluable resource for language processing
    applications in the Semantic Web
  • RDF description and instantiation is in line with
    the goal of ISO TC37 SC4

99
RDF Representation of MILE
  • MILE was already supplied with
  • an RDF schema for the MILE Syntactic Layer
  • an instantiation of pre-defined syntactic objects
  • We increased the repository of shared lexical
    objects with the RDF description and (partial!)
    instantiations of the objects of the semantic and
    linking layers
  • This has been carried out with the intent to
  • be submitted within the ISO TC37/SC4
  • foster the adoption of MILE, by offering a
    library of RDF objects ready-to-use

100
An RDF Schema for the synt-sem linking
lt!-- An RDF Schema for ISLE lexical
entries v 0.1 2004/05/05 Author
Monachini --gt ltrdfRDF xmlnsrdf"http//www.w3.or
g/1999/02/22-rdf-syntax-ns"
xmlnsrdfs"http//www.w3.org/2000/01/rdf-schema"
xmlnsowl "http//www.w3.org/2002/07/ow
l xmlnsmlc "http//www.cs.vassar.edu/
ide/rdf/isle-schema-v.6 xmlnsmlc
"http//www.ilc.cnr.it/clips/rdf/isle-schema-synt
semlinking_v.1"gt lt!-- ISLE/MILE lexical
objects (classes for the synt-sem linking)
--gt ltrdfsClass rdfabout"http//www.ilc.cnr.it/
clips/rdf/isle-schema-syntsemlinking_v.1CorrespSy
nUSemU"gt ltrdfslabelgtCorrespSynUSemUlt/rdfslabelgt
ltrdfscommentgtThis class links a SynU to a
SemUlt/rdfscommentgt lt/rdfsClassgt ltrdfsClass
rdfabout"http//www.ilc.cnr.it/clips/rdf/isle-sc
hema-syntsemlinking_v.1PredicativeCorresp"gt
ltrdfslabelgtPredicativeCorresplt/rdfslabelgt
ltrdfscommentgtThis class contains the
associations between the syntactic slots and
semantic argumentlt/rdfscommentgt lt/rdfsClassgt
ltrdfsClass rdfabout"http//www.ilc.cnr.it/cli
ps/rdf/isle-schema-syntsemlinking_v.1SlotArgCorre
sp"gt ltrdfslabelgtSlotArgCorresplt/rdfslabelgt
ltrdfscommentgtThis class links a syntactic slots
to a semantic argumentlt/rdfscommentgt
lt/rdfsClassgt
Classes
101
An RDF Schema for the synt-sem linking
lt!-- Properties (relations) between objects and
between objects and atomic values
--gt ltrdfProperty rdfabout"http//www.ilc.cnr.i
t/clips/rdf/isle-schema-syntsemlinking_v.1hasSour
ceSynU"gt ltrdfslabelgthasSourceSynUlt/rdfslabelgt
ltrdfsdomain rdfresource"http//www.ilc.cnr.it/
clips/rdf/isle-schema-syntsemlinking_v.1CorrespSy
nUSemU"/gt ltrdfsrange rdfresource"http//www.cs
.vassar.edu/ide/rdf/isle-schema-v.6SynU"/gt
lt/rdfPropertygt ltrdfProperty
rdfabout"http//www.ilc.cnr.it/clips/rdf/isle-sc
hema-syntsemlinking_v.1hasTargetSemU"gt
ltrdfslabelgthasTargetSemUlt/rdfslabelgt
ltrdfsdomain rdfresource"http//www.ilc.cnr.it/
clips/rdf/isle-schema-syntsemlinking_v.1CorrespSy
nUSemU"/gt ltrdfsrange rdfresource"http//www.cs
.vassar.edu/ide/rdf/isle-schema-v.6SemU"/gt
lt/rdfPropertygt ltrdfProperty
rdfabout"http//www.ilc.cnr.it/clips/rdf/isle-sc
hema-syntsemlinking_v.1hasPredicativeCorresp"gt
ltrdfslabelgthasPredicativeCorresplt/rdfslabelgt
ltrdfsdomain rdfresource"http//www.ilc.cnr.it/
clips/rdf/isle-schema-syntsemlinking_v.1CorrespSy
nUSemU"/gt ltrdfsrange rdfresource"http//www.il
c.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1
PredicativeCorresp"/gt lt/rdfPropertygt
ltrdfProperty rdfabout"http//www.ilc.cnr.it/c
lips/rdf/isle-schema-syntsemlinking_v.1includesSl
otArgCorresp"gt ltrdfslabelgtincludesSlotArgCorresp
lt/rdfslabelgt ltrdfsdomain rdfresource"http//w
ww.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking
_v.1PredicativeCorresp"/gt ltrdfsrange
rdfresource"http//www.ilc.cnr.it/clips/rdf/isle
-schema-syntsemlinking_v.1SlotArgCorresp"/gt
lt/rdfPropertygt
Properties
102
The library of Pre-instantiated objects
  • Enable modular specification of lexical entities
  • eliminate redundancy
  • identify lexical entries or sub-entries with
    shared properties
  • create ready-to-use packages that can be combined
    in different ways
  • Can be used off the shelf or as a departure
    point for the definition of new or modified
    categories

103
MDCR for some objects
  • lt!-- Sample LDCR entry for a PredicativeCorresp
    and SlotArgCorresp objects
  • DataCats for ISLE lexical entries
  • v 0.1 2004/05/17
  • Author Monachini --gt
  • ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
    df-syntax-ns"
  • ltPredicativeCorresp rdfID"isobivalent"gt
  • ltincludesSlotArgCorresp
  • rdfresourcehttp//www.ilc.cnr.it/clips/rdf
    /isle-datacats/SlotArgCorrespArg0Slot0
    Arg1Slot1/gt
  • lt/includesSlotArgCorrespgt
  • lt/PredicativeCorrespgt
  • ltSlotArgCorresp rdfID"Arg0Slot0"
  • SlotNumber"0"
  • ArgNumber"0"gt
  • lt/SlotArgCorrespgt
  • ltSlotArgCorresp rdfID"Arg1Slot1"

Pre-instantiated PredicativeCorresp
Pre-instantiated SlotArgCorresp
104
A Sample Entry in MILE
  • The entry is shown in a double alternative
  • the full specification of a lexical object
    PredicativeCorresp
  • an already instantiated object PredicativeCorresp
  • The advantage is that
  • the object does not need to be specified in the
    entry
  • and can be used and reused in other entries
  • explore the potential of MILE for representation
    of lexical data

105
Sample full entry for amareV
  • lt!-- The SynU SemU link --gt
  • ltcorrespondsTogt
  • ltCorrespSynUSemUgt
  • lthasSourceSynU mlcpID"SYNUamareV"gt
  • lt/hasSourceSynUgt
  • lthasTargetSemU mlcpID"SEMUamareEXPEVE"
    gt
  • lt/hasTargetSemUgt
  • lthasPredicativeCorrespgt
  • ltPredicativeCorresp
    mlcpID"amare-PredCorresp"gt
  • ltincludesSlotArgCorrespgt
  • ltSlotArgCorresp SlotNumber"0"
    ArgNumber"0"gt
  • lt/SlotArgCorrespgt

  • ltSlotArgCorresp SlotNumber"1" ArgNumber"1"gt
  • lt/SlotArgCorrespgt
  • lt/includesSlotArgCorrespgt
  • lt/PredicativeCorrespgt
  • lt/hasPredicativeCorrespgt
  • lt/CorrespSynUSemUgt
  • lt/correspondsTogt

The full object PredicativeCorresp
106
the abbreviated entry
  • lt!-- The SynU SemU link --gt
  • ltcorrespondsTogt
  • ltCorrespSynUSemUgt
  • lthasSourceSynU mlcpID"SYNUamareV"gt
  • lt/hasSourceSynUgt
  • lthasTargetSemU mlcpID"SEMUamareEXPEVE
    "gt lt/hasTargetSemUgt
  • lthasPredicativeCorresp
  • rdfresourcehttp//www.ilc.cnr.it/cli
    ps/rdf/isle-datacats/PredicativeCorrespisobivalen
    t/gt
  • lt/CorrespSynUSemUgt
  • lt/correspondsTogt
  • lt/SynUgt
  • lt/hasSynugt

Instantiated object PredicativeCorresp
107
The RDF Schema, the DCR for MILE objects and the
entries are available at www.ilc.cnr.it/clips/rdf/

108
and INTERA?
  • INTERA Multilingual Terminological Lexica will
    follow and merge the two frameworks
  • The MILE and
  • ISO TMF (Terminological Markup Framework)

109
Beyond MILE future work
  • MILE Lexical Model oriented towards an
  • Open Distributed Lexical Infrastructure
  • Lexical Information Servers for multiple access
    to lexical information repositories
  • Enhance
  • user-adaptivity
  • resource sharing
  • cooperative creation
  • Develop integration and interchange tools

110
Broadening MILE ... other
languages
  • Ongoing enlargement to Asian languages (Chinese,
    Japanese, Korean, Thai, Hindi ...)
  • promote common initiatives between Asia Europe
    (e.g. within the EU 6th FP)
  • The creation of an Open Distributed Lexical
    Infrastructure, also supported by Asian
    Institutions
  • AFNLP
  • University of Tokyo (Dept. of Computer Science)
  • Korean KAIST and KORTERM
  • Academia Sinica (Taiwan)

To valorise results increase visibility of LR
standardisation initiatives in a world-wide
context, while concretely promoting the
launching of a new common platform for
multilingual LR creation management
111
Using semantically tagged corpora to acquire
semantic info and enhance Lexicons
  • evaluate the disambiguating power of the semantic
    types of the lexicon
  • assess the need of integrating lexicons with
    attested senses and/or phraseology
  • identify the inadequacy of sense distinctions in
    lexicons
  • check actual frequency of known senses in
    different text types
  • have a more precise and complete view on the
    semantics of a lemma
  • identify the most general senses
  • capture the most specific shifts of meaning
  • Capture just the core, basic distinctions in a
    core lexicon
  • Corpus analysis must not lead to excessive
    granularity of sense distinctions, but draw a
    distinction between
  • sense discrimination to be kept under control
    - clustering (manually or automatically)
  • additional, more granular information (often of
    collocational nature) which can/must be
    acquired/encoded within the broader senses, e.g.
    to help translation

112
Dynamic lexicon
  • Current computational lexicons (even WordNets)
    are static objects, still shaped on traditional
    dictionaries
  • suffering from the limitations induced by paper
    support
  • Thinking at the complex relationships between
    lexicon and corpus
  • towards a flexible model of dynamic lexicon
  • extending the expressiveness of a core static
    lexicon adapting to the requirements of language
    in use as attested in corpora
  • with semantic clustering techniques, etc.
  • Convert the extreme flexibility
    multidimensionality of meaning into large-scale
    and exploitable (VIRTUAL?) resources

a Lexicon and Corpus together
113
What to annotate?
  • Mix of
  • Word-sense annotation (implicit semantic markup)
  • Semantic/conceptual markup
  • Syntagmatic relations
  • Dependency relations
  • Semantic roles

114
Need for a common Encoding Policy ?
  • Agree on common policy issues?
  • is it feasible?
  • desirable?
  • to what extent?
  • This would imply, among others
  • analysis of needs also applicative/industrial -
    before any large development initiative
  • base semantic tagging on commonly accepted
    standards/guidelines ??
  • up to which level?
  • Common semantic tagset Gold Standard??
  • build a core set of semantically tagged corpora,
    encoded in a harmonised way, for a number of
    languages??
  • make annotated corpora available to the community
    by large
  • involve the community, collect and analyse
    existing semantically tagged corpora
  • devise common set of parameters for analysis

115
A few Issues for discussion MILE lexicon
standards More standardisation initiatives?
  • MILE - a general schema for encoding multilingual
    lexical info, as a meta-entry, as a common
    representational layer
  • Short medium term requirements wrt standards
    for multilingual lexicons and content encoding,
    also industrial requirements
  • Relation with Spoken language community (see
    ELRA)
  • Semantic Web standards the needs of content
    processing technologies importance of reaching
    consensus on (linguistic non-linguistic)
    content, in addition to agreement on formats
    encoding issues (words convey content
    knowledge)
  • Define further steps necessary to converge on
    common priorities

116
Broadening MILE ... other communities
  • NLP, lexicons, terminologies, ontologies,
    Semantic Web
  • a continuum?
  • Knowledge management is critical.
  • For content interoperability, need to converge
    around agreed standards also for the
    semantic/conceptual level
  • is the field mature enough to converge around
    agreed standards also for the semantic/conceptual
    level (e.g. to automatically establish links
    among different languages)?
  • Is the field of multilingual lexical resources
    ready to tackle the challenges set by the
    Semantic Web development?
  • Foster better integration with
  • corpus-driven data
  • terminology/ontology/semantic web communities
  • multimodal multimedial aspects

Oriented towards open, distributed lexical
resources Lexical Information Servers for
multiple access to lexical information
repositories
117
A few Issues for discussion NLP, lexicons,
content, ontologies, Semantic Web a
continuum?
  • Need for robust systems, able to acquire/tune
    multilingual lexical/linguistic/
Write a Comment
User Comments (0)
About PowerShow.com