Computational Lexicons and the Semantic Web - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Computational Lexicons and the Semantic Web

Description:

AGENT John TARGET store. Bucharest, 30 July 2003. Computational Lexicons and HLT ... employment. SemU. to employ. agent. nominalization. patient ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 76
Provided by: rac5
Category:

less

Transcript and Presenter's Notes

Title: Computational Lexicons and the Semantic Web


1
Computational Lexicons and the Semantic Web
  • Alessandro Lenci
  • Università di Pisa Department of Linguistics
  • Istituto di Linguistica Computazionale - CNR

2
Tutorial Outline
  • Computational lexicons for the Semantic Web (SW)
  • how they are
  • how they should be
  • The SW for computational lexicons
  • lexicon design in the age of the SW
  • Training session
  • case study lexical modelling in RDF/S

3
The Semantic Web Vision
  • Turning the WWW into a machine understandable
    knowledge base

Intelligent Agents
Documents
Semantic Web
Applications
Databases
4
Six Challenges for the SW(Benjamins et al. 2002)
  • Content availability
  • Ontology availability
  • Multilinguality
  • Scalability
  • Visualization
  • Stability of SW languages

5
Six Challenges for the SW(Benjamins et al. 2002)
  • Content availability
  • Ontology availability
  • Multilinguality
  • Scalability
  • Visualization
  • Stability of SW languages

Human Language Technology (HLT)
6
Lexical Information and HLT
  • All language analysis involves determining
    meaning at some level
  • Anything from groups of related words to a
    full-blown representation of each sentence

Information retrieval
bank account money
John went to the store
Topic financial
GO AGENT John TARGET store
7
Computational Lexicons and HLT
Computational lexicons provide machine
understandable word knowledge
  • Explicit representation of word meaning
  • word content accessible to computational agents
  • Word meaning linked to word syntax and morphology
  • Multilingual lexical links

8
Computational Lexicons and HLT
  • Contain the linguistic information required to
    build meaning representations

Lexicon
went vpast GO go v. (NP_SUBJ ((role AGENT) (sem
animate)) (VP ((verb GO)
(PP ((prep TO) (NP
((role TARGET) (sem loc))))) John n. sem
human store n. sem loc
Lexicon
account n. domain financial account v. bank_1
n. domain financial bank_2 n. domain
geography money n. domain financial
bank account money
John went to the store
Topic financial
GO AGENT John TARGET store
9
Computational Lexicons and HLT
  • Critical language resources for NLP systems
  • syntactic subcategorization frames for parsing
  • semantic selectional preferences for ambiguity
    reduction
  • semantic classes for WSD, semantic tagging, etc.
  • Key components of HLT
  • monolingual lexicons IE, QA, etc.
  • multilingual lexicons MT, CLIR, etc.

10
Ontologies and Computational Lexicons
HLT
Access to Content
Semantic Web
Ontologies
Computational Lexicons
?
11
Ontologies
  • An ontology is a system of concepts relevant for
    knowledge and action in (a portion of) the world
  • categorization of objects and processes
  • inference
  • action planning

An ontology is a specification of a
conceptualization (Gruber 1993)
12
Ontologies
A set of knowledge terms, including the
vocabulary, the semantic interconnections, and
some simple rule of inference and logic (Hendler
2001)
ARTIFACT
OBJECT
ANIMAL
LOCATION
ENTITY
EVENT
13
Types of Ontologies
Vertical typology
Foundational Ontology
OBJECT
Domain Core Ontology
SOFTWARE
Domain Specific Ontology
WORD_PROCESSOR
  • Horizontal typology
  • Information System ontology
  • AI ontology
  • Linguistic ontology

14
Linguistic Ontology
  • A system of symbols representing the concepts
    (meanings) encoded by NL expressions (lexical
    units, terms, etc.)
  • specify semantic classes grouping semantically
    similar terms
  • semantic representation language
  • interlingua

car, van, truck
ARTIFACT
VEHICLE
OBJECT
dog, cat, horse
ANIMAL
MAMMAL
beach
LOCATION
ENTITY
BEACH
spiaggia
piano concert, rock concert
EVENT
CONCERT
15
Ontologies and Computational Lexicons
Ontology
Concept Space
Semantics
Syntax
Multilinguality
Morphology
Language/s
Computational Lexicon
16
Computational Lexiconstipology
  • Monolingual vs. multilingual
  • General purpose vs. domain (application) specific
  • Content type
  • (Morpho)-Syntactic
  • Semantic
  • Mixed
  • Terminological

17
Syntactic Computational Lexicons
  • Syntactic lexical information is distilled in
    subcategorization frames
  • ComLex, PAROLE, etc.
  • Syntactic frames typically include
  • number of selected arguments
  • syntactic categories of their realizations (PP,
    NP, etc.)
  • lexical constraints on argument realization (e.g.
    preposition heading a PP)
  • argument functional role (Subj, Obj, etc.)
  • optionality, control, auxiliary selection, etc.

hit V (Subj NP) (Objd NP) answer N
(Obji PP_to)
18
Semantic Computational Lexicons
  • Representing the meaning of a word (minimally)
    requires
  • Distinguishing different senses of the word
  • E.g. bank finacial institution vs. geographical
    configuration
  • Capturing inferences
  • E.g. being human implies being animate
  • Representing similarity of meaning with other
    words
  • E.g. bank, account, money all related to finances

19
Semantic Computational Lexicons
  • Mikrokosmos (Nirenburg, Mahesh et al.)
  • WordNet (Miller, Fellbaum et al.)
  • EuroWordNet (Vossen et al.)
  • SIMPLE (Calzolari, Lenci et al.)
  • FrameNet (Fillmore et al.)

20
Computational Lexiconsdesign issues
  • Network based
  • hierarchy (taxonomy)
  • WordNet
  • heterarchy
  • EuroWordNet
  • Frame based
  • Mikrokosmos
  • FrameNet
  • Hybrid
  • SIMPLE

21
EuroWordNet
22
EuroWordNetTop Ontology
23
EuroWordNet
24
PAROLE-SIMPLE Lexicons
  • 12 EU monolingual core lexicons built according
    to a harmonized model and further extended at the
    national level
  • Integrated combinations of syntactic and semantic
    information
  • syntactic subcategorization frames
  • semantic type (Ontology)
  • semantic frames linked to syntax
  • semantic roles
  • selectional preferences
  • etc.
  • semantic relations
  • Pustejovskys qualia roles, etc.
  • regular polysemy
  • event structure

25
SIMPLE Architecture
Italian lexicon
PAROLE Syntax
SemU
Semantic Frame (semantic roles, etc.)
Semantic Relations
Event Structure
Polysemy
etc.
26
SIMPLEsemantic relations
Top
Telic
Formal
Constitutive
Agentive
Is_a
Is_a_part_of
Property
Created_by
Agentive_cause
Indirect_telic
Activity
Contains
Instrumental
Is_the_habit_of
...
...
Used_for
Used_as
27
SIMPLEsemantic network
ltfabbricaregt make
Ala (wing)
Agentive
SemU 3232 Type Part Part of an airplane
Agentive
ltvolaregt fly
Used_for
Is_a_part_of
ltaeroplanogt airplane
Isa
SemU 3268 Type Part Part of a building
ltpartegt part
Isa
Used_for
Isa
SemU D358 Type Body_part Organ of birds for
flying
ltedificiogt building
Is_a_part_of
Is_a_part_of
SemU 3467 Type Role Role in football
ltuccellogt bird
ltgiocatoregt player
Isa
28
SIMPLEsemantic frames
PREDemploy1 Arg1ltAGENT - HUMANgt Arg2ltPATIENT
- HUMANgt
agent nominalization
master link
patient nominalization
event nominalization
SemU employee
SemU employment
SemU to employ
SemU employer
29
SIMPLEsemantic frames
Comprendere V
SemU 61725 Type Cognitive_event To understand
SemU 6962 Type Constitutive_state To include
PREDComprendere1 ltArg1 humangt, ltArg2
semioticgt
PREDComprendere2 ltArg1 Entitygt,
ltArg2Entitygt
30
SIMPLEsemantic frames
il difensore di Berlusconi (Berlusconi's
defender) il difensore del Milan (the Milan
fullback)
Difensore N
agent nominalization
SemU 4125 Type Role Defender
PREDDifendere1 ltArg1gt, ltArg2gt
SemU 3526 Type Role Fullback
ltsquadragt team
Is_a_member_of
31
Semantic multidimensionality
  • Identification of the semantic contribution of an
    NP requires to access a rich representation of
    semantic content of the nominal heads
  • The semantic structure of the nominal head
    determines the semantic relation expressed by a
    modifying PP (in Italian)
  • la pagina del libro (the page of the book)
  • il difensore del Milan (the Juventus fullback)
  • il suonatore di liuto (the lute player)
  • il tavolo di legno (the wooden table)

PART-OF
MEMBER-OF
TELIC
MADE-OF
32
SIMPLEsample entries
semantic relations
ontology
semantic frame
33
Computational Lexiconsloose ends
  • Non-compositional aspects in the lexicon
  • collocations, terms, MWEs, etc.
  • Integration between lexicons and corpus data
  • lexical tuning, data-driven lexicon population,
    etc.
  • Semantic dynamics (polysemy, lexical creativity,
    etc.)
  • context-sensitivity of meaning as a challenge
    for lexical semantics
  • sense enumeration vs. sense generation
  • heavy smoker, heavy book, heavy road, heavy sea,
    heavy wine, heavy sky, heavy artillery, etc.

34
Computational Lexiconsloose ends
  • Semantic type system for lexical senses must
    account for a non-static kaleidoscope of senses
  • Salience of aspects of meaning differ for
    different types
  • natural kinds ? Is-a artifacts ? function
  • Possible solutions
  • multiple layers of representation
  • explicit identification of information so that
    NLP systems can access what is needed at a given
    time
  • dynamic type systems

35
Computational Lexiconsnew challenges from the SW
  • From language resources for HLT to knowledge
    resources for inferential engines
  • in-depth lexical description for better content
    understanding
  • Content interoperability between computational
    lexicons
  • better integration between lexical information
    from different sources
  • Beyond the lexical information bottleneck
  • automatic lexical knowledge acquisition

36
Lexical Inferences
  • Midfielder Scott Sellars was sold to Blackburn
    for 35,000 and was bought back in the summer for
    750,000.
  • (FrameNet Corpus)

after e1 OWN (buyer, goods) NOT(OWN (buyer,
money))
after e2 NOT(OWN (seller, goods)) OWN (seller,
money)
e1 lt e2 TIME e2 SUMMER
37
Hot Topics
To provide SW agents with high inferential
capacities in accessing linguistic content
  • In-depth lexical analysis
  • e.g. X buys Y from Z at t gt Z owns Y before t
    X owns Y after t
  • Key issues at the lexicon-grammar interface
  • predicate event structure
  • states, processes, accomplishments, etc.
  • temporal adverbs and temporal expressions
  • e.g. in three years, etc.
  • quantificational expressions etc.
  • syntax-semantics argument linking

38
Computational Lexicons and the Semantic Web
  • Part 2
  • Lexicon Design in the Age of the Semantic Web

39
Lexicons of the Future
  • General purpose
  • portable over different domains
  • Multilingual
  • relations among lexical entities in different
    languages
  • Flexible and extensible
  • enable use of information at appropriate
    granularity for the application
  • enable continual extension dynamic
  • Integrated with Web technology
  • content interoperability

40
Lexical Content Interoperability
  • The Lexical Web
  • Enable universal access to lexical information

FrameNet
SIMPLE
WordNet
EuroWordNet
Intelligent Agents
41
Some Requirements for Lexical Content
Interoperability
  • Compatibility between different models of
    lexical analysis
  • relational semantic models (e.g. WordNet)
  • Syntactic and semantic frames
  • Compatibility between different degrees of
    lexical specification
  • deep lexical representations (e.g. PAROLE-SIMPLE)
  • shallow semantic descriptions
  • Compatibility between different paradigms of
    multilinguality
  • lexicons for transfer-based MT
  • interlingua-based lexicons

42
The Need for Standards
  • To represent common information
  • while keeping flexibility
  • To enhance the sharing and reusability of
    multilingual lexical resources
  • To establish an open environment for the
    development and integration of multilingual
    resources
  • Information must be consistent with related
    technologies in order to take advantage of them
  • XML, RDF/S, etc.

43
International Standards for Language Engineering
  • Definition of standards for multilingual
    computational lexicons both at the content and at
    the representational level

44
ISLE
EAGLES guidelines for syntactic and semantic
lexicons
GENELEX Model
MILE Lexical Model
45
The MILE Lexical Model
  • A general architecture to foster the content
    interoperability between multilingual
    computational lexicons
  • Key issues
  • Modularity
  • User-adaptability
  • Resource sharing
  • Reusability

SW technologies and standards applied at lexicon
modelling
46
The MILE Lexical Model (MLM)
  • The MLM core is the Multilingual ISLE Lexical
    Entry (MILE)
  • a general schema for multilingual lexical
    resources
  • a lexical meta-entry as a common representational
    layer for multilingual lexicons
  • Computational lexicons can be viewed as different
    instances of the MILE schema

MILE Lexical Model
lexicon1
lexicon3
lexicon2
47
MILEthe building-block model
  • The MILE architecture is designed according to
    the building-block model
  • Lexical entries are obtained by combining various
    types of lexical objects (atomic and complex)
  • Users design their lexicon by
  • selecting and/or specifying the relevant lexical
    objects
  • combine the lexical objects into lexical entries
  • Lexical objects may be shared
  • within the same lexicon (intra-lexicon
    reusability)
  • among different lexicons (inter-lexicon
    reusability)

48
MILEthe building-block model
49
Modularity in MILE
multi-MILE
multilingual correspondence conditions
multiple levels of modularity
50
The Mono-MILE
  • Each monolingual layer within Mono-MILE
    identifies a basic unit of lexical description

SemU
basic unit to describe the semantic properties of
the MU
semantic layer
basic unit to describe the syntactic behavior of
the MU
SynU
syntactic layer
basic unit to describe the inflectional and
derivational morphological properties of the word
MU
morphological layer
51
The Mono-MILE
MU
52
Syntax-Semantics Linking
CorrespSynUSemU
53
Syntax-Semantics Linking
John gave the book to Mary John gave Mary the book
SynU1
SemU1
obj_NP
obl_PP_to
subj_NP
Semantic_FrameGIVE
Arg2 Theme
Arg3 Goal
Arg1 Agent
SynU2
obj_NP
obj_NP
subj_NP
54
The Multi-MILE
  • Open to various approaches to multilinguality
  • transfer-based
  • monolingual descriptions are used to state
    complex correspondences (tests and actions)
    between source and target entries
  • interlingua-based
  • monolingual entries linked to
    language-independent lexical objects (e.g.
    semantic frames, primitive predicates, etc.)

55
Multi-MILE
IT_SemU_2 ? En_SemU_1 IT_SynU_2 ?
En_SynU_1 IT_Slot_0 ?EN_Slot_1 IT_Slot_1 ?
EN_Slot_0
AddFeature to source SemU HUMAN
AddSlot to target SynU MODIF PP_with
56
Multi-MILE
IT Lexicon
EN Lexicon
multilingual conditions
finger
modif(mano)
dito
modif(piede)
toe
multilingual conditions
entrare to enter
run PP_into
PP_di_corsa
57
Defining the MLM
  • The MLM is designed as an E-R model (MILE Entry
    Schema)
  • defines the lexical objects and the ways they can
    be combined into a lexical entry
  • The MLM includes two types of lexical objects
  • MILE Lexical Classes (MLC)
  • MILE Lexical Data Categories (MDC)

58
MILE Lexical Classes
  • Represent the main building blocks of lexical
    entries
  • Define an ontology of lexical objects
  • represent lexical notions such as semantic unit,
    syntactic feature, syntactic frame, semantic
    predicate, semantic relation, synset, etc.
  • Similar to class definitions in OO languages
  • specify the relevant attributes
  • define the relations with other classes
  • hierarchically structured

59
MILE Lexical Classesan ontology of lexical
objects
60
MILE Lexical Data Categories
  • MDC are instances of the MILE lexical Classes
  • Each MDC respresents a resource
  • uniquely identified by a URI
  • Two types of MDC
  • Core MDC
  • belong to shared repositories (Lexical Data
    Category Registry)
  • lexical objects and linguistic notions with wide
    consensus
  • User Defined MLDC
  • user-specific or language specific lexical
    objects

61
MILE Lexical Data Categories
MLMFeature
MLMGrammaticalFunction
62
Defining the MLM
MILE Entry Schema
MILE Lexical Classes
RDF/S Descriptions
63
RDF Instantiation of the MLM
Lexicon2
Resources
Lexicon1
Lexicon3
Metadata
Lexical Objects
Resources
Lexical Classes
Lexical Data Categories
64
General Means
  • W3C standards
  • Resource Definition Framework (RDF/S)
  • Ontology Web Language (OWL)
  • Built on the XML web infrastructure to enable the
    creation of a Semantic Web
  • web objects are classified according to their
    properties
  • semantics of relations (links) to other web
    objects precisely defined

65
MILE Lexical Model
  • Ideal structure for rendering in RDF
  • hierarchy of lexical objects built up by
    combining atomic data categories via clearly
    defined relations
  • Proof of concept
  • Create an RDF schema for the MILE Lexical Model
  • version 1.2
  • Instantiate MILE Lexical Data Categories

66
The RDF Schema
  • Defines classes of objects (MLC) and their
    relations to other objects
  • Like a class definition in Java, etc.
  • Classes and properties in the schema correspond
    to the E-R model
  • Can specify sub-classes/sub-properties and
    inheritance

67
MILE Lexical Data Category Registry (MDC)
  • Instantiation of pre-defined lexical objects
  • Extension of the shared class schema with
    lexicon-specific sub-classes and sub-properties
  • Can be used off the shelf or as a departure
    point for the definition of new or modified
    categories
  • Enables modular specification of lexical entities
  • eliminate redundancy
  • identify lexical entries or sub-entries with
    shared properties

68
MLC in RDF/S features
features are properties of lexical objects
mlmLexObject
mlmValues
mlmfeature
rdfssubPropertyOf
rdfssubClassOf
mlmsemFeature
rdfssubClassOf
mlmSemValues
mlmsynFeature
mlmSynValues
69
MLC in RDF/S syntactic features
ltrdfsProperty rdfIDsynCat"gt ltrdfssubProperty
Of rdfresource"http//webilc.ilc.cnr.it/lenc
i/isle/mile- schema-v.1synFeature"/gt ltrdfsrang
e rdfresourcehttp//webilc.ilc.cnr.it/lenci/
isle/mile- schema-v.1SynCatValues/gt lt/rdfsProp
ertygt ltrdfsClass rdfIDSynCatValuesgt ltrdfss
ubClassOf rdfresourcehttp//webilc.ilc.cnr.it
/lenci/isle/mile- schema-v.1 SynValues/gt
ltowloneOf rdfparseType"Collection"gt ltowlThin
g rdfabout"Noun"/gt ltowlThing
rdfabout"Verb"/gt ltowlThing
rdfabout"Adjective"/gt ... lt/owloneOfgt
lt/rdfsClassgt lt/rdfsRDFgt
feature values
70
MLC in RDF/S semantic features
ltrdfsProperty rdfIDdomain"gt ltrdfssubProperty
Of rdfresource"http//webilc.ilc.cnr.it/lenc
i/isle/mile- schema-v.1semFeature"/gt ltrdfsrang
e rdfresourcehttp//webilc.ilc.cnr.it/lenci/
isle/mile- schema-v.1 DomainValues/gt lt/rdfsPro
pertygt ltrdfsClass rdfIDDomainValuesgt ltrdfs
subClassOf rdfresourcehttp//webilc.ilc
.cnr.it/lenci/isle/mile- schema-v.1SemValues/gt
ltowloneOf rdfparseType"Collection"gt ltowl
Thing rdfabout"Finance"/gt ltowlThing
rdfabout"Medicine"/gt ltowlThing
rdfabout"Sport"/gt ... lt/owloneOfgt
lt/rdfsClassgt lt/rdfsRDFgt
domain ontology
71
Synsets in RDF/S
mlmword
mlmSynset
rdfsliteral
mlmgloss
rdfsliteral
mlmfeature
mlmsynsetRelation
mlmValues
mlmSynset
cf. also http//www.semanticweb.org/library/wordne
t/wordnet-20000620.rdfs
72
Synsets in RDF/S
ltrdfsClass rdfID"Synset"gt ltrdfslabelgtSynsetlt/
rdfslabelgt ltrdfscommentgtThis class formalizes
the notion of synset as defined in WordNet
(Fellbaum 1998).lt/rdfscommentgt ltrdfssubClassOf
rdfresourceLexObject/gt lt/rdfsClassgt ltrdfsP
roperty rdfID"synsetRelation"gt ltrdfsdomain
rdfresource"Synset"/gt ltrdfsrange
rdfresource"Synset"/gt lt/rdfsPropertygt ltrdfsP
roperty rdfID"hypernym" mlmsource"WordNet1.7"gt
ltrdfscommentgtThe WordNet hypernym
relationlt/rdfscommentgt ltrdfssubPropertyOf
rdfresource"synsetRelation"/gt lt/rdfsPropertygt
ltrdfsProperty rdfID"meronym"
mlmsource"WordNet1.7"gt ltrdfscommentgtThe
WordNet meronym relationlt/rdfscommentgt ltrdfssub
PropertyOf rdfresource"synsetRelation"/gt lt/rdfs
Propertygt
relation between synsets
different types of synset relations
73
WordNet 1.7 Synsets
ltmlmSynset rdfabout"http//www.cogsci.prin
ceton.edu/wn1.7/concept01752990
mlmsource"WordNet1.7"gt ltmlmglossgtA member of
the genus Canislt/mlmglossgt ltmlmwordgtdoglt/mlmwo
rdgt ltmlmwordgtdomestic doglt/mlmwordgt ltmlmwordgt
Canis familiarislt/mlmwordgt ltmdcsynCat
rdfresource"Noun"/gt ltmdcdomain
rdfresource"Zoology"/gt ltmdchypernym rdfreso
urce"http//www.cogsci.princeton.edu/wn1.7/conce
pt 01752283"/gt lt/mlmSynsetgt
features
hypernym
74
Conclusions and Future Work
  • The MILE Lexical Model is oriented towards open,
    distributed lexical resources
  • Lexical Information Servers for multiple access
    to lexical information repositories
  • Enhance user-adaptivity and resource sharing
  • Develop integration and interchange tools
  • Promote interchange with the Semantic Web and
    Ontology communities
  • Related projects and initiatives
  • ISO, INTERA, ENABLER, etc.

75
Acknowledgements
S. Atkins, N. Bel, F. Bertagna, P. Bouillon, N.
Calzolari, C. Fellbaum, R. Grishman, N. Ide, M.
Palmer, W. Peters, G. Thurmair, M. Villegas, P.
Wittenburg, A. Zampolli and many others
Thank You !
Write a Comment
User Comments (0)
About PowerShow.com