From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning

Description:

I once had a canary. The bird got sick. The poor animal died. ... canary. abbey. crocodile. dog. basic level. concepts. balance of two principles: ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 97
Provided by: tri5244
Category:

less

Transcript and Presenter's Notes

Title: From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning


1
From WordNet, to EuroWordNet, to the Global
Wordnet Grid anchoring languages to universal
meaning
  • Piek Vossen
  • VU University Amsterdam

2
What kind of resource is wordnet?
  • Mostly used database in language technology
  • Enormous impact in language technology
    development
  • Large
  • Free and downloadable
  • English

3
WordNet
  • http//wordnet.princeton.edu/
  • Developed by George Miller and his team at
    Princeton University, as the implementation of a
    mental model of the lexicon
  • Organized around the notion of a synset a set of
    synonyms in a language that represent a single
    concept
  • Semantic relations between concepts
  • Covers over 117,000 concepts and over 150,000
    English words

4
Relational model of meaning
animal
kitten
man
boy
man
woman
cat
meisje
boy
girl
dog
puppy
woman
5
Wordnet a network of semantically related words
conveyancetransport
vehicle
armrest
car mirror
motor vehicle automotive vehicle
car door
doorlock
car auto automobile machine motorcar
bumper
hinge flexible joint
car window
cruiser squad car patrol car police car
prowl car
cab taxi hack taxicab
6
Wordnet Semantic Relations
WN 1.5 starting point The synset as a weak
notion of synonymy two expressions are
synonymous in a linguistic context C if the
substitution of one for the other in C does not
alter the truth value. (Miller et al.
1993) Relations between synsets Relation POS-co
mbination Example ANTONYMY adjective-to-adjectiv
e good/bad verb-to-verb open/
close HYPONYMY noun-to-noun car/
vehicle verb-to-verb walk/ move MERONYMY noun-
to-noun head/ nose ENTAILMENT verb-to-verb buy/
pay CAUSE verb-to-verb kill/ die
7
Wordnet Data Model
Vocabulary of a language
Concepts
Relations
  • rec 12345
  • financial institute

1
bank
rec 54321 - side of a river
2
rec 9876 - small string instrument
1
fiddle
violin
type-of
rec 65438 - musician playing violin
2
fiddler
violist
rec42654 - musician
type-of
rec35576 - string of instrument
1
part-of
string
rec29551 - underwear
2
rec25876 - string instrument
8
Some observations on Wordnet
  • synsets are more compact representations for
    concepts than word meanings in traditional
    lexicons
  • synonyms and hypernyms are substitutional
    variants
  • begin commence
  • I once had a canary. The bird got sick. The poor
    animal died.
  • hyponymy and meronymy chains are important
    transitive relations for predicting properties
    and explaining textual properties
  • object -gt artifact -gt vehicle -gt 4-wheeled
    vehicle -gt car
  • strict separation of part of speech although
    concepts are closely related (bed sleep) and
    are similar (dead death)
  • lexicalization patterns reveal important mental
    structures

9
Lexicalization patterns
entity
25 unique beginners
organism
object
garbage
threat
animal
artifact
plant
waste
building
tree
bird
flower
basic level concepts
canary
church
rose
crocodile
dog
  • balance of two principles
  • predict most features
  • apply to most subclasses
  • where most concepts are created
  • amalgamate most parts
  • most abstract level to draw a pictures

common canary
abbey
10
Wordnet top level
11
Meronymy pictures
beak
12
Meronymy pictures
13
Co-reference constraint in wordnetCats cannot
be a kind of cats
  • S (n) cat, true cat (feline mammal usually
    having thick soft fur and no ability to roar
    domestic cats wildcats)
  • S (n) guy, cat, hombre, bozo (an informal term
    for a youth or man) "a nice guy" "the guy's only
    doing it for some doll"
  • S (n) cat (a spiteful woman gossip) "what a cat
    she is!"
  • S (n) kat, khat, qat, quat, cat, Arabian tea,
    African tea (the leaves of the shrub Catha edulis
    which are chewed like tobacco or used to make
    tea has the effect of a euphoric stimulant) "in
    Yemen kat is used daily by 85 of adults"
  • S (n) cat-o'-nine-tails, cat (a whip with nine
    knotted cords) "British sailors feared the cat"
  • S (n) Caterpillar, cat (a large tracked vehicle
    that is propelled by two endless metal belts
    frequently used for moving earth in construction
    and farm work)
  • S (n) big cat, cat (any of several large cats
    typically able to roar and living in the wild)
  • S (n) computerized tomography, computed
    tomography, CT, computerized axial tomography,
    computed axial tomography, CAT (a method of
    examining body organs by scanning them with X
    rays and using a computer to construct a series
    of cross-sectional scans along a single axis)
  • S (n) domestic cat, house cat, Felis domesticus,
    Felis catus (any domesticated member of the genus
    Felis)

14
(No Transcript)
15
Wordnet 3.0 statistics
16
Wordnet 3.0 statistics
17
Wordnet 3.0 statistics
18
http//www.visuwords.com
19
(No Transcript)
20
Usage of Wordnet
  • Improve recall of textual based analysis
  • Query -gt Index
  • Synonyms commence begin
  • Hypernyms taxi -gt car
  • Hyponyms car -gt taxi
  • Meronyms trunk -gt elephant
  • Lexical entailments gun -gt shoot
  • Inferencing
  • what things can burn?
  • Expression in language generation and
    translation
  • alternative words and paraphrases

21
Improve recall
  • Information retrieval
  • small databases without redundancy, e.g. image
    captions, video text
  • Text classification
  • small training sets
  • Question Answer systems
  • query analysis who, whom, where, what, when

22
Improve recall
  • Anaphora resolution
  • The girl fell off the table. She....
  • The glass fell of the table. It...
  • Coreference resolution
  • When he moved the furniture, the antique table
    got damaged.
  • Information extraction (unstructed text to
    structured databases)
  • generic forms or patterns "vehicle" - gt text with
    specific cases "car"

23
Improve recall
  • Summarizers
  • Sentence selection based on word counts -gt
    concept counts
  • Avoid repetition in summary -gt language
    generation
  • Limited inferencing detect locations,
    organisations, etc.

24
Many others
  • Data sparseness for machine learning hapaxes can
    be replaced by semantic classes
  • Use redundancy for more robustness spelling
    correction and speech recognition can built
    semantic expectations using Wordnet and make
    better choices
  • Sentiment and opinion mining
  • Natural language learning

25
Recall Precision
jail
cell phone
mobile phones
nerve cell police cell
neuron
recall doorsnede / relevant precision
doorsnede / gevonden
Recall lt 20 for basic search engines! (Blair
Maron 1985)?
26
EuroWordNet
  • The development of a multilingual database with
    wordnets for several European languages
  • Funded by the European Commission, DG XIII,
    Luxembourg as projects LE2-4003 and LE4-8328
  • March 1996 - September 1999
  • 2.5 Million EURO.
  • http//www.hum.uva.nl/ewn
  • http//www.illc.uva.nl/EuroWordNet/finalresults-ew
    n.html

27
EuroWordNet
  • Languages covered
  • EuroWordNet-1 (LE2-4003) English, Dutch,
    Spanish, Italian
  • EuroWordNet-2 (LE4-8328) German, French, Czech,
    Estonian.
  • Size of vocabulary
  • EuroWordNet-1 30,000 concepts - 50,000 word
    meanings.
  • EuroWordNet-2 15,000 concepts- 25,000 word
    meaning.
  • Type of vocabulary
  • the most frequent words of the languages
  • all concepts needed to relate more specific
    concepts

28
EuroWordNet Model
II
II
Inter-Lingual-Index
I Language Independent link II Link from
Language Specific to Inter lingual
Index III Language Dependent Link
29
EuroWordNet Design
30
Differences in relations between EuroWordNet and
WordNet
  • Added Features to relations
  • Cross-Part-Of-Speech relations
  • New relations to differentiate shallow
    hierarchies
  • New interpretations of relations

31
EWN Relationship Labels
Disjunction/Conjunction of multiple relations of
the same type WordNet1.5 door1 -- (a swinging or
sliding barrier that will close the entrance to a
room or building "he knocked on the door" "he
slammed the door as he left") PART OF doorway,
door, entree, entry, portal, room access door 6
-- (a swinging or sliding barrier that will close
off access into a car "she forgot to lock the
doors of her car") PART OF car, auto,
automobile, machine, motorcar.
32
EWN Relationship Labels
airplane HAS_MERO_PART conj1
door HAS_MERO_PART conj2 disj1 jet
engine HAS_MERO_PART conj2 disj2 propeller
door HAS_HOLO_PART disj1 car HAS_HOLO_PAR
T disj2 room HAS_HOLO_PART disj3
entrance dog HAS_HYPERONYM
conj1 mammal HAS_HYPERONYM
conj2 pet albino HAS_HYPERONYM
disj1 plant HAS_HYPERONYM
disj2 animal Default Interpretation
non-exclusive disjunction
33
EWN Relationship Labels
Factive/Non-factive CAUSES (Lyons 1977) factive
(default interpretation) to kill causes to
die kill CAUSES die non-factive E1
probably or likely causes event E2 or E1 is
intended to cause some event E2 to search
may cause to find. search CAUSES find
non-factive
34
Cross-Part-Of-Speech relations
WordNet1.5 nouns and verbs are not interrelated
by basic semantic relations such as hyponymy and
synonymy adornment 2 ?change of state-- (the
act of changing something) adorn 1 ?change,
alter-- (cause to change make different) EuroWor
dNet words of different parts of speech can be
inter-linked with explicit xpos-synonymy,
xpos-antonymy and xpos-hyponymy
relations adorn V XPOS_NEAR_SYNONYM adornmen
t N size N XPOS_NEAR_HYPONYM tall
A short A
35
Role relations
In the case of many verbs and nouns the most
salient relation is not the hyperonym but the
relation between the event and the involved
participants. These relations are expressed as
follows knife ROLE_INSTRUMENT to cut to
cut INVOLVED_INSTRUMENT knife reversed schoo
l ROLE_LOCATION to teach to
teach INVOLVED_LOCATION school reversed T
hese relations are typically used when other
relations, mainly hyponymy, do not clarify the
position of the concept network, but the word is
still closely related to another word.
36
Co_Role relations
guitar player HAS_HYPERONYM player CO_AGENT_INS
TRUMENT guitar player HAS_HYPERONYM person ROL
E_AGENT to play music CO_AGENT_INSTRUMENT musi
cal instrument to play music HAS_HYPERONYM to
make ROLE_INSTRUMENT musical
instrument guitar HAS_HYPERONYM musical
instrument CO_INSTRUMENT_AGENT guitar
player ice saw HAS_HYPERONYM saw CO_INSTRUMENT
_PATIENT ice saw HAS_HYPERONYM saw ROLE_INSTRU
MENT to saw ice CO_PATIENT_INSTRUMENT ice saw
REVERSED
37
Co_Role relations
Examples of the other relations
are criminal CO_AGENT_PATIENT victim novel
writer/ poet CO_AGENT_RESULT novel/
poem dough CO_PATIENT_RESULT pastry/
bread photograpic camera CO_INSTRUMENT_RESULT phot
o
38
Overview of the Language Internal relations in
EuroWordnet
Same Part of Speech relations NEAR_SYNONYMY app
aratus - machine HYPERONYMY/HYPONYMY car -
vehicle ANTONYMY open - close HOLONYMY/MERONY
MY head - nose Cross-Part-of-Speech
relations XPOS_NEAR_SYNONYMY dead - death to
adorn - adornment XPOS_HYPERONYMY/HYPONYMY to
love - emotion XPOS_ANTONYMY to live -
dead CAUSE die - death SUBEVENT buy -
pay sleep - snore ROLE/INVOLVED write -
pencil hammer - hammer STATE the poor -
poor MANNER to slurp - noisily
BELONG_TO_CLASS Rome - city
39
Horizontal vertical semantic relations
chronical patient mental patient
?-PATIENT
HYPONYM
cure
patient
?-CAUSE
docter
treat
HYPONYM
?-AGENT
?-PATIENT
STATE
child docter
?-LOCATION
?-PROCEDURE
co-?- AGENT-PATIENT
disease disorder
HYPONYM
physiotherapy medicine etc.
hospital, etc.
stomach disease, kidney disorder,
child
40
The Multilingual Design
  • Inter-Lingual-Index unstructured fund of
    concepts to provide an efficient mapping across
    the languages
  • Index-records are mainly based on WordNet synsets
    and consist of synonyms, glosses and source
    references
  • Various types of complex equivalence relations
    are distinguished
  • Equivalence relations from synsets to index
    records not on a word-to-word basis
  • Indirect matching of synsets linked to the same
    index items

41
Equivalent Near Synonym
  • 1. Multiple Targets (1many)
  • Dutch wordnet schoonmaken (to clean) matches
    with 4 senses of clean in WordNet1.5
  • make clean by removing dirt, filth, or unwanted
    substances from
  • remove unwanted substances from, such as
    feathers or pits, as of chickens or fruit
  • remove in making clean "Clean the spots off the
    rug"
  • remove unwanted substances from - (as in
    chemistry)
  • 2. Multiple Sources (many1)
  • Dutch wordnet versiersel near_synonym
    versiering ILI-Record decoration.
  • 3. Multiple Targets and Sources (manymany)
  • Dutch wordnet toestel near_synonym
    apparaat ILI-records machine device apparatus
    tool

42
Equivalent Hyperonymy
  • Typically used for gaps in English WordNet
  • genuine, cultural gaps for things not known in
    English culture
  • Dutch klunen, to walk on skates over land from
    one frozen water to the other
  • pragmatic, in the sense that the concept is known
    but is not expressed by a single lexicalized form
    in English
  • Dutch kunststof artifact substance ltgt
    artifact object

43
Equivalent Hyponymy
  • has_eq_hyponym
  • Used when wordnet1.5 only provides more narrow
    terms. In this case there can only be a pragmatic
    difference, not a genuine cultural gap, e.g.
    Spanish dedo either finger or toe.

44
Complex mappings across languages
EN-Net
IT-Net

toe
dito



toe
part of foot
finger


finger

part of hand
head

dedo
dito


,



finger or toe


head
part of body

NL-Net
ES-Net
hoofd


human head



kop


animal head
dedo
hoofd

kop
45
Typical gaps in the (English) ILI
  • Dutch
  • doodschoppen (to kick to death)
  • eq_hyperonym killV and to kickV
  • aardig (Adjective, to like)
  • eq_near_synonym likeV
  • cassière (female cashier)
  • eq_hyperonym cashier, woman
  • kunstproduct (artifact substance)
  • eq_hyperonym artifact and to product
  • Spanish
  • alevín (young fish)
  • eq_hyperonym fish and eq_be_in_state young
  • cajera (female cashier)
  • eq_hyperonym cashier, woman

46
Wordnets as semantic structures
  • Wordnets are unique language-specific structures
  • different lexicalizations
  • differences in synonymy and homonymy
  • different relations between synsets
  • same organizational principles synset structure
    and same set of semantic relations.
  • Language independent knowledge is assigned to the
    ILI and can thus be shared for all language
    linked to the ILI both an ontology and domain
    hierarchy

47
Autonomous Language-Specific
Wordnet1.5
Dutch Wordnet
voorwerp object
blok block
lichaam body
werktuigtool
lepel spoon
tas bag
bak box
48
Linguistic versus Artificial Ontologies
  • Artificial ontology
  • better control or performance, or a more compact
    and coherent structure.
  • introduce artificial levels for concepts which
    are not lexicalized in a language (e.g.
    instrumentality, hand tool),
  • neglect levels which are lexicalized but not
    relevant for the purpose of the ontology (e.g.
    tableware, silverware, merchandise).
  • What properties can we infer for spoons?
  • spoon -gt container artifact hand tool object
    made of metal or plastic for eating, pouring or
    cooking

49
Linguistic versus Artificial Ontologies
  • Linguistic ontology
  • Exactly reflects the relations between all the
    lexicalized words and expressions in a language.
  • Captures valuable information about the lexical
    capacity of languages what is the available fund
    of words and expressions in a language.
  • What words can be used to name spoons?
  • spoon -gt object, tableware, silverware,
    merchandise, cutlery,

50
Wordnets versus ontologies
  • Wordnets
  • autonomous language-specific lexicalization
    patterns in a relational network.
  • Usage to predict substitution in text for
    information retrieval,
  • text generation, machine translation,
    word-sense-disambiguation.
  • Ontologies
  • data structure with formally defined concepts.
  • Usage making semantic inferences.

51
Sharing world knowledge
  • All wordnets in the world can be linked to the
    same ontology
  • All wordnets in the world can be linked to the
    same thesaurus

52
Wordnet Domain information
Relations
Concepts
Vocabularies of languages
1
  • rec 12345
  • financial institute

rec 54321 - river side
2
bank
1
rec 9876 - small string instrument
violin
2
rec 65438 - musician playing a violin
violist
rec42654 - musician
type-of
1
rec35576 - string of an instrument
type-of
part-of
string
2
rec29551 - underwear
rec25876 - string instrument
53
How to harmonize wordnets?
  • Wordnets are unique language-specific
    lexicalizations patterns
  • Define universal sets of concepts that play a
    major role in many different wordnets so-called
    Base Concepts
  • Define base concepts in each language wordnet
  • High level in the hierarchy
  • Many hyponyms
  • Provide the closest equivalent in English wordnet
  • Determine the intersection of English
    equivalences

54
Lexicalization patterns
entity
25 unique beginners
organism
object
garbage
threat
animal
artifact
plant
1024 base concepts
building
tree
bird
flower
basic level concepts
canary
church
rose
crocodile
dog
common canary
abbey
55
Base Concept Intersection

human 1 individual1 mortal1 person1
someone1 soul1 animal 1 animate being1
beast1 brute1 creature1 fauna1 flora 1
plant1 plant life1 matter 1
substance1 food 1 nutrient1 feeling
1 act 1 human action1 human activity1
cause 6 get9 have7 induce2 make12
stimulate3 create 2 make13 go 14
locomote1 move15 travel4 be 4 have the
quality of being1
56
Explanations for low intersection of Base Concepts
  • The individual selections are not representative
    enough.
  • There are major differences in the way meanings
    are classified, which have an effect on the
    frequency of the relations.
  • The translations of the selection to WordNet1.5
    synsets are not reliable
  • The resources cover very different vocabularies

57
Concepts selected by at least two languages
intersections of pairs
58
Common Base Concepts
59
Table 4 Number of Common BCs represented in the
local wordnets Related to CBCs Eq_synonym Eq_nea
r CBCs Without Direct Equivalent NL 992 7
25 269 97 ES 1012 1009 0 15 IT 878 759 19
1 9 Table 5 BC4 Gaps in at least two
wordnets (10 synsets) body covering1 mental
object1 cognitive content1 content2 body
substance1 natural object1 social
control1 place of business1 business
establishment1 change of magnitude1 plant
organ1 contractile organ1 plant
part1 psychological feature1 spatial
property1 spatiality1
60
Table 6 Local senses with complex equivalence
relations to CBCs NL ES IT Eq_has_hyperonym
61 40 4 eq_has_hyponym 34 14 20 Eq_has_holonym
2 0 Eq_has_meronym 3 2 Eq_involved 3 Eq
_is_caused_by 3 Eq_is_state_of 1 Example
of complex relation CBC cause to feel
unwell1, Verb Closest Dutch concept onwel1,
Adjective (sick) Equivalence relation
eq_is_caused_by
61
EuroWordNet data
62
From EuroWordNet to Global WordNet
  • Currently, wordnets exist for more than 50
    languages, including
  • Arabic, Bantu, Basque, Chinese, Bulgarian,
    Estonian, Hebrew, Icelandic, Japanese, Kannada,
    Korean, Latvian, Nepali, Persian, Romanian,
    Sanskrit, Tamil, Thai, Turkish, Zulu...
  • Many languages are genetically and typologically
    unrelated
  • http//www.globalwordnet.org

63
Global Wordnet Association
EuroWordNet
BalkaNet
  • Arabic
  • Polish
  • Welsh
  • Chinese
  • 20 Indian Languages
  • Brazilian Portuguese
  • Hebrew
  • Latvian
  • Persian
  • Kurdish
  • Avestan
  • Baluchi
  • Hungarian
  • Danish
  • Norway
  • Swedish
  • Portuguese
  • Korean
  • Russian
  • Basque
  • Catalan
  • Thai
  • Romanian
  • Bulgarian
  • Turkish
  • Slovenian
  • Greek
  • Serbian
  • English
  • German
  • Spanish
  • French
  • Italian
  • Dutch
  • Czech
  • Estonian

http//www.globalwordnet.org
64
Some downsides of the EuroWordnet model
  • Construction is not done uniformly
  • Coverage differs
  • Not all wordnets can communicate with one another
  • Proprietary rights restrict free access and usage
  • A lot of semantics is duplicated
  • Complex and obscure equivalence relations due to
    linguistic differences between English and other
    languages

65
Next step Global WordNet Grid
Inter-Lingual Ontology
voertuig
1
auto
trein
Object
2
liiklusvahend
Dutch Words
1
Device
auto
killavoor
TransportDevice
2
véhicule
Estonian Words
1
voiture
train
2
dopravní prostredník
French Words
1
auto
vlak
2
Czech Words
66
GWNG Main Features
  • Construct separate wordnets for each Grid
    language
  • Contributors from each language encode the same
    core set of concepts plus culture/language-specifi
    c ones
  • Synsets (concepts) can be mapped
    crosslinguistically via an ontology

67
The Ontology Main Features
  • Formal ontology serves as universal index of
    concepts
  • List of concepts is not just based on the lexicon
    of a particular language (unlike in EuroWordNet)
    but uses ontological observations
  • Ontology contains only upper and mid-level
    concepts
  • Concepts are related in a type hierarchy
  • Concepts are defined with axioms

68
The Ontology Main Features
  • In addition to high-level (primitive) concept
    ontology needs to express low-level concepts
    lexicalized in the Grid languages
  • Additional concepts can be defined with
    expressions in Knowledge Interchange Format (KIF)
    based on first order predicate calculus and
    atomic element

69
The Ontology Main Features
  • Minimal set of concepts (Reductionist view)
  • to express equivalence across languages
  • to support inferencing
  • Ontology must be powerful enough to encode all
    concepts that are lexically expressed in any of
    the Grid languages
  • Ontology need not and cannot provide a linguistic
    encoding for all concepts found in the Grid
    languages
  • Lexicalization in a language is not sufficient to
    warrant inclusion in the ontology
  • Lexicalization in all or many languages may be
    sufficient
  • Ontological observations will be used to define
    the concepts in the ontology

70
Ontological observations
  • Identity criteria as used in OntoClean (Guarino
    Welty 2002),
  • rigidity to what extent are properties true for
    entities in all worlds? You are always a human,
    but you can be a student for a short while.
  • essence what properties are essential for an
    entity? Shape is essential for a statue but not
    for the clay it is made of.
  • unicity what represents a whole and what
    entities are parts of these wholes? An ocean is a
    whole but the water it contains is not.

71
Type-role distinction
  • Current WordNet treatment
  • (1) a husky is a kind of dog(type)
  • (2) a husky is a kind of working dog (role)
  • Whats wrong?
  • (2) is defeasible, (1) is not
  • This husky is not a dog
  • This husky is not a working dog
  • Other roles watchdog, sheepdog, herding dog,
    lapdog, etc.

72
Ontology and lexicon
  • Hierarchy of disjunct types
  • Canine ? PoodleDog NewfoundlandDog
    GermanShepherdDog Husky
  • Lexicon
  • NAMES for TYPES
  • poodleEN, poedelNL, pudoruJP
  • ((instance x Poodle)
  • LABELS for ROLES
  • watchdogEN, waakhondNL, bankenJP
  • ?((instance x Canine) and (role x
    GuardingProcess))

73
Ontology and lexicon
  • Hierarchy of disjunct types
  • River Clay etc
  • Lexicon
  • NAMES for TYPES
  • riverEN, rivier, stroomNL
  • ((instance x River)
  • LABELS for dependent concepts
  • rivierwaterNL (water from a river gt water is
    not a unit)
  • kleibrokNL (irregularly shared piece of
    claygtnon-essential)
  • ?((instance x water) and (instance y River) and
    (portion x y)
  • ?((instance x Object) and (instance y Clay) and
    (portion x y) and (shape X Irregular))

74
Rigidity
  • The primitive concepts represented in the
    ontology are rigid types
  • Entities with non-rigid properties will be
    represented with KIF statements
  • But ontology may include some universal, core
    concepts referring to roles like father, mother

75
Properties of the Ontology
  • Minimal terms are distinguished by essential
    properties only
  • Comprehensive includes all distinct concepts
    types of all Grid languages
  • Allows definitions via KIF of all lexemes that
    express non-rigid, non-essential properties of
    types
  • Logically valid, allows inferencing

76
Mapping Grid Languages onto the Ontology
  • Explicit and precise equivalence relations among
    synsets in different languages
  • type hierarchy is minimal
  • subtle differences can be encoded in KIF
    expressions
  • Grid database contains wordnets with synsets that
    label
  • --either primitive types in the hierarchies,
  • --or words relating to these types in ways made
    explicit in KIF expressions
  • If 2 lgs. create the same KIF expression, this is
    a statement of equivalence!

77
How to construct the GWNG
  • Take an existing ontology as starting point
  • Use English WordNet to maximize the number of
    disjunct types in the ontology
  • Link English WordNet synsets as names to the
    disjunct types
  • Provide KIF expressions for all other English
    words and synsets
  • Copy the relation to the ontology to other
    languages, including KIF statements built for
    English
  • Revise KIF statements to make the mapping more
    precise
  • Map all words and synsets that are and cannot be
    mapped to English WordNet to the ontology
  • propose extensions to the type hierarchy
  • create KIF expressions for all non-rigid concepts

78
Initial Ontology SUMO (Niles and Pease)
  • SUMO Suggested Upper Merged Ontology
  • --consistent with good ontological practice
  • --fully mapped to WordNet(s) 1000 equivalence
    mappings, the rest through subsumption
  • --freely and publicly available
  • --allows data interoperability
  • --allows NLP
  • --allows reasoning/inferencing

79
SUMO
  • 1,000 generic, abstract, high-level terms
  • 4,000 definitional statements
  • MILO (Mid-Level Ontology)
  • closer to lexicon, WordNet

80
Mapping Grid languages onto the Ontology
  • Check existing SUMO mappings to Princeton WordNet
    -gt extend the ontology with rigid types for
    specific concepts
  • Extend it to many other WordNet synsets
  • Observe OntoClean principles! (Synsets referring
    to non-rigid, non-essential, non-unicitous
    concepts must be expressed in KIF)

81
Lexicalizations not mapped to WordNet
  • Not added to the type hierarchy
  • straathondNL (a dog that lives in the streets)
  • ((instance x Canine) and (habitat x Street))
  • Added to the type hierarchy
  • klunenNL (to walk on skates from one frozen
    body to the next over land)
  • WalkProcess ? KluunProcess
  • Axioms
  • (and (instance x Human) (instance y Walk)
    (instance z Skates) (wear x z) (instance s1
    Skate) (instance s2 Skate) (before s1 y) (before
    y s2) etc
  • National dishes, customs, games,....

82
Most mismatching concepts are not new types
  • Refer to sets of types in specific circumstances
    or to concept that are dependent on these types,
    next to rivierwaterNL there are many other
  • theewaterNL (water used for making tea)
  • koffiewaterNL (water used for making coffee)
  • bluswaterNL (water used for making
    extinguishing file)
  • Relate to linguistic phenomena
  • gender, perspective, aspect, diminutives,
    politeness, pejoratives, part-of-speech
    constraints

83
KIF expression for gender marking
  • teacherEN
  • ?((instance x Human) and (agent x
    TeachingProcess))
  • LehrerDE ?((instance x Man) and (agent x
    TeachingProcess))
  • LehrerinDE ?((instance x Woman) and (agent x
    TeachingProcess))

84
KIF expression for perspective
  • sell subj(x), direct obj(z),indirect obj(y)
  • versus
  • buy subj(y), direct obj(z),indirect obj(x)
  • ?(and (instance x Human)(instance y Human)
    (instance z Entity) (instance e
    FinancialTransaction) (source x e) (destination y
    e) (patient e)
  • The same process but a different perspective by
    subject and object realization marry in Russian
    two verbs, apprendre in French can mean teach and
    learn

85
Aspectual variants
  • Slavic languages two members of a verb pair for
    an ongoing event and a completed event.
  • English can mark perfectivity with particles,
    as in the phrasal verbs eat up and read through.
  • Romance languages mark aspect by verb
    conjugations on the same verb.
  • Dutch, verbs with marked aspect can be created by
    prefixing a verb with door doorademen, dooreten,
    doorfietsen, doorlezen, doorpraten (continue to
    breathe/eat/bike/read/talk).
  • These verbs are restrictions on phases of the
    same process
  • Does NOT warrant the extension of the ontology
    with separate processes for each aspectual variant

86
Kinship relations in Arabic
  • ???(Eam) father's brother, paternal uncle.
  • ???? (xaAl) mother's brother, maternal uncle.
  • ?????? (Eamap) father's sister, paternal aunt.
  • ?????? (xaAlap) mother's sister, maternal aunt

87
Kinship relations in Arabic
  • .........
  • ???????? (aqiyqapfull) sister, sister on the
    paternal and maternal side (as distinct from
    ????? (gtuxot) 'sister' which may refer to a
    'sister' from paternal or maternal side, or both
    sides).
  • ??????? (vakolAna) father bereaved of a child (as
    opposed to ?????? (yatiym) or ????????
    (yatiymap) for feminine 'orphan' a person whose
    father or mother died or both father and mother
    died).
  • ??????? (vakolaYa) other bereaved of a child (as
    opposed to ?????? or ???????? for feminine
    'orphan' a person whose father or mother died or
    both father and mother died).

88
Complex Kinship concepts
  • father's brother, paternal uncle
  • WORDNET
  • paternal uncle gt uncle
  • gt brother of ....????
  • ONTOLOGY
  • (gt
  • (paternalUncle ?P ?UNC)
  • (exists (?F)
  • (and
  • (father ?P ?F)
  • (brother ?F ?UNC))))

89
Universality as evidence
  • English verb cut abstracts from the precise
    process but there are troponyms that implicate
    the manner
  • snip, clip imply scissors, chop and hack a large
    knife or an axe
  • Dutch there is no general verb but only specific
    verbs
  • knippen clip, snip, cut with scissors or a
    scissor-like tool', snijden cut with a knife or
    knife-like tool, hakken chop, hack, to cut with
    an axe, or similar tool).
  • If lexicalization of the specific process is more
    universal it can be seen as evidence that the
    specific processes should be listed in the
    ontology and not the generic verb

90
Open Questions/Challenges
  • What is a word, i.e., a lexical unit?
  • What is the status of complex lexemes like
    English lightning rod, word of mouth, find out,
    kick the bucket?
  • What is a semantic unit, i.e. a concept?

91
Open Questions/Challenges
  • Is there a core inventory of concepts that are
    universally encoded?
  • If so, what are these concepts?
  • How can crosslinguistic equivalence be verified?
  • Is there systematicity to the language-specific
    extensions?
  • What are the lexicalization patterns of
    individual languages?
  • Are lexical gaps accidental or systematic?

92
Coverage what belongs in a universal lexical
database?
  • Formal, linguistic criteria for inclusion
  • Informal, cultural criteria
  • Both are difficult to define and apply!

93
Advantages of the Global Wordnet Grid
  • Shared and uniform world knowledge
  • universal inferencing
  • uniform text analysis and interpretation
  • More compact and less redundant databases
  • More clear notion how languages map to the
    knowledge
  • better criteria for expressing knowledge
  • better criteria for understanding variation

94
Expansion with pure hyponymy relations
dog
hunting dog
puppy
dachshund
lapdog
poodle
bitch
street dog
watchdog
short hair dachshund
long hair dachshund
Expansion from a type to roles
95
Expansion with pure hyponymy relations
dog
hunting dog
puppy
dachshund
lapdog
poodle
bitch
street dog
watchdog
short hair dachshund
long hair dachshund
Expansion from a role to types and other roles
96
Automotive ontology (http//www.ontoprise.de)
97
Who uses ontologies?
98
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com