Ontologies and Ontology Learning from Text - PowerPoint PPT Presentation

Loading...

PPT – Ontologies and Ontology Learning from Text PowerPoint presentation | free to download - id: 3c430a-MmZjM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Ontologies and Ontology Learning from Text

Description:

Ontologies and Ontology Learning from Text Philipp Cimiano HCI Postgraduate Research School Aalborg, 25.04.2006 What to expect? Overview of OL state-of-the-art Focus ... – PowerPoint PPT presentation

Number of Views:474
Avg rating:3.0/5.0
Slides: 201
Provided by: peopleAi
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Ontologies and Ontology Learning from Text


1
Ontologies and Ontology Learning from Text
  • Philipp Cimiano
  • HCI Postgraduate Research School
  • Aalborg, 25.04.2006

2
What to expect?
  • Overview of OL state-of-the-art
  • Focus on Breadth
  • But I will pick out some details
  • Emphasis on own work
  • Evaluation of OL approaches
  • Discussion of implementations tools
  • Discussion of applications
  • Lets keep it interactive!

3
Related Material
  • P. Buitelaar, P. Cimiano Tutorial on Ontology
    Learning from Text at EACL06 Trento.
  • P. Buitelaar, P. Cimiano, M. Sintek, M.Grobelnik
    Tutorial on Ontology Learning from Text at
    ECML/PKDD06, Porto
  • P. Buitelaar, P. Cimiano, B. Magnini Ontology
    Learning from Text Methods, Evaluation and
    Applications, IOS Press, 2005.
  • P. Cimiano Ontology Learning and Population from
    Text Algorithms, Evaluation and Applications,
    Springer Verlag, to appear (2006)

4
Agenda
  • Ontologies
  • Motivation
  • Ontology Learning
  • Layer Cake
  • Term Extraction
  • Concept Hierarchies
  • Relations
  • Applications
  • Conclusion

5
Ontologies
  • Computers are essentially symbol-manipulating
    machines.
  • For applications in which meaning is shared
    between parties, ontologies play a crucial role.
  • Ontologies fix the interpretation of symbols
    w.r.t some semantics (typically model-theoretic)
  • Ontologies are formal specifications of a shared
    conceptualization of a certain domain Gruber 93.

6
SW Ontology languages
  • Nowadays, there are different ontology languages
  • DAML OIL
  • RDF(S)
  • OWL
  • F-Logic
  • Essentially, they provide
  • Taxonomic organization of concepts
  • Relations between concepts (with type and
    cardinality constraints)
  • Instantiation relations

7
Ontologies in Philosophy
  • A Branch of Philosophy that Deals with the Nature
    and Organization of Reality
  • Science of Being (Aristotle, Metaphysics)
  • What Characterizes Being?
  • Eventually, what is Being?

8
Ontologies in Computer Science
  • Ontology refers to an engineering artifact
  • a specific vocabulary used to describe a certain
    reality
  • a set of explicit assumptions regarding the
    intended meaning of the vocabulary
  • An Ontology is
  • an explicit specification of a conceptualization
    Gruber 93
  • a shared understanding of a domain of interest
    Uschold and Gruninger 96

9
Why Develop an Ontology?
  • Make domain assumptions explicit
  • Easier to exchange domain assumptions
  • Easier to understand and update legacy data
  • Separate domain knowledge from operational
    knowledge
  • Re-use domain and operational knowledge
    separately
  • A community reference for applications
  • Shared understanding of what information means

10
Applications of Ontologies
  • NLP
  • Information Extraction, e.g. Buitelaar et al.
    06, Stevenson et al. 05, Mädche et al. 02
  • Information Retrieval (Semantic Search), e.g.
    WebKB Martin et al. 00, SHOE Hendler et al.
    00, OntoSeek Guarino et al. 99
  • Question Answering, e.g. Sinha and Narayanan
    05, Schlobach et al. 04, Aqualog Lopez and
    Motta 04, Pasca and Harabagiu 01
  • Machine Translation, e.g. Nirenburg et al. 04,
    Beale et al. 95, Hovy and Nirenburg 92,
    Knight 93
  • Other
  • Business Process Modeling, e.g. Uschold et al.
    98
  • Information Integration, e.g. Kashyap 99,
    Wiederhold 92
  • Knowledge Management (incl. Semantic Web), e.g.
    Fensel 01, Mulholland et al. 2001, Staab and
    Schnurr 00, Sure et al. 00, Abecker et al.
    97
  • Software Agents, e.g. Gluschko et al. 99,
    Smith and Poulter 99
  • User Interfaces, e.g. Kesseler 96

11
Example Semantic Image Retrieval
  • E.g. Give me images with a ball on a table.
  • State-of-the-art ask Google Images for ball on
    table
  • Semantic Web specify what you want precisely
  • FORALL X lt- Ximage AND EXISTS B,T Xcontains -gt
    B AND Xcontains -gt T AND Bball and Ttable
    and BlocatedOn -gt T.

12
Representation, Acquisition, and Mapping of
Personal Information Models is at the heart of KM
Research
13
Information Integration
DB1
DB2
DBn
....
?X employee(X) worksFor(X,salesDep)
14
Mapping in Distributed Systems
P1composer(X,Y) lt- P2author(X,Y)
?X P1title(X) P1composer(X,Mozart)
P2
P1
P4
P3
P5
15
Types of Ontologies Guarino 98
16
Ontologies and Their Relatives
17
Ontologies and Their Relatives (Contd)
18
Thesauri - Examples
MeSH Heading Databases, Genetic Entry Term
Genetic Databases Entry Term Genetic Sequence
Databases Entry Term OMIM Entry Term Online
Mendelian Inheritance in Man Entry Term Genetic
Data Banks Entry Term Genetic Data Bases Entry
Term Genetic Databanks Entry Term Genetic
Information Databases See Also Genetic Screening
MeSH (Medical Subject Headings) is organized by
terms (currently over 250,000) that correspond to
a specific medical subject. For each such term a
list of syntactic, morphological or semantic
variants is given.
MT 3606 natural and applied sciences UF gene
pool genetic resource genetic
stock genotype heredity BT1 biology BT2 life
sciences NT1 DNA NT1 eugenics RT genetic
engineering (6411)
EuroVoc covers terminology in all of the official
EU languages for all fields that concern the EU
institutions, e.g., politics, trade, law,
science, energy, agriculture, 27 such fields in
total.
19
Semantic Networks - Examples
Pharmacologic Substance affects Pathologic
Function Pharmacologic Substance causes
Pathologic Function Pharmacologic Substance
complicates Pathologic Function Pharmacologic
Substance diagnoses Pathologic
Function Pharmacologic Substance prevents
Pathologic Function Pharmacologic Substance
treats Pathologic Function
UMLS (Unified Medical Language System) integrates
linguistic, terminological and semantic
information. The Semantic Network consists of 134
semantic types and 54 relations between types.
20
Example Geographical Ontology
GE
is-a
flow_through
Inhabited GE
Natural GE
located_in
city
country
river
mountain
instance_of
has_capital
Neckar
Zugspitze
Germany
capital
located_in
length (km)
height (m)
flow_through
has_capital
Berlin
Stuttgart
367
2962
flow_through
21
Mathematical Definition Stumme et al 2003
  • Structure
  • C set of concept identifiers
  • R set of relation identifiers
  • ltC partial order on C (concept hierarchy)
  • ltR partial order on R (relation hierarchy)
  • Signature
  • Mathematical definition of extension of concepts
    c and relations r
  • Axiom System, e.g.

22
But to be honest...
  • There are not much (real) ontologies around
  • Most SW Ontologies are RDFSed thesauri!
  • Most people dont think model-theoretically!
  • So we have to live with
  • Linguistic Ontologies like WordNet
  • Thesauri
  • Automatically Learned Thesauri/Taxonomies/Ontologi
    es

23
NLP Applications
  • Ontologies in NLP Applications
  • Information Retrieval Query Expansion
  • Information Extraction Template Definition,
    Semantic Integration
  • Question Answering Question Analysis, Answer
    Selection
  • Machine Translation Interlingua
  • Summarization Semantic Graphs

24
Information Extraction
  • Class-based Template Definition
  • Allows for Reasoning over Extracted Templates
    with Respect to the Ontology (see e.g. Nedellec
    and Nazarenko 05 for discussion)
  • Rule Induction
  • Discovering Semantically Similar Patterns (e.g.
    Unsupervised Approach w.r.t. WordNet Stevenson
    and Greenwood 05)
  • Discourse Analysis
  • Event Co-Reference Resolution (e.g. LaSIE
    Gaizauskas et al. 95)
  • Semantic Integration (Template Merging)
  • Extraction from Heterogeneous Sources (Text,
    Tables and other Semi-Structured Data, Image
    Captions) SmartWeb Buitelaar et al. 06a/b
  • Multi-Document Information Extraction ArtEquAKT
    Alani et al. 03

25
Question Answering
  • Question Analysis
  • Ontology/WordNet-based Semantic Question
    Interpretation
  • e.g. Pasca and Harabagiu 01
  • Answer Selection
  • Ontology/WordNet-based Reasoning for Answer
    Type-Checking
  • Ontology of Events Sinha and Narayanan 05
  • Geographical Ontology, WordNet Schlobach et al.
    04
  • WordNet Pasca and Harabagiu 01
  • Ontology-based Question Answering
  • Derive Answers from a Knowledge Base
  • e.g. Aqualog Lopez and Motta 04

26
Machine Translation Summarization
  • Conceptual Model for Interlingua in MT
  • Not much current work, but see e.g. Hovy and
    Nirenburg 92, Knight 93
  • Background Knowledge from Relevant Ontologies in
    Concept-based Summarization
  • Not common, but see e.g. Lee et al. 04, Lenci
    et al. 02
  • Multi-Document Concept-based Summarization
  • e.g. ArtEquAKT Alani et al. 03

27
Semantic Interpretation
  • Ontology-based Inference and Reasoning for
  • Compound Analysis
  • headache medicine (medicine cure headache)
  • Metonymy and Coercion
  • The Boston office called (office gt person, person
    part_of office)
  • I began the book (book gt event, read telic book)
  • Bridging and Discourse
  • Peter bought a car. The engine runs well (engine
    part_of car)
  • Word Sense Disambiguation
  • in the corner (gt location) / before the corner is
    taken (gt event)
  • Beckham kicked the ball (kick gt shot) / the
    referee (kick gt foul)

28
Summary
  • Ontologies have nowadays important applications
  • Semantic Web
  • Knowledge Management
  • Agent Communication
  • Information Integration (e.g. databases)
  • Planning and Composition of services
  • Natural Language Processing

29
Motivation for Ontology Learning
  • High cost for modelling ontologies.
  • Solution learn from existing data?
  • Which data?
  • Legacy Data (XML oder DB-Schema) gt Lifting
  • Texts ?
  • Images ?
  • In this talk we will discuss ontology learning
    from texts.

30
Learning ontologies from texts
  • Problems
  • Bridge the gap between symbol
  • and concept/ontology level
  • Knowledge is rarely mentioned
  • explicitly in texts.

31
OL from Text as Reverse Engineering
32
OL from Text - Some pre-History
  • AI - Knowledge Acquisition
  • Since 60s/70s Semantic Network Extraction and
    similar for Story Understanding
  • e.g. MARGIE (Schank et al. 73), LUNAR (Woods 73)
  • NLP - Lexical Knowledge Extraction
  • 70s/80s/early 90s Extraction of Lexical Semantic
    Representations from Machine Readable
    Dictionaries
  • e.g. ACQUILEX LKB (Copestake et al. 92)
  • 80s/90s Extraction of Semantic Lexicons from
    Corpora for Information Extraction Systems
  • e.g. AutoSlog (Riloff 93), CRYSTAL (Soderland et
    al. 95)
  • IR - Thesaurus Extraction
  • Since 60s Extraction of Keywords, Thesauri and
    Controlled Vocabularies
  • e.g. (Sparck-Jones 66/86, 71), Sextant
    (Grefenstette 92), DR-Link (Liddy 94)

33
Ontology Learning Layer Cake
34
TextToOnto Text2Onto
  • Ontology Learning Frameworks_at_AIFB
  • Various algorithms, various features
  • Download (open source)
  • Sourceforge (TextToOnto)
  • Ontoware (Text2Onto)
  • Ontology Learning is an over-night task!

35
Tools
36
Ontology Learning Layer Cake
General Axioms
Axiom Schemata
Relation Hierarchy
Relations
Concept Hierarchy
Concept Formation
(Multilingual) Synonyms
Terms
37
Terms
  • Terms are at the basis of the ontology learning
    process
  • Terms express more or less complex semantic units
  • But what is a term?
  • Huge Selection of Top Brand Computer Terminals
    Available for Immediate Delivery
  • Because Vecmar carries such a large inventory of
    high-quality computer terminals, including ADDS
    terminals, Boundless terminals, DEC terminals, HP
    terminals, IBM terminals, LINK terminals, NCR
    terminals and Wyse terminals, your order can
    often ship same day. Every computer terminal
    shipped to you is protected with careful packing,
    including thick boxes. All of our shipping
    options - including international - are available
    through major carriers.
  • Extracted term candidates (phrases)
  • computer
  • terminal
  • computer terminal
  • ? high-quality computer terminal
  • ? top brand computer terminal
  • ? HP terminal, DEC terminal,

38
Term Extraction
  • Determine most relevant phrases as terms
  • Linguistic Methods
  • Rules over linguistically analyzed text
  • Linguistic analysis Part-of-Speech Tagging,
    Morphological Analysis,
  • Extract patterns Adjective-Noun, Noun-Noun,
    Adj-Noun-Noun,
  • Ignore Names (DEC, HP, ), Certain Adjectives
    (quality, top, ), etc.
  • Statistical Methods
  • Co-occurrence (collocation) analysis for term
    extraction within the corpus
  • Comparison of frequencies between domain and
    general corpora
  • Computer Terminal will be specific to the
    Computer domain
  • Dining Table will be less specific to the
    Computer domain
  • Hybrid Methods
  • Linguistic rules to extract term candidates
  • Statistical (pre- or post-) filtering

39
Statistical Analysis
  • Scores used in Term Extraction
  • MI (Mutual Information) Cooccurrence Analysis
  • TFIDF Term Weighting
  • ?2 (Chi-square) Cooccurrence Analysis Term
    Weighting
  • Other
  • c-value/nc-value (Frantzi Ananiadou, 1999)
  • Considers length (c-value) and context (nc-value)
    of terms
  • Domain Relevance Domain Consensus (Navigli and
    Velardi, 2004)
  • Considers term distribution within (DC) and
    between (DR) corpora

40
Term Extraction
  • Use some statistical measure to assess term
    relevance, e.g. tf.idf

The word is more important if it appears several
times in a target document
The word is more important if it appears in less
documents
tf(w) term frequency (number of word occurrences
in a document) df(w) document frequency (number
of documents containing the word) N number of all
documents tfIdf(w) relative importance of the
word in the document
41
C- / NC-value
  • Combination of
  • C-value (indicator for termhood)
  • NC-value (contextual indicators for termhood)
  • C-value (frequency-based method sensitive to
    multi-word terms)

42
C- / NC-value
  • NC-value (incorporation of information from
    context words indicating termhood)
  • C-/NC-value

43
Terms Tools
44
TextToOnto
45
Ontology Learning Layer Cake
General Axioms
Axiom Schemata
Relation Hierarchy
Relations
Concept Hierarchy
Concept Formation
(Multilingual) Synonyms
Terms
46
Synonyms
  • Next step in ontology learning is to identify
    terms that share (some) semantics, i.e.,
    potentially refer to the same concept
  • Synonyms (Within Languages)
  • 100 synonyms dont exist only term pairs
    with similar meanings
  • Examples from http//thesaurus.com
  • terminal video display input device
  • graphics terminal - video display unit screen
  • Techniques
  • Clustering, e.g. Grefenstette
  • Significance of Co-occurence, e.g. PMI-IR

47
Synoyms - Evaluation
  • Gold Standard
  • TOEFL (Landauer LSA 64.45, Turney PMI-IR
    48-74)
  • WordNet (problematic due to domain-independence,
    e.g. Pantel and Lin 03)
  • WordNet tuning, e.g. Cucchiarelli and Velardi
    98, Turcato 00, Buitelaar and Sacaleanu 01
  • Human Evaluation
  • Task-based
  • (Cross-lingual ) IR/QA - e.g. Query Expansion
  • Other
  • Artificial Evaluation (see Grefenstette 94)
  • e.g. transform cell -gt CELL in some contexts

48
Synonyms Tools
49
Ontology Learning Layer Cake
General Axioms
Axiom Schemata
Relation Hierarchy
Relations
Concept Hierarchy
Concept Formation
(Multilingual) Synonyms
Terms
50
Concepts Intension, Extension, Lexicon
  • A term may indicate a concept, if we can define
    its
  • Intension
  • (in)formal definition of the set of objects that
    this concept describes
  • a disease is an impairment of health or a
    condition of abnormal functioning
  • Extension
  • a set of objects (instances) that the definition
    of this concept describes
  • influenza, cancer, heart disease,
  • Discussion what is an instance? - heart
    disease or my uncles heart disease
  • Lexical Realizations
  • the term itself and its multilingual synonyms
  • disease, illness, Krankheit, maladie,
  • Discussion synonyms vs. instances disease,
    heart disease, cancer,

51
Concepts Intension
  • Extraction of a Definition for a Concept from
    Text
  • Informal Definition
  • e.g., a gloss for the concept as used in WordNet
  • OntoLearn (Navigli and Velardi 04 Velardi et al.
    05) uses natural language generation to
    compositionally build up a WordNet gloss for
    automatically extracted concepts
  • Integration Strategy strategy for the
    integration of
  • Formal Definition
  • e.g., a logical form that defines all formal
    constraints on class membership
  • Inductive Logic Programming, Formal Concept
    Analysis,

52
Concepts Extension
  • Extraction of Instances for a Concept from Text
  • Commonly referred to as Ontology Population
  • Relates to Knowledge Markup (Semantic Metadata)
  • Uses Named-Entity Recognition and Information
    Extraction
  • Instances can be
  • Names for objects, e.g.
  • Person, Organization, Country, City,
  • Event instances (with participant and property
    instances), e.g.
  • Football Match (with Teams, Players, Officials,
    ...)
  • Disease (with Patient-Name, Symptoms, Date, )

53
Concept Formation - Evaluation
  • Concept Extension
  • Gold Standard
  • overlap on clusters, e.g. OntoBasis
  • overlap on set of instances w.r.t. KB (difficult)
  • Human Evaluation (e.g. OntoBasis)
  • Task Based
  • QA from KBs
  • Concept Intension (in/formal definitions)
  • Gold Standard (e.g. WordNet glosses, WikiPedia)
  • Human Evaluation (e.g. WordNet glosses Velardi
    et al. 05)
  • Task Based
  • Ontology Engineering
  • Understanding
  • Consistency

54
Concept Formation Tools
55
Ontology Learning Layer Cake
General Axioms
Axiom Schemata
Relation Hierarchy
Relations
Concept Hierarchy
Concept Formation
(Multilingual) Synonyms
Terms
56
Taxonomy Extraction - Overview
  • Lexico-syntactic patterns
  • Distributional Similarity Clustering
  • Linguistic Approaches
  • Taxonomy Extension/Refinement
  • Combination of Methods
  • Evaluation
  • Tools Matrix

57
Hearst Patterns Hearst 1992
  • Patterns to extract a relation of interest
    fullfilling the following requirements
  • They should occur frequently and in many text
    genres.
  • They should accurately indicate the relation of
    interest.
  • They should be recognizable with little or no
    pre-encoded knowledge.

58
Acquiring Hearst Patterns
  • Hearst also suggests a procedure in order to
    acquire
  • such patterns from a corpus
  • Decide on a lexical relation R of interest, e.g.
    hyponymy/hypernymy.
  • Gather a list of terms for which this relation is
    known to hold, e.g. hyponym(car, vehicle). This
    list can be found automatically using the Hearst
    patterns or by bootstrapping from an existing
    lexicon or knowledge base.
  • Find places in the corpus where these expressions
    occur syntactically near one another.
  • Find the commonalities and generalize the
    expressions in 3. to yield patterns that indicate
    the relation of interest.
  • Once a new pattern has been identified, gather
    more instances of the target relation and go to
    step 3.

59
Hearst Patterns - Examples
  • Examples for hyponymy patterns
  • Vehicles such as cars, trucks and bikes
  • Such fruits as oranges, nectarines or apples
  • Swimming, running and other activities
  • Publications, especially papers and books
  • A seabass is a fish.

60
Hearst Patterns (Continued)
  • Use regular expression defined over syntactic
    categories
  • NP such as NP, NP, ... and NP
  • Such NP as NP, NP, ... or NP
  • NP, NP, ... and other NP
  • NP, especially NP, NP ,... and NP
  • NP is a NP.
  • ...
  • Precision wrt. Wordnet 55,46 (66/119) on the
    basis of New York Times corpus
  • Cederberg and Widdows 03 report lower results
    40

61
Extensions of Hearsts approach
  • Using Hearst Patterns for Anaphora Resolution
  • Poesio et al. 02 / Markert et al. 03
  • Additional Patterns Iwanska et al. 00
  • Using Questions Sundblad 02
  • Application to collateral texts Ahmad et al. 03
  • Matching patterns on the Web
  • KnowItAll Etzioni et al. 04-05, PANKOW Cimiano
    et al. 04-05
  • Improving Accuracy (LSA) Coverage
    (Conjunctions)
  • Cederberg and Widdows 03
  • Learning Patterns
  • Snowball Agichtein et al. 00, Downey et al.
    04, Ravichandran and Hovy 02, Snow et al. 04)

62
Generalizing Patterns
  • Pantel, Ravichandran, Hovy
  • using edit distance as a basis to generalize
    patterns
  • Snowball Agichtein et al. 00
  • patterns as triples of bag-of-words represented
    as vectors, i.e. (left,arg1,middle,arg2,right)
  • use dot product to calculate similarity
  • calculating centroid as a generalization of the
    pattern
  • Other
  • Downey et al. 04
  • Ravichandran and Hovy 02
  • Snow et al. 04

63
Taxonomy Extraction - Overview
  • Lexico-syntactic patterns
  • Distributional Similarity Clustering
  • Linguistic Approaches
  • Taxonomy Extension/Refinement
  • Combination of Methods
  • Evaluation
  • Tools Matrix

64
What does the X stand for?
  • X is very nice.
  • In X it is always sunny.
  • We usually spend our holidays at X.
  • We observe that we can group words which appear
    at certain contexts.
  • For this purpose we need to represent the context
    of words.

65
Distributional Hypothesis Vector Space Model
  • Harris, 1986
  • Words are (semantically) similar to the extent
    to which they share similar words
  • Firth, 1957
  • You shall know a word by the company it keeps
  • Idea collect context information and represent
    it as a vector
  • compute similarity among vectors wrt. a measure

66
Context Features
  • Four-grams Schuetze 93
  • Word-windows Grefenstette 92
  • Predicate-Argument relations (SUBJ/OBJ/COMPLEMENT)
  • Modifier Relations (fast car, the hood of the
    car)
  • Grefenstette 92, Cimiano 04b, Gasperin et al.
    03
  • Appositions (Ferrari, the fastest car in the
    world)
  • Caraballo 99
  • Coordination (ladies and gentlemen)
  • Caraballo 99, Dorow and Widdows 03

67
Overall Process for Clustering Concept Hierarchies
Ling. Analysis
Attribute Extraction
Pruning
Clustering
68
Extracting contextual features
The museum houses an impressive collection of
medieval and modern art. The building combines
geometric abstraction with classical references
that allude to the Roman influence on the
region.
house_subj(museum) house_obj(collection) combine_s
ubj(museum) combine_obj(abstraction) combine_with(
reference) allude_to(influence)
69
Pseudo-syntactic Dependencies
  • The museum houses an impressive collection of
    medieval and modern art. The building combines
    geometric abstraction with classical references
    that allude to the Roman influence on the region.

NP verb NP -gt verb_subj / verb_obj
impressive(collection) geometric(abstraction) comb
ine_with(reference) classical(reference) allude_to
(influence) roman(influence) influence_on(region)
on_region(influence)
house_subj(museum) house_obj(museum) combine_subj(
museum) combine_obj(abstraction) combine_with(refe
rence)

70
Weighting Measures
71
Clustering Concept Hierarchies from Text
  • Similarity-based
  • Set-theoretical
  • Soft clustering

72
Similarity-based Clustering
  • Similarity Measures
  • Binary (Jaccard, Dine)
  • Geometric (Cosine, Euclidean/Manhattan distance)
  • Information-theoretic (Relative Entropy, Mutual
    Information)
  • ()
  • Methods
  • Hierarchical agglomerative clustering
  • Hierarchical top-down clustering, e.g. Bi-Section
    KMeans
  • ()

73
Hierarchical Agglomerative Clustering
car
bus
trip
excursion
74
Bi-Section-KMeans
75
Clustering Concept Hierarchies
  • Similarity-based
  • Set Theoretical
  • Soft clustering

76
Formal Concept Analysis Ganter, Wille 1999
  • finds closed sets of attributes and
  • objects (Formal Concepts)
  • yields a hierarchy with a formal
  • interpretation in terms of subsumption
  • of attributes

77
TextToOnto FCA
78
Evaluation
  • Evaluation with respect to existing ontologies
    for a certain domain (tourism and finance)
  • Quantitative comparison of agglomerative,
    divisive and conceptual clustering (FCA)
  • Qualitative comparison understandability,
    efficiency

79
Comparison of Hierarchies
bookable
root
rentable
joinable
thing
activity
driveable
apartment
trip
excursion
vehicle
apartment
trip
excursion
car
rideable
car
TWV
bike
bike
80
Semantic Cotopy Maedche Staab 02
bookable
root
rentable
joinable
thing
activity
driveable
apartment
trip
excursion
vehicle
apartment
trip
excursion
car
rideable
car
TWV
bike
bike
SC(bike)bike,rideable,driveable,rentable,bookabl
e
SC(bike)bike,TWV,vehicle,thing,root
gt TO(bike,O1,O2)1/9!!!
81
Common Semantic Cotopy (SC)
bookable
root
rentable
joinable
thing
activity
driveable
apartment
trip
excursion
vehicle
apartment
trip
excursion
car
rideable
car
TWV
bike
bike
SC(driveable)bike,car
SC(vehicle)bike,car
gt TO(driveable,O1,O2)1
82
Trivial Concept Hierarchies
root
thing
activity
car
excursion
bike
trip
apartment
vehicle
apartment
trip
excursion
car
TWV
bike
Berechnung nur für Nicht-Blätter!!!
83
Example for Precision/Recall
F100
84
Example for Precision/Recall
bookable
rentable
joinable
driveable
apartment
trip
excursion
car
bike
F93.33
85
Example for Precision/Recall
bookable
rentable
joinable
driveable
apartment
trip
planable
car
excursion
rideable
bike
F94.74
86
Trivial Concept Hierarchies
F57.14
87
Evaluation
  • Variant of the semantic cotopy
  • Calculation of overlap in both directions
  • Precision
  • Recall
  • F-Measure

88
Syntactic Dependencies
89
Recall over Precision (Tourism)
90
Recall over Precision (Finance)
91
Pseudo-syntactic dependencies
92
Summary of Results
93
Experimental results
  • Formal Concept Analysis yields better concept
    hierarchies than similarity-based clustering
    algorithms.
  • The results of FCA are better understandable
    (intensional description of concepts!)
  • Bi-Section-Kmeans is most efficient (O(n2))
  • Though FCA is exponential in the worst case, it
    shows a favourable runtime behaviour (sparsely
    populated formal contexts)
  • The more fine-grained features, the better the
    results!

94
Clustering Concept Hierarchies from Text
  • Similarity-based
  • Set-theoretical Probabilistic
  • Soft clustering

95
What About Multiple Word Meanings?
  • bank financial institute or natural object?
  • At least two clusters!
  • So we need soft clustering algorithms
  • Clustering By Committee (CBC) Lin et al. 2002
  • Gaussian Mixtures (EM)
  • PoBOC (Pole-Based Overlapping Clustering)
  • FCA
  • (...)
  • Challenge recognize multiple word meanings!

96
Soft clustering aglorithms
  • Principle underlying POBOC and CBC
  • Construct first poles or committees
    corresponding to very homogeneous groups of
    words, e.g. monosemous words
  • At a second step, assign words which do not form
    poles or committes to one or more committees
    these are the ambiguos words
  • Additional trick in CBC once you assign a word
    to a committe, remove the overlapping features,
    i.e. substract the meaning of the committee

97
Approach by Widdows and Dorow 2002
  • Extract shallow grammatical
  • relations for words -gt build a
  • context vector.
  • Apply LSA/LSI to reduce
  • dimension of co-occurrence
  • matrix.
  • Calculate similarity as the
  • cosine between the angle of
  • the corresponding vectors.
  • Senses of a word disjoint
  • subgraphs

98
Scalability
  • Problem with clustering algorithms
  • Compute at least pairwise similarity between
    words, i.e. O(n2k)
  • Idea of Ravichandran, Pantel and Hovy
  • Apply locality sensitive hash functions
  • i.e. approximate cosine measure by a randomized
    procedure

99
Randomly approximating the cosine measure
where d is the number of random vectors!
100
Taxonomy Extraction - Overview
  • Lexico-syntactic patterns
  • Distributional Similarity Clustering
  • Linguistic Approaches
  • Taxonomy Extension/Refinement
  • Combination of Methods
  • Evaluation
  • Tools Matrix

101
Demos
  • Similar Words http//www.isi.edu/pantel/Cont
    ent/Demos/LexSem/thesaurus.htm
  • CBC
  • http//www.isi.edu/pantel/Content/Demos/LexSem/cb
    c.htm

102
Linguistic Approaches
  • Modifiers
  • Modifiers (adjectives/nouns) typically restrict
    or narrow down the meaning of the modified noun,
    i.e.
  • e.g. isa(international credit card, credit card)
  • Yields a very accurate heuristic for learning
    taxonomic relations, e.g. OntoLearn Velardi
    Navigli, OntoLT Buitelaar et al., 2004,
    TextToOnto Cimiano et al., Sanchez et al.,
    2005
  • Compositional interpretation of compounds
    OntoLearn
  • e.g. long-term debt
  • Disambiguate long-term and debt with respect to
    WordNet
  • Generate a gloss out of the glosses of the
    respective synsets
  • long-term debt a kind of debt, the state of
    owing something (especially money), relating to
    or extending over a relatively long time

103
Taxonomy Extraction - Overview
  • Lexico-syntactic patterns
  • Distributional Similarity Clustering
  • Linguistic Approaches
  • Taxonomy Extension/Refinement
  • Combination of Methods
  • Evaluation
  • Tools Matrix

104
General Problem
105
Hearst Schuetze 1993
  • For each word w in WordSpace
  • collect the 20 nearest neighbors in space using
    the cosine measure,
  • compute the score si of category i for w as the
    number of nearest neighbors that are in i, and
  • assign w to the highest scoring category.

106
Widdows 2003
  • For a target word w, find words from the corpus
    which are similar to those of w. Consider these
    corpus-derived neighbors N(w)
  • Map the target word w to the place in the
    taxonomy where the neighbors N(w) are most
    concentrated.
  • Crucial question What does most concentrated
    mean?

107
Determine where they are most concentrated
  • Maximization problem

108
Classification of Instances
  • Known from memory-based learning or
  • instance-based learning Daelemans et al.

109
Dataset
  • Two annotators assigned named entities appearing
    in Lonely Planet destination descriptions to
    their corresponding class (682 concepts)
  • Kappa63.54
  • Coincided in 277 named entities which will be
    used for the evaluation

110
Evaluation
  • F-Measure averaged over both gold standards
  • Symmetric variant of the Learning Accuracy Hahn
    et al.
  • Majority Baseline F12.64

111
Word windows vs. Syntactic surface dependencies
  • Take n words to the left and right of the word as
    context without trespassing sentence boundaries
    (3,5,10)
  • Use syntactic surface dependencies extracted with
    a shallow parser
  • adjective modifiers a nice city -gt nice(city)
  • possessive modifiers the citys center -gt
    has_center(city)
  • subject/objects/PP-compl the river flows through
    the city -gt flows_sub(river), flow_through(city)
  • prepositional phrases (the city near the river)
    -gt near_river(city), city_near(river)
  • copula constructions (a flamingo is a bird)
    is_bird(flamingo)
  • verb phrases with to have (a country has a
    capital) -gt has_capital(country)

112
Improvement - surface dependencies
57.78
60.03
19.70
19.58
113
Data Sparseness 1 Using Conjunctions
  • If two terms or NEs w1 and w2 are coordinated
    using the conjunctions and or or, count any
    occurence of a feature of w1 also as a feature of
    w2 and the other way round.
  • leads to smoothing of frequency landscape

114
Improvement - Conjunctions
61.23
22.80
115
Data Sparseness 2 Exploiting the Taxonomy in
line with Pekar et al.
  • Construct the context vector for a word by
    aggregating the vectors of its hyponyms wrt. to
    the taxonomy
  • Vector addition
  • Category (vector normalization)
  • Centroid (average vector)
  • We use only direct hyponyms
  • We yielded reasonable results only
  • with the centroid-based method

116
Improvement Taxonomy (centroid)
64.11
23.02
117
Data Sparseness 3 Anaphora Resolution
  • Approach based on the algorithm by Lappin and
    Leass 94
  • Additionally use patterns to detect pleonastic
    uses of it in line with Dimitrov 2002
  • Replace each non-pleonastic anaphor by ist
    gramatically correct form, e.g.
  • The port capital of Vathy is dominated by its
    fortified Venetian harbour.
  • The port capital of Vathy is dominated by Vathys
    fortified Venetian harbour.

118
Improvement Anaphora Resolution
64.11
23.82
119
Data Sparseness 4 Downloading documents from the
Web
  • Observation Named entities are especially
    affected by data sparseness
  • Idea download documents from the Web in which
    the named entity occurs (using the Google-API or
    similar)
  • Calculate the relevance of the downloaded
    document wrt. to the corpus by calculating a
    bag-of-words-style similarity (cosine)
  • Reject the document if the similarity is below
    some threshold (0.2 in our experiments and
    considering at most 20 documents)

120
Improvement Downloading documents
65.91
26.21
121
Postprocessing Web Statistics
  • Take the k best answers c1,...,ck
  • Count the number of occurences of the following
    Hearst-style patterns using the Google-API
  • add the counts by dividing by the constant part
    for each
  • entity e
  • Choose the concept ci maximizing this value

122
Improvement Web Statistics
69.87
32.60
123
Taxonomy Extension/ Refinemet
  • Conclusions
  • difficult problem
  • approaches not comparable (datasets,
  • measures, ontologies, number of concepts,...)

124
Taxonomy Extraction - Overview
  • Lexico-syntactic patterns
  • Distributional Similarity Clustering
  • Linguistic Approaches
  • Taxonomy Extension/Refinement
  • Combination of Methods
  • Evaluation
  • Tools Matrix

125
Initial Blueprints for Combination
  • Ontology learning is error-prone, combination of
    techniques can be
  • expected to make results more accurate
  • Caraballo 99
  • Label tree produced with hierarchical
    agglomerative clustering using lexico-syntactic
    patterns
  • Cimiano 05b/c
  • Guided Clustering
  • Integrate a hypernym oracle with agglomerative
    clustering
  • Classification-based approach
  • use features derived from several learning
    paradigms
  • Cederberg Widdows 03
  • Increase accuracy and coverage of
    lexico-syntactic patterns by using LSA and
    coordination patterns

126
Hierarchical Agglomerative Clustering with
Postprocessing
  • Caraballos Method Caraballo 1999
  • Agglomerative Clustering
  • Labeling Clusters with hypernyms derived from
    Hearst patterns
  • Removing unlabeled concepts thus compacting the
    hierarchy
  • Evaluation select 20 nouns with at least 20
    hypernyms and present them to human judges with
    the 3 best hypernyms for each
  • Results
  • Best Hypernym 33 (Majority) / 39 (Any)
  • Any Hypernym 47.5 (Majority) / 60.5 (Any)

127
Classification-based approachCimiano et al.
2005b
isaWN(t1,t2)
isaHearst(t1,t2)
isa(t1,t2)p
isaWWW(t1,t2)
isahead(t1,t2)
Idea Use as input features derived by applying
different techniques, resources, etc. and find
optimal combination in a supervised manner!
128
Results for Combination
129
Improving Precision and Recall of Hearst patterns
Cederberg and Widdows 03
  • Main Idea
  • Improve precision by filtering hyponym pairs
    using their similarity in WordSpace (error
    reduction by 30, P58)
  • Improve recall by using coordination information,
    i.e. A lt B and coordinated(A,C) -gt C lt B
  • This yields a five-fold increase in recall while
    mantaining precision at P54 using the WordSpace
    filtering technique.

130
Concept Hierarchy Tools
131
Ontology Learning Layer Cake
General Axioms
Axiom Schemata
Relation Hierarchy
Relations
Concept Hierarchy
Concept Formation
(Multilingual) Synonyms
Terms
132
Specific Relations / Attributes
  • Part-of Charniak et al. 98
  • X consists of Y
  • Qualia Yamada et al. 04, Cimiano Wenderoth 05
  • Formal such X as Y
  • Purpose X is used for Y
  • Agentive a ADV Xed Y
  • Causation Girju 02, Sanchez 04
  • X leads to Y
  • Attributes Poesio and Almuhareb 05

133
What are Qualia Structures?
  • lexical structures describing the nature of an
    object
  • introduced by James Pustejovsky in the context of
    the Generative Lexicon (GL)
  • inspired in Aristotles basic factors or causes
  • described in terms of four roles
  • Formal describing the properties which
    distinguish the object in a larger domain
  • Agentive describing the factors involved in its
    creation
  • Constitutive describing ist physical properties
  • (weight, material, shape, parts, components,
    etc.)
  • Telic describing its purpose or function

134
Qualia Structure for knife
  • Formal artifact_tool
  • Constitutive blade, handle, ...
  • Agentive make_act
  • Telic cut_act

135
Why are QS useful for NLP?
  • Analysis of compounds (Johnston Busa)
  • headache medicine (telic role)
  • Co-composition and coercion (Pustejovsky)
  • fast car vs. fast highway vs. fast waiter
  • begin a book (agentive or telic role)
  • Bridging reference resolution (Bos et al.)
  • Peter bought a car. The motor runs well.
    (Constitutive role)
  • Query Expansion Refinement (Vorhees)
  • headache medicine cure

136
Motivation for automatically learning Qualia
Structures
  • Axiom1 World Knowledge is needed for NLP
  • Axiom2 Modeling a significant amount of world
    knowledge by hand is not always useful (e.g. Cyc)
  • current NLP systems rely on WordNet
  • broad, but also
  • shallow (not much relations)
  • proliferation of senses
  • domain-independent
  • Long-term-goal Create a large lexical resource
    of qualia structures for use it for NLP
    applications

137
The General Idea
  • Our approach exploits
  • lexico-syntactic patterns (Hearst 1992, etc.) to
    discover fillers of the qualia roles
  • The web as a big corpus
  • A statistical measure to weight each filler
  • Result Weighted Qualia Structure (wQS) for a
    given word

138
The Process
Generate Clues
Word
POS-tagging
Match regular expressions
Statistical Weighting
Weighted QS
139
The Process some details
  • Generating clues in order to download a set of
    promising pages in which the patterns will be
    matched
  • POS-tagging to avoid errors
  • Bill Gates is a computer hacker.
  • Statistical Weighting by the Jaccard coefficient

140
The Formal role
NP(a-zDT )? ((a-zJJ))?
(a-zNN(S)?)
141
The Telic role
PURPa-zVB NP NP beVB a-zVBD
142
The Constitutive role
143
The Agentive role
  • No reliable patterns found
  • X is made by Y
  • X is produced by Y
  • Use a predefined set of creation verbs
  • build, produce, make, write, plant, elect,
    create, cook, construct, design
  • Calculate

144
Evaluation by a human judge
  • Evaluation scale
  • O totally wrong
  • 1 not completely wrong
  • 2 almost acceptable
  • 3 correct

145
Automatically generated qualia structure for
knife
146
Results
  • Evaluation judge assigns credits from 0 (wrong)
    to 3 (totally correct)

147
General Relations Exploiting Linguistic Structure
  • OntoLT SubjToClass_PredToSlot_DObjToRange
    Heuristic
  • Maps a linguistic subject to a class, its
    predicate to a corresponding slot for this class
    and the direct object to the range of the slot
  • TextToOnto Acquisition of Subcategorization
    Frames
  • love(man,woman)
  • love(kid,mother)
  • love(kid,grandfather)
  • Problem related to acquisition of
    subcategorization frames and selectional
    restrictions in Natural Language Processing
  • e.g. Resnik 97, Ribas 95, Clark and Weir 02

love(person,person)
148
Finding the Right Level of Abstraction
  • Ciramita et al. 05
  • Genia Corpus. Genia Ontology
  • Verb-based relations
  • X activates B
  • Use X2 to decide to generalize or not
    (significance level)
  • Results
  • 83.3 of relations correct according to human
    evaluation
  • 53.1 correctly generalized

149
Our experiments
  • Genia corpus Genia ontology
  • Extract subj-verb-obj relations using a shallow
    parser (Abneys CASS)
  • Try to find the appropriate domain and range for
    the relations wrt. Genia
  • Use different statistical measures to generalize!

150
Comparing different measures
  • Conditional Probability
  • Point-wise Mutual Information
  • Chi-square test

151
An example
  • Words found as objects of activate
  • protein_molecule 5
  • protein_family_or_group 10
  • amino_acid 10
  • Cond. Prob
  • P(proteinactivate_obj)15/25 0.6
  • P(amino_acidactivate_obj)25/25 1
  • PMI
  • PMI(protein,activate_obj)log(0.6/0.14) 2.1
  • PMI(amino_acid,activate_obj)log(1/0.27)1.89

152
Example (Contd)
153
Results
  • Evaluation
  • Biologist labelled 100 relations from hand by
    selecting the appropriate domain and range from
    the Genia corpus
  • Surprisingly, the conditional probability gives
    the best results!
  • But chi-square still works better than PMI!
  • Peculiarities
  • Genia ontology very shallow
  • Corpus semantically annotated

154
Relations Tools
155
TextToOnto Relations
156
TextToOnto - Relations
157
Ontology Learning Layer Cake
General Axioms
Axiom Schemata
Relation Hierarchy
Relations
Concept Hierarchy
Concept Formation
(Multilingual) Synonyms
Terms
158
Axiom Schemata General Axioms
  • DIRT (Discovery of Inference Rules from Text Lin
    et al. 01)
  • calculate significant collocations on dependency
    paths
  • Examples X solves Y
  • Y is solved by X, X resolves Y, X finds a
    solution to Y, X tries to solve Y, Y deals with
    X, Y is resolved by X, X addresses Y, X seeks a
    solution to Y, X do something about Y, ...
  • AEON Völker et al. 05
  • Rigidity, Identity, Unity, Dependence
  • Haase and Völker 05
  • Disjointness Axioms on the basis of coordination
  • i.e. disjoint(man,woman)

159
Tools - Axioms
160
Summary
  • Terms use some statistical measure to assess
    relevance wrt. to a corpus
  • Concept Hierarchies
  • Formal Concept Analysis Clustering
  • Hearst Patterns
  • Relations use NLP techniques to extract verbs
    and their argument structure (Generalize!)

161
Agenda
  • Ontologies
  • Motivation
  • Ontology Learning
  • Layer Cake
  • Term Extraction
  • Concept Hierarchies
  • Relations
  • Applications
  • Conclusion

162
Applications
  • Information Retrieval
  • Query Expansion
  • Document Similarity (IR)
  • Natural Language Processing
  • Word Sense Disambiguation
  • Text Mining
  • Enhanced bag-of-word model

163
Classification and Clusteringof Texts
Category
  • Typically, document classification and
    clustering
  • methods rely on the bag-of-words model.
  • Recently, the bag-of-words model has been
    enhanced to
  • also contain conceptual features derived from a
    domain
  • ontology Bloehdorn and Hotho 2004.

164
Generalization
Relative Improvement w.r.t. bag-of-words model
between 2 and 7.
h2
165
Using automatically learned ontologies
conceptural document representation
term vectors

concept vectors
Classification / Clustering
linguistic context vectors
term clustering
166
Results
  • Automatically learned ontologies achieve
    comparable results to hand-crafted ontologies
    wrt. clustering and classification tasks.
  • Best Algorithm Bi-Section KMeans
  • Unclear how many levels one has to move up!
  • Conclusion For some applications automatically
    generated ontologies are good enough.

167
SEKT Case Studies
  • BT case study
  • Legal case study

168
BT (British Telecom) Case Study
  • Digital Library (since 1994)
  • Single interface for accessing multiple databases
    with content from different publishers
  • More than 1 million technical articles and papers
    from 12000 publications, about 1000 business and
    management magazines
  • Main features
  • Information spaces collections of documents
    about interesting topics
  • Searching and browsing
  • Personalization alerts, bookmarks, annotations,
    private information spaces

169
BT Case Study Semantic Web Information Space
170
BT Case Study Ontology Learning Scenario
  • Learn fine-grained topic hierarchy from each
    information space
  • Why?
  • Visualization of information spaces
  • Searching and browsing information spaces (Query
    Refinement)
  • Topic discovery
  • Integrated with a Query Refinement Tool

171
Evaluation Setting
  • Corpus 1700 abstracts from knowledge
    management information space
  • 5 human annotators, domain experts
  • For each type of ontology element
  • Each annotator was given the top 50 ontology
    learning results (regarding confidence /
    relevance)
  • Rating scale ranging from 1 (completely wrong) to
    5 (perfectly correct)

172
Algorithms
  • Concept and Instance Extraction
  • TFIDF (discussed)
  • Subclass relations
  • Combination of Hearst Patterns WordNet
  • Linguistic Heuristics (partially discussed)
  • Instance-of relations
  • Hearst Patterns (discussed)
  • Non-taxonomic relations
  • Analysis of verb structure (discussed)
  • Subtopic relations
  • Sanderson and Croft algorithm
  • Disjointness Axioms
  • Analysis of enumerations, e.g. men and women

173
Evaluation ResultsConclusion
  • Promising evaluation results
  • Problems due to evaluation procedure and human
    perception
  • High disagreement among human annotators
  • What is a topic?
  • Which score do I have to assign if I do not know
    a concept / instance or if the label is
    ambiguous?
  • How can you talk about disjointness of concepts
    which do not have a set theoretic interpretation?

174
Legal Case Study
  • In General
  • Complaint about diligence of legal
    administration.
  • The Judges are overworked.
  • In Particular
  • New Judges
  • A lot of theoretical knowledge, but few practical
    knowledge
  • On Duty.
  • When they are confronted with situations in which
    they are not sure what to do
  • Disturb experienced judges with typical
    questions.
  • Usually his/her former tutor (Preparador)
  • Existing Technology
  • Legal Databases
  • Essential in their daily work
  • Based on keywords and boolean operators
  • A search retrieves a huge number of hits

175
Description of the Problem Legal Domain
  • Solution
  • Design an intelligent system to help new judges
    with their typical problems.
  • Extended FAQ system using Semantic Web
    technologies
  • Connect the FAQ system with the exiting
    jurisprudence.
  • Search Jurisprudence using Semantic Web
    technologies.

176
Learning Concept Hierarchies with the Spanish
version of TextToOnto
177
Expert Knowledge Retrieval
  • Use automatically learned ontologies for
    computation of similarity between question and
    FAQ database (consider synonyms, etc.)

FAQ
FAQ
FAQ
FAQ
FAQ Candidates
Ontology Domain Detection
Keyword/synonym matching stage
Ontology graph path matching
User Question
iFAQ Search Engine
Other search engines ...
Search Factory
178
Applications in IR
  • Query Refinement
  • Use corpus-derived synonyms
  • Use corpus-derived subconcepts
  • Query Interpretation
  • Headache medicine
  • Cure or cause ?
  • See OntoQuery project

179
Take-home Message
  • Powerful Methods
  • Matching of lexico-syntactic patterns
  • Distributional Similarity
  • Use any similarity measure of your choice
  • Yields similar words (near synonyms)
  • Very promising applications
  • Information retrieval
About PowerShow.com