Knowledge Acquisition presentation

About This Presentation

Transcript and Presenter's Notes

Title: Knowledge Acquisition

1
Knowledge Acquisition
for Semantic Search System

ITSIM 08
2
Semantic Search System

Conventional IR system
Information retrieval models and techniques VSM,
Probabilistic Model, LSA, pLSA, and etc.
Document-oriented, NO direct answers.
Semantic Search System
Entity and Knowledge oriented.
Semantic Web technologies, formal knowledge
representation, logical reasoning, and etc
Ontology
Fundamental components in semantic search.
Knowledge acquisition and ontology learning.
Ontology engineering manual, transformation, and
learning.

3
Ontology Categorisaztion

4
Ontology Learning Tasks and Methods

5
Probabilistic Topic Models

Generative Process
pLSA
LDA
Why use probabilistic topic models?
Developed in information retrieval to solve
synonym and polysemy.
Capture semantic relations between words and
documents.
Interpretable in terms of probabilistic topics,
compared to LSA (Latent Semantic Analysis).
Efficient dimension reduction techniques.
Application
Document modelling, clustering, classification,
etc.

6
Learning Terminological Ontologies and Concept
Hierarchies

7
Relations in SKOS Model

SKOS Simple Knowledge Organisation System.
Expresses basic structure and content of concept
schemes such as thesauri, classification schemes,
subject heading lists, taxonomies, folksonomies,
etc.
The objective is to learn broader and related
relations.

8
Information Theory Principle for Concept
Relationship

Motivated by the information theory.
Defined over the Kullback-Leibler divergence.
Definition A concept Cp is broader than another
concept Cq if the following two conditions hold
(Similarity condition) the similarity measure
between them is greater than certain threshold,
and
(Divergence difference condition) the difference
between Kullback-Leibler divergence measures

9
Discussion and Comparison to Other Theories

KL divergence
P normally represents the true distribution of
data, while the Q represents a practical
approximation of P.
Average surprise of an incoming message drawn
from distribution Q when it is expected from P.
The principle compares the difference between two
surprise.
Comparison to other theories
A coarse assumption a term A subsumes B if the
documents in which B occurs are a subset of the
documents in which A occurs (quite effective in
certain situations).
The recent theory of Surprise The quantity is
defined as the KL divergence of prior and
posterior distribution of a random variable.

10
Concept Hierarchy Construction Algorithms

Local Similarity Hierarchy Learning (LSHL)
algorithm
Performs local (greedy) search, only constructs
concepts hierarchies.
Global Similarity Hierarchy Learning (GSHL)
algorithm
Performs global search, constructs terminological
ontologies.
Learns both broader and related relations
(broader can be viewed as subsumption).
Model parameters
The number of topics or classes used to learn
parameters in topic model.
The maximum number of designated sub-nodes for a
particular node.
The thresholds for similarity and divergence
measures.
The noise factor, difference between two KL
divergence measures
Maximum number of iterations.

11
Experiment

12
Experiment (cont.)

Folding-in documents of concepts to learned topic
models
Concepts are represented as documents.
Folding-in is similar as training, conditioned on
probability of topic-words learned.
For pLSA using tempered EM algorithm.
For LDA using Gibbs Sampling.
Pair-wise similarity between all concepts are
calculated and used as input to the LSHL and GSHL
algorithms.
Applying LSHL and GSHL algorithms to learn
concept hierarchies and terminological
ontologies.

13
Evaluation

A total number of 168 sets of ontology statements
are learned, and evaluated by domain experts.
In almost all cases, precision
of ontology using LDA is
better than pLSA.
The best precision using LDA
is 86.6 and the worst is 58
The best precision using pLSA is 80, and the
worst is 39.
The possible reason is the generalisability of
LDA to new documents.

Knowledge Acquisition PowerPoint PPT Presentation