Semantic Query By Example - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Semantic Query By Example

Description:

Urgent need to incorporate ontology into the realm of RDBMS to support semantic queries ... 'Iris Neoplasm', 'Tumor of the Uvea', 'Eye Neoplasm' Example ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 31
Provided by: Jian83
Category:

less

Transcript and Presenter's Notes

Title: Semantic Query By Example


1
Semantic Query By Example
  • Lipyeow Lim, Haixun Wang, Min Wang
  • IBM T.J. Watson Research Center
  • Jiang Kai jiangkai_at_fudan.edu.cn

2
Outline
  • Introduction
  • Background
  • Semantic Query By Example
  • Feature Extraction
  • Learning Query Semantics
  • Active Learning for QBE
  • Experiments
  • Conclusion

3
Introduction
  • Urgent need to incorporate ontology into the
    realm of RDBMS to support semantic queries
  • Difficulties in semantic query on ontology
  • Integrate data and its related ontology
  • Express semantic queries

4
Examples EMR
  • Different disease codes for the same symptoms
  • Iris Neoplasm, Tumor of the Uvea, Eye
    Neoplasm

5
(No Transcript)
6
Example
  • Query find all patients diagnosed with eye tumor
  • RDF-like data model, store ontology in
    Thesaurus(src, rel, tgt)

7
Example
  • XML format

8
Challenge
  • Hard to write the query and needs to have an
    intimate knowledge of the structure of the
    ontology
  • Fuzziness in the semantics of the query
  • Eg. Find all patients diagnosed with some disease
    in the choroid (???) , which is part of eye.
    Relationships linking disease concepts to
    anatomic locations Disease_Has_Primary_Anatomic_S
    ite, Disease_Has_Associated_Anatomic_Site,
    Disease_Has_Metastatic_Anatomic_Site

9
Contributions
  • A novel approach to address the problem
  • A novel QBE (Query By Example) method
  • Active learning to improve the accuracy of query
    semantics modeling
  • Experimentally validate the effectiveness of the
    method with as few as 2-4 user-given examples

10
Review on QBE
  • First proposed in 1970s, two common
    characteristics
  • The examples are used to specify a query that
    will be generated
  • The generated query is a normal query in terms
    that all the query conditions are defined on the
    base attributes in the underlying tables in the
    database
  • The novel semantic QBE framework
  • Learn the query processing directly without
    learning the query first
  • Not only on the base attributes but also
    semantics of the base data encoded in the
    ontology and the connections between them

11
Background
  • Supervised Learning
  • Create a function f from a training dataset
  • Predict any valid input object x as yf (x)
  • Accuracy depends largely on the quality of the
    training dataset
  • Active Learning
  • Situations in which unlabeled data is abundant
    but labeling data is expensive
  • Query the user for labels of examples
  • Graph Classification
  • Classify object (graph/subgraph) based on a set
    of labeled graphs
  • Classify vertices on a graph
  • Vertices and edges in the ontology have their own
    content

12
Semantic Query by Example
  • Query posed against a base table D(X,OID)
  • Offline process
  • Compute the feature vector associated with OID
  • Online process
  • User poses a query by giving a set of example
    tuples. SVM learn a model from the examples
  • User determine whether predictions is accurate
    for an additional small no. of tuples

13
Feature Extraction
  • Challenges
  • Feature space must capture enough information so
    that all discriminative features are covered
  • Vectorize the local hierarchical structure of an
    ontology node
  • Naïve scheme include immediate child nodes
    parent nodes, paths to some important nodes
  • Nodes and paths still complex
  • Not general enough
  • Alternative perform feature extraction per query
  • Expensive
  • Require user to know what part of ontology are
    important

14
Shortest distance based feature extraction
  • Ontology has a set of concepts Cc1,ck.
    f(OID)lty1,ykgt, yi is the shortest distance from
    node o to any node whose concept is ci.
  • Omits much information in the ontology
  • Treat the ontology as an undirected graph
  • Disregard the labels of the edges, concentrate
    only on the labels of the nodes
  • Effectively capture the local structures
  • Tell how close a node is to different concepts
  • Infer much hierarchical information since the
    distribution of concepts is not uniformly random

15
Example
16
Example
  • Find nodes that are parent nodes of any node of
    concept C

17
Implementation
18
Learning Query Semantics
  • User provides examples following format (X,
    OID, label), label ?1, -1
  • Randomly pick a set of tuples Pn in D and label
    them as negative.
  • Tuples satisfying the query is a small
    proportion, classifier usually can overcome such
    noise

19
Support Vector Machine
  • SVMs learn a linear decision boundary to
    discriminate between the two classes.
  • SVM trained from dataset (xi, labeli) specifies a
    linear classification rule by a pair (w, b),
    where w ?RN, f(x) wx b, where x is
    classified as positive if f(x)gt0, or negative if
    f(x)lt0.
  • Decision boundary x ?RN wx b0

20
SVM
  • Optimal boundary

21
Active Learning for QBE
  • Iteratively select the best example to
    incorporate into the training dataset to obtain a
    high quality classifier

22
Experiment
  • Queries

23
Varying the no. of positive examples
  • Table size1000, training dataset 40
  • Query selectivity 10

24
Varying the num of training examples
  • Num of positive examples 8
  • Query selectivity 10

25
Varying the num of training examples
26
Varying query selectivity
  • Num of training examples 160
  • Num of positive examples 8

27
Varying table size
  • Num of training examples 160
  • Num of positive examples 16
  • Query selectivity 5

28
Processing time
  • Processing a query by example on a table of size
    10,000 takes only 0.25 sec

29
Conclusion
  • Introduce a data mining approach to support
    semantic queries in relational database
  • Dealing with the ontology directly when asking
    semantic queries
  • Efficient, effective and general in supporting
    semantic queries

30
The End
  • Thank you all!
Write a Comment
User Comments (0)
About PowerShow.com