Indexes for Supporting Path Queries - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Indexes for Supporting Path Queries

Description:

Current RDF Stores use traditional relational databases to store RDF data. ... We use Berkeley DB as our backing store. Index Manager. Main Memory. Disk Indexes ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 24

Provided by: scottpa5

Category:

more less

Transcript and Presenter's Notes

Title: Indexes for Supporting Path Queries

1
Indexes for Supporting Path Queries

Presented by
Doug Brewer, Scott Patterson, Ravi Pavagada

2
Outline

Problem Description
Goal
Indexes
Attribute
Subsumption
Path
Query Optimization
GUI

3
Problem Description

Current RDF Stores use traditional relational
databases to store RDF data.
Only some can do inferencing and subsumption.
They all have to do expensive joins to compute
path queries with predicates and subsumption.

4
Our goal

To create a RDF store to accomplish the following
more efficiently than current stores
Compute queries on Classes or Properties
direct and derived
Compute attribute predicate queries
Compute path queries with predicates
Compute multiple source and/or destination
queries

5
System Architecture

We use Berkeley DB as our backing store

Index Manager
Main Memory
Disk Indexes
6
Indexes

Created two different types of indexes on
resources
Attribute indexes
Subsumption indexes

7
Attribute Indexes

Built during pre-processing phase for each
property that has a literal value
These indexes are used for handling wildcards
in the queries
Brahms main memory model was used to load RDF
graph
The attribute indexes are then created and stored
in a data structure similar to a Patricia Trie
A Patricia Trie is built for each attribute

8
Attribute Indexes - Patricia Trie

Contains a set of values for a given property
Patricia Trie structure
Each tree or sub tree stores in its root the
longest common prefix of all strings, possibly
the empty string
For each string stored in the sub tree, a suffix
is computed
Each suffix is stored in a child node, e.g. the
root of a new sub tree

9
Attribute Indexes - Patricia Trie

Patricia Trie

Amit
Doug
Haibo
John
Mat
Ra
Scott
Mathew
P Sheth
Sheth
Edward
Lathem
Miller
Perry
hew
j
mesh
vi

e
ndra
sh
Patricia Trie index is used to get the attribute
values that match a given wildcard and returns a
set of instances that contain these attribute
values
10
Subsumption Indexes

Uses the Brahms and SemDis APIs
Four types of subsumption indices
Classes considered direct instances
This index maps the class URI to its instance
ids. The instance Ids are unique ids assigned
for each instance.
Classes considered derived instances
This index maps the class URI to its derived
instance ids. Derived instances are obtained by
first finding the sub classes for a given class
and then getting all the instances of those
classes.

11
Subsumption Indexes

Properties considered direct values
It uses to indexes
maps property to its objects
maps objects to its respective subjects or
instances
Subsumption Indexes for properties considering
derived values
Gets sub properties of a given property
Gets instances using the above maps

12
Path Expressions

This index structure is a disk based map that
holds a collection of trees which hold path
information between two nodes in the graph
The trees store what edges go along the path,
which allows the index to filter out paths based
on edges that are required to be the path

13
Path Expressions

The trees have a CollapsedLeaf component in side
which hold all the path information.
allowing to store the tree without the entire
tree structure
Berkley DB for storing the trees is setup to be
used like a map and implements the Map interface
in Java
This allows the quick picking of the most common
case (i.e. searches that involve some node)

14
Path Expressions

PETree Fields
startNode (Subject)
This is a long that holds the numerical
representation a RDF Node
endNode (Object)
This is a long that holds the numerical
representation a RDF Node
count
Holds the number of paths in the tree

15
PathExpression

CollapsedLeaf Fields
startNode
This is the subject node of the (s -gt p -gt o)
this leaf stores
endNode
This is the object node of the (s-gt p -gt o) this
leaf stores
predicateNumber
The predicate of the (s -gt p -gt o) this leaf
stores
LRbits
Stores if a leaf is a left child or a right child
of a node in the tree
Treebits
Stores how the tree was corrected whether at a
certain point it was unioned or joined.

16
Path Expressions

Paths are built end node first
The node that comes just before the end is
checked for compatibility with the end by using
the BitSet and Anding them together from the two
nodes
Determining by way of the LRbits whether they
could have the same parent a position in the
BitSet with the 1 set. The 1 highest order 1 is
used when making these computations. 2

17
Query Optimization

Joins can quickly become one of the more
computationally expensive operations
These have been optimized by ordering the action
of joining sets from smallest sets to largest
remaining set

18
Query Optimization

Suppose one would like to compute paths for
sources 12,13,16 to destinations 15,21
Since computing paths for 12 will compute the
paths for 13 to 15 and 16 to 21 we dont
want redundant computations

12
22
18
22
16
13
19
20
17
14
21
15
19
Query Optimization

Suppose one would like to compute paths for
sources 12,13,16 to destinations 15,21
Since computing paths for 12 will compute the
paths for 13 to 15 and 16 to 21 we dont
want redundant computations

12
22
18
16
13
22
19
20
17
14
15
21
20
Query Optimization

Solving Paths for multiple sources
A Path for one source to a destination may
contain the node(s) of (an)other source(s)
Data structures are used to
Keep track of which source nodes have been
visited in a Path
Keep track of which source nodes paths have been
computed
Keep track of the Path expressions
Redundancy of computing Path expressions is
eliminated

21
GUI

An intuitive GUI has been designed to allow users
to input
Resource instances
Classes
Predicate Properties
Attributes
Uses may enter multiple inputs via Add button
Check boxes allow users to choose between derived
or direct predicates

22
Enter Resource Instance
Source
Destination
Add
Add
Enter Resource Class Predicate
Destination
Source
Add
Add
Enter Property Predicate
Enter Attribute Predicate
Add
Add
Search
Reset
23
References

1 Stuckenschmidt, H. Vdovjak, R. Houben, G.
Broekstra, J. Index Structures and Algorithms
for Querying Distributed RDF Repositories
2 Christophides, V. Plexousakis, D. Scholl,
M. Tourtounis, S, On Labeling Schemes for
Semantic Web
3 Kemafor Anyanwu, Angela Maduko, Amit Sheth,
SemRank Ranking Complex Relationship Search
Results on the Semantic Web, The 14th
International World Wide Web Conference,
(WWW2005), Chiba, Japan, May 10-14, 2005
4 Kemafor Anyanwu, Angela Maduko, Amit Sheth,
John Miller Top-k Path Query Evaluation in
Semantic Web Databases, The 24th ACM SIGMOD
International Conference on Management of Data,
(SIGMOD 2005), Baltimore, Maryland, June 14 - 16,
2005
5 Kemafor Anyanwu, Amit Sheth, The ? Operator
Discovering and Ranking Associations on the
Semantic Web, SIGMOD Record (Special issue on
Amicalola Workshop), 31 (4), pp. 42-47. 2002
6 B. Aleman-Meza, C. Halaschek-Wiener, I. B.
Arpinar, C. Ramakrishnan, and A. Sheth, Ranking
Complex Relationships on the Semantic Web, IEEE
Internet Computing, 9(3)37-44, May/June 2005
7 B. Aleman-Meza, C. Halaschek, I. B. Arpinar,
and A. Sheth, Context-Aware Semantic Association
Ranking, First International Workshop on Semantic
Web and Databases, Berlin, Germany, September
7-8, 2003, pp. 33-50