Indexes for Supporting Path Queries - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Indexes for Supporting Path Queries

Description:

Current RDF Stores use traditional relational databases to store RDF data. ... We use Berkeley DB as our backing store. Index Manager. Main Memory. Disk Indexes ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 24
Provided by: scottpa5
Category:

less

Transcript and Presenter's Notes

Title: Indexes for Supporting Path Queries


1
Indexes for Supporting Path Queries
  • Presented by
  • Doug Brewer, Scott Patterson, Ravi Pavagada

2
Outline
  • Problem Description
  • Goal
  • Indexes
  • Attribute
  • Subsumption
  • Path
  • Query Optimization
  • GUI

3
Problem Description
  • Current RDF Stores use traditional relational
    databases to store RDF data.
  • Only some can do inferencing and subsumption.
  • They all have to do expensive joins to compute
    path queries with predicates and subsumption.

4
Our goal
  • To create a RDF store to accomplish the following
    more efficiently than current stores
  • Compute queries on Classes or Properties
  • direct and derived
  • Compute attribute predicate queries
  • Compute path queries with predicates
  • Compute multiple source and/or destination
    queries

5
System Architecture
  • We use Berkeley DB as our backing store

Index Manager
Main Memory
Disk Indexes
6
Indexes
  • Created two different types of indexes on
    resources
  • Attribute indexes
  • Subsumption indexes

7
Attribute Indexes
  • Built during pre-processing phase for each
    property that has a literal value
  • These indexes are used for handling wildcards
    in the queries
  • Brahms main memory model was used to load RDF
    graph
  • The attribute indexes are then created and stored
    in a data structure similar to a Patricia Trie
  • A Patricia Trie is built for each attribute

8
Attribute Indexes - Patricia Trie
  • Contains a set of values for a given property
  • Patricia Trie structure
  • Each tree or sub tree stores in its root the
    longest common prefix of all strings, possibly
    the empty string
  • For each string stored in the sub tree, a suffix
    is computed
  • Each suffix is stored in a child node, e.g. the
    root of a new sub tree

9
Attribute Indexes - Patricia Trie
  • Patricia Trie

Amit
Doug
Haibo
John
Mat
Ra
Scott
Mathew
P Sheth
Sheth
Edward
Lathem
Miller
Perry
hew
j
mesh
vi

e
ndra
sh
Patricia Trie index is used to get the attribute
values that match a given wildcard and returns a
set of instances that contain these attribute
values
10
Subsumption Indexes
  • Uses the Brahms and SemDis APIs
  • Four types of subsumption indices
  • Classes considered direct instances
  • This index maps the class URI to its instance
    ids. The instance Ids are unique ids assigned
    for each instance.
  • Classes considered derived instances
  • This index maps the class URI to its derived
    instance ids. Derived instances are obtained by
    first finding the sub classes for a given class
    and then getting all the instances of those
    classes.

11
Subsumption Indexes
  • Properties considered direct values
  • It uses to indexes
  • maps property to its objects
  • maps objects to its respective subjects or
    instances
  • Subsumption Indexes for properties considering
    derived values
  • Gets sub properties of a given property
  • Gets instances using the above maps

12
Path Expressions
  • This index structure is a disk based map that
    holds a collection of trees which hold path
    information between two nodes in the graph
  • The trees store what edges go along the path,
    which allows the index to filter out paths based
    on edges that are required to be the path

13
Path Expressions
  • The trees have a CollapsedLeaf component in side
    which hold all the path information.
  • allowing to store the tree without the entire
    tree structure
  • Berkley DB for storing the trees is setup to be
    used like a map and implements the Map interface
    in Java
  • This allows the quick picking of the most common
    case (i.e. searches that involve some node)

14
Path Expressions
  • PETree Fields
  • startNode (Subject)
  • This is a long that holds the numerical
    representation a RDF Node
  • endNode (Object)
  • This is a long that holds the numerical
    representation a RDF Node
  • count
  • Holds the number of paths in the tree

15
PathExpression
  • CollapsedLeaf Fields
  • startNode
  • This is the subject node of the (s -gt p -gt o)
    this leaf stores
  • endNode
  • This is the object node of the (s-gt p -gt o) this
    leaf stores
  • predicateNumber
  • The predicate of the (s -gt p -gt o) this leaf
    stores
  • LRbits
  • Stores if a leaf is a left child or a right child
    of a node in the tree
  • Treebits
  • Stores how the tree was corrected whether at a
    certain point it was unioned or joined.

16
Path Expressions
  • Paths are built end node first
  • The node that comes just before the end is
    checked for compatibility with the end by using
    the BitSet and Anding them together from the two
    nodes
  • Determining by way of the LRbits whether they
    could have the same parent a position in the
    BitSet with the 1 set. The 1 highest order 1 is
    used when making these computations. 2

17
Query Optimization
  • Joins can quickly become one of the more
    computationally expensive operations
  • These have been optimized by ordering the action
    of joining sets from smallest sets to largest
    remaining set

18
Query Optimization
  • Suppose one would like to compute paths for
    sources 12,13,16 to destinations 15,21
  • Since computing paths for 12 will compute the
    paths for 13 to 15 and 16 to 21 we dont
    want redundant computations

12
22
18
22
16
13
19
20
17
14
21
15
19
Query Optimization
  • Suppose one would like to compute paths for
    sources 12,13,16 to destinations 15,21
  • Since computing paths for 12 will compute the
    paths for 13 to 15 and 16 to 21 we dont
    want redundant computations

12
22
18
16
13
22
19
20
17
14
15
21
20
Query Optimization
  • Solving Paths for multiple sources
  • A Path for one source to a destination may
    contain the node(s) of (an)other source(s)
  • Data structures are used to
  • Keep track of which source nodes have been
    visited in a Path
  • Keep track of which source nodes paths have been
    computed
  • Keep track of the Path expressions
  • Redundancy of computing Path expressions is
    eliminated

21
GUI
  • An intuitive GUI has been designed to allow users
    to input
  • Resource instances
  • Classes
  • Predicate Properties
  • Attributes
  • Uses may enter multiple inputs via Add button
  • Check boxes allow users to choose between derived
    or direct predicates

22
Enter Resource Instance
Source
Destination
Add
Add
Enter Resource Class Predicate
Destination
Source
Add
Add
Enter Property Predicate
Enter Attribute Predicate
Add
Add
Search
Reset
23
References
  • 1 Stuckenschmidt, H. Vdovjak, R. Houben, G.
    Broekstra, J. Index Structures and Algorithms
    for Querying Distributed RDF Repositories
  • 2 Christophides, V. Plexousakis, D. Scholl,
    M. Tourtounis, S, On Labeling Schemes for
    Semantic Web
  • 3 Kemafor Anyanwu, Angela Maduko, Amit Sheth,
    SemRank Ranking Complex Relationship Search
    Results on the Semantic Web, The 14th
    International World Wide Web Conference,
    (WWW2005), Chiba, Japan, May 10-14, 2005
  • 4 Kemafor Anyanwu, Angela Maduko, Amit Sheth,
    John Miller Top-k Path Query Evaluation in
    Semantic Web Databases, The 24th ACM SIGMOD
    International Conference on Management of Data,
    (SIGMOD 2005), Baltimore, Maryland, June 14 - 16,
    2005
  • 5 Kemafor Anyanwu, Amit Sheth, The ? Operator
    Discovering and Ranking Associations on the
    Semantic Web, SIGMOD Record (Special issue on
    Amicalola Workshop), 31 (4), pp. 42-47. 2002
  • 6 B. Aleman-Meza, C. Halaschek-Wiener, I. B.
    Arpinar, C. Ramakrishnan, and A. Sheth, Ranking
    Complex Relationships on the Semantic Web, IEEE
    Internet Computing, 9(3)37-44, May/June 2005
  • 7 B. Aleman-Meza, C. Halaschek, I. B. Arpinar,
    and A. Sheth, Context-Aware Semantic Association
    Ranking, First International Workshop on Semantic
    Web and Databases, Berlin, Germany, September
    7-8, 2003, pp. 33-50
Write a Comment
User Comments (0)
About PowerShow.com