Comparing pathbased and verticallypartitioned RDF databases

About This Presentation

Title:

Comparing pathbased and verticallypartitioned RDF databases

Description:

Comparing path-based and vertically-partitioned. RDF databases. Preetha Lakshmi & Chris Mueller ... INSERT, UPDATE, & DELETE are insignificant compared to SELECT ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 23

Provided by: cfans

Learn more at: https://www-users.cse.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Comparing pathbased and verticallypartitioned RDF databases

1
Comparing path-based and vertically-partitioned
RDF databases

Preetha Lakshmi Chris Mueller
12/10/2007
CSCI 8715
Shashi Shekhar

2
Outline

Motivation
Background and related work
Problem statement
Our contributions
Assumptions
Experimental process
Results
Conclusions

3
Motivation

Semantic Web
libraries
scientific databases
industry
social networks
Computer-to-computer communication

4
RDF Schema
Schema
Instance
5
RDF Schema
RDF Triples ltsubject, property,
objectgt ltwww.picasso.net , first, Pablogt
6
Related Work

Triple store
Property tables
Class property tables
Dynamic table model
Vertically partitioned tables (Abadi, et al
2007)?
Path based approach (Matono, et al 2005)

Require more self joins, normal joins, NULL value
storage
7
Vertical Partitioning

A table is created for each property

First Subject Object 'r1' 'Picasso' 'r4' 'Au
gust'
Last Subject Object 'r1' 'Picasso' 'r4' 'Rod
in'
Paints Subject Object 'r1' 'r2' 'r1' 'r3'
... etc.
8
Path-based Model

Path signatures relate to instance data

Path pathid pathexp 1 '' 2 'first' 3 'las
t' 4 'paints' 5 'titleltpaints' 6 'sculpts'
7 'titleltsculpts'
Resource name pathid root 'r1' 1 'r1' 'r2'
4 'r1' 'r3' 4 'r1' 'r4' 1 'r4' 'Picasso'
2 'r1' 'Pablo' 3 'r1' 'August' 2 'r4' 'Rodi
n' 3 'r4' ...
Our enhancement
9
Problem Statement

Given
A set of RDF triples
Vertical partitioning storage model
Path-based storage model
Find Query plans for the various categories of
queries under these two storage schemes.
Objective To determine query types that perform
comparatively better or worse in two storage
models
Why is this challenging?
Need for efficient storage of structured data
Different application domains use RDF, generic
storage schemes should support a diverse
workload.

10
Contributions

Identification of benchmark queries
schema, instance, path, and aggregate queries
Enhancement to the path-based schema that
addresses different types of workloads
Comparison of path-based model and vertical
partitioning
Analysis of cyclic queries

11
Query Types
Non-path
Path
Schema vs Instance
Aggregate
List
Cycle
Connection
Diameter
Constraints
Relationship
intermediate node
terminal node

Schema queries
find all types of artists
list all property names
list nodes with 2 or more descendants.
find the transitive sub-classes of a class
'sculpture'
list properties with 2 or more descendants
Instance queries
find the titles of all paintings by Picasso
select all nodes within one edge-length of R4
list all the properties of node r4

12
Query Types

Path queries
find the title of any painting painted by anyone
display all the titles of work done by artists
find the names of all the sculptors
...with constraint on intermediate node
find an artist's name where the artifact is a
painting
...with terminal node constraints
display all the titles of work done by Picasso

13
Query Types

Path queries
connection queries
list all the properties of node r4
is there a connection between 'Picasso' and
'Guernica'?
diameter queries
select all nodes in the graph within one
edge-length of R4
non-simple path queries
detect loops in the dataset starting at 'Picasso'
detect loops in the whole dataset

14
Query Types

Aggregate queries
find all nodes with 2 or more properties
list all subjects that have two instances of a
single property
Relationship queries
find any relationship between r1 and r4

15
Assumptions

Using a small dataset, with the assumption that
number of joins and efficiency of the queries
will not change significantly with larger
datasets
No explicit storage of the RDF schema in the
vertically-partitioned scheme (application
independent)?
INSERT, UPDATE, DELETE are insignificant
compared to SELECT
Key nodes in the path-based model are
well-defined
In practice, key nodes, would be generated
dynamically after user load analysis

16
Experimental Process

Setup both schemes in Oracle 10g for the RDF
graph shown earlier
Materialized path lengths in path-based scheme
Generated query plans
Analyzed queries based on the validation
parameters
Cycle queries joins are not supported

Validation parameters
Nodes
Edges
Number of joins
Number of tables
CPU cost
Storage bytes

17
Dataset used for experiment
18
Experimental Results

For CPU cost and bytes (storage) the entry in
the table indicates which scheme used less CPU
cycles or occupied less space. In cases where
both required an identical or similar amount of
computation or storage, we indicate this with
same.
Queries which cannot be answered are indicated by
--.

19
Conclusions Observations

Vertical Partitioning performs well for
Short path length, terminal node constraints.
Offers storage benefits for instance queries
without path expressions.
Enhanced Path Based model performs well for
Schema queries, path queries, cycle queries
Queries which the original path-based could not
address and the enhanced model could answer
Connection queries and diameter queries
Path queries with intermediate node constraints

20
Conclusion (Cont'd)?

Both the schemes show the same performance on
instance queries without path expressions.
Both the schemes do not address relationship
queries
Interesting results for cycle queries
specifying the start node gives a bad performance
than when the start node is not specified
specifying the start node uses Oracle Filter.

21
Future Work

Test large and diverse datasets
Test vertical partitioning with a
column-orientated database like MonetDB
Pruning strategies for cycle queries
Impose join indexes
Find approaches to answer relationship queries
Storage classification based on the application
domain

22
Thank You

Questions?

Please see http//www.cs.umn.edu/cmueller/cs8715
for a copy of the report that accompanies this
presentation, including a full bibliography

Write a Comment

User Comments (0)

About PowerShow.com

Comparing pathbased and verticallypartitioned RDF databases - PowerPoint PPT Presentation

Comparing pathbased and verticallypartitioned RDF databases

Comparing path-based and vertically-partitioned. RDF databases. Preetha Lakshmi & Chris Mueller ... INSERT, UPDATE, & DELETE are insignificant compared to SELECT ... – PowerPoint PPT presentation