RDF Triple Stores presentation

About This Presentation

Transcript and Presenter's Notes

Title: RDF Triple Stores

1
RDF Triple Stores

Nipun Bhatia
Department of Computer Science. Stanford
University

2
Contents

Introduction
Different Architectures
Implications
An Example Jena SDB
Evaluations
Evaluations using LUBM/DBPedia
Open Research Issues
Which RDF Store to choose for a particular
application?
Possible system diagram for Phenotype
Annonations.

3
Introduction

What is an RDF store?
A system to provide a mechanism for persistent
storage and access of RDF graphs.
Potential Applications areas
Plenty! Backend for Protege, BioPortal,
Phenotype Annotations.

4
Different Architectures

Based on their implementation, can be divided
into 3 broad categories In-memory, Native,
Non-native Non-memory.
In Memory RDF Graph is stored as triples in
main memory. Eg. Storing an RDF graph using Jena
API/ Sesame API.
Native Persistent storage systems with their
own implementation of databases. Eg. Sesame
Native, Virtuoso, AllegroGraph, Oracle 11g.
Non-Native Non-Memory Persistent storage
systems set-up to run on third party DBs. Eg.
Jena SDB.

5
Implications

Scalability
Different query languages supported to varying
degrees.
Sesame SeRQL, Oracle 11g Own query language.
Different level of inferencing.
Sesame supports RDFS inference, AllegroGraph
RDFS,
Oracle 11g RDFS, OWL Prime
Lack of interoperability and portability.
More pronounced in Native stores.

6
Jena SDB

SDB basically is a Java Loader.
Multiple stores supported MySQL, PostgreSQL,
Oracle, DB2.
Takes incoming triples and breaks them down into
components ready for the database.
Multiple layouts
Integration with the Joseki server.
SPARQL supported.

7
Evaluations

Third party evaluations for Sesame, Jena SDB,
Virtuoso
Oracle 11g company evaluations
Methodology
LUBM Lehigh University BenchMark
DBPedia
Multiple Queries
Load Times

8
Evaluations

DB Pedia Database of structured information
extracted from Wikipedia. Information about
places, persons, music albums and films2
LUBM Synthetically generated RDF data
containing universities, departments, students
etc.1
Dataset size
DataSet1 15,472,624 triples 2.1 GB
DataSet 2 LUBM 50 2.75 Million LUBM 1000
55.09 Million
3 Queries

9
Loading Time-DataSet1
10
Results Query 1

Simple select query 2 variables

11
Query 2

Unconstrained Select Query only predicate was
specified.

12
Query 3

Complex Query Uses filter

13
Oracle 11g DataSet 2
Ontology (size) RDFS RDFS OWL Prime OWL Prime
Ontology (size) Triples Time Triples Time
LUBM 50(6.8 Million) 2.75 M 12.14 min 3.05 M 8.01 min
LUBM 1000(133.6 M) 55.09M 7h 19m 65.25M 7h 12m
14
Observations

Native Stores perform better than systems using
third party stores.
Optimizations are possible
Each of the systems uses different database
layouts.
Virtuoso OGPS,POGS,PSOG,SOPG
SDB SPO,GSPO
Hashing on SDB is very bad.

15
Open Research Issues

Inferencing4
Present common implementations
Make a number of small queries to propagate the
effects of rule firing.
Each of these queries creates an interaction with
the database.
Not very efficient
Approaches
Snapshot the contents of the database-backed
model into RAM for the duration of processing by
the inference engine.
Performing inferencing in-stream.
Precompute the inference closure of ontology and
analyze the in-coming data-streams, add triples
to it based on your inference closure.
Assumes rigid seperation of the RDF Data(A-box)
and the Ontology data(T-box)
Even this maynot work for very large ontologies
BioMedical Ontologies

16
Open Research Issues

Query Optimization
Third party stores undos any optimization done
at the API level.
Better performance of native stores points to
that direction.
Some work in optimizing SPARQL queries for
in-memory story.

17
Which RDF store to choose for an app?

Frequency of loads that the application would
perform.
Single scaling factor and linear load times.
Level of inferencing.
Support for which query language. W3C
recommendations.
Special system needs. Eg. Allegograph needs 64
bit processor.

18
Phenotype Annotations
Jena API
Jena API
j
Inferencing
Jena Model
SDB
Jena API
Set of Ontologies required for Phenotype
Annotationseg. PATO, Fly etc.
MySQL / Virtuoso
Phenotype Annotations
Jena API
j
Jena API
Jena Model
SDB
19
References

1 http//esw.w3.org/topic/RdfStoreBenchmarking
2 http//www4.wiwiss.fu-berlin.de/benchmarks-200
801/
3 Kurt Rohloff et al. An Evaluation of
Triple-Store Technologies for Large Data Stores.
Comparing Sesame, Jena and AllegroGraph. 2007
4N Bhatia, A Seaborne Ingestion pipeline for
RDF

Write a Comment

User Comments (0)

About PowerShow.com

RDF Triple Stores PowerPoint PPT Presentation