RDF Triple Stores - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

RDF Triple Stores

Description:

RDF Triple Stores. Nipun Bhatia. Department of Computer Science. Stanford University ... Better performance of native stores points to that direction. ... – PowerPoint PPT presentation

Number of Views:292
Avg rating:3.0/5.0
Slides: 20
Provided by: bioont
Category:
Tags: rdf | computer | stores | triple

less

Transcript and Presenter's Notes

Title: RDF Triple Stores


1
RDF Triple Stores
  • Nipun Bhatia
  • Department of Computer Science. Stanford
    University

2
Contents
  • Introduction
  • Different Architectures
  • Implications
  • An Example Jena SDB
  • Evaluations
  • Evaluations using LUBM/DBPedia
  • Open Research Issues
  • Which RDF Store to choose for a particular
    application?
  • Possible system diagram for Phenotype
    Annonations.

3
Introduction
  • What is an RDF store?
  • A system to provide a mechanism for persistent
    storage and access of RDF graphs.
  • Potential Applications areas
  • Plenty! Backend for Protege, BioPortal,
    Phenotype Annotations.

4
Different Architectures
  • Based on their implementation, can be divided
    into 3 broad categories In-memory, Native,
    Non-native Non-memory.
  • In Memory RDF Graph is stored as triples in
    main memory. Eg. Storing an RDF graph using Jena
    API/ Sesame API.
  • Native Persistent storage systems with their
    own implementation of databases. Eg. Sesame
    Native, Virtuoso, AllegroGraph, Oracle 11g.
  • Non-Native Non-Memory Persistent storage
    systems set-up to run on third party DBs. Eg.
    Jena SDB.

5
Implications
  • Scalability
  • Different query languages supported to varying
    degrees.
  • Sesame SeRQL, Oracle 11g Own query language.
  • Different level of inferencing.
  • Sesame supports RDFS inference, AllegroGraph
    RDFS,
  • Oracle 11g RDFS, OWL Prime
  • Lack of interoperability and portability.
  • More pronounced in Native stores.

6
Jena SDB
  • SDB basically is a Java Loader.
  • Multiple stores supported MySQL, PostgreSQL,
    Oracle, DB2.
  • Takes incoming triples and breaks them down into
    components ready for the database.
  • Multiple layouts
  • Integration with the Joseki server.
  • SPARQL supported.

7
Evaluations
  • Third party evaluations for Sesame, Jena SDB,
    Virtuoso
  • Oracle 11g company evaluations
  • Methodology
  • LUBM Lehigh University BenchMark
  • DBPedia
  • Multiple Queries
  • Load Times

8
Evaluations
  • DB Pedia Database of structured information
    extracted from Wikipedia. Information about
    places, persons, music albums and films2
  • LUBM Synthetically generated RDF data
    containing universities, departments, students
    etc.1
  • Dataset size
  • DataSet1 15,472,624 triples 2.1 GB
  • DataSet 2 LUBM 50 2.75 Million LUBM 1000
    55.09 Million
  • 3 Queries

9
Loading Time-DataSet1
10
Results Query 1
  • Simple select query 2 variables

11
Query 2
  • Unconstrained Select Query only predicate was
    specified.

12
Query 3
  • Complex Query Uses filter

13
Oracle 11g DataSet 2
Ontology (size) RDFS RDFS OWL Prime OWL Prime
Ontology (size) Triples Time Triples Time
LUBM 50(6.8 Million) 2.75 M 12.14 min 3.05 M 8.01 min
LUBM 1000(133.6 M) 55.09M 7h 19m 65.25M 7h 12m
14
Observations
  • Native Stores perform better than systems using
    third party stores.
  • Optimizations are possible
  • Each of the systems uses different database
    layouts.
  • Virtuoso OGPS,POGS,PSOG,SOPG
  • SDB SPO,GSPO
  • Hashing on SDB is very bad.

15
Open Research Issues
  • Inferencing4
  • Present common implementations
  • Make a number of small queries to propagate the
    effects of rule firing.
  • Each of these queries creates an interaction with
    the database.
  • Not very efficient
  • Approaches
  • Snapshot the contents of the database-backed
    model into RAM for the duration of processing by
    the inference engine.
  • Performing inferencing in-stream.
  • Precompute the inference closure of ontology and
    analyze the in-coming data-streams, add triples
    to it based on your inference closure.
  • Assumes rigid seperation of the RDF Data(A-box)
    and the Ontology data(T-box)
  • Even this maynot work for very large ontologies
    BioMedical Ontologies

16
Open Research Issues
  • Query Optimization
  • Third party stores undos any optimization done
    at the API level.
  • Better performance of native stores points to
    that direction.
  • Some work in optimizing SPARQL queries for
    in-memory story.

17
Which RDF store to choose for an app?
  • Frequency of loads that the application would
    perform.
  • Single scaling factor and linear load times.
  • Level of inferencing.
  • Support for which query language. W3C
    recommendations.
  • Special system needs. Eg. Allegograph needs 64
    bit processor.

18
Phenotype Annotations
Jena API
Jena API
j
Inferencing
Jena Model
SDB
Jena API
Set of Ontologies required for Phenotype
Annotationseg. PATO, Fly etc.
MySQL / Virtuoso
Phenotype Annotations
Jena API
j
Jena API
Jena Model
SDB
19
References
  • 1 http//esw.w3.org/topic/RdfStoreBenchmarking
  • 2 http//www4.wiwiss.fu-berlin.de/benchmarks-200
    801/
  • 3 Kurt Rohloff et al. An Evaluation of
    Triple-Store Technologies for Large Data Stores.
    Comparing Sesame, Jena and AllegroGraph. 2007
  • 4N Bhatia, A Seaborne Ingestion pipeline for
    RDF
Write a Comment
User Comments (0)
About PowerShow.com