Jena Persistent Storage Property Table Design - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Jena Persistent Storage Property Table Design

Description:

seamless querying across Jena and legacy db ... 6. button. 30. Inventory table. Inventory relationship table. Compound key: (storeId,partId) ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 27
Provided by: kevinwi1
Category:

less

Transcript and Presenter's Notes

Title: Jena Persistent Storage Property Table Design


1
Jena Persistent StorageProperty Table Design
  • Kevin Wilkinson
  • HP Labs Palo Alto

2
Islands of Information
LegacyRelationalContinent
D2RQShipping Co.
Can ship from RDBMS to Jena (takes effort). Cant
get back. Goal a nearly seamless bridge.
Jena
3
Topics
  • Motivation Why Property Tables?
  • Creating Property Table
  • Accessing Property Tables
  • Accessing Legacy Database Tables
  • Implementation

4
Background RDF Persistence
  • Task persistent storage for an RDF graph
  • Conventional solution - Triple Store
  • Problems with Triple Store approach
  • Property tables how they help
  • Property tables in Jena today

5
RDF Triple Store Approach
  • Statement table and Symbols table (RDBMS)
  • efficient in space
  • - retrieval requires 3-way join
  • - data patterns ignored (everyone has a name,
    addr)

Statement table
Symbols table
6
Problems with Triple Store
  • Doesnt leverage patterns in data
  • Cant leverage locality (spatial/temporal)
  • Excessive load time (cant use db loader)
  • Database optimizer useless no statistics
    (?var, exempId, 123) vs. (?var, exgender,
    M)
  • Alternatives native RDF store, object-relational
    store, property tables

7
Whats a Property Table?
  • A table that stores patterns of RDF statements
  • n-column prop tbl stores n-1 statements (1 col
    per prop)
  • Augments, doesnt replace, triple store
  • Partitioned statements a statement is stored in
    TS or a prop tbl, never both
  • Partitioned properties all values for a given
    property are in TS or a prop tbl

8
Triple Store plus Property Tables
Triple Store Only
Person Property Table
Triple Store
9
Property Table Pro/Con
  • Advantages
  • efficiencies in storage and access
  • transparent to application
  • enables access to legacy relational tables (which
    can be modeled as property tables)
  • bridges the RDF-relational divide
  • Disadvantages
  • exhaustive search if property unknown (Tony
    Blair, -, -)
  • queries dont compile to single SQL statement
  • loss of flexibility fixed schema, typed property
    values

10
Property Tables in Jena Today
  • Two tables created for each Jena2 graph
  • Stmt table a triple store
  • Reif table property table for reified stmts
    (the only property table currently supported)
  • Our goal generalize the existing framework

11
Creating Property Tables
  • Types of property tables
  • Column encoding
  • Table specification

12
Types of Property Tables
  • Single-valued property table stores several
    single-valued properties for a subject
  • Multi-valued property table stores one
    multi-valued property for a subject
  • Property-class table stores class
    membershiprdftype only property allowed in
    multiple tables

13
Property Table Column Encoding
  • Issue how to encode values in columns?
  • Option1 Jena encoding or symbol ids
  • enhttp//www.hp.com/exfoo or 1234
  • Option2 native db encoding
  • foo
  • Choice support both
  • Option2 needed to access legacy database tables

14
Property Table Creation
  • Property tables are
  • user-defined
  • sharable across graphs
  • created when graph is created
  • specified in a meta-graph (RDF stmts)
  • table name, type, column descriptors, etc.

15
Accessing Property Tables
  • Graph operations add, delete, find, query
  • Recall, properties are partitioned over tables
  • add, delete is applied to table for stmt prop
  • find is applied to each table, results merged
  • query requires special processing

16
Add Stmt on Property Tables
  • Add 1 statement create new row in tableuse null
    for unknown property values
  • Add n statements (bulk add) order stmts by
    subject and add one row for each subject
  • Delete is similar

17
Find Operation on Prop Tbls
  • Find (s,p,o), each s,p,o is value or
    dont-carereturns all matching RDF statements
  • Goal process find with one SQL statement
  • Triple-store 8 possible find patterns
  • (-,-,-) (s,-,-) (-,p,-) (s,p,-) (-,-,o) (s,-,o)
    (-,p,o) (s,p,o)
  • Predefine 8 SQL queries, one for each find
    pattern
  • Property table of p props
  • 4(p1) possible queries
  • Dont predefine queries, generate and cache them

18
Query Processing on Prop Tbls
  • Goal of 1 SQL stmt for query not achievable
  • e.g., the query
  • ( Tony Blair, -, ?var
    )
  • must search all tables and merge
    results.This cant be done with a union query

19
Query Proc on Prop Tbls contd
  • But, some joins can be eliminated
  • e.g., given a person name-address property table,
  • the query
  • (?var,name,-) (?var,addr,-)
  • can be processed as an SQL select (no join)
  • Over a triple store, this query requires a join.

20
Accessing Legacy Database Tables
  • Goal access legacy relational db tables
  • Note D2RQ provides read-only access
  • We want
  • support for updates
  • seamless querying across Jena and legacy db
  • Challenge extend Jena property tables to support
    legacy tables

21
Legacy Table Columns
  • Legacy table has a key and n value columns
  • The key identifies some object (i.e., resource)
  • If key is only 1 column, looks like a property
    table
  • If key is gt 1 column (i.e., a compound key), need
    a work-around (called virtual bnodes)

key
val1
val2
valn
22
Support for Compound Keys
  • Assume key components (keyi) identify objects
  • So, a compound key represents a relationship
    among the key components
  • RDF models compound relationships with bnodes
  • Legacy table has no bnode, so we have to fake it

key
23
Virtual Bnodes for Compound Keys
  • Virtual bnode surrogate for a compound key
  • identifies a table row, i.e., a relationship
    instance
  • can be used in querying
  • generated dynamically upon retrieval e.g.,
    concatenate key components
  • compound key properties map from virtual bnode to
    key components

24
Example inventory table
Inventory table
Inventory relationship table Compound key
(storeId,partId) RDF graphs for each
row Compound key properties exstoreId,
expartId Virtual bnode ids (e.g., _i1) generated
dynamically
_i1
_i2
25
Implementation
  • In progress, first testing in a month or so
  • Legacy database table support to follow(another
    month)
  • Initially, RDQL support only
  • Performance evaluation synthetic, scalable
    dataset and benchmark queries

26
Summary
  • Property tables
  • leverage patterns in RDF datasets
  • performance benefit for some applications
  • better enable use of relational tools (loaders,
    optimizers)
  • Legacy relational tables
  • can be updated
  • look like property tables
  • virtual bnodes to represent compound key
Write a Comment
User Comments (0)
About PowerShow.com