An Efficient SQL-based RDF Querying Scheme - PowerPoint PPT Presentation

Loading...

PPT – An Efficient SQL-based RDF Querying Scheme PowerPoint presentation | free to download - id: 6ca236-MGUyM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

An Efficient SQL-based RDF Querying Scheme

Description:

PowerPoint Presentation – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 37
Provided by: vld2
Learn more at: http://www.vldb2005.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: An Efficient SQL-based RDF Querying Scheme


1
(No Transcript)
2
An Efficient SQL-based RDF Querying Scheme
Eugene Inseok Chong Souripriya
Das George Eadon Jagannathan
Srinivasan New England Development CenterOracle
3
Talk Outline
  • Introduction
  • Functionality
  • Design and Implementation
  • Performance
  • Conclusions and Future Work

4
Introduction
5
RDF (Resource Description Framework)
  • RDF is a W3C Standard for describing resources on
    the web
  • Uniform Resource Identifiers (URIs) are used to
    identify resources
  • Example http//www.oracle.com/peopleJohn
  • RDF triples are used to make statements about a
    resource
  • Format (subject predicate object)
  • Example (John brotherOf Mary)
  • Represents a directed, labeled edge in an RDF
    graph

brotherOf
John
Mary
6
RDF Data and Graph Example
  • Family Data (John brotherOf Mary)
  • (Mary parentOf Matt)
  • (John name
    John)
  • (Mary name
    Mary)
  • (Matt name
    Matt)

John
name
John
brotherOf
parentOf
Mary
Matt
name
name
Mary
Matt
7
RDF Querying Problem
  • Given
  • RDF graphs the data set to be searched
  • Graph Pattern containing a set of variables
  • Find
  • Matching Subgraphs
  • Return
  • Sets of variable bindings where each set
    corresponds to a Matching Subgraph

8
RDF Query Example
John
  • Family Data (John brotherOf Mary)
  • (Mary parentOf Matt)
  • (John name
    John)
  • (Mary name
    Mary)
  • (Matt name
    Matt)
  • Graph Pattern (names of Marys brothers)
  • (?x brotherOf ?y)
  • (?y name Mary)
  • (?x name ?n)
  • Variable Bindings
  • x John, y Mary, n
    John
  • Matching Subgraph
  • (John brotherOf Mary)
  • (Mary name Mary)
  • (John name John)

name
John
brotherOf
parentOf
Mary
Matt
name
name
Mary
Matt
9
RDF Storage Issues
  • Need to store RDF ltsubject, predicate, objectgt
    triples where the individual components can be
    URIs, blank nodes, or literals
  • Namespaces used in URIs could be long
  • Multiple triples describe a resource resulting
    in repetition of (possibly long) URIs
  • Different representations possible for a literal
    occurring in multiple triples
  • e.g. 120 120.0 12.0e1 1.20e2
  • RDF graph may include schema triples
  • e.g. (brotherOf rdfsdomain Male)

10
RDF Querying Issues in SQL
  • Support specification of graph pattern-based SQL
    query
  • Occurrence of same variables in multiple triples
    of graph pattern Processing requires self-join
  • e.g. (?x brotherOf ?y)
  • (?y name Mary)
  • (?x name ?n)
  • Query processing (e.g for filter conditions,
    ORDER BY) requires datatype-specific comparison
    semantics
  • Schema Triple (age rdfsrange xsdint)
  • Graph Pattern (?x age ?a)
  • Filter Condition a gt 60
  • ORDER BY a DESCENDING

11
RDF Querying Issues Inference
  • Query processing may involve Inferencing.
  • Example
  • Data (Jim brotherOf John) (John
    fatherOf Mary)
  • Graph Pattern
  • (?x uncleOf ?y)
  • Result Empty
  • Rule
  • (?x brotherOf ?y) (?y fatherOf ?z)
  • ? (?x uncleOf ?z)
  • Inferred data (Jim uncleOf Mary)
  • Result x Jim, y Mary

12
RDF Querying Approach
  • General Approach
  • Create a new (declarative, SQL-like) query
    language
  • e.g. RQL, SeRQL, TRIPLE, N3, Versa, SPARQL,
    RDQL, RDFQL, SquishQL, RSQL, etc.
  • SQL-based Approach
  • Introduces a SQL Table Function RDF_MATCH that
    uses SPARQL-like graph pattern to express RDF
    queries
  • Benefits of SQL-based Approach
  • Leverages all the powerful constructs in SQL
    (e.g., SELECT / FROM / WHERE, ORDER BY, GROUP BY,
    aggregates, Join) to process graph query results
  • RDF queries can easily be combined with
    conventional queries on database tables thereby
    avoiding staging

13
Embedding RDF Query in SQL
  • SELECT FROM , TABLE (


  • ) t, WHERE
  • Use of RDF_MATCH Table Function allows embedding
    a graph query in a SQL query

RDF Query (expressed as RDF_MATCH Table Function
invocation)
14
Functionality
15
RDF_MATCH Table Function
  • Input parameters
  • RDF_MATCH (Pattern, ? graph patternModels, ?
    Data (set of RDF graphs)RuleBases, ? Rules (0 or
    more rulebases)Aliases ? list of prefixes for
    namespaces)
  • Returns a set of columns containing variable
    bindings
  • Variable matching URI returned as single VARCHAR2
    column with the same name (e.g. x for ?x)
  • Variable matching literal returned as a pair of
    VARCHAR2 columns with a name (e.g. x for ?x) and
    the type (xtype for ?x)

16
RDF_MATCH Example
  • Example student reviewers less than 25 years
    old
  • SELECT t.r reviewer, t.c conf, t.a age
  • FROM TABLE (
  • RDF_MATCH (
  • (?r rdftype Student)
  • (?r reviewerOf ?c)
  • (?r age ?a),
  • RDFModels(reviewers),
  • NULL,
  • RDFAliases())
  • ) t
  • WHERE t.a lt 25

17
Specifying Rules
  • RDFS rulebase Pre-Loaded
  • Can add User-defined rules
  • Rule
  • Chairperson of Conference is also a
    reviewer (rb, ? rulebase name
  • ChairpersonRule, ? rule name
  • (?r ChairpersonOf ?c) ?
    antecedents
  • NULL, ? filter condition
  • NULL, ? aliases
  • (?r ReviewerOf ?c)) ? consequents

18
RDF_MATCH Example with rulebase
  • Query Find reviewers of conferences
  • SELECT t.r reviewer FROM TABLE(
  • RDF_MATCH(
  • (?r ReviewerOf ?c),
  • RDFModels
    (reviewers),
  • RDFRules (rb),
  • NULL)) t
  • Data ? (Mary ChairpersonOf
    IDBC2005)
  • Inferred data ? (Mary ReviewerOf IDBC2005)

19
Design Implementation
20
RDF Data Storage
  • Triples Data stored after normalization in two
    tables
  • UriMap(UriID, UriValue,) contains mapping of
    (URIs, blank nodes, literals) to internal
    identifiers
  • IdTriples (ModelID, SubjectID, PropertyID,
    ObjectID,) contains the triple information
    encoded as three identifiers
  • Multiple representation of literals The first
    occurrence treated as canonical, rest mapped to
    canonical representation
  • e.g. 120.0 ? 120 1.20e2 12.0e1

21
RDF_MATCH Query Processing
  • Subsititute aliases with namespaces in search
    pattern
  • Convert URIs and literals to internal IDs
  • Generate Query
  • Generate self-join query based on matching
    variables
  • Generate SQL subqueries for rulebases component
    (if any)
  • Generate the join result by joining internal IDs
    with UriMap table
  • Use model IDs to restrict IdTriples table
  • Compile and Execute the generated query

22
Optimization Table Function Rewrite
  • TableRewriteSQL( )
  • Takes RDF Query (specified via arguments) as
    input
  • generates a SQL string
  • Substitute the table function call with the
    generated SQL string
  • Reparse and execute the resulting query
  • Advantages
  • Avoid execution-time overhead (linear in number
    of result rows) associated with table function
    infrastructure
  • Leverage SQL optimizer capabilities to optimize
    the resulting query (including filter condition
    pushdown)

23
Optimization Materialized Join Views
  • Generic Materialized Join views (MJVs)
  • Subject-Subject, Object-Subject,
  • Subject-property matrix MJVs (SPMJVs)
  • custom, workload based (e.g., frequent search
    patterns)
  • Example Select student name, university, and age
  • Select r, u, a
  • (?r rdftype Student)
  • (?r enrolledAt ?u)
  • (?r age ?a)
  • SPMJV lt Student enrolledAt age gt

24
Performance
25
Dataset
  • WordNet lexical database for English
    language
  • UniProt large scale (80 million triples)
  • Protein and annotation data

26
Experiments
  • Varying number of triples in search pattern
  • Varying filter conditions
  • Varying projection list
  • Large-scale RDF data
  • Subject-property MJVs

27
Varying Number of Triples
  • (?a wnhyponymOf ?b)
  • (?b wnhyponymOf ?c)
  • ..
  • Increasing number of self-joins

28
Varying Number of Triples
29
Varying Projection List
  • (?c0 wnwordForm ?word)
  • (?c0 wnwordForm ?syn1)
  • (?c1 wnwordForm ?syn1)
  • . (5 triples)
  • Benefit of the projection list optimization
  • Eliminate joins with UriMap table for variables
    not referenced outside of RDF_MATCH

30
Varying Projection List
31
Large-Scale RDF Data
  • UniProt 10M, 20M, 40M, 80M triples
  • 6 example queries given with UniProt
  • Number of matches remain constant as dataset size
    changes (ROWNUM)

32
(No Transcript)
33
Query Response Times
34
Conclusions
35
Conclusions and Future Work
  • SQL-based RDF querying scheme
  • RDF_MATCH table function
  • Supports graph-pattern based query on RDF data
    with RDFS and user-defined rules
  • Efficient Execution
  • Table Function Rewrite
  • Materialized Join Views Generic and
    Subject-Property
  • Rule Indexes
  • Future work
  • OPTIONAL support outer-join
  • Provenance support

36
(No Transcript)
About PowerShow.com