Rewriting Nested XML Queries Using Nested Views - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Rewriting Nested XML Queries Using Nested Views

Description:

Rewriting Nested XML Queries Using Nested Views – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 51
Provided by: nic171
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Rewriting Nested XML Queries Using Nested Views


1
Rewriting Nested XML Queries Using Nested Views
  • Nicola Onosejoint work withAlin Deutsch,
    Yannis Papakonstantinou, Emiran
    CurtmolaUniversity of California, San Diego

2
The problem
INTRO
query result
Can we answer Q using only view access paths?
the query Q
docVn
docV1

V1
Vn
Input XML data
  • views defined by queries V1, , Vn and
    materialized as docV1, , docVn

3
The problem
INTRO
query result
the rewritingquery R
?
the query Q
docVn
docV1

V1
Vn
Input XML data
  • views defined by queries V1, , Vn and
    materialized as docV1, , docVn
  • is there a query R such that R(V1(Input)
    Vn(Input)) Q(Input)?

4
Motivation caching indexes
INTRO
query result
the rewritingquery R
the query Q
docVn
docV1
materialized views, faster to access than the
original input

V1
Vn
Input XML data
  • caching answer new queries using results of
    previously answered ones
  • (partial) indexes materialized references to
    frequently accessed parts of the data

5
Motivation security views
INTRO
query result
the rewritingquery R
?
the query Q
docVn
docV1

V1
Vn
security views(permitted queries)
Input XML data
  • checking existence of R ? security problemallow
    only queries that can be expressed in terms of
    certain permitted queries, the security views

6
Motivation data integration
INTRO
query result
the rewritingquery R
the query Q
source1
sourcen

local/global mappings expressed as views
Virtual global DB
  • data integration given a query expressed in
    global terms, rewrite it using the descriptions
    of the particular sources

7
Rewritings enabled by pattern matching
INTRO
  • Previous literature find parts of the query that
    are precomputed by the views.
  • How to decide that match the patterns of the
    views into the query
  • In the relational case, patterns were tableaux,
    conjunctive queries
  • For XPath tree patterns
  • Matching XML queries?
  • (until recently) no pattern based description of
    XQuery semantics
  • Nested XML Tableaux (NEXT) come to fill the
    gapThe NEXT Logical Framework for XQuery,
    A.Deutsch et al., VLDB04

8
Scope of Our Approach
INTRO
Tree Patterns ? cover XPath
NEXT ? extend TreePatterns with
- nested for-loops - joins
- element construction etc.
NEXT ? extends NEXT to the whole XQuery
language, including -
function calls - universal
quantification - disjunction,
negation etc.
  • Nested XML Tableaux (NEXT) extend previous work
    on tree patterns.
  • NEXT extends NEXT to the whole XQuery.

9
Scope of Our Approach
INTRO
Tree Patterns ? cover XPath
NEXT ? extend TreePatterns with
- nested for-loops - joins
- element construction etc.
NEXT ? extends NEXT to the whole XQuery
language, including -
function calls - universal
quantification - disjunction,
negation etc.
completeness guaranteeif a rewriting exists, we
will find one
soundness guaranteeif a rewriting is found, it
is equivalent to the original query
10
Rewriting using views example
INTRO
Query Q group titles by author for each
distinct author, output the titles of his/her
books
View V group authors by title for each book,
output its title and the list of authors
The result of the view is cached and has faster
access time than getting the data directly from
the source
bib.xml
book
Rewriting R scan the view and create an entry
for each distinct author in the view output add
to it all the titles of the respective author
author
title
?
?
?
Data on the Web
11
Rewriting using views example
INTRO
Query Q group titles by author for each
distinct author, output the titles of his/her
books
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
Previous work captures - XPath navigation
Rewriting R scan the view and create an entry
for each distinct author in the view output add
to it all the titles of the respective author
12
Rewriting using views example
INTRO
Query Q group titles by author for a in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some a1 in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
  • Previous work captures - XPath navigation
  • NEXT captures - XPath navigation
  • nested for loops
  • joins
  • element construction etc.

13
Rewriting using views example
INTRO
Query Q group titles by author for in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
a
a1
  • Previous work captures - XPath navigation
  • NEXT captures - XPath navigation
  • nested for loops
  • joins
  • element construction etc.

14
Rewriting using views example
INTRO
Query Q group titles by author for a in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some a1 in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
  • Previous work captures - XPath navigation
  • NEXT captures - XPath navigation
  • nested for loops
  • joins
  • element construction etc.

15
Rewriting using views example
INTRO
Query Q group titles by author for a in
distinct-values(doc//booktitle/author) return
ltbibentrygt a, for b in
doc//book, t in b/title
where some a1 in b/author
satisfies a1 eq a
return t lt/bibentrygt
View V group authors by title for b1 in
doc//book, t1 in b1/title return
ltauthorlistgt t1, b1/author
lt/authorlistgt
bib.xml
bound to the root of the view output
book
Rewriting R for a3 in distinct-values(docV/autho
rlisttitle/author) return ltbibentrygt a3,
for p in docV/authorlist,
t3 in p/title where
some a4 in p/author
satisfies a4 eq a3 return t3
lt/bibentrygt
author
title
navigate inside the view output
?
?
?
Data on the Web
16
Outline
  • NEXT (NEsted XML Tableaux)
  • Rewriting Algorithm and Extensions
  • Experiments
  • Previous Work
  • Conclusions

17
Outline
  • NEXT (NEsted XML Tableaux)
  • Rewriting Algorithm and Extensions
  • Experiments
  • Previous Work
  • Conclusions

18
Architecture of the NEXT framework
NEXT
XQuery query and views
Normalization
patterns
Nested XML Tableaux (NEXT)
presented at this conference
VLDB04
Logical Optimization
Rewriting Using Views
Minimization
Nested XML Tableaux (NEXT)
Logical Plan
Translate to XQuery
Plan Execution Engine
To Any XQuery Processor
19
The need for normalization
NEXT
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, t in
b/title where some a1 in
b/author satisfies a1
eq a return t
lt/bibentrygt
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
20
Normalization into NEXT
NEXT
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, t in
b/title where some a1 in
b/author satisfies a1
eq a return t
lt/bibentrygt
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, a1 in
b/author, t in
b/title where a1 eq a
return t
lt/bibentrygt
21
Normalization into NEXT
NEXT
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, t in
b/title where some a1 in
b/author satisfies a1
eq a return t
lt/bibentrygt
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
for a in distinct-values(doc//booktitle/author
) return ltbibentrygt a, for b
in doc//book, a1 in
b/author, t in
b/title where a1 eq a
groupby b, t return t
lt/bibentrygt
cardinality?

NEXT
22
NEXT Patterns
NEXT
  • alternative way of defining the XQuery semantics
    (but equivalent to the standard), given by
    matching patterns

View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
  • graphical representation of NEXT nested patterns

forest of tree patterns
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
book(b1)
B2(V)
a2
author(a2)
a2
23
NEXT Patterns
NEXT
  • alternative way of defining the XQuery semantics
    (but equivalent to the standard), given by
    matching patterns

View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
  • graphical representation of NEXT nested patterns

descendant navigation
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
child navigation
book(b1)
B2(V)
a2
author(a2)
a2
24
NEXT Patterns
NEXT
  • alternative way of defining the XQuery semantics
    (but equivalent to the standard), given by
    matching patterns

View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
  • graphical representation of NEXT nested patterns

return function
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
book(b1)
B2(V)
a2
author(a2)
a2
25
NEXT Patterns
NEXT
  • alternative way of defining the XQuery semantics
    (but equivalent to the standard), given by
    matching patterns

View V
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B2(V)
  • graphical representation of NEXT nested patterns

list of groupby variables
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
B1(V)
book(b1)
b1,t1
title(t1)
book(b1)
B2(V)
a2
author(a2)
a2
26
NEXT Patterns
NEXT
  • alternative way of defining the XQuery semantics
    (but equivalent to the standard), given by
    matching patterns

View V
Query Q
for b0 in doc//book, t0 in b0/title, a in
b0/author groupby a return ltbibentrygt a,
for b in doc//book, a1 in
b/author, t in b/title where
a1 eq a groupby b,t
return t lt/bibentrygt
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
B1(V)
B1(Q)
B2(V)
B2(Q)
  • graphical representation of NEXT nested patterns

doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
ltbibentrygt a, B2(Q) lt/bibentrygt
B1(V)
book(b1)
book(b0)
B1(Q)
b1,t1
title(t1)
a
title(t0)
author(a)
doc
book(b1)
B2(V)
book(b)
B2(Q)
a2
t
author(a2)
a2
b, t
title(t)
author(a1)
27
NEXT Patterns
NEXT
  • alternative way of defining the XQuery semantics
    (but equivalent to the standard), given by
    matching patterns

View V
Query Q
for b0 in doc//book, t0 in b0/title, a in
b0/author groupby a return ltbibentrygt a,
for b in doc//book, a1 in
b/author, t in b/title where
a1 eq a groupby b,t
return t lt/bibentrygt
for b1 in doc//book, t1 in b1/title groupby
b1, t1 return ltauthorlistgt t1,
for a2 in b1/author groupby
a2 return a2
lt/authorlistgt
  • graphical representation of NEXT nested patterns

doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
ltbibentrygt a, B2(Q) lt/bibentrygt
B1(V)
book(b1)
book(b0)
B1(Q)
b1,t1
title(t1)
a
title(t0)
author(a)
doc
book(b1)
B2(V)
book(b)
B2(Q)
a2
t
author(a2)
a2
b, t
title(t)
author(a1)
28
Outline
  • NEXT (NEsted XML Tableaux)
  • Rewriting Algorithm and Extensions
  • Experiments
  • Previous Work
  • Conclusions

29
Architecture of the NEXT framework
NEXT
XQuery query and views
Normalization
Nested XML Tableaux (NEXT)
rewriting algorithm
Logical Optimization
Rewriting Using Views
Minimization
Nested XML Tableaux (NEXT)
Logical Plan
Translate to XQuery
Plan Execution Engine
Independent XQuery Processor
30
Overview of the Rewriting Algorithm
REWRITING ALGORITHM
  • Input query Q, views V
  • detect alternative access paths towards the
    variable bindings through the views
  • build a candidate rewriting R that uses only the
    access paths from phase 1.
  • check that R is equivalent to Q

Query Q
Access paths through V
Access paths(candidate rewriting)
31
Step 1 Detect View Access Paths
REWRITING ALGORITHM
  • access paths ways of accessing data using the
    view
  • identify matching subqueries(extended tree
    pattern matching)
  • find a mapping and add navigation from the view
    return

doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
book(b1)
book(b0)
title(t1)
title(t0)
author(a)
doc
book(b1)
book(b)
a2
author(a2)
author(a1)
title(t)
view
query body
32
Step 1 Detect View Access Paths
REWRITING ALGORITHM
  • access paths ways of accessing data using the
    view
  • identify matching subqueries(extended tree
    pattern matching)
  • find a mapping and add navigation from the view
    return

doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
docV
book(b1)
book(b0)
authorlist(p0)
title(t1)
title(t0)
author(a)
title(t2)
doc
book(b1)
book(b)
a2
author(a2)
author(a1)
title(t)
view
query body
extended query
33
Step 1 Detect View Access Paths
REWRITING ALGORITHM
  • access paths ways of accessing data using the
    view
  • identify matching subqueries(extended tree
    pattern matching)
  • find a mapping and add navigation from the view
    return
  • and another one

doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
docV
book(b1)
book(b0)
authorlist(p0)
title(t1)
title(t0)
author(a)
author(a3)
title(t2)
doc
book(b1)
book(b)
a2
author(a2)
author(a1)
title(t)
view
query body
extended query
34
Step 1 Detect View Access Paths
REWRITING ALGORITHM
  • access paths ways of accessing data using the
    view
  • identify matching subqueries(extended tree
    pattern matching)
  • find a mapping and add navigation from the view
    return
  • and another one
  • computing all such mappings ? query extension
    that uses only view access paths

doc
ltauthorlistgt t1, B2(V) lt/authorlistgt
doc
docV
book(b1)
book(b0)
authorlist(p0)
title(t1)
title(t0)
author(a)
title(t2)
author(a3)
doc
docV
book(b1)
book(b)
authorlist(p)
a2
author(a2)
author(a1)
author(a4)
title(t)
title(t3)
query extension
view
query body
extended query
35
Step 2 Candidate Rewriting
REWRITING ALGORITHM
  • same return function as the initial query, but
    with other variable bindings

original query
doc
docV
ltbibentrygt a, B2(Q) lt/bibentrygt
B1(Q)
book(b0)
authorlist(p0)
a
title(t0)
author(a)
title(t2)
author(a3)
doc
docV
B2(Q)
book(b)
authorlist(p)
t
b, t
author(a1)
author(a4)
title(t)
title(t3)
extended query
36
Step 2 Candidate Rewriting
REWRITING ALGORITHM
  • same return function as the initial query, but
    with other variable bindings

candidate rewriting
original query
doc
docV
ltbibentrygt a3, B2(R) lt/bibentrygt
B1(Q)
book(b0)
authorlist(p0)
B1(R)
a
a3
title(t0)
author(a)
title(t2)
author(a3)
doc
docV
B2(Q)
book(b)
authorlist(p)
B2(R)
t3
b, t
t3
author(a1)
author(a4)
title(t)
title(t3)
37
Step 3 Equivalence Check
REWRITING ALGORITHM
  • check that R Q containment mappings defined on
    the tree of query blocks
  • and then (optional step) translate back to XQuery

Rewriting R for a3 in distinct-values(docV/au
thorlisttitle/author) return ltbibentrygt a3,
for p in docV/authorlist,
t3 in p/title where
some a4 in p/author
satisfies a4 eq a3 return p
lt/bibentrygt
docV
ltbibentrygt a3, B2(R) lt/bibentrygt
authorlist(p0)
B1(R)
a3
title(t2)
author(a3)
docV
authorlist(p)
t3
B2(R)
t3
title(t3)
author(a4)
38
Under the Hood
REWRITING ALGORITHM
  • two types of equality by value and by node id
  • mappings must take it into consideration
  • the groupby clause also
  • XQuery results have order. We consider rewritings
    that
  • do not respect order (for DB-centric
    applications)
  • respect order (for text-centric applications)
  • for rewritings that respect order look for an
    ordering of the view access paths that preserves
    the original query order (details in the paper)

39
Extensions to NEXT
REWRITING ALGORITHM
  • Extended NEXT to NEXT
  • extend the pattern based representation to the
    whole XQuery
  • functions and other expressions (negation,
    disjunction, aggregates etc.) modeled as
    uninterpreted functions
  • Extended the algorithm to use NEXT need to
    identify maximal subparts that are pure NEXT
    blocks

for x in doc/book where count( for
a in x/author where x/price eq 60
groupby a return a ) eq count(
) groupby x return x
40
Extensions to NEXT
REWRITING ALGORITHM
  • Extended NEXT to NEXT
  • extend the pattern based representation to the
    whole XQuery
  • functions and other expressions (negation,
    disjunction, aggregates etc.) modeled as
    uninterpreted functions
  • Extended the algorithm to use NEXT need to
    identify maximal subparts that are pure NEXT
    blocks.

for x in doc/book where count( for
a in x/author where x/price eq 60
groupby a return a ) eq count(
) groupby x return x
rewrite outer block, disregarding function calls
rewrite blocks inside function arguments, with
free variables bound in upper blocks
41
Formal Guarantees
REWRITING ALGORITHM
  • The rewriting algorithm is sound
  • and complete for a large fragment of XQuery (the
    one that can be translated into NEXT), without
    order
  • Completeness means that if there are any
    rewritings, we are guaranteed to find at least
    one.
  • There is no hope for completeness for
  • ordered rewritings equivalence is undecidable
  • expressions beyond NEXT negation and universal
    quantification also lead to undecidability
  • ?In these cases, our algorithm is a best effort
    approach, with guaranteed soundness.

42
Implementation (considerations)
REWRITING ALGORITHM
  • completeness guarantees ? a price to
    paycompute mappings between view and query
    patterns
  • in general, NP-complete, but PTIME if the
    patterns are trees (no equality conditions)
    based on M. Yanakakis, Algorithms for acyclic
    database schemes, 1981
  • our goal design an implementation whose running
    time is polynomial for pure tree patterns and
    degrades progressively with the number of added
    joins

43
Implementation in practice
REWRITING ALGORITHM
V
Q
..
mappings
compile
compile
XML instance
query plan (SPJ)
evaluate
  • when computing the query plan, apply techniques
    from the Yanakakis algorithm push projections
    selections
  • performance degrades with the number of
    equalities the problem is NP-complete in the
    width of the view pattern (see the paper) and in
    PTIME when no join equalities.

44
Outline
  • NEXT (NEsted XML Tableaux)
  • Rewriting Algorithm and Extensions
  • Experiments
  • Previous Work
  • Conclusions

45
Experiments Design
EXPERIMENTS
  • The running time of the algorithm increases with
  • number of nested levels mappings are block by
    block
  • size of the pattern of mapped and target nodes
    increases
  • number of views more patterns to match
  • Our experiments measured how the algorithm scales
    with these parameters.
  • We designed a configuration where we generated
    queries and views of increasing size and nesting
    depth.

46
Experiments Implementation
EXPERIMENTS
Queries views with similar basic patterns, in a
vertical chain of blocks
doc
doc
block Bk
mk
mk
..
c1
a
c2
a
doc
doc
doc
block Bk1
basic pattern
mk
mk1
mk1
..
c1
a
c2
a
ci
a
  • Irrelevant views dont matter (can be quickly
    discarded). ? We create only relevant views (with
    mappings into query)
  • split the query recursively into fragments
    views
  • make them overlap on basic patterns

47
Experiments Good Scalability
EXPERIMENTS
d depth ( of nested levels in a query) b
breadth ( of basic patterns in a block)
1.25s for d16, b16 and 128 views
48
Previous work
  • rewriting XPath queries using XPath
    viewsRewriting XPath Queries Using Materialized
    ViewsW.Xu et al. VLDB 2005
  • rewriting XQuery using XPath viewsA Framework
    for Using Materialized XPath Views in XML Query
    ProcessingA. Balmin et al. VLDB 2004
  • rewrite an XQuery with only one XQuery view that
    has to contain the queryACE-XQ A CachE-aware
    XQuery Answering SystemL.Chen et al. WebDB 2002
  • caching common XQuery subexpressionsImplementing
    Memoization in a Streaming XQuery
    ProcessorY.Diao et al. XSym 2004

49
Conclusions
  • NEXT is a pattern based representation that
    describes what the query result is and not how it
    is computed ? more opportunities for semantic
    optimizations
  • extensible to all of XQuery, using NEXT
  • rewriting using views algorithm
  • sound for the whole language
  • complete for a large fragment of XQuery
  • good scalability
  • independent of the underlying algebra of the
    query processor

50
Online Demo
  • http//db.ucsd.edu/reform
Write a Comment
User Comments (0)
About PowerShow.com