KapQuilt: Semantic Caching for Quilt Queries --- A New Quilt Query Answerable By Cached Ones? - PowerPoint PPT Presentation

About This Presentation
Title:

KapQuilt: Semantic Caching for Quilt Queries --- A New Quilt Query Answerable By Cached Ones?

Description:

integrate Argos with KapQuilt. Immediate Task! -- MQP project ... Integrate with Argos system for cached view maintenance. A Taxonomy for XML Query. XML-QL ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 43
Provided by: Lic143
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: KapQuilt: Semantic Caching for Quilt Queries --- A New Quilt Query Answerable By Cached Ones?


1
KapQuilt Semantic Caching for Quilt Queries
--- A New Quilt Query Answerable By Cached
Ones?
  • Li Chen

2
Outline
  • Background
  • Motivation
  • Goals semantic cache for quilt queries
  • Overall task list
  • Immediate task
  • Approaches for containment and rewriting
  • Case studies
  • Module design
  • Timetable

3
Background
query efficiency
query quality
query performance
dynamically decide mat views
control concurrency of queries updates
Argos! -))
ECA,
query optimization
semantic cache
sweep
mat view maintenance
answer queries using views
data mining
database design
containment theory algo
web site integration
data warehousing
web site management
views
independence of physical logical data
4
Dimensions of Semantic Caching
languages
SQL - simple select-project-join SPJ
- group, aggregation, query blocks -
datalogs OQL TSL Quilt
rewritten query query plan
containment relationships
fully contained max-contained
outcomes
5
Motivations
  • Whats new about semantic caching?
  • Web proxies just cache web page hits, not real
    computed queries
  • Web information integration needs expressive XML
    queries
  • Semantic caching for XML queries is new
  • Quilt is a full capability XML query language,
    promising for the integration of web info, and
    kweelt is a quilt query engine implemented!

6
Goals
  • Goal! build a SC system for quilt queries
  • to better answer populate queries
  • quicker, less expansive and more up-to-date
  • KapQuilt comes to rescue

limitation we start from the core subset of
quilt queries while ignoring nesting queries and
regular expression queris for now
7
KapQuilt System Architecture
KSP
Client
Kweelt Engine
Kweelt API
remote query requests
Parser
DTDM
Query Decomposer
KapQuilt
Query Matcher
CIS
Query Rewriter
Cost Estimator
Query plans
PQ
Evaluator
RQ
DOM
DOM

Other Node Factories
XML Parser
Parser
Wrapper
...
Doc
RDB
8
Task List
  • Answer whether a query is computable by cached
    ones
  • If answerable, compute PQ (probe query) and RQ
    (remainder query)
  • If many PQ candidates, pick the one benefits most
  • Decide whether a query is worth to cache, when to
    cache
  • In case of cache space limitation, apply replace
    policy
  • Decompose and coalesce the query segments in
    cache
  • Concurrency control of queries and updates
  • Analyze costs in various web query archs
  • ?Keep cached view always fresh
  • integrate Argos with KapQuilt

9
Immediate Task! -- MQP project goal as well
  • Design and impl core functions of KapQuilt
  • input
  • a set of cached queries Ss1,s2...
  • a new query q
  • output
  • a probe query (PQ)
  • might be null if not answerable at all
  • if not null, PQ ? ?A?c (s1 ? s2 ? ? sn)
  • a reminder query (RQ)
  • might be null if q fully contained in S
  • if not null, RQ go down to query against data
    sources

10
Approaches
  • Analyze quilt query process and its variable
    binding mechanism
  • Set up cache index structure (CIS) to represent
    elements of a quilt query
  • Warm up cache by initializing CIS with decomposed
    queries
  • Implement the query containment and rewriting
    algorithm for quilt
  • Conduct experimental studies for cost analysis
  • Integrate with Argos system for cached view
    maintenance

11
A Taxonomy for XML Query
SQL
XQL
OQL
XSL Patterns
XPointer
XML-QL
XQL-99
XPath
Quilt
12
Briefs on Quilt
  • Quilt is a functional language
  • A query is an expression, composed of
  • FLWR Expressions
  • FOR ... LET ... WHERE ... RETURN
  • Filters
  • XPath expressions
  • document("bids.xml")//biditemno"47"/bi
    d_amount
  • Operators and functions
  • Element Constructors
  • ltbidgt
  • ltuseridgt u lt/useridgt ,
  • ltbid_amountgt a lt/bid_amountgt
  • lt/bidgt

13
Data Flow in a FLWR Expression
XML
FOR/LET
List of tuples of bound variables
(x value, y value, z value), (x
value, y value, z value), (x value, y
value, z value)
WHERE
List of tuples of bound variables
RETURN
XML
14
Quilt Compared to XQL
  • A superset of XQL
  • Overcome shortcomings of XQL
  • no variable bindings, joins, transformations,
    ordering, aggregate functions, etc
  • no data integration from multiple XML sources
  • do semi-join, but in pretty non-intuitive syntax
  • Cover queries on structured document (including
    SGML), relational data even!

bookauthor//booktitle'Moby Dick'/author
15
Quilt Query Process
  • Variable binding is an important means
  • a query can define multiple variables, in order
  • dependency relationships exist among variables
  • a tuple list is bound to each variable, condition
    evaluation and return invocation are tuple-based
  • tuple lists are handles to data tree components,
    of which answer tree is composed

16
Cache Index Structure (CIS)
  • A structure to capture the essential elements of
    a quilt query
  • Whats essential elements of a quilt query?
  • variable bindings, conditions and returning nodes
    all refer to some element nodes in dtd.
  • a query can be identified by variable nodes V,
    return nodes T, condition nodes F and their
    dependency relationships
  • each element node in a dtd tree can be assigned a
    unique number (with unique absolute xpath)

17
Example DTD
lt?xml version"1.0"?gt lt!DOCTYPE bib lt!ELEMENT
bib (book )gt lt!ELEMENT book (title, (author
editor ), publisher, price )gt lt!ATTLIST book
year CDATA REQUIRED gt lt!ELEMENT author (last,
first )gt lt!ELEMENT editor (last, first,
affiliation )gt lt!ELEMENT title (PCDATA
)gt lt!ELEMENT last (PCDATA )gt lt!ELEMENT first
(PCDATA )gt lt!ELEMENT affiliation (PCDATA
)gt lt!ELEMENT publisher (PCDATA )gt lt!ELEMENT
price (PCDATA )gt gt
bib
1
book
2
title
author
editor
publisher
price
year
5
7
12
19
21
3
last
first
last
first
affiliation
8
10
13
15
17
4
6
20
22
PCDATA
PCDATA
PCDATA
CDATA
9
11
14
16
18
PCDATA
PCDATA
PCDATA
PCDATA
PCDATA
18
Quilt Query Sampler I
//book
v1
v1/author
Q1 ltbibgt FOR book IN document("bib.xml")//book
_at_year.gt.1991 AND publisher"Addison-Wesley"
RETURN ltbook yearbook/_at_yeargtbook/titlelt/bookgt lt
/bibgt
v2
b2
b1
b3
variable nodes
v1
/bib/book
2
2
p1
p2
p3
y1
y2
y3
v1
f1 ( )
f2 ( )
condition nodes
2
3
19
f1
v1 _at_ year.gt.1991
3
3
b1
1
1
y11993
p1
f2
19
v1 / publisherAddison-Wesley
b2
1
0
y21995
p2
19
b3
0
1
y31990
p3
return nodes
t1
v1 _at_ year
3
3
t2
5
v1 / title
t1
t2
t1
5
3
5
y11993
t1
/bib/book
/book
2
r1
publisher
year
19
5
3
3
t1
y1
19
Quilt Query Sampler II
Q2 FOR author IN DISTINCT document("bib.xml")//au
thor, book IN document("bib.xml")//bookautho
r author RETURN ltresultgt book/title,
authorlt/resultgt
b2
b3
b1
a2
a3
a1
a2
a1
variable nodes
v2 author v1
v1
2
7
v1
/bib/book/author
7
a1
b1
7
v2
a2
b2
2
/bib/bookauthor v1
2
a3
b1
return nodes
b3
t1
v2 / title
b2
5
5
t2
t2
t1
7
v1
7
5
7
a1
t1
a2
t2
a3
t1
t3
/bib/book
t2
/result
2
author
r2
r1
r4
r3
r5
7
5
7
v1
v2 / title
a1
a1
t1
t2
a2
a2
t1
t3
a3
t2
/bib/book/author
/bib/bookauthor v1/title
20
Quilt Query Sampler III
Q3 ltresultsgt FOR author IN DISTINCT
document("bib.xml")//author RETURN
ltresultgt author,
document("bib.xml")//bookauthor
author/title lt/resultgt lt/resultsgt
b2
b3
b1
a2
a3
a1
a2
a1
v1
7
a1
variable nodes
a2
v1
/bib/book/author
a3
7
7
t1
t2
7
5
return nodes
a1
t1
t1
7
7
v1
a2
t2
5
t2
a3
5
/bib/bookauthor v1/title
t1
t3
t2
/result
/bib/book/author
r1
r2
r3
7
5
7
t2
t1
a1
t3
t1
a2
t2
a3
21
Quilt Query Sampler IV
Q4 ltbooks-with-pricesgt FOR a_book IN
document("prices.xml")//booksource
"www.amazon.com", b_book IN
document("prices.xml")//booksource
"www.bn.com"title a_book/title RETURN
ltbook-with-pricesgt b_book/title,
ltprice-amazongta_book/price/text()lt/price-amazongt,
ltprice-bngtb_book/price/text()lt/price-bngt
lt/book-with-pricesgt lt/books-with-pricesgt
22
Quilt Query Sampler IV
b3
b2
b1
b3
b2
b1
variable nodes
v1
2
bib/booksource "www.amazon.com"
2
v2
t3
t2
t1
t3
t2
t1
2
bib/booksource "www.bn.com"title v1 /title
2
return nodes
t1
v1
5
v2 / title
v2 title v1 /title
2
5
2
t2
22
22
v1 /price/text()
b1
b1
22
t3
b2
b2
22
v2 /price/text()
b3
b3
t2
t3
t1
22
22
5
12.5
21
t1
23
/ book-with-prices
/bib/book
22
t2
_at_source "www.amazon.com"
54
47
t3
price-bn
2
price-amazon
5
/bib/book
_at_source "www.bn.com"
2
22
22
PCDATA
PCDATA
t1
t2
t3
12.5
21
23
22
54
47
23
More Quilt Query Sampler
variable nodes
Q5 ltbibgt FOR book IN document("bib.xml")//book
price.lt.50 RETURN ltbook yearbook/_at_yeargtltedi
torsgtbook/editorlt/editorsgtlt/bookgt lt/bibgt
v1
/bib/book
2
2
condition nodes
f1
v1 / price.lt.50
21
21
return nodes
t1
v1 _at_ year
3
3
t2
12
v1 / editor
variable nodes
12
v1
/bib/book
2
2
v2
7
/bibbook v1//author
7
condition nodes
f1
v1 / price.lt.50
21
21
Q6 ltbibgt FOR book IN document("bib.xml")//book
price.lt.50, author IN
/bibbookbook//authorlastAbiteboul
RETURN ltbookgtbook/title, book/price,
authorlt/bookgt lt/bibgt
return nodes
t1
v1 / title
5
5
v1 / price
t2
21
21
t3
7
v2
7
24
Query Containment for Relational Queries
25
Our Containment Theorem
  • Given a set of cached queries Ss1,s2..., and a
    new query q,
  • q can be fully answerable by S if

1 2 3 4 5 6
26
Explanations
for every condition node f of q, it must either
also be one condition node, with loose
predicates, of some si in the cache, or be one
return node of some sj
for every condition node fi of q, if it is not
one of any return node of sj, then it must be
one condition node, with loose predicates, of
some si in the cache, and any other condition
node fk of si should be one condition node fk of q
or
there is a subset of S, whose condition nodes is
a subset of those of q, but whose condition nodes
and return nodes are a superset of the condition
nodes of q.
27
Explanations
for every return node t of q, it must also be one
return node of some si in the cache.
for every pair of return nodes ti and tj of q, if
their counterparts are in different segments si
and sj, then there must be a common return node
in si and sj.
for every return node t of q, if it is derived
from a variable node v, then its counterpart in
the cache should be also derived from the same
variable node, and all the condition nodes
derived from this v should also have their
counterparts derived from v in q.
28
Query Rewriting Rules
If a query is judged to be computable by
cached views, the following rules can be followed
to figure out the rewritten q
  • 1. Decide which filters to keep(not evaluated by
    any cached query yet),
  • which filters to remove (evaluated by some
    cached query) and
  • remember those cached queries S with
    established F mappings.
  • keep all those f that has t matches, and those
    f with a looser f matches,
  • they would be still appearing as condition
    nodes in the probe query
  • remove those f with exact f matches
  • for each non-exact f match, remember its s so
    to know which s to associated
  • with those left over filters

29
Query Rewriting Rules (cont.)
  • 2. We need to figure out the semantic meanings of
    newly constructed nodes
  • in the returning structure of each cached
    queries, they are associated
  • with new xpaths as the replacement of their
    old ones
  • .
  • A newly constructed node can be seen as the
    renaming of some old
  • element node. Return nodes usually appear
    under each newly constructed
  • node, hence a mapping of this new node to the
    old one can be inferred
  • from those return nodes
  • replace in the new query q those old xpaths,
    with the new xpath to a
  • newly constructed node in cached views
  • 3. In case of a query rewriting using joins of
    more than one s with common
  • t pair, be sure to add such joins as new
    conditions
  • if there is no variable binding in the new q, a
    new binding should be produced
  • for one of the common t pair so that there is
    a way to join with its pair

30
Query Containment I
Suppose that we have queries of q1,q2,q3,q4,q5,q6
cached in Cs1,s2,s3,s4, s5,s6, a new query q
comes in,
case q of ltbibgt FOR book IN
document("bib.xml")//bookeditor/affiliationWPI
RETURN ltbook yearbook/_at_yeargtbook/titlelt/boo
kgt lt/bibgt it does not even satisfy the first
condition. s.t. not answerable f1 refers to the
element node of 17, which has no match in s1 to
s6 ltbibgt FOR book IN document("bib.xml")//boo
kpublisher"Addison-Wesley" RETURN ltbook
yearbook/_at_yeargtbook/titlelt/bookgt lt/bibgt it
satisfies the first condition, but not the second
one. s.t. not answerable f1 lt -- gt f1 in s1, but
another condition node f2 of s1 is not any
condition node of q
f1
f1
31
Query Rewriting I
case q of ltbibgt FOR book IN /bib/book
_at_year1997 AND title like JAVA AND
publisher"Addison-Wesley" RETURN ltbook
yearbook/_at_yeargtbook/titlelt/bookgt lt/bibgt it
satisfies all those conditions, s.t. is
answerable 1) f1 lt -- gt f1 in s1, f2lt -- gt t2
of s1, f3 lt -- gt f2 in s1, 2) there is no
other f in s1, 3) t1 lt -- gt t1 in s1, t2lt --
gtt2 in s1,
4) t1 and t2 are both from the same s1,
5) t1 and t2 are derived from v1, so do t1 and
t2 from v1, v1--gt f1, f2, and v1--gt f1, f2
f1
f2
f3
s1
book IN /bib/book _at_year1997 AND title like
JAVA AND
publisher"Addison-Wesley"
/book
v1
/bib/book
rewritten as
t1
t2
v1 _at_ year
v1 / title
5
3
/book source "s1
Rewrite the query as ltbibgt FOR book IN /book
source "s1"_at_year1997 AND title like
JAVA RETURN ltbook yearbook/_at_yeargtbook/tit
lelt/bookgt lt/bibgt
left over filters
32
Query Rewriting II
case q of ltbibgt FOR book IN /bib/book
_at_year.gt.1991 AND publisher"Addison-Wesley AND
price.lt.50 RETURN ltbookgtbook/title,lteditorsgt
book/editorlt/editorsgtlt/bookgt lt/bibgt it satisfies
all the conditions, s.t. is answerable 1) f1 lt --
gt f1 in s1, f2 lt -- gt f2 in s1, f3lt -- gt f1 of
s5, 2) there is no other f in s1 and
s5 3) t1lt -- gtt2 in s1, t2 lt -- gt t2 in s5,
4) t1 in s1 t1
in s5, 5) t1 and t2 are derived from
v1, so do t1 and t2 from v1, v1--gt f1, f2,
and v1--gt f1, f2, f3
book IN /bib/book _at_year .gt.1991 AND
publisher"Addison-Wesley AND
price.lt.50
s1
/book
v1
/bib/book
rewritten as
t1
t2
v1 _at_ year
v1 / title
5
3
/book1source "s1 and /book2source "s5
s5
Rewrite the query as ltbibgt FOR book1 IN /book
source "s1", book2 IN /book
source "s5"title book1/title RETURN
ltbookgtbook1/title,lteditorsgtbook2/editorlt/editors
gtlt/bookgt lt/bibgt
/book
v1
/bib/book
t1
v1 _at_ year
t2
5
3
v1 / editor
33
Query Rewriting III
Suppose that we have q2 cached in S, but q3 is
not cached, instead, it is a new query,
case q of ltbibgt FOR book IN
document("bib.xml")//bookpublisher"Addison-Wesle
y" RETURN ltbook yearbook/_at_yeargtbook/titlelt/b
ookgt lt/bibgt ltbibgt FOR book IN //book
_at_year1997 AND title like JAVA AND
publisher"Addison-Wesley" RETURN ltbook
yearbook/_at_yeargtbook/titlelt/bookgt lt/bibgt
34
Query Rewriting IV
Suppose that we have q2 cached in S, but q3 is
not cached in, instead, it is a new query,
Q4 ltbooks-with-pricesgt FOR a_book IN
document("prices.xml")//booksource
"www.amazon.com", b_book IN
document("prices.xml")//booksource
"www.bn.com"title a_book/title RETURN
ltbook-with-pricesgt b_book/title,
ltprice-amazongta_book/price/text()lt/price-amazongt,
ltprice-bngtb_book/price/text()lt/price-bngt
lt/book-with-pricesgt lt/books-with-pricesgt
35
Input Query q, Semantic Cache C Output Result
of q AnsweringQuery Procedure
answerable, fullAns lt--- False T lt---
current timestamp Cs1,s2,.. lt--- set
up CIS for si segment s lt--- set up CIS
for q M lt--- matched node set, set as
null at the beginning R RENs lt--- not
matched node set RC lt--- CENs of s
si lt--- look for the first q related si in C
S lt--- put si into a candidate set
While (si can be found)
answerable lt--- True STs lt--- T
(MS, RM) lt--- query_trimming(si, s)
MS lt--- matching nodes of s and si
RM lt--- remaining nodes of s not
covered by si R R-RM M MMS
RC RCRemainingCENS If (Rnull)
fullyAns lt--- True
break si lt---
next q related segment in C
PQ lt--- query_rewriting(S, JoinCISs, M, RC)
MatV lt--- materialized view sets referred by
S ResPQ, Result lt--- process PQ against
MatV If (fullyAns False)
RQ lt--- query_rewriting(S, JoinCISs, M,
RC) ResRQ lt--- process RQ at the
server Result lt--- coalesce(ResPQ,
ResRQ) create a new segment
Snew contains the result of q SnewTs
lt--- T If there isnt enough space, do
cache replacement Cache Snew
return(Result).
36
Input CIS structure for q s and a segment si
current candidate set S and
JoinCISs Output judge whether si is related to
q if yes, add into S, matching
nodes MS and remaining nodes RM Query_trimming
Procedure IsRealted lt--- False
MS lt--- null RM lt--- boundRENs of s
RS lt--- RENs in si RNS lt--- RENs in
all s of S RemainingCENS lt--- boundCENs
of s While (RM\null)
matchingRENS lt-- match nodes in RM and RS by
applying theorems if (matchingRENS
\null) commonSet lt--- RNS
RS if (commonSet \null)
IsRelated lt---
True ltsi, sj,
commonSetgt lt--- sj is the segment in S has
commonSet with si
JoinCISs lt--- add ltsi, sj, commonSetgt into
JoinCISs (MS, RM)
lt--- (matchingRENS, RM-M)
RemainingCENS lt--- left-over ones except
exact-match CENS
return (S, JoinCISs,
MS, RM, RemainingCENS).
37
DTD
DTDWalker
DTDTree
(ElementNode)
ENIndex
to-be-cached queries
set up
QueryDecomposer
QueryIndex
VENIndex
CENIndex
RENIndex
ReturnEleNode
ConditionEleNode)
(VariableEleNode
CacheIndexStructs
MatView
(ViewDTDTree)
new query
QueryDecomposer
AnsweringQuery
NewQuery
QueryTrimmer
NewQuery
QueryRewritter
contained?
(MatchingCENPair
ProbeQuery
MatchingRENPair
MatchingENPair)
Y
Result
fully contained?
remainingRENs
N
QueryRewritter
RemainderQuery
ResultCoalescer
38
1
ElementNode
enIdint dtdRefDTDTree enNameString absXpathStr
ing parentEN ElementNode childrenENVector
DTDTree
dtdNameString dtdLocString rootEN
ElementNode elementNodesVector
equalTo(EN)boolean relativeXpathForm(EN)String
VariableEleNode
ConditionEleNode
ReturnEleNode
ViewDTDTree
venNameString cisRefCIS parentVENVEN childrenVE
NVector childrenCENVector childrenRENVector
cenNameString cisRefCIS conditionString parentV
ENVEN
renNameString cisRefCIS parentVENVEN
matchingENPairsVector
stricterThan(CEN)boolean
39
1
ProbeQuery
boundVENsVector boundCENsVector boundRENsVector
newDTDViewDTDTree
1
NewQuery
1
CacheIndexStructs
candidateCISSetVector matchingCENPairsVector rem
ainingCENsVector matchingRENPairsVector remainin
gRENsVector
cisIdString qstringString viewDTDViewDTDTree ma
tRef MatView boundVENsVector boundCENsVector bo
undRENsVector
initialize
1
hasMoreCENsThan(CIS)boolean hasOverlapTENs(Matchi
ngRENPair, MatchingRENPair)boolean
MatchingRENPair
MatView
MatchingCENPair
newTEN ReturnEleNode oldTEN ReturnEleNode otherO
ldTENsVector
matIdString cisRefCacheIndexStructs
newCEN ConditionEleNode oldCEN
ConditionEleNode oldREN ReturnEleNode remainingCo
ndstring
diffCISFrom(MatchingRENPair)boolean overlapTENWit
h(MatchingRENPair)boolean sameParentVEN()boolean
newWithPVENhasMoreCENs()boolean
MatchingENPair
newConstructENElementNode originalDtdENElementNo
de source MatView
stricterThan(CEN)boolean
40
Timetable
  • By 11/15 design due, implement starts
  • By 11/30 half finish coding
  • By 12/10 fully finish coding
  • By 12/20 finish integration
  • By 1/15 test designed cases
  • By 1/30 design and do experiments
  • By 2/15 collect experiment results
  • By 2/28 document code, writing
  • By 3/15 summarize

41
Task Assignment
  • Lily
  • design classes, containment, rewriting and
    candidate picking algorithms, design experiments
  • ideas for query decomposition, result
    combination, cache decomposing /coalesce,
    replacement policy, data updates handling
  • Jake
  • implement containment algo, ...
  • Ian
  • implement classes of EN, VEN, CEN, TEN, ...
  • Amar
  • module of rewriting algo

42
Implementation Toolsuites
  • JDK1.2, Servlet
  • XML Parser, DTD Parser
  • Quilt Parser, Kweelt(Quilt) Query Engine
Write a Comment
User Comments (0)
About PowerShow.com