Title: Web data management and distribution XQuery processing Part 4: A logical algebra for XQuery
1Web data management and distributionXQuery
processingPart 4 A logical algebra for XQuery
S.Abiteboul, I.Manolescu, P.Rigaux,
P.Senellart INRIA Saclay Île-de-France This
course based on I.Manolescu, Y.Papakonstantinou
XQuery Midflight Emerging Database-Oriented
Paradigms and a Classification of Research
Advances, ICDE 2005
2 XQDMA an abstraction of the XQuery Data Model
- Labeled (ordered) trees
- Nodes
- Four kinds Document, element, attribute, text
- Single document node
- Element/attribute nodes labeled with XQDM
element/attribute names - by convention, attribute names start with _at_
- Text nodes labeled with XQDM values
- Nodes have unique identities
- Type function T labels every node with XQDM type
- Edges
- Document node is root and has exactly one child,
which is element - Attribute nodes may appear only as children of
element nodes - Attribute nodes may only have text node children
- Text nodes may only be leaves
3Equality Relationships
- Node ID-based equality id (XQuery is)
- Two nodes are id-equal if they are the same
- Value-based equality v (XQuery eq)
- Two values are equal if the results of casting
one or both into a common domain are equal.
Depends on values' types. - 24 atomic types XQDM,XSch casting rules in
XQFO - Limited value-based equality
- Text nodes are value-based equal if their labels
are equal - Attribute/Element nodes n1 and n2 are value-based
equal if their labels are equal and - (unordered) for every child of n1 there is an
equal child of n2 and vice versa - (obvious generalization to ordered)
- Limited value-based equality implies Value-based
equality - Not vice-versa (eg, text node with "005" is equal
to text node with "05", considering typing and
coercion)
4Order Relationship
- Node order relationship ltlt (XQuery before) in
ordered - Parent before children
- Attribute nodes directly follow parent (precede
non-attribute nodes) - Undefined order between attributes
5Node and value comparisons
document node element node attribute node text
node
"group.xml"
r1
group
g1
faculty
f1
s1
j2
j1
_at_name
_at_name
p1
p3
p2
n2
n1
inproject
inproject
inproject
m1
m2
i1
m3
KadoP
NexT
i2
i3
n5
t3
t1
NexT
KadoP
m_at_u.edu
t5
t8
j_at_u.edu
KadoP
Lily
t12
t9
URL
URL
t14
t15
n1
u1
u2
m_at_acm.org
Mary
kadop.net
next.org
t13
t10
t4
t2
6Equality comparisons
t1 ?id t5 t1 v t5 t5 v t14 i1 ?id i3 i1 v
i3 i1 ?v n1
"group.xml"
r1
group
g1
faculty
f1
s1
j2
j1
_at_name
_at_name
p1
p3
p2
n2
n1
inproject
inproject
inproject
m1
m2
i1
m3
KadoP
NexT
i2
i3
n5
t3
t1
NexT
KadoP
m_at_u.edu
t5
t8
j_at_u.edu
KadoP
Lily
t12
t9
URL
URL
t14
t15
n1
u1
u2
m_at_acm.org
Mary
kadop.net
next.org
t13
t10
t4
t2
7Order comparisons
j1 ltlt n1, j1 ltlt u1 p1 ltlt i1 ltlt t5 ltlt n3 ltlt f1
"group.xml"
r1
group
g1
faculty
f1
s1
j2
j1
_at_name
_at_name
p1
p3
p2
n2
n1
inproject
inproject
inproject
m1
m2
i1
m3
KadoP
NexT
i2
i3
n5
t3
t1
NexT
KadoP
m_at_u.edu
t5
t8
j_at_u.edu
KadoP
Lily
t12
t9
URL
URL
t14
t15
n1
u1
u2
m_at_acm.org
Mary
kadop.net
next.org
t13
t10
t4
t2
8Equalities on XQDMA lists
- Deep-equal
- Two lists are deep-equal if they have the same
length and their items at corresponding positions
are value-based equal - The "" of XQuery translates to existential
equality comparison - Existential list equality (ele) comparison ?
- l1 ? l2 iff ? o1 ? l1, o2 ? l2 such that o1 v
o2 - ele not transitive
9Unified data model (UDM) extending XQDMA
- (XQDMA) lists l o1, o2, ..., on
- ois are XQDMA nodes
- Tuples t (v1a1, v2a2, ..., vnan)
- "(in t) the variable vi binds to variable
binding ai" - t.vi may be an XQDMA list, or
- a set/bag/list
(collection) of homogenous tuples - (v1, v2, ..., vn) tuple schema
10UDM tuple equality
b1 (v1p1, v2t5, v3 i1, t5 ) b2
(v1p1, v2t11, v3 i3, t14 )
b1 b2
"group.xml"
r1
group
g1
faculty
f1
s1
j2
j1
_at_name
_at_name
p1
p3
p2
n2
n1
inproject
inproject
inproject
m1
m2
i1
m3
KadoP
NexT
i2
i3
n5
t3
t1
NexT
KadoP
m_at_u.edu
t5
t8
j_at_u.edu
KadoP
Lily
t9
t7
URL
URL
t11
t12
n4
u1
u2
m_at_acm.org
Mary
kadop.net
next.org
t10
t8
t4
t2
11- Unified Tuple-Based Algebra Operators
12Unified tuple-based algebra operators
- Navigation
- XML construction
- Nested plans
- Relational-style operators
- Other operators
13XPath navigation
- Based on tree patterns
- Navigation operator nav (Collection of
Tuples)
R//faculty/person
R//person
R//personemail/name
person
faculty
Nname
email
Pperson
14XPath navigation
"group.xml"
r1
R//personmail/name
group
g1
faculty
f1
p1
p2
R N r1 n3 r1 n4
inproject
m1
m2
m3
i2
person
NexT
Nname
m_at_u.edu
mail
t8
j_at_u.edu
R r1
t9
t7
n4
m_at_acm.org
Mary
t10
t8
15Tree patterns capture navigation of "for"
for P in R//person, M in P/mail, N in
P/name return P, M, N
R P M N r1 p1 m1 n3 r1 p2 m2
n4 r1 p2 m3 n4
Pperson
Mmail
Nname
R r1
16Generalized "for" navigation
for P in R//person, M in P/mail, N in
P/name return P, M, N
P M N p1 m1 n3 p2 m2 n4 p2 m3
n4
p
P,M,N
R P M N r1 p1 m1 n3 r1 p2 m2
n4 r1 p2 m3 n4
Pperson
Mmail
Nname
R r1
17Generalized "for" navigation
for P in R//person, M in P/mail, N in
P/name return P, M, N
P M N p1 m1 n3 p2 m2 n4 p2 m3
n4
Pperson
Mmail
Nname
R r1
18Generalized "for" and "where" navigation
for P in R//person, N in P/name where
P/email return P, N
P N p1 n3 p2 n4 p2 n4
Pperson
mail
Nname
19XML result construction
person
x1
name
mail
m1'
n3'
j_at_u.edu
John
t6'
t9'
for P in //person, N in P/name, M
in P/mail return ltpersongt M, N
lt/persongt
person
x2
mail
name
m2'
n4'
m_at_u.edu
Mary
t12'
t10'
person
x3
mail
m3'
n4'
m_at_acm.org
Mary
t13'
t10'
Pperson
Mmail
Nname
20XML result construction
person
x1
n3'
m1'
for P in //person, N in P/name, M
in P/mail return ltpersongt M, N
lt/persongt
person
x2
m2'
n4'
person
x3
m3'
n4'
21Nested plans and the apply operator (1)
for P in //person return ltpersongt for N
in P/name return N
lt/persongt
22Nested plans and apply (2)
P1
for P in //person return ltpersongt for N
in P/name return N,
P/mail lt/persongt
P p1 p2 p3
23Nested plans and apply (3)
person
x1
n3'
m1'
person
x2
n4'
m2'
m3'
for P in //person return ltpersongt for N
in P/name return N,
P/mail lt/persongt
person
person
x3
1
n5'
24Nested plans and let clauses
Previous query for P in //person return
ltpersongt for N in P/name return
N, P/mail lt/persongt
25Nested plans and let clauses
Same query with let for P in //person let L1
for N in P/name let L2
N, P/mail return L2
return ltpersongt L1 lt/persongt
26Nested plans and let clauses
Same query with let for P in //person let L1
for N in P/name let L2
P/mail return N, L2 return
ltpersongt L1 lt/persongt
T
N L2
L2
P2
crList
nav
N name
27Nested plans and let clauses
Same query with let for P in //person let L1
for N in P/name let L2
N, P/mail return L2
return ltpersongt L1 lt/persongt
P1
28Nested plans and let clauses
- Nested queries can be equivalently rewritten
using let clauses until return clauses reach the
form - V1, V2, ..., VK or lttaggt V1, V2,
..., Vk lt/taggt - Nested queries with let can be "automatically"
translated - for ? nav
- let ? apply
- return ? crList
29Nested plans and optional navigation
- Capture all navigation with a single pattern
- optional edges
- null (?) variable values
for P in //person return ltpersongt for N
in P/name return N,
P/mail lt/persongt
30Nested plans and optional navigation
- Capture all navigation with a single pattern
- optional edges
- null (?) variable values
for P in //person return ltpersongt for N
in P/name return N,
P/mail lt/persongt
appsP1?V
R
Pperson
Nname
Mmail
31Nested plans and optional navigation
P
G
P M N
V
p1 p1 m1 n3 p2 p2 m2
n4 , p2 m3 n4 p3
p3 ? n5
person
x1
n3'
n3'
m1'
person
x2
n4'
m2'
m3'
person
x3
for P in //person return ltpersongt for N
in P/name return N,
P/mail lt/persongt
n5'
appsP1?V
R
Pperson
Nname
Mmail
32Nested plans and optional navigation
for P in //person return ltpersongt for N
in P/name return N,
P/mail lt/persongt
P
G
P M N
V
p1 p1 m1 n3 p2 p2 m2
n4 , p2 m3 n4 p3
p3 ? n5
n3'
appsP1?V
R
Pperson
Mmail
Nname
33Selection predicates
- Predicate Meaning XQuery notation / fn or op
XQFO - id same node is /
opis-same-node - v same value eq /
fncompare, opnumeric-equal... - ltv smaller value lt /
fncompare, opnum-less-than... - ltlt node before ltlt /
opnode-before - list equality eq /
fndeep-equal
tuple equality - ? exist. equality /
fns. backing eq,
tuple equality
34Selection plan (1)
n4
?M v? "m_at_acm.org"
for P in //person, N in P/name where
P/email "m_at_acm.org" return N
appsP1?V
35Selection plan (2)
P
G
P M N
M2
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
p1 p1 m1 n3 p2 p2 m2 n4
, p2 m3 n4
n3
n4
apps P2?M2
R
P M N p1 m1 n3 p2 m2 n4 p2 m3
n4
Pperson
Mmail
Nname
36Selection plan (2)
T
M2
2
P
G
P M N
M2
p2 p2 m2 n4 , p2 m3
n4
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
n4
?M2 v? "m_at_acm.org"
apps P2?M2
37Selection plan (3)
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
?M v "m_at_acm.org"
R
Pperson
Mmail
Nname
38Joins
- for p in //person, j in //projects
- where p/inproject j/_at_name
- return p, j
...
x v? y
R
R
Jproject
Pperson
y_at_name
xinproject
39Generalized navigation
"group.xml"
for P in //person, M in P/mail, N
in P/name return ltpersgt M, N
lt/persgt
r1
group
g1
faculty
f1
...
p1
p2
m2
m3
m_at_u.edu
t12
m_at_acm.org
n4
t13
Mary
t10
40Other algebras
- Tsimmis and YAT algebras for semistructured data
- Introduced naviagtion and construction patterns
- SAL from U. Tel Aviv
- Close relative of OQL
- TAX Generalized Tree Patterns from Michigan
- Navigation extracts bindings to hidden tuples
(packaged as trees) - Grouping tracking navigation
- Enosys algebra
- Collect bindings (nav and join), do nested plans,
create XML - Also present in NEXT system
- Xstasy from U. Pisa
- Context-based algebra from U. Oregon
- Rainbow algebra from Worcester Polytechnic Inst.
41- Putting it all together processing a sample query
42Sample query
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
43Sample query and logical plan 1
T
N
C
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
?M v "m_at_acm.org"
R
Pperson
Mmail
Nname
44Sample query and logical plan 1
T
N
C
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
?M v "m_at_acm.org"
R
Pperson
Mmail
Nname
45Sample query and logical plan 1
How to implement navTN? If a (person, mail,
name) table exists, scan it, then apply
selection If a (person, mail) and a (person,
name) table exist, join them If the document
still exists as a whole, evaluate in streaming
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
?M v "m_at_acm.org"
R
Pperson
Mmail
Nname
46Sample query and logical plan 2
How to implement navTN2? If a (person, mail,
name) table exists, scan it, then apply
selection If a (person, mail, name) table exists,
indexedon mail, perform an index access If the
document still exists as a whole, evaluate in
streaming
for P in //person, N in P/name where
P/mail "m_at_acm.org" return N
R
Pperson
Mmail m_at_acm.org
Nname
47- XQuery processing conclusion
48Conclusions
- XML query processing takes place in a huge
variety of settings - Many small documents
- Few large documents
- Varied structure vs. uniform structure
- Retrieve vs. construct queries
- Different implementation options
- Persistent store
- Streaming
- Both
- Choice of storage model choice of materialized
views - Access path selection view-based query
rewriting
49Conclusions
- Logical algebra for XQuery allows making sense of
a query (decomposing it in logical operator) - Each logical operator may have different
implementations (physical operators) - Navigation
- Table access
- Index access
- Stream-based evaluation
- Join
- Hash-based
- Nested loops
- Apply
- Iteration-based (nested loops)
- If there are duplicates, use group-by
- List constructor typically easy