Processing Recursive Xquery over XML Streams: The Raindrop Approach - PowerPoint PPT Presentation

About This Presentation
Title:

Processing Recursive Xquery over XML Streams: The Raindrop Approach

Description:

... Polytechnic Institute. XSDM ... Jack, Brooks /name /person Q1: for $a in stream('persons')//person ... Will, Brooks. 9 /name 10 /person 11 ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: mingz5
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Processing Recursive Xquery over XML Streams: The Raindrop Approach


1
Processing Recursive Xquery over XML Streams The
Raindrop Approach
Mingzhu Wei Ming Li Elke A. Rundensteiner Murali
Mani Worcester Polytechnic Institute XSDM
Workshop, 2006 Supported by USA National Science
Foundation
2
Whats Special for XML Streams
Token-by-Token access manner
ltpersongt
Q1 for a in stream(persons)//person return
a, a//name
ltnamegt
Jack, Brooks

lt/namegt

lt/persongt
timeline
Pattern Retrieval on Token Streams
3
Running Example
Q1 for a in stream(persons)//person return
a, a//name
D1 1 ltpersongt 2 ltnamegt 3 Jack, Brooks
4 lt/namegt 5 ltchildrengt 6 lt/childrengt 7
lt/persongt 8 ltpersongt 9 ltnamegt
10 Amy 11 lt/namegt 12 lt/persongt
D2 1 ltpersongt 2 ltnamegt Jack,
Brooks 4 lt/namegt 5 ltchildrengt 6
ltpersongt 7 ltnamegt Will,
Brooks 9 lt/namegt 10 lt/persongt 11
lt/childrengt 12 lt/persongt
D1 not recursive
D2 recursive
4
Retrieving Patterns Using Automata
How to process / pattern retrieval in automata?
Q2 for a in stream(persons) /person return
a, a/name
person
name
s0
s1
s2
How to process // pattern retrieval in automata?
Q1 for a in stream(persons)//person return
a, a//name

?
?
person
name
s0
s1
s2
s3
s4
Automata of Q1 and its stack
5
Raindrop Algebra Plan
StructuralJoin a
op5
ltpersongt ltnamegt lt/namegt lt/persongt
ltnamegt lt/namegt
ExtractNest b
op3
ExtractUnnest a
op4
Navigate a//name-gtb
op2
Navigate //person-gta
Note that structural join (in-time structural
join) only perform Cartesian products! The person
element will be purged after generating output!
op1
Stream data
6
Problems with Recursion
D2 1 ltpersongt 2 ltnamegt Jack,
Brooks 4 lt/namegt 5 ltchildrengt 6
ltpersongt 7 ltnamegt Will,
Brooks 9 lt/namegt 10 lt/persongt 11
lt/childrengt 12 lt/persongt
StructuralJoin a
op5
ltnamegt lt/namegt
op3
ExtractNest b
op3
ltpersongt ltnamegt lt/namegt ltpersongt ltnamegt
lt/namegt lt/persongt
op4
ltnamegt lt/namegt
ExtractUnnest a
Navigate a//name-gtb
op2
Navigate //person-gta
op1
After the second person and name and joined, we
cant get the correct result for the first person.
Stream data
7
Goals
  • How to correctly process recursive data and
    recursive queries?
  • How to guarantee that data is output as early as
    possible?
  • When data is non-recursive, how to make the cost
    of the plan as cheap as possible?

8
Recursive-Mode Operators
  • Each operator has recursive mode operator
  • Associate IDs with elements
  • Each element is associated with a triple
    (startID, endID, level)
  • Given two elements and the corresponding triples,
    we can determine ancestor-descendent and
    parent-child relationships.

1 ltpersongt 2 ltnamegt Jack 4
lt/namegt 5 ltchildrengt 6 ltpersongt 7
ltnamegt Amy 9 lt/namegt 10
lt/persongt 11 lt/childrengt 12 lt/persongt
1, 12, 1
2, 4, 2
9
Features of Recursive Navigate Operators
  • Keep track of the triple for each element.
  • Call structural join only when all triples in
    Navigate operator are complete.

1 ltpersongt 2 ltnamegt Jack 4
lt/namegt 5 ltchildrengt 6 ltpersongt 7
ltnamegt Amy 9 lt/namegt 10
lt/persongt 11 lt/childrengt 12 lt/persongt
6, 10, 3
7, 9, 4
12
1, , 1
1, -,1
2, -,2
2, 4, 2
Navigate //person-gta
Navigate a//name-gtb
Navigate //person-gta
Navigate a//name-gtb
Token1
Token2
Token 9
Token12
10
Features of Recursive Extract Operators
  • ExtractUnnest
  • Compose the tokens into tuples
  • Associate ID information for each corresponding
    element
  • ExtractNest
  • Collect the tokens and creates one tuple for the
    whole collection.
  • Move the groupby functionality to the top
    structural join

11
Changes of Structural Join
a, b1
a, b2
  • In-time structural join
  • Do Cartesian product
  • ID based Structural Join
  • Change from In-time structural join to
    ID-based-comparison method
  • ID-based-comparison condition
  • (a.startID lt b.startID b.endID lt a.endID
    b.level a.level 1)
  • (a.startID lt b.startID b.endID lt a.endID)

Structural Join a
b1 b2
a
a1, b1
Valid for parent child relationship
a1, b2
Structural Join a
a2, b2
2, 4, 2
a1
b1
1, 12, 1
ExtractUnnest a
ExtractUnnest b
7, 9, 4
a2
b2
6, 10, 3
12
Structural Join Invoking Issue
  • Invoking strategy structural join will be
    invoked only when all the triples are complete.

a1, b1
a1, b2
a2, b2
clean
Structural Join a
2, 4, 2
a1
b1
1, 12, 1
ExtractUnnesta
ExtractUnnestb
7, 9,4
6, 10, 3
a2
b2
13
Another Query With ExtractNest Operators
StructuralJoin x
Q3 for x in //a return x//b, x//c
ExtractNesty
ExtractNest z
a
(1,14 )
Navigate x //c-gtz
Navigate x//b-gt y
a
b
c
(2, 9)
(10,11)
(12,13 )
Navigate //a -gt x
b
b
c
(3,4)
(5,6)
(7,8)
Stream data
  • ExtractNest ExtractUnnest GroupBy

14
Process ExtractNest GroupBy
Structural Join x
b1, b2, b3
Push GroupBy Up
c1, c2
b2, b3
c1
ExtractNesty
ExtractNestz
b1
3, 4
3
c, 7, 8
3
c1
5, 6
3
b2
c,12 ,13
2
c2
10, 11
2
b3
Navigatex//c-gt z
Navigatex//b-gt y
1
a, 1 ,14
a1
Navigate //a -gt x
Q2 for x in //a return x//b, x//c
2
a,2 , 9
a2
Stream data
It is better to do groupby in structural join
here!
15
Further Optimization
  • Using context-aware structural join

Automata
Navigate
Run-time switching from id-based structural join
to the efficient in-time- structural join
strategy.
Data is recursive
Data is not recursive
Context Check
Recursive Structural Join
In-time Structural Join
Output tuples
Purge tuples
16
Plan Optimization with Multiple Structural Joins
for a in stream (s)//a return for b
in a//b return for c in b//c
return c//d, c//e, c//f , b//f ,
a//g
Goal Try to generate as many non-recursive
operators as possible. Traverse the query plan in
a top-down manner. When a structural join that
corresponds to a path expression with // is
encountered, we instantiate this structural join
and its descendents as recursive mode operators.
17
Experiments
  • Advantages of early invocation of structural join
  • Context-aware structural join VS recursive
    structural join

18
Recursion-free Mode VS Recursive Mode
19
Related work
  • Stack-Tree-AncAJK02
  • Use stack to store the chain of ancestor
    candidates
  • Can be combined to our system
  • Transducer-based XML query processorLPY02
  • FSA without stack are not sufficient for handling
    recursion.
  • YFilter NFA-based path navigation DF03
  • Do not guarantee that the structural join is
    processed at first possible moment

20
Conclusions
  • Propose a new class of stream operators for
    recursive XQuery stream processing
  • Propose a context-aware structural join
  • Use cheaper algebra operators whenever possible
    in plan generation
  • Illustrate performance benefits with little
    overhead in experiments

21
  • http//davis.wpi.edu/dsrg/raindrop/

samanwei_at_cs.wpi.edu
22
Thank you!
Questions?
Write a Comment
User Comments (0)
About PowerShow.com