Holistic Twig Joins: Optimal XML Pattern Matching - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Holistic Twig Joins: Optimal XML Pattern Matching

Description:

Holistic Twig Joins: Optimal XML Pattern Matching. Author: Nicolas Bruno. Nick Koudas ... An XML database is a forest of rooted, ordered, labeled trees, each node ... – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 29
Provided by: homepage9
Category:

less

Transcript and Presenter's Notes

Title: Holistic Twig Joins: Optimal XML Pattern Matching


1
Holistic Twig Joins Optimal XML Pattern Matching
  • Author Nicolas Bruno
  • Nick Koudas
  • Divesh Srivastava
  • Presented by Huang Yukai

2
Outline
  • 1. Background
  • 2. Algorithm PathStack
  • 3. Algorithm TwigStack
  • 4. Experiments
  • 5. Conclusion References

3
Background (1)
  • An XML database is a forest of rooted, ordered,
    labeled trees, each node corresponding to an
    element or a value, and the edges representing
    the relationships.

ltbookgt lttitlegt XML lt/titlegt ltall
authorsgt ltauthorgt ltfngt jane lt/fngt ltlngt poe
lt/fngt ltauthorgt ...... lt/all authorsgt ltyeargt
2000 lt/yeargt ...... lt/bookgt
book
title
all authors
year
...
XML
author
2000
...
fn
ln
jane
poe
4
Background (2)
  • XML queries specify patterns of selection
    predicates on multiple elements.
  • e.g. booktitle xml//authorfn jane AND
    ln poe

book
  • Finding all occurrences of such a twig pattern in
    an XML database is a core operation for XML query
    processing.

title
author
fn
ln
xml
jane
poe
5
Background (3)
  • The previous work
  • (i) decompose the twig pattern into binary
    structural relationships.
  • (ii) match the binary relationships against the
    XML database.
  • (iii) stitch together these basic matches.

book
  • Drawbacks
  • the intermediate result size can get too large!

title
author
fn
ln
motivation algorithm complexity should be
independent of the size of intermediate results
xml
jane
poe
6
Background (4)
  • Labeling method

7
Outline
  • 1. Background
  • 2. Algorithm PathStack
  • 3. Algorithm TwigStack
  • 4. Experiments
  • 5. Conclusion References

8
PathStack (1)
  • Notation
  • q denote twig patterns (and the root node of
    twig pattern)
  • Associated with each node q in query twig pattern
    there is a stream Tq contains all the positional
    representations of node q in database and sorted
    by (DocId, LeftPos).
  • Associated with each q there is a stack Sq
    contains the pairs (positional representation in
    Tq, point to a node in Sparent(q))

9
PathStack (2.1) an example
  • Input data D and query Q
  • position representations of each nodes in data

A1
B1
A1 (1,19,1) B1 (1,28,2) A1 (1,37,3) B1
(1,46,4) C1 (1,55,5)
A
A2
B
B2
C
C1
Q
D
10
PathStack (2.2) an example
  • Phase 1 construct stacks

(1) For each node in q, we get stream Tq from XML
database
Ta A1, A2 (eof) Tb B1, B2 (eof) Tc C1 (eof)
(2) loop while eof(Tq) is false, where q is leaf
node.


Push stack
A1
B1
A2
C1
B2
A2
B2
C1
A1
B1
Sc
SB
SA
node C is leaf node and Tc is eof.
11
PathStack (2.3) an example
  • Phase 2 output all the solutions with the stacks



A2
B2
C1
A1
B1
Sc
SB
SA
Query results A1B1C1 A1B2C1 A2B2C1
12
PathStack (3)
  • Algorithm PathStack

13
PathStack (4)
  • Procedure showSolutions

14
Outline
  • 1. Background
  • 2. Algorithm PathStack
  • 3. Algorithm TwigStack
  • 4. Experiments
  • 5. Conclusion References

15
TwigStack (1)
  • A straightforward way is to decompose the twig
    into multiple path patterns, use PathStack to get
    partial solutions and merge them finally.
  • Drawback!

16
TwigStack (2) an example
  • Input data D and query Q

R
A
A1
A2
B
C
B1
C1
B2
C2
D
F
D1
E
D2
F
  • How to skip node A1?

D
Q
17
(No Transcript)
18
XB-trees
  • XB-tree a variant B-tree for indexing the
    positional representation (DocId, LeftPos
    RightPos, LevelNum) of elements in XML tree.
  • The nodes in page are sorted by DocId and
    LeftPos.
  • The node in internal page contains bounding
    segment N.L, N.R, all its child nodes included
    in the segment.
  • Two operations over XB-trees Advance Drilldown
  • TwigStackXB (omitted)

19
Outline
  • 1. Background
  • 2. Algorithm PathStack
  • 3. Algorithm TwigStack
  • 4. Experiments
  • 5. Conclusion References

20
Experiments (1)
  • Experimental Setting
  • Implemented in C
  • A computer with 550Mhz Pentium III processor,
    768MB of main memory and a 2GB disk.
  • Datasets
  • (a) synthetic data random generated trees with
    parameters depth, fan-out and labels.
  • (b) real-world data an unfolded fragment of
    the DBLP database.

21
Experiments (2)
  • PathStack vs. Binary Structural Joins

22
Experiments (3.1)
  • PathStack vs. PathMPMJ3

23
Experiments (3.2)
  • PathStack vs. PathMPMJ3

24
Experiments (4)
  • PathStack vs. TwigStack
  • Queries

25
(No Transcript)
26
Experiments (5)
  • Using XB-trees

27
Conclusions
  • Holistic join algorithms PathStack and
    TwigStack.
  • More issues
  • to handle more complicated XPath expressions.
  • value-based joins (e.g. links across documents)

28
References
  • 1 N. Bruno, N. Koudas, D. Srivastava. Holistic
    Twig Joins Optimal XML Pattern Matching.
    Technical Report. ...
  • 2 S.Al-Khalifa, H. V. Jagadish, N. Koudas, J.
    M. Patel, D.Srivastava, and Y. Wu. Structural
    joins A primitive for efficient XML query
    pattern matching. ICDE 02. some stack-based
    algorithms for joins
  • 3 C. Zhang, J. Naughton, D. Dewitt, Q. Luo, and
    G. Lohman. On supporting containment queries in
    relational database management systems. SIGMOD
    01. the MPMGJN algorithm
Write a Comment
User Comments (0)
About PowerShow.com