Efficient Processing of Ordered XML Twig Pattern - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Processing of Ordered XML Twig Pattern

Description:

Efficient Processing of Ordered XML Twig Pattern – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 81
Provided by: yut8
Category:

less

Transcript and Presenter's Notes

Title: Efficient Processing of Ordered XML Twig Pattern


1
Efficient Processing of Ordered XML Twig Pattern
  • by Jiaheng Lu, Tok Wang Ling, Tian Yu, Changqing
    Li, Wei Ni
  • Presented by Tian Yu
  • 23, Aug 2005

2
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

3
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

4
Introduction
  • XML data representation rapidly increases
    popularity
  • XML documents modeled as ordered trees.
  • XML queries specify patterns of selection
    predicates on multiple elements having some
    structural relationships (parent-child,
    ancestor-descendant)

5
What is a Twig Pattern?
  • A twig pattern is a small tree whose nodes are
    tags, attributes or text values and edges are
    either Parent-Child (P-C) edges or
    Ancestor-Descendant (A-D) edges.
  • E.g. Query description Selects Figure elements
    which are descendants of Paragraph elements which
    in turn are children of Section elements having
    child element Title
  • Twig pattern


Section
Paragraph
Title
Figure
6
Motivation
  • XML documents modeled as ordered trees, its
    natural to have ordered queries.
  • Four ordered axes following-sibling,
    preceding-sibling, following, preceding.
  • Example
  • ordered query
  • //book/title/following-siblingchapter
  • unordered query
  • //book/title/chapter

7
Order axis
  • Four axis following-sibling, preceding-sibling,
    following, and preceding.
  • In the sample document Set the context node to
    be f

a
Context node f Following of f i and
j Preceding of f b, c and e Following-sibling
of f i Preceding-sibling of f e
d
b
e
f
c
i
j
g
h
Sample XML document
Following-sibling of f following of f and share
the same parent with f Preceding-sibling of f
preceding of f and share the same parent with f
8
Ordered Twig Pattern
  • //chaptertitlerelated work/followingsection
  • Intuitive meaning search for all the sections
    that appear after (but are not descendents of)
    chapter elements with the title related work in
    the XML document.
  • The query node Book is ordered

9
Ordered Twig Pattern
  • //chaptertitlerelated work/followingsection

10
Ordered Twig Pattern
  • //chaptertitlerelated work/followingsection
  • If the twig pattern is unordered
  • section1, section2, and section3 are all matching
    elements.

11
Ordered Twig Pattern
  • //chaptertitlerelated work/followingsection

But for ordered query, section1 and section2 are
not in the solution. How to know that in our
method?
12
Motivation
  • Naïve Method
  • Use the existing algorithm to output the
    intermediate path solutions for each individual
    root-leaf query path
  • Merge path solutions so that the final
    solutions are guaranteed to satisfy the order
    predicates of the query.
  • Disadvantage of the naïve method
  • Many intermediate results may not contribute
    to final answers.
  • Our Solution efficient processing of ordered
    XML twig patterns.

13
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

14
XML Twig Pattern Matching
  • An XML document is commonly modeled as a rooted,
    ordered and tagged tree.

book
chapter
preface
chapter
.
Intro
section
section
paragraph
section
title
paragraph
title
paragraph

Data


XML
15
Region Coding
  • Node Label1 (startPos, endPos, LevelNum)
  • E.g.

(1,21,1)
book
(2,4,2)
(13,20,2)
(5,12,2)
preface
chapter
chapter
(3,3,3)
(9,11,3)
Intro
(17,19,3)
(6,8,3)
(14,16,3)
section
title
section
title
(7,7,4)
(15,15,4)
(18,18,4)
(10,10,4)
Data
Data


  1. M.P. Consens and T.Milo. Optimizing queries on
    files. In In Proceedings of ACM SIGMOD, 1994.

16
Region Coding
  • Given e1, e2 e1 is ancestor of e2 iff
    e1.start lt e2.start and e1.end gt e2.end.

(1,21,1)
e1
book
(2,4,2)
(13,20,2)
(5,12,2)
preface
chapter
chapter
(3,3,3)
(9,11,3)
Intro
(17,19,3)
(6,8,3)
(14,16,3)
section
title
section
title
e2
(7,7,4)
(15,15,4)
(18,18,4)
(10,10,4)
Data
Data

  1. M.P. Consens and T.Milo. Optimizing queries on
    files. In In Proceedings of ACM SIGMOD, 1994.

17
Region Coding
  • Given e1, e2 e1 is parent of e2 iff e1.start
    lt e2.start and e1.end gt e2.end , and e1.level
    1 e2.level

(1,21,1)
e1
book
(2,4,2)
(13,20,2)
(5,12,2)
e2
preface
chapter
chapter
(3,3,3)
(9,11,3)
Intro
(17,19,3)
(6,8,3)
(14,16,3)
section
title
section
title
(7,7,4)
(15,15,4)
(18,18,4)
(10,10,4)
Data
Data

  1. M.P. Consens and T.Milo. Optimizing queries on
    files. In In Proceedings of ACM SIGMOD, 1994.

18
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

19
Previous work TwigStack
  • TwigStack2 a holistic approach
  • Two-phase algorithm
  • Phase 1 TwigJoin part of intermediate root-leaf
    paths are outputted
  • Phase 2 Merge merge the intermediate paths to
    get the final results
  • 2. N. Bruno, D. Srivastava, and N. Koudas.
    Holistic twig joins optimal xml pattern
    matching. In In Proceedings of ACM SIGMOD, 2002.

20
Sub-optimality of TwigStack
  • TwigStack optimal when the query contains only
    ancester-descendant relationship
  • If the query contains any parent-child
    relationship, TwigStack may output some
    intermediate path solutions that cannot
    contribute to final results.
  • We call that TwigStack is sub-optimal for queries
    with parent-child relationships.

21
TwigStackList
  • The main problem of TwigStack is to assume all
    edges are ancestor-descendant relationship in the
    first phase. So it is not efficient for queries
    with parent-child relationships.
  • Improved method TwigStackList3 CIKM 2004
  • There is an additional list structure for each
    query node to cache elements that likely
    participate in final solutions.
  • TwigStackList3 is an improvement algorithm for
    TwigStack, since it considers parent-child
    relationships in the first phase.
  • TwigStackList is optimal when there is no P-C
    edge for branching nodes (a branch node is a node
    with more than one descendant or child)

3. J. Lu, T. Chen, and T. W. Ling. Efficient
processing of xml twig patterns with parent child
edges a look-ahead approach. In CIKM, pages 533-
542, 2004.
22
TwigStackList v.s. TwigStack
Twig Pattern
Root
An XML tree
section
s2
s1
s1
title
p2
t3
paragraph
t1
p1
t1
No Parent-child relationship for branching node
p3
t2
figure
f1
f2
  • TwigStack output the it output the uesless path
    solution
  • lt s1,t1gt, since it doesnt check for
    parent-child relationsihp.
  • TwigStackList has no uesless output. lt s1,t1gt is
    not in the output.

23
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

24
Ordered Children Extension (OCE)
  • Definition
  • An element en (of Type n) has an OCE if
  • 1) In the query Q, for all A-D children of n
    (if any), n, there is an element en (with tag
    n) that is a descendant of en , and en also
    has an OCE and
  • 2) In the query Q, for all P-C children of n
    (if any), n, there is an element e (with tag n)
    in the path en to en such that e is the parent
    of en, and en also has an OCE and
  • 3) For each child (or descendant) n of n, if
    there is an node m that is the immediate
    rightSibling of n, there are elements en and em
    such that en is a child (or descendant) of
    element en, en.end lt em.start, and both en and
    emi have OCE.

The first two conditions are guaranteed in
twigStackList Our main focus is in the third
condition
25
Ordered Children Extension (OCE)
  • Definition
  • Condition 3)
  • For each child (or descendant) n of n, if
    there is an node m that is the immediate
    rightSibling of n, there are elements en and em
    such that en is a child (or descendant) of
    element en, en.end lt em.start, and both en and
    emi have OCE.

en
n
gt
m
n
em
En
XML document
Ordered XML Query
26
Ordered Children Extension (OCE)
  • In an Ordered XML query
  • If node n is ordered node
  • In order to find its OCE, all the three
    previous conditions must be checked.
  • If node n is an unordered node
  • In order to find its OCE, only the first
    two conditions need to be checked. The last
    condition does not apply.

27
Ordered Children Extension Example 1
Document
Query
a1
a
gt

c1
e2
e1
b
d
c
b1
d1
28
Ordered Children Extension Example 1
Document
Query
a1
a
gt

c1
e2
e1
b
d
c
b1
d1
a1 has an OCE
29
Ordered Children Extension Example 1
Document
Query
a1
a
gt

c1
e2
e1
b
d
c
b1
d1
a1 has an OCE 1) a1 has descendants b1 and d1,
and child c1 (fulfill condition 1, 2 of OCE
definition) 2) b1 has a right sibling element c1
, and c1 has a right sibling element d1 (fulfill
condition 3 of OCE definition)
30
Ordered Children Extension Example 2
Document
Query
a1
a
gt

c1
e1
b
d
c
b1
d1
31
Ordered Children Extension Example 2
Document
Query
a1
a
gt
c1
e1
b
d
c
b1
d1
a1 doesnt have any OCE
32
Ordered Children Extension Example 2
Document
Query
a1
a
gt
c1
e1
b
d
c
b1
d1
a1 doesnt have any OCE 1) a1 has descendants b1
and d1, and child c1 (fulfill condition 1, 2 of
OCE definition) 2) b1 has a right sibling node c1
(fulfill condition 3 of OCE definition) 3)
However, c1 only has descendant of d1. There is
no element with the labeld d that is a right
sibling of element c1 (doesnt satisfy condition
3 of OCE definition)
33
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

34
Data structure
  • Each node n in the twig query has Stream, List,
    and Stack
  • Data Stream Tn
  • we partition an XML document into streams
  • All elements in a stream are of the same tag and
    ordered by their start Position
  • The elements in each stream is read only once
    from head to tail.

a1
Level 1
Ta
a1, a2, a3
a
gt
a3
b2
a2
2
b1 , b2
d1, d2, d3
d
c
b
Tb
Td
d3
d1
3
d2
b1
C1 , C2
Tc
4
c2
c1
Document
35
Data structure
  • Each node n in the twig query has Stream, List,
    and Stack
  • List Ln
  • The elements in lists help to check for P-C
    relationship
  • Elements in each list Ln are strictly nested from
    the first to the end, i.e. in the XML document,
    each element is an ancestor or parent of the
    following element.

La
a1, a2
a
gt
Lb
b1 ..
d
c
b
Ld
d1 ,d3
C1
Lc
36
Data structure
  • Each node n in the twig query has Stream, List,
    and Stack
  • Stack Sn
  • Stacks is used to store elements that have at
    least one OCE
  • Elements in the stack are potential solutions of
    the XML query.
  • When we insert an new element into a stack, the
    top element of the stack is popped out if the top
    of the stack doesnt have A-D relationship with
    the new element.

Sa
a
gt
d
c
b
Sb
Sd
Sc
37
A holistic matching algorithm OrderedTJ
  • We propose a general algorithm, OrderedTJ, that
    computes answers to an ordered query twig.
  • Our key focus is to check the ordered nodes in
    the query and find elements which has at least
    one OCE.

38
Main function
  • OrderedTJ Main function operates in two phases.

39
Main function
  • OrderedTJ Main function operates in two phases.

Important function
Phase 1
Phase 2
Phase 1 Parts of query root-leaf paths are
output. The ordering requirements in the ordered
query is checked. Phase 2 These solutions are
merged-joined to compute the answers to the whole
query.
40
getNext(n)
  • It gets the next stream to be processed and
    advanced

Check Order
Check P-C
41
An example of OrderedTJ algorithm
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Next Action
Title
t1, t2, t3
Partition an XML document into streams
related work
Related work
42
An example of OrderedTJ algorithm
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3

Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1

Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
t1, t2, t3
Next Action
Show lists for nodes with P-C child
related work
Related work
43
An example of OrderedTJ algorithm
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3

Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1

Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Show Stacks of every node in the query
related work
Related work
44
An example of OrderedTJ algorithm
b1
Document
c1
t1 has no descendant related work
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3

Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1

Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
advance (Title)
related work
Related work
45
An example of OrderedTJ algorithm
b1
Document
t2 has descendant related work
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3

Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1

Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Insert t2 into the list of Title
related work
Related work
46
An example of OrderedTJ algorithm
b1
Document
C1 has no descendant title that has child
related work
c1
c2
c3
Book
Query
gt
t2
s2
s1
t1
s3
t3

Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Advance (Chapter)
related work
Related work
47
An example of OrderedTJ algorithm
b1
Document
C2 has a descendant t2 that has child related
work
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3

Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Insert c2 into the list of chapter
related work
Related work
48
An example of OrderedTJ algorithm
b1
Document
c1
s1 is not the following element of c2
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Advance(Section)
related work
Related work
49
An example of OrderedTJ algorithm
b1
Document
c1
c3
c2
Book
Query
gt
s2 is not the following element of c2
t2
s2
t1
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Advance(Section)
related work
Related work
50
An example of OrderedTJ algorithm
b1 is has an OCE
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push b1 into the stack of Book
related work
Related work
51
An example of OrderedTJ algorithm
b1
b1
Document
c1
c2 is has an OCE
c2
c3
Book
Query
gt
t1
t2
s2
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push c2 into the stack of Chapter
related work
Related work
52
An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
t2 is has an OCE
t1
t2
s2
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push t2 into the stack of Title
related work
Related work
53
An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
rel.. is the leaf node
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push r to into the stack of Related work
related work
Related work
54
An example of OrderedTJ algorithm
b1
b1
Document
A path is found
c1
c2
c3
Book
Query
gt
c2
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Introduction
Algorithm
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
Next Action
t1, t2, t3
Output b1, c2, t2,r
related work
Related work
55
An example of OrderedTJ algorithm
b1
b1
Document
s3 is a leaf node and follows element c2
c1
c2
c3
Book
Query
gt
c2
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
Next Action
t1, t2, t3
Push s3 into stack
related work
Related work
56
An example of OrderedTJ algorithm
b1
b1
Document
A path is found
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Next Action
Title
Output b1, s3
t1, t2, t3
related work
Related work
57
An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
Previous Output Output b1, c2, t2,r
t1, t2, t3
related work
Related work
Output b1, s3
58
An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Next Action
Title
t1, t2, t3
Join the output paths
related work
A match is found
Related work
59
An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
t1, t2, t3
related work
A match is found
Related work
60
Optimality of OrderedTJ
  • TwigStack doesnt consider P-C relationship,
    therefore, it produce more intermediate result
    than TwigStackList.
  • Therefore, we compare the optimality of our
    OrderedTJ with TwigStackList.
  • Example we match ordered query1 in XML document
    1 using the two algorithms TwigStackList, and
    OrderedTJ.

a1
a
Query 1
Document 1
gt
c1
a2
b
c
b1
61
Optimality of OrderedTJ
  • TwigStackList can only solve ordered XML query
    with naïve method.
  • Therefore, it convert query 1 to query 2, by
    removing the ordered sign in the twig pattern.

a1
a
Query 1
Query 2
a
Document 1
gt
c1
a2
b
c
c
b
b1
62
Optimality of OrderedTJ
  • Sub-optimality of TwigStackList
  • When there is a P-C relationship at the
    branching node, there could be redundant
    intermediate output.
  • In this example
  • In the streams, the elements are read only
    once from head to tail.
  • Therefore, when the TwigStackList process
    element a1, c1, and b1. There is no way to decide
    if there is an element b2 that is a child of a1

Therefore, the algorithm outputs useless solution
lta1,c1gt
a1
a
Query 2
Document
TwigStackList
b2
c1
a2
b
c
b1
63
Optimality of OrderedTJ
  • Optimality of OrderedTJt
  • It allows the existence of parent-child
    relationship in the first branching edge for the
    ordered node.
  • In this example
  • Therefore, when the OrderedTJ process
    element a1, c1, and b1. Since there is no element
    with tag name b before c1. It doesnt satisfy
    condition 3 in the definition of OCE. c1 does not
    contribute to any final answer

Therefore, the algorithm doesnt outputs useless
solution lta1,c1gt
a1
a
gt
Query 1
Document
OrderedTJ
c1
a2
b
c
b1
64
Optimality of OrderedTJ
TwigStack Optimality
A-D only
TwigStack optimal for A-D only queries.
65
Optimality of OrderedTJ
TwigStackList Optimality
A-D for branching node
A-D only
TwigStackList optimal for queries that only has
A-D edge for branching node. The other edges in
the query can be P-C edge.
66
Optimality of OrderedTJ
OrderedTJ Optimality
P-C for 1-Branch of ordered node
A-D for branching node
A-D only
OrderedTJ It allows the existence of
parent-child relationship in the first branching
edge for the ordered nodes
67
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

68
Experiments
  • Algorithms for comparison
  • straightforward -TwigStack (short STW)
  • straightforward-TwigStackList (STWL)
  • Our proposed OrderedTJ
  • Benchmarks
  • XMark Synthetic Data
  • Size 115 M bytes factor1.0
  • Treebank Real Data from Wall Street Journal
  • Size 82M bytes nodes2.5 million

69
Experiments
  • Testing Queires
  • Q1, Q2, Q3 for XMark Q4,Q5,Q6 for TreeBank)
  • Evaluation metrics
  • Number of intermediate path solutions
  • Total running time

70
Experiments Execution Time
OrderedTJ outputs less intermediate
result Therefore, it has less execution time
71
Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Table 1. The number of intermediate path solutions
OrderedTJ has the smallest intermediate results
72
Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Table 1. The number of intermediate path solutions
For all queries, OrderedTJ has the smallest
intermediate results.
73
Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Query 1
gt
test
bold
keyword
Table 1. The number of intermediate path solutions
Only A-D edges, therefore, STW and STWL output
same intermediate result. However, OrderedTJ has
less intermediate result since it also considers
the ordering relationship.
74
Experiments Intermediate result
Query 4
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
S
VP
gt
PP
IN
NP
VBN
Table 1. The number of intermediate path solutions
It has P-C edges for non-branching nodes.
Therefore, STWL output less intermediate result
than STW. OrderedTJ output even less intermediate
result since it also consider the ordering
relationship. OrderedTJ still has redundant
intermediate result comparing with the final
useful result. It is because there is P-C edges
on the second branch of ordered node PP
75
Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Query 6
S
gt
DT
PRP_DOLLAR_
Table 1. The number of intermediate path solutions
STWL output less intermediate result than STW,
since there is a P-C edge in the query. OrderedTJ
output no redundant intermediate result comparing
with the final useful result. It is because it
only has a P-C edge on the first branch of
ordered node PP OrderedTJ is optimal in this case
76
Outline
  • Introduction and motivation
  • Background
  • XML tree and twig pattern matching
  • Previous two algorithms TwigStack and
    TwigStackList
  • Our Ordered Twig Algorithms
  • Ordered Children Extension (for short OCE)
  • A generalized holistic matching algorithm
    OrderedTJ
  • Experiments
  • Conclusion

77
Conclusions
  • We developed a new algorithm orderedTJ to solve
    the problem of Ordered Twig Pattern matching.
  • Our algorithm orderedTJ can identify a larger
    query class to guarantee I/O optimality.
  • Experimental results showed the effectiveness,
    scalability, and efficiency of our algorithm.
  • Future work implement more efficient indexing
    method, e.g. B tree or R tree to skip XML
    elements.

78
Reference(1)
  • 1 M.P. Consens and T.Milo. Optimizing queries
    on files. In In Proceedings of ACM SIGMOD, 1994
  • Node Label Regional encoding.
  • 2 N. Bruno, D. Srivastava, and N. Koudas.
    Holistic twig joins optimal XML pattern
    matching. In SIGMOD Conference, pages 310 - 321,
    2002
  • Propose TwigStack algorithm
  • 3 J. Lu, T. Chen, and T. W. Ling. Efficient
    processing of xml twig patterns with parent child
    edges a look-ahead approach. In CIKM, pages
    533-542, 2004.
  • Propose TwigStackList algorithm

79
Reference(2)
  • 4 Y. Chen, S. B. Davidson, and Y. Zheng. BLAS
    An efficient XPath processing system. In Proc. of
    SIGMOD, pages 47-58, 2004.
  • Propose a new algorithm for XPath query
  • 5 J. Lu, T. W. Ling. C.Y Chan and T. Chen, From
    Region Encoding To Extended Dewey On Efficient
    Processing of XML Twig Pattern Matching In VLDB
    2005
  • Propose a new twig pattern matching
    algorithm
  • based on a proposed prefix labeling
    scheme

80
END
  • Thank you!
  • Q A
Write a Comment
User Comments (0)
About PowerShow.com