Jiaheng Lu, Ting Chen and Tok Wang Ling - PowerPoint PPT Presentation

About This Presentation
Title:

Jiaheng Lu, Ting Chen and Tok Wang Ling

Description:

Jiaheng Lu, Ting Chen and Tok Wang Ling ... (1) N. Bruno, D. Srivastava, and N. Koudas. Holistic twig joins: optimal XML pattern matching. ... – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 2
Provided by: g0605
Category:
Tags: bruno | chen | jiaheng | ling | ting | tok | wang

less

Transcript and Presenter's Notes

Title: Jiaheng Lu, Ting Chen and Tok Wang Ling


1
TJFast Effective Processing of XML Twig Pattern
Matching
Jiaheng Lu, Ting Chen
and Tok Wang Ling National University of
Singapore lujiahen,chent,lingtw_at_comp.nus.edu.s
g
1. INTRODUCTION
Extended Dewey solve two problems Wildcards
query and Query performance
Finding all the occurrences of a twig pattern in
an XML database is a core operation for efficient
evaluation of XML queries. Our motivation is (1)
The performance of previous holistic twig join
algorithms12 can be further improved. (2)
Algorithm based on region encoding CANNOT answer
queries with wildcards in branching nodes. For
example.
Extended Dewey is a prefix-based labeling
scheme. So we can answer this wildcards query
based on extended Dewey.
Given an extended Dewey label, we can use
statistics information and model function to
derive its path For example 0.5.1.1
bib/book/chapter/section/t
ext 1.2.1.1
bib/book/chapter/section/section
According to region codes, which document, Doc1
or Doc2, matches query?
To answer a twig pattern query, we propose a
new holistic twig join algorithm, called TJFast.
Compared to previous algorithms, to answer path
and twig queries, we only need to access the
labels of leaf nodes, So we significantly reduce
I/O cost. For example, given a path query
//chapter/section/text, we only access the labels
of text to answer this query. Given a twig
query //chapter/section.//keyword/text, We
only scan keyword and text.
3. TJFAST
By reading the region encoding of elements a,b,c
alone, we CANNOT answer this wildcards branching
query. In this paper, we provide a new
prefix-based labeling scheme, called extended
Dewey.
2. FEATURES OF EXTENDED DEWEY
4. Preliminary experiments
We use the random data sets (with 3 millions
nodes) consisting of seven labels, namely
a,b,...,e. The node labels in the data were
uniformly distributed. We issue four twig
queries a.//b//c, a./b/c, a./b/c/d/e,
a.//b/c//d/e, We compare our method with the
previous work TwigStack1 and TwigStackList2.
A sample XML tree with extended Dewey labels
Unlike the previous region encoding labeling
scheme, extended Dewey has two unique features
Reference (1) N. Bruno, D. Srivastava, and N.
Koudas. Holistic twig joins optimal XML pattern
matching. In SIGMOD Conference, pages 310-321,
2002. (2) J. Lu, T. Chen, and T. W. Ling.
Efficient processing of xml twig patterns with
parent child edges a look-ahead approach. In
CIKM, pages 533542, 2004 (3) P. O'Neil et al.
ORDPATHs Insert-friendly XML node labels SIGMOD
pages 903908, 2004.
  1. Extended Dewey enable us to efficiently answer
    wildcards queries.
  2. We only need to access the labels of query leaf
    nodes to answer a twig query. So we significantly
    reduce I/O cost of join algorithm.
Write a Comment
User Comments (0)
About PowerShow.com