<AUTHORS> - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

<AUTHORS>

Description:

By Value: get me 'document'; get me 'element= node1' ' or 'attribute=10' ... To join intermediate results from sub-expressions with a list of elements and a ... – PowerPoint PPT presentation

Number of Views:13

Avg rating:3.0/5.0

Slides: 21

Provided by: gnanas

Learn more at: http://www.cise.ufl.edu

Category:

Tags: subelement

more less

Transcript and Presenter's Notes

Title: <AUTHORS>

1
ltTITLEgtIndexing Querying XML Data for
../Regular Path Expressions/lt/TITLEgt

ltAUTHORSgt
ltNAME ID1gtQuanzhong lilt/NAMEgt
ltNAME ID2gtBongki MOONlt/NAMEgt
ltAUTHORSgt

ltPRESENTERSgt
ltNAME UFID1234567gtSUNDARlt/NAMEgt
ltNAME UFID7654321gtsUPRIYAlt/NAMEgt
ltPRESENTERSgt

2
Need for this paper

XML emerged as a popular standard for data
representation and data exchange on the Internet
XML Query Languages use Regular Path Expressions
to query the data
Conventional approaches (for indexing searching
this data) based on Tree traversals goes for a
toss! under heavy access requests
Traversing this hierarchy of XML data becomes a
overhead if the path lengths are long or unknown
What can be done???

3
Try our System and the Algorithms !!!

New system for indexing storing XML data XISS
New numbering scheme for elements and attributes
Quick in figuring-out ancestor-descendant
relationship
New index structures
Easier to find all elements and attributes with a
particular given name string
Join algorithms for processing Reg-Path-Exp
queries
EE-Join to search paths from element to element
EA-Join to find element-attribute pairs
KC-Join to find KC () on repeated paths or
elements

4
Go XISS!!!

In general, XML data can be queried for a
particular value (or) a structure
By Value get me document get me
elementnode1 or attribute10
By Structure get me parent and child
elements/attributes for a given element
Components
Index Structure element, attribute and structure
(index)
Data Loader
Query Processor
Numbering Scheme first..

5
Deitz vs. Li-Moon

Deitz says, If x and y are the nodes of a tree
T, x is an ancestor of y iff x comes before y
when I climb down the tree (pre-order), and
after y when I climb up (post-order) and shows
us his scheme,
Ancestor-Descendant relationship
determination in constant time
Li-Moon says, but this lacks flexibility
This leads to many re-computations
when a new node is inserted.
Hmm let us check-out Li-Moons.

6
Li-Moons Numbering

Hey folks, we are going to extend this preorder
and cover up a range of descendants ?
Just associate a pair of numbers ltorder, sizegt
with each node
Parent node x says to its child node y, I came
before you so my order is less than yours my
size is gt (your order your size) and so your
interval is always contained in my interval
If there are siblings x y (same parent), say, x
is before y, then order(x) size(x) lt order(y)

7
Voila!

Here it goes,
So, for any node x, size(x) gt size of all its
direct children size(x) is Laarrrge!
That being said, Given nodes x and y of a tree
T, x is an ancestor of y iff
order(x) lt order(y) lt order(x)
size(x)

8
Good news!

Easy accommodation of future insertions more
flexible
Global reordering not necessary until no more
reserved spaces
order in ltorder, sizegt pair is an unique
identifier for each element and attribute in the
document
Attribute nodes are placed before their sibling
elements in the order why?
How this scheme helps? wait till the
algorithms!
Switching back to XISS

9
Internals of XISS

Index Structure Overview

10
More structures

Element Index
Structure Index

11
Path Join Algorithms

Conventional approaches (top down, bottom up and
hybrid traversals) not effective
Main Idea of proposed algorithm
For a given query chapter/-/figure,
- find all chapter elements
- find all figure elements
- join the qualified chapter-figure
pairs without
traversing XML data trees (if ancestor-
descendant relationship is obtained
quickly)

12
Complex -gt Simple

Complex path expression decomposed to many simple
path expressions
Intermediate results are joined to get the final
result.
Different types of sub-expressions

13
EA-Join Algorithm

To join intermediate results from sub-expressions
with a list of elements and a list of attributes
E.g. figure_at_captionflowchart
Attributes should be placed before sibling
elements in the order by the numbering scheme

14
EA-Join Algorithm

Input List of figure elements and List of
caption attributes grouped by documents
Steps (2 stages)
Element sets and attribute sets merged by doc. Id
(single scan)
Elements and attributes are merged by figuring
out the parent-child relationship using ltordergt
value (single scan)
Output A set of (e, a) pairs where e is the
parent of a

15
EE-Join Algorithm

To join intermediate results each of which is a
list of elements from a sub-expression
E.g. chapter/-/figure
Input List of chapter elements and List of
figure elements
Steps (2 stages) are similar to EA-Algorithm
Both element sets are merged by doc. Id (single
scan)
Chapter element and Figure element are merged by
finding the ancestor-descendant relationship
using ltorder, sizegt values
Output A set of (e, f) pairs where e is the
ancestor of f

16
EE-Algorithm

The second stage cannot be done in a single scan
In this E.g. , a figure element can be
descendant of more than one chapter element
(see book1.xml)
order(figure) will lie in more than one chapter
interval (order(chapter), order(chapter)
size(chapter))
This multiple-times scan is still highly
effective in searching long or unknown length
paths when compared to the conventional tree
traversals.

17
KC-Algorithm

Processes a regular path expression with zero,
one or more occurrences of a subexpression
E.g. chapter, chapter
Input Set of elements from an XML document
Steps
In each stage applies EE-Algorithm to previous
stages result
Repeat until no change in result
Output Kleene Closure of all elements in the
given input set

18
Experiments.. ? ?