Indexing and Querying XML Data for Regular Path Expressions - PowerPoint PPT Presentation

About This Presentation

Title:

Indexing and Querying XML Data for Regular Path Expressions

Description:

XML has tree structured data model. ... And Attributes in the. Same Document. nid, order,size , Parent order, Child order, Sibling order, ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 30

Provided by: scan151

Category:

more less

Transcript and Presenter's Notes

Title: Indexing and Querying XML Data for Regular Path Expressions

1
Indexing and Querying XML Data for Regular Path
Expressions

Quanzhong Li and Bongki Moon
Dept. of Computer Science
University of Arizona
VLDB 2001.

2
Querying XML

XML has tree structured data model.
Queries involve navigating data using regular
path expressions.(e.g., XPath)
e.g. /chapter/-/figure_at_captionTree Frogs
Accessing all elements with same name string.
Ancestor-descendant relationship between
elements.

3
Contribution

New system for Indexing XML data.
Querying XML data based on a numbering scheme for
elements
Join algorithms for processing complex regular
path expressions.

4
Outline

Numbering scheme
Index structure
Join algorithms
Experimental results

5
Path expression evaluation

Previous approaches
Conventional tree traversals
Disadvantage Overhead of traversing for long or
unknown path lengths.
New approach
Indexing for efficient element access.
Numbering scheme for ancestor-descendant
relationship.

6
Dietzs Numbering Scheme
(1,7)

for two given nodes x and y, x is an ancestor of
y, if and only if
x occurs before y in the preorder traversal of T
and
after y in postorder traversal.

(6,6)
(2,4)
(7,5)
(3,1)
(5,3)
(4,2)
7
Proposed numbering scheme

This associates with each node
a pair of numbers ltorder, sizegt
as follows
For a tree node y and its parent x,
order(x) lt order(y)
order(y)size(y) lt order(x) size(x)
For two sibling nodes x and y, if x is the
predecessor of y in preorder traversal then
order(x) size(x) lt order(y)

(1,100)
(10,30)
(41,10)
(45,5)
(25,5)
(11,5)
(17,5)
8
Advantages

Efficient Updates
Extra space can be reserved to accommodate future
insertions.

9
Ancestordescendant relationship

For two given nodes x and y of a tree T, x is an
ancestor of y if and only if
order(x) lt order(y) lt order(x) size(x).

10
Outline

Numbering scheme
Index structure
Join algorithms
Experimental results

11
Index and Data Organization
Query Processor
Query
Result
XISS
Element Index
Attribute Index
Structure Index
Name Index
Value Table
XML Raw Data
Document Loader
Paged File
12
Element Index
Element nid
Element nid
Document ID list
B-tree
B-tree
ltOrder, Sizegt Depth, Parent ID
Element Record
Element list with the Same name in the Same
Document
13
Structure Index
B-tree
Document ID (did)
nid, ltorder,sizegt, Parent order, Child
order, Sibling order, Attribute order
Array of All Elements And Attributes in the Same
Document
14
Outline

Numbering scheme
Index structure
Join algorithms
Experimental results

15
Regular Path expression

complex regular path expressions.
e.g., /chapter/_/figure_at_captionTree Frogs

Symbol Function of symbol
__ Any single node
/ Union of node
Zero or more occurrences of a node
_at_ Denotes attributes
16
Regular expression Decomposition

A regular path expression can be decomposed to a
combination of following basic subexpressions
A subexpression with a single element or a single
attribute,
A subexpression with an element and an attribute
( e.g., figure_at_caption Tree Frogs)
A subexpression with two elements (e.g.,
chapter/figure or chapter/_/figure),
A subexpression with a Kleene closure (,) of
another subexpression, and
A subexpression that is a union of two other
subexpressions.

17
Example

( E1 / E2 ) / E3 / ( ( E4 _at_A v ) ( E5 /
_ / E6 ) )

E2
E3
E4
_at_Av
E5
E6
E1

/
/_/
EE-Join
EA-Join
EE-Join

/
KC-Join
Union
/
EE-Join
/
EE-Join
18
Join algorithms

Element Attribute join
Element Element join
Kleene Closure join

19
EA-Join Algorithm

Input
E1..Em Ei is a set of elements having a common
document identifier
A1..An Aj is a set of attributes having a
common document identifier
Output
A set of (e,a) pairs such that the element e is
the parent of the attribute a.
//Sort-merge Ei and Aj by document
identifier.
For each Ei and Aj with the same did do
//Sort-merge Ei and Aj by PARENT-CHILD
relationship.
For each e in Ei and a in Aj do
If ( e is a parent of a) then output (e,a)
End
End.

20
Example
book
chapter
chapter
chapter
appendix
Figure
Figure
Figure
21
Attribute-element position
chapter lt1,3gt
chapter lt1,3gt
chapterlt2,1gt
chapter lt3,1gt
name lt4,0gt
namelt2,0gt
name lt4,0gt
name lt3,0gt
22
EE-Join Algorithm

Input
E1..Em and F1..Fn Ei and Fj is a set of
elements having a common document identifier.
Output
A set of (e,f) pairs such that the element e is
an ancestor of the element f.
//Sort-merge Ei and Fj by doc. identifier.
For each Ei and Fj with the same did do
//Sort-merge Ei and Fj by ANCESTOR-DESCENDANT
relationship.
For each e in Ei and f in Fj do
If (e is an ancestor of f ) then output (e,f)
End
End

23
Extreme case of EE-Join
chapter lt1,90gt
chapter lt2,80gt
chapter lt8,20gt
chapter lt9,10gt
figure lt19,0gt
figure lt10,0gt
figure lt11,0gt
24
KC-Join Algorithm