Indexing and Querying XML Data for Regular Path Expressions - PowerPoint PPT Presentation

About This Presentation
Title:

Indexing and Querying XML Data for Regular Path Expressions

Description:

XML has tree structured data model. ... And Attributes in the. Same Document. nid, order,size , Parent order, Child order, Sibling order, ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 30
Provided by: scan151
Category:

less

Transcript and Presenter's Notes

Title: Indexing and Querying XML Data for Regular Path Expressions


1
Indexing and Querying XML Data for Regular Path
Expressions
  • Quanzhong Li and Bongki Moon
  • Dept. of Computer Science
  • University of Arizona
  • VLDB 2001.

2
Querying XML
  • XML has tree structured data model.
  • Queries involve navigating data using regular
    path expressions.(e.g., XPath)
  • e.g. /chapter/-/figure_at_captionTree Frogs
  • Accessing all elements with same name string.
  • Ancestor-descendant relationship between
    elements.

3
Contribution
  • New system for Indexing XML data.
  • Querying XML data based on a numbering scheme for
    elements
  • Join algorithms for processing complex regular
    path expressions.

4
Outline
  • Numbering scheme
  • Index structure
  • Join algorithms
  • Experimental results

5
Path expression evaluation
  • Previous approaches
  • Conventional tree traversals
  • Disadvantage Overhead of traversing for long or
    unknown path lengths.
  • New approach
  • Indexing for efficient element access.
  • Numbering scheme for ancestor-descendant
    relationship.

6
Dietzs Numbering Scheme
(1,7)
  • for two given nodes x and y, x is an ancestor of
    y, if and only if
  • x occurs before y in the preorder traversal of T
    and
  • after y in postorder traversal.

(6,6)
(2,4)
(7,5)
(3,1)
(5,3)
(4,2)
7
Proposed numbering scheme
  • This associates with each node
  • a pair of numbers ltorder, sizegt
  • as follows
  • For a tree node y and its parent x,
  • order(x) lt order(y)
  • order(y)size(y) lt order(x) size(x)
  • For two sibling nodes x and y, if x is the
    predecessor of y in preorder traversal then
  • order(x) size(x) lt order(y)

(1,100)
(10,30)
(41,10)
(45,5)
(25,5)
(11,5)
(17,5)
8
Advantages
  • Efficient Updates
  • Extra space can be reserved to accommodate future
    insertions.

9
Ancestordescendant relationship
  • For two given nodes x and y of a tree T, x is an
    ancestor of y if and only if
  • order(x) lt order(y) lt order(x) size(x).

10
Outline
  • Numbering scheme
  • Index structure
  • Join algorithms
  • Experimental results

11
Index and Data Organization
Query Processor
Query
Result
XISS
Element Index
Attribute Index
Structure Index
Name Index
Value Table
XML Raw Data
Document Loader
Paged File
12
Element Index
Element nid
Element nid
Document ID list
B-tree
B-tree
ltOrder, Sizegt Depth, Parent ID
Element Record
Element list with the Same name in the Same
Document
13
Structure Index
B-tree
Document ID (did)
nid, ltorder,sizegt, Parent order, Child
order, Sibling order, Attribute order
Array of All Elements And Attributes in the Same
Document
14
Outline
  • Numbering scheme
  • Index structure
  • Join algorithms
  • Experimental results

15
Regular Path expression
  • complex regular path expressions.
  • e.g., /chapter/_/figure_at_captionTree Frogs

Symbol Function of symbol
__ Any single node
/ Union of node
Zero or more occurrences of a node
_at_ Denotes attributes
16
Regular expression Decomposition
  • A regular path expression can be decomposed to a
    combination of following basic subexpressions
  • A subexpression with a single element or a single
    attribute,
  • A subexpression with an element and an attribute
    ( e.g., figure_at_caption Tree Frogs)
  • A subexpression with two elements (e.g.,
    chapter/figure or chapter/_/figure),
  • A subexpression with a Kleene closure (,) of
    another subexpression, and
  • A subexpression that is a union of two other
    subexpressions.

17
Example
  • ( E1 / E2 ) / E3 / ( ( E4 _at_A v ) ( E5 /
    _ / E6 ) )

E2
E3
E4
_at_Av
E5
E6
E1

/
/_/
EE-Join
EA-Join
EE-Join

/
KC-Join
Union
/
EE-Join
/
EE-Join
18
Join algorithms
  • Element Attribute join
  • Element Element join
  • Kleene Closure join

19
EA-Join Algorithm
  • Input
  • E1..Em Ei is a set of elements having a common
    document identifier
  • A1..An Aj is a set of attributes having a
    common document identifier
  • Output
  • A set of (e,a) pairs such that the element e is
    the parent of the attribute a.
  • //Sort-merge Ei and Aj by document
    identifier.
  • For each Ei and Aj with the same did do
  • //Sort-merge Ei and Aj by PARENT-CHILD
    relationship.
  • For each e in Ei and a in Aj do
  • If ( e is a parent of a) then output (e,a)
  • End
  • End.

20
Example
book
chapter
chapter
chapter
appendix
Figure
Figure
Figure
21
Attribute-element position
chapter lt1,3gt
chapter lt1,3gt
chapterlt2,1gt
chapter lt3,1gt
name lt4,0gt
namelt2,0gt
name lt4,0gt
name lt3,0gt
22
EE-Join Algorithm
  • Input
  • E1..Em and F1..Fn Ei and Fj is a set of
    elements having a common document identifier.
  • Output
  • A set of (e,f) pairs such that the element e is
    an ancestor of the element f.
  • //Sort-merge Ei and Fj by doc. identifier.
  • For each Ei and Fj with the same did do
  • //Sort-merge Ei and Fj by ANCESTOR-DESCENDANT
    relationship.
  • For each e in Ei and f in Fj do
  • If (e is an ancestor of f ) then output (e,f)
  • End
  • End

23
Extreme case of EE-Join
chapter lt1,90gt
chapter lt2,80gt
chapter lt8,20gt
chapter lt9,10gt
figure lt19,0gt
figure lt10,0gt
figure lt11,0gt
24
KC-Join Algorithm
  • Input
  • E1..Em where Ei is a group of elements from an
    XML document.
  • Output
  • A Kleene Closure of E1..Em
  • //Apply EE-Join algorithm repeatedly.
  • Set x 1
  • Set Ki E1..Em
  • Repeat
  • Set I I 1
  • Set Ki EE-Join(Ei-1, E1)
  • Until ( Ki is empty)
  • Output union of K1,K2..Ki-1.

25
Outline
  • Numbering scheme
  • Index structure
  • Join algorithms
  • Experimental results

26
Experiment Results
  • Comparison with top-down and bottom-up evaluation
    methods.
  • Comparison for
  • EE-Join ( E1 /_/ E2 )
  • EA-Join ( E_at_A )
  • Scalability test

27
EE-Join performance
28
EA-Join performance
29
Results
  • EE-Join algorithm outperformed bottom-up.
  • EA-Join algorithm is comparable with top-down but
    outperformed bottom-up.
  • Both are linearly scalable.
Write a Comment
User Comments (0)
About PowerShow.com