Indexing Methods for Efficient XML Query Processing

About This Presentation

Title:

Indexing Methods for Efficient XML Query Processing

Description:

Support queries of q= Px where P = /l1/l2/.../ln. Non-deterministic ... collection of all equivalence class. Exponential construction cost. Backward ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 36

Provided by: peopleI

Category:

more less

Transcript and Presenter's Notes

Title: Indexing Methods for Efficient XML Query Processing

1
Indexing Methods for Efficient XML Query
Processing

Jun-Ki Min
KAIST
http//islab.kaist.ac.kr/jkmin/

2
XML

eXtensible Markup Language
The de facto standard
data representation and exchange on the Web
XML Data
An instance of semistructured data
self-describing
irregularly structured

3
XML Data

Comprise hierarchically nested collections of
elements
Element can contains
Atomic data value
A sequences of subelements
attributes composed of name-value pairs
ID-IDREF relationship
Tree or Graph representation

4
XML Example
ltlibraryDBgt ltbook editor 1gt lttitlegt title1
lt/titlegt ltauthorgt author1 lt/authorgt
ltchaptergt lt/chaptergt lt/bookgt ltpapergt
lttitlegt title2lt/titlegt ltauthor id 1gt
author2 lt/authorgt ltauthorgt author3 lt/authorgt
ltsectiongt lt/sectiongt lt/papergt lt/libraryDBgt
ToXin
Index Fabric
APEX
5
XML Query

XML Query Language
XSLT, XML-QL, XPath, XQuery
use path expression to traverse the irregularly
structured data
ex) /libraryDB/book/title or //title
search the whole XML data gt inefficiency
Structural Summary Path Index
by restricting the search to only relevant
portions of XML Data

6
Schemas for XML

DTD, XML Schema
Specifies the constraints of XML Data
lt!ELEMENT book (title, author,chapter)gt
are not mandatory
gt lack of external schema
Structural Summary
Summary of label paths
Path Index
Structural Summary Extents

7
Schemas for XML

Applications
User Interface
XML Data Design, Editing
Query Formulation
Query Validation
Query Optimization
Path Index

8
Structural Summary

DTD Extraction
XTRACT
based on element information
Structural Summary
Representative Objects
based on path information

9
XTRACT

Garofalakis, Gionis, Rastogi, Seshadri, Shim
SIGMOD 00
Infer concise and accurate DTD
Choose a DTD from candidate DTDs
(a b),(b a) gt (ab) or (a b)(b a)
Based on Minimum Description Length (MDL)
Principle
ranks each candidate DTDs depending on the number
of bits required to describe the subelement
sequences in terms of the candidate DTD
6(for DTD)33 12
9(for DTD)11 11

10
Representative Objects(RO)

Nestorov, Ullman, Wiener, Chawathe ICDE 97
Provide a concise representation of the inherent
schema of a semistructured hierarchical data
Full-RO
Describe all simple paths
K-RO
K-RO guarantees that its paths whose length are
k1 exist in data.
1-RO
Simplest very compacted representation

11
Representative Objects(RO)
12
Path Index

Access Support Relations
Deterministic
Strong DataGuide
Index Fabric
ToXin
APEX
Non-Deterministic
1-Index
A(k) Index
FB Index

13
Access Support Relations

Kemper, Moerkotte IS 92
Originated from OODBMS
select Name
from Mercedes.Manufactures.Composition.Division
To support join along arbitrary reference chains
Generalization of Join IndexValduriez 87
Based on the paths in the schema
Materialize access paths of arbitrary length
Support only predefined subsets of paths.

14
DataGuides

Goldman, Widom VLDB 97
An implementation version of Full-RO
Summary of label paths from the root ( simple
paths)
Concise describe every unique simple path
exactly once, regardless of the number of times
it appears
Accuracy do not contains label paths that do not
appear in the data
Convenience can store and access it using
similar techniques available for processing
semistructured data

15
DataGuides

Construction Algorithm emulates the conversion
algorithm from non-deterministic finite automata
(NFA) to deterministic finite automata (DFA)
Intuitively, a simple path is represented as a
node in DataGuide
One XML Data may have multiple DataGuides

16
Strong DataGuide

If the sets of nodes which are reachable for
simple paths are equal, then the simple paths are
represented as a single node.
Linear time and linear space for tree structured
data
Exponential time and exponential space for
graph structured data

17
1/2/T-Index

Milo and Suciu ICDT 99
1-Index
Summary all label paths starting from the root
Support queries of q Px where P /l1/l2//ln
Non-deterministic
Based on backward bisimulation which is
originated from graph verification
Extents are disjoint
More compact size than Strong DataGuides

18
1-Index

Equivalence relation ()
v u iff Lv Lu
where Lx w w is a simple path from the root
to x
the collection of all equivalence class
Exponential construction cost
Backward Bisimulation (b)
If xby and x is the root then y is the root
Conversely, If xby and y is the root, then x is
the root.
If xby and ltxl xgt is an edge, then there is
exists an edge (yl y), such that x by
Conversely, if xby and (yl y) is an edge, then
there exists an edge (xl x) such that xby

19
vs b
a
a
a
a
c
b
b
c
d
d
d
X
Y

X Y since LX LY a.b.d, a.c.d
X Y
v b u ? v u
O(mlogm) construction cost Paige and Tarjan 87

20
1-Index vs Strong DataGuide

In tree structured Data, strong Dataguide and
1-Index coincide

21
2/T-Index

2-Index
To support queries of x1Px2
ex) //title
Equivalence relation ()
(v, u) (v, u) iff L(v,u) L(v,u)
where L(x,y) w w is a label path from x to
y
Summary of path information bwt. two arbitrary
nodes
T-Index
Generalization of 1/2-Index
(v1,,vn ) (u1,,un) iff L(v1,,vn) L(u1,,un)
Conceptually similar to Access Support Relations
Support only predefined paths

22
Index Fabric

Cooper, Sample, Franklin, Hjaltason, Shadmon,
VLDB 01
Tree Structured Data
Conceptual similar to strong DataGuide
Layered structure
Use Patricia trie to index a large number of
search keys
The simple path of an element which has a data
value is encoded as a special character sequence
Keeps the key which is the combination of encoded
sequence and data value.

23
Index Fabric
XML Data

Keeps only the information of elements which have
data values
Patricia trie lossy Compression

24
ToXin

Rizzolo, Mendelzon WebDB 01
Tree Structured Data
Conceptually Similar to strong DataGuide (not
minimal DataGuide)
Support navigation of forward and backward
traversal
Path Tree ( strong DataGuide)
A node of Path Tree has an Index Table or Value
Tables
Index Table (IT) parent-child relationships
Value Table (VT) owner-value relationships

25
ToXin
XML Data

Index Tables

LibararyDB parent child null 1
LibraryDB.book parent child 1 2
LibraryDB.paper parent child 1 6

Value Tables

LibraryDB.book.author
parent value
author1

Since ToXin keeps parent-child relationships,
ToXin supports path expression with value
predicates
ex) /libraryDB/bookauthor author1

26
A(k)-Index

Kaushik, Shenoy, Bohannon, Gudes ICDE 02
Strong DataGuide and 1-Index record the all
simple paths
Increase index size gt Increase search space
Approximation of 1-Index
Non-deterministic
Utilize local similarity( degree k)
reduce the size of index graph

27
A(k)-Index

k-bisimulation (k)
For any two nodes, v and u, v 0 u iff u and v
have the same label
Node vku iff vk-1u and for every parent v of
v, there is a parent u of u such that vk-1u

28
A(k)-Index

Building cost O(km)
In general, for 1-Index, k lt logm
Query Processing
label path expression whose length k1
precise
label path expression whose length gt k1
safe include false results
validation gt require the data scan

29
APEXAdaptive Path indEx for XML Data

Chung, Min, Shim SIGMOD 02
Strong DataGuide and 1-Index are kept the all
simple paths
Users used partial matching path queries
//book/title
Exhaustive navigation of index structure for
partial matching path queries may result in
performance degradation

30
APEX

Deterministic
Approximation of DataGuides
Efficient processing of partial matching path
queries
Workload-Aware
Self Tuning Strategies Chaudhuri et. al 00
Utilize Query Workload
Build APEX with both XML data and frequently used
paths
Sequential pattern mining Agrawal and Srikant 95

31
APEX
APEX frequently used paths book.title
label xnode next
xroot 0
libraryDB 1
book 2
paper 3
title
author 4
chapter 5
section 6
editor 7
label count xnode next
book 8
remainder 9
extent 0 ltnull,0gt 1 lt0,1gt 2 lt1,2gt
3 lt1,6gt 4 lt2,4gt, lt6,8gt, lt6,9gt 5
lt2,5gt 6 lt6,10gt 7 lt2,8gt 8
lt2,3gt 9 lt6,7gt

Hash Tree
keep frequently used paths
prevent the exhaustive search
Graph Structure
structural summary extents

XML Data
32
FB Index

Kaushik, Bohannon, Naughton, Korth SIGMOD 02
Support Twig path expression
/A/BC
Basic Idea
For every edge e labelled l from v to u, add an
(inverse) edge e-1 with label l-1 from u to v
And then, compute 1-Index on this modified graph.
Very large Index space
Apply some heuristics
Exploiting Local Similarity k-bisimulation

A
B
C-1
33
Discussion

Path Index
Improve the query performance by restriction of
search space
Can be apply to various application
Selectivity Estimation
QBE(Query By Example)
Future Work
Support twig queries
Query Optimization
cost formula of path index

34
Thank You!

Any Question?
http//islab.kaist.ac.kr/jkmin
jkmin_at_islab.kaist.ac.kr

35
Reference

C. Chung, J. Min and K. Shim, APEX An Adaptive
Path Index for XML Data, SIGMOD 02
B. Cooper, N. Sample, M. Franklin, G. Hjaltason
and M. Shadmon, A Fast Index for Semistructed
Data, VLDB 01
M. Garofalakis, A. Gionis, R. Rastogi, S.
Seshadri, and K. Shim, XTRACT A System for
Extracting Document Type Descriptors from XML
Documents, SIGMOD 00
L. Goldman and J. Widom, DataGuides Enabling
Queries Formulation and Optimization in
Seminstructured Databases, VLDB 97
R. Kaushik, P. Bohannon, J. Naughton and H.
Korth, Covering Indexes for Branching Path
Queries, SIGMOD 02
R. Kaushik, P. Shenoy, P. Bohannon and E. Gudes,
Exploiting Local Similarity for Indexing Paths
in Graph-Structured Data, ICDE 02
A. Kemper and G. Moerkotte, Access Support
Relations An Indexing Method for Object Bases,
Information Systems 92
T. Milo and D. Suciu, Index Structures for Path
Expressions, ICDT 99
S. Nestorov, J. Ullman, J. Wiener and S.
Chawathe, Representative Objects Concise
Representations of Semi structured, Hierarchical
Data, ICDE 97
F. Rizzolo and A. Mendelzon, Indexing XML Data
with ToXin, WebDB 01
R. Paige and R. Tarjan, Three partition
refinement algorithms, SIAM Journal of Computing
87
P. Valduriez, Join Indices, TODS 87

Write a Comment

User Comments (0)

About PowerShow.com

Indexing Methods for Efficient XML Query Processing - PowerPoint PPT Presentation

Indexing Methods for Efficient XML Query Processing

Support queries of q= Px where P = /l1/l2/.../ln. Non-deterministic ... collection of all equivalence class. Exponential construction cost. Backward ... – PowerPoint PPT presentation