ViST: a dynamic index method for querying XML data by tree structures - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

ViST: a dynamic index method for querying XML data by tree structures

Description:

ViST: a dynamic index method for querying XML data by tree structures – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 31

Provided by: ezhe4

Learn more at: https://www.cs.umd.edu

Category:

more less

Transcript and Presenter's Notes

Title: ViST: a dynamic index method for querying XML data by tree structures

1
ViST a dynamic index method for querying XML
data by tree structures

Authors Haixun Wang, Sanghyun Park, Wei Fan,
Philip Yu
Presenter Elena Zheleva, November 2003

2
Overview

Modeling XML Queries
Structure-encoded sequences
Indexing
ViST
Experimental Results

3
Modeling XML Queries
4

DTD of purchase records
(!ELEMENT purchases (purchase))
(!ELEMENT purchase (seller, buyer))
(!ATTRIST seller ID ID location CDATA name CDATA)
(!ELEMENT seller (item))
(!ATTRIST buyer ID ID location CDATA name CDATA)
(!ELEMENT item (item))
(!ATTRIST item name CDATA manufacturer CDATA)

5
Modeling XML Queries

Focus in XML query language design ability to
express complex structural or graphical queries

6
Modeling XML Queries

Querying XML data finding sub structures of the
data graph that match the sequence
Structure-encoded sequences a sequential
representation of both XML data and XML queries

7
Structure-Encoded Sequences
8
Structure-Encoded Sequences

Maps the data and the queries
Matches the subsequence
Purpose to avoid as many join operations as
possible
Def. Sequence of (symbol, prefix) pairs

9
Mapping Data

Represent XML document/tree in preorder
Represent in structure-encoded seq

10
Mapping Queries

Benefit of sequence matching query gets
processed as whole
Path Expression

11
Structure-Encoded Sequences

Query
Data

12
Querying XML

through Structure-Encoded Sequence Matching

13
Indexing
14
Role of Indexing

To provide an algorithm to perform this sequence
matching
Desired features for algorithm
Efficient support for subsequence matching
Use well-supported DB indexing techniques such as
B trees
Allow dynamic index insertion

15
What is indexing useful for

Auxiliary access structures
Used to speed up the retrieval of records
In response to certain search conditions
Provide efficient support for arbitrary
structured queries
Using wild-cards // and

16
Indexing

State-of the-art approaches
Indexes on paths
Indexes on nodes
Indexes on both (structures) ViST

17
ViST
18
Algorithms

Naïve Algorithm based on Suffix Trees
RIST Relationships Indexed Suffix Tree
ViST Virtual Suffix Tree

19
Algorithm Using Suffix Trees

Suffix Tree a compact index to all distinct,
contiguous substrings of a string
D-Ancestorship in XML doc tree
Through structure-encoded sequence
S-Ancestorship in suffix tree

20
Example Using Suffix Trees
21
Algorithm Using Suffix Trees

Searches
first by S-Ancestorship searching under suffix
tree
then by D-Ancestorship matching nodes and
prefixes
Disadvantages
Costly traverse large portion of subtree
Most commercial DBMSs do not support

22
RIST Indexing by Ancestor-Descendant
Relationships

Jumps directly to the nodes Y to which X is both
a D-Ancestor and S-Ancestor
Index Construction uses B trees

23
RIST Indexing by Ancestor-Descendant
Relationships

Subsequence Matching
Determine D-Ancestorship by prefixes
Determine S-Ancestorship by label ltnx,sizexgt
x suffix tree node (root of S-tree)
nx prefix traversal order
sizex number of descendants

24
ViST the Virtual Suffix Tree

Same sequence algorithm as RIST
BUT supports dynamic insertions
Uses dynamic method to assign labels
Once assigned, the labels are fixed and are not
affected by subsequent data insertion or deletion
Labeling the suffix tree w/o building it
Relies on statistical information about the XML
data

25
ViST the Virtual Suffix Tree