ViST: a dynamic index method for querying XML data by tree structures - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

ViST: a dynamic index method for querying XML data by tree structures

Description:

ViST: a dynamic index method for querying XML data by tree structures – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 31
Provided by: ezhe4
Learn more at: https://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: ViST: a dynamic index method for querying XML data by tree structures


1
ViST a dynamic index method for querying XML
data by tree structures
  • Authors Haixun Wang, Sanghyun Park, Wei Fan,
    Philip Yu
  • Presenter Elena Zheleva, November 2003

2
Overview
  • Modeling XML Queries
  • Structure-encoded sequences
  • Indexing
  • ViST
  • Experimental Results

3
Modeling XML Queries
4
  • DTD of purchase records
  • (!ELEMENT purchases (purchase))
  • (!ELEMENT purchase (seller, buyer))
  • (!ATTRIST seller ID ID location CDATA name CDATA)
  • (!ELEMENT seller (item))
  • (!ATTRIST buyer ID ID location CDATA name CDATA)
  • (!ELEMENT item (item))
  • (!ATTRIST item name CDATA manufacturer CDATA)

5
Modeling XML Queries
  • Focus in XML query language design ability to
    express complex structural or graphical queries

6
Modeling XML Queries
  • Querying XML data finding sub structures of the
    data graph that match the sequence
  • Structure-encoded sequences a sequential
    representation of both XML data and XML queries

7
Structure-Encoded Sequences
8
Structure-Encoded Sequences
  • Maps the data and the queries
  • Matches the subsequence
  • Purpose to avoid as many join operations as
    possible
  • Def. Sequence of (symbol, prefix) pairs

9
Mapping Data
  • Represent XML document/tree in preorder
  • Represent in structure-encoded seq

10
Mapping Queries
  • Benefit of sequence matching query gets
    processed as whole
  • Path Expression

11
Structure-Encoded Sequences
  • Query
  • Data

12
Querying XML
  • through Structure-Encoded Sequence Matching

13
Indexing
14
Role of Indexing
  • To provide an algorithm to perform this sequence
    matching
  • Desired features for algorithm
  • Efficient support for subsequence matching
  • Use well-supported DB indexing techniques such as
    B trees
  • Allow dynamic index insertion

15
What is indexing useful for
  • Auxiliary access structures
  • Used to speed up the retrieval of records
  • In response to certain search conditions
  • Provide efficient support for arbitrary
    structured queries
  • Using wild-cards // and

16
Indexing
  • State-of the-art approaches
  • Indexes on paths
  • Indexes on nodes
  • Indexes on both (structures) ViST

17
ViST
18
Algorithms
  • Naïve Algorithm based on Suffix Trees
  • RIST Relationships Indexed Suffix Tree
  • ViST Virtual Suffix Tree

19
Algorithm Using Suffix Trees
  • Suffix Tree a compact index to all distinct,
    contiguous substrings of a string
  • D-Ancestorship in XML doc tree
  • Through structure-encoded sequence
  • S-Ancestorship in suffix tree

20
Example Using Suffix Trees
21
Algorithm Using Suffix Trees
  • Searches
  • first by S-Ancestorship searching under suffix
    tree
  • then by D-Ancestorship matching nodes and
    prefixes
  • Disadvantages
  • Costly traverse large portion of subtree
  • Most commercial DBMSs do not support

22
RIST Indexing by Ancestor-Descendant
Relationships
  • Jumps directly to the nodes Y to which X is both
    a D-Ancestor and S-Ancestor
  • Index Construction uses B trees

23
RIST Indexing by Ancestor-Descendant
Relationships
  • Subsequence Matching
  • Determine D-Ancestorship by prefixes
  • Determine S-Ancestorship by label ltnx,sizexgt
  • x suffix tree node (root of S-tree)
  • nx prefix traversal order
  • sizex number of descendants

24
ViST the Virtual Suffix Tree
  • Same sequence algorithm as RIST
  • BUT supports dynamic insertions
  • Uses dynamic method to assign labels
  • Once assigned, the labels are fixed and are not
    affected by subsequent data insertion or deletion
  • Labeling the suffix tree w/o building it
  • Relies on statistical information about the XML
    data

25
ViST the Virtual Suffix Tree
  • Index structure contains the sequence
  • Sequence to be inserted
  • Dynamic scope of x ltnx, sizex,kxgt

26
ViST the Virtual Suffix Tree
27
Experimental Results
  • Datasets used
  • DBLP CS bibliography DB
  • 289,627 records/publications
  • Each publication tree of max depth 6
  • Avg length of structure-encoded seq 31
  • XMARK
  • 1 record
  • Complicated tree structure
  • Synthetic

28
Experimental Results
  • Comparison Methods
  • Index Fabric Algorithm XML paths
  • XISS uses nodes as basic query unit
  • ViST appx. 1/10 of time to perform queries due
    to (multiple) join operations

29
Experimental Results - remove
  • Index Structure and Size (1/3 less from suffix
    tree)
  • DocId B Tree N elements
  • Combined D-ancestor and S-ancestor B tree - N x
    L elements
  • Index Construction

30
Conclusion
  • XML Queries Subsequence Matching
  • Advantages of ViST algorithm for subsequence
    matching
  • Avoids expensive join operations
  • Index on both content and structure of XML
    documents
  • B trees supported by disk-based data
  • Dynamic data insertion and deletion
Write a Comment
User Comments (0)
About PowerShow.com