Efficient%20Complex%20Query%20Support%20For%20Multi-version%20XML%20Documents - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Efficient%20Complex%20Query%20Support%20For%20Multi-version%20XML%20Documents

Description:

Title: Managing and Querying Multiversion XML Documents Author: Shu-Yao Chien Last modified by: Donghui Zhang Created Date: 6/11/2001 1:19:27 AM Document presentation ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 29
Provided by: ShuY151
Learn more at: http://zgking.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Efficient%20Complex%20Query%20Support%20For%20Multi-version%20XML%20Documents


1
Efficient Complex Query Support For
Multi-version XML Documents
Shu-Yao Chien Dept. of CS UCLA csy_at_cs.ucla.edu
Vassilis J. Tsotras Dept. of CSE UC
Riverside tsotras_at_cs.ucr.edu
Carlo Zaniolo Dept. of CS UCLA zaniolo_at_cs.ucla.edu
Donghui Zhang Dept. of CSE UC Riverside donghui_at_c
s.ucr.edu
2
Content
  • Motivation
  • Problem statement
  • Framework
  • Problem Reduction
  • Solutions
  • Performance
  • Conclusions

3
Motivation
  • The web changes everything---XML unifies
    everything.
  • An assortment of new and old applications seek
    from XML a shared technology and toolset to
    support their assorted requirements.
  • Version management for XML documents is an
    important topic.
  • Main requirements and research challenges
  • Efficient version retrieval.
  • Storage efficiency.
  • Complex query support.

4
Problem Definition
  • Given an XML document which evolve over time, how
    to store the whole history of it and perform
    complex queries on any version efficiently?

5
Durable Node Numbering Scheme
  • XML document has ordered-tree structure and each
    element has
  • a Durable Node Number (DNN) , and
  • a Range

6
Node Numbering Scheme --- by Example
  • DNN preserves element order as pre-order
    traversal.
  • Range preserves parent-child relationship such
    that
  • dnn(P) lt dnn(C) lt dnn(C)range(C) lt
    dnn(P)range(P).

7
Version Model
  • Each element has
  • Lifespan --- (Vstart , Vend)
  • SPaR range --- (DNN, Range)
  • Adding a new version N corresponds to a set of
    changes
  • Delete(E) Set E.Vend to N and free its SPaR
    range.
  • Insert(E) Set the lifespan of E to (N, now) and
    assign it an unused SPaR range.
  • Update(E, new value) Delete(E) Insert(E)
    using the same SPaR range but the new value.

8
Framework for Storage Schemes
  • Two types of tags individual tag (abstract,
    conclusion) and list tag (chapter, section,
    figure).
  • User query list tag element by order (e.g.
    chapter 2) rather than by SPaR (e.g. the chapter
    whose SPaR range is (128, 512). Need to transform
    the order to SPaR range. Calls for separate
    indices.

9
Problem Reduction
  • Complex queries that can be reduced to partial
    version retrievals
  • Structural projection project the part of
    document between chapter 2 and 5 in version 20
  • Path-expression find the chapter that contains
    figure 7 in version 10.

10
Problem Reduction
  • Structural projection project the part of
    document between chapter 2 and 5 in version 20
  • Query CH-index, find all chapters in version 20
  • Compute SPaR range between chapter 2 and 5
  • Partial version retrieval on full index.

11
Problem Reduction
  • Partial version retrieval given version i and
    DNN range r, find all elements whose DNN?r in
    version i.

12
Problem Reduction
  • Path-expression find/construct the chapter that
    contains figure 7 in version 10
  • Query FIG-index, find the SPaR for figure 7 in
    version 10
  • Query CH-index using the SPaR to find the
    chapter
  • To construct, Partial version retrieval on full
    index.

13
Indexing for List Tags
  • The indexing for list tags (CH-index, FIG-index)
    is trivial small.
  • Multi-version B-tree (MVBT) BGO96
    asymptotically optimal in space, update, partial
    version retrieval.

14
Storage and Query Scheme for Full Index
  • We examine two schemes
  • MVBT Storage/Index
  • UBCC Storage secondary index

15
Motivation for UBCC Storage
  • The MVBT is capable of storing and querying the
    multi-versioned XML document, and is
    asymptotically optimal. Why UBCC?
  • MVBT is designed for handling one-by-one updates,
    not specialized for the batch update in the
    document versioning environment.

16
Traditional Versioning Schemes
  • Naive approach stores each version in its
    entirety minimizes retrieval but very
    inefficient storage.
  • RCS (Revision Control System)
  • stores the latest version in its entirety, and
  • old versions represented by deltas ---reverse
    edit script
  • minimizes storage cost
  • version retrieval cost grows linearly with
    version number
  • SCCS (Source Code Control System)
  • objects time-stamped and stored by their document
    order
  • version retrieval cost as high as whole change
    history
  • These schemes are used by most current
    systems---but need improvements in storage
    management, retrieval, query, and support for
    complex objects.

17
UBCC Storage Scheme
  • RCS and SCCS stores major versions and
    incremental modifications. To query, find nearest
    major version and apply incremental changes for
    multiple versions. Also, designed for full
    version retrieval.
  • UBCC VLDB01 Usefulness-Based Copy Control,
    uses the concept of Page Usefulness

18
Page Usefulness by Example
  • We set a minimum usefulness requirement Umin,
    e.g. 70 (0 lt Umin lt 1).
  • A page is useful/useless when its usefulness is
    above/ below Umin .

19
Usefulness Based Copy Control (UBCC)
  • STEP 1 Determine page usefulness for copying.
  • STEP 2 Append new/copied objects into new pages
    by
  • their logical order.

VERSION 1
, U(P2) 50 lt Umin70
P2
, U(P1)75
P1
Root
Ch A
Fig D
Sec E
Ch B
Sec F
Fig G
Fig H
COPY
Sec J
Ch B
Sec F
Fig M
Ch K
Sec L
Sec T
Fig R
, U(P4)100
, U(P3)100
20
Complexity Analysis
  • Version retrieval I/O cost for Version N is bound
    by (SN/Umin).
  • SN is the size of Version N
  • E.g. Umin 50 ---gt I/O lt 2SN
  • Version file size is linear with the size of
    change history (RCS), and is bound by
    O(Schg/(1-Umin)), where
  • Schg is the size of change history.
  • Umin is usefulness requirement.
  • Both are optimal!

21
Indexing Choices using UBCC
  • Using UBCC to cluster the document elements. On
    top of the document file
  • MVBT as a dense index or
  • MVRT as a sparse index.

22
Sparse Page Index --- Multi-version R Tree
  • Multi-version R-Tree each record corresponds to
    a UBCC page
  • Life Span (T1,T2)
  • Maximum DNN Range (D1,D2)
  • UBCC Page-ID
  • When retrieve a segment for a version, MVRT is
    traced to locate useful data pages with an
    overlapping DNN range.

Version
Retrieve Version 10, Segment (D1,D2)
P 15
P 22
P 8
V 10
P 5
P 11
DNN range
D1
D2
23
Sparse vs. Dense Indexing
  • Good for sparse MVRT
  • small size
  • each page is checked at most once.
  • Bad for sparse MVRT
  • May read unnecessary pages, e.g.
  • Request Version 3, SPaR (420,700)
  • Page P is qualified but contains no valid element.

24
Experimental Setup
  • Sun Enterprise 250 Server, Solaris 2.8, 16KB page
    size, 100 pages buffer size, GNU C.
  • Dataset 1000 versions initial version 1000
    objects each object 200 bytes change between
    two versions is 10.
  • Implemented schemes
  • scheme 1 MVBT storage/index
  • scheme 2 UBCC storage, dense MVBT index
  • scheme 3 UBCC storage, sparse MVRT index

25
Performance Comparison --- Check-In Time and
Index Size
26
Performance Comparison --- Partial Version
Retrieval
27
Conclusions and Future Work
  • We proposed a framework for storing and querying
    multi-versioned XML documents.
  • We examined techniques that merges traditional
    versioning schemes and temporal databases for XML
    version management.
  • Best scheme
  • UBCC storage
  • Sparse MVRT for full index
  • Dense MVBT for each tag index
  • Emerging issues
  • Query language support for version queries.
  • User interface for browsing versions and
    presenting query results

28
Thank you!
About PowerShow.com