Change-Centric Management of Versions in an XML Warehouse - PowerPoint PPT Presentation

About This Presentation
Title:

Change-Centric Management of Versions in an XML Warehouse

Description:

?i,j (unit delta) contains the Set of operations needed to go from Vi to Vj ( Diff(Vi,Vj) ... Delete(C) Insert (D,2,P) ?(1,2) Delete(C,1,P) Insert (D,2,P) ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 27
Provided by: ame6
Category:

less

Transcript and Presenter's Notes

Title: Change-Centric Management of Versions in an XML Warehouse


1
Change-Centric Management of Versions in an XML
Warehouse
  • Amélie Marian
  • Columbia University
  • Serge Abiteboul, Grégory Cobéna, Laurent Mignet
  • INRIA-Rocquencourt

2
Overview
  • The Xyleme Project
  • Change Management
  • Version Management
  • XIDs
  • XML Diff
  • Deltas
  • Storage of XML documents versions
  • Implementation and experiments

3
The Xyleme Project
  • A dynamic XML Data Warehouse with high level
    services
  • User-friendly Query Engine
  • Semantic Data Integration
  • Version Management
  • Query Subscription, Change Monitoring services
  • Xyleme project is now finished
  • Start-up also called Xyleme

4
Change Management
  • Version Management
  • Learning about Changes
  • Monitoring Changes Query Subscription
  • Querying the PastTemporal Queries

5
Version Management
  • Our Requirements
  • Obtain the current version
  • Get the modifications since time t
  • Subscribe to change notifications, query changes
  • Compute temporal queries
  • Rebuild the version Vi of a document at time ti

6
Getting the Documents
  • XML documents are fetched from the web
  • We only have snapshots of the documents

7
XIDs
  • Unique identifiers needed to track XML nodes
    through time
  • Track changes on a specific node (ex a product
    in a catalog)
  • Reconstruct the history of a node
  • But physically adding an ID attribute to each
    node is expensive storage-wise
  • ? XIDs allow to attach persistent IDs to every
    node in a storage efficient manner

8
XIDs
  • XIDs stored separately as a list (XID-map)
  • List of the nodes IDs in a postorder traversal of
    the tree
  • XIDnext gives the next available XID
  • Compact Representation
  • Document is not modified

9
XML Diff
  • We implemented a XML diff algorithm to compute
    changes between two versions of a document
  • Use of XML structure for matching
  • Content matching
  • Linear in the size of the document
  • XML diff has two roles
  • Match nodes
  • Build the delta
  • Ongoing work on improving the XML diff

10
Node Matching using a Diff Algorithm
Diff (V1,V2) delete(5) update(13,150) insert(16,2,
(17-21))
New XID-map (6-10,17-21,11-1622)
11
Edit-Scripts SEQUENCE
  • Sequences of basic operations over XML trees
  • Delete(n)
  • Update(n, v)
  • Insert(m,k,T)
  • Move(n,k,m)
  • An Edit Script can be applied to a document D if
    its operations are consistent with D
  • An Edit Script applied to a document D will
    result in a unique document D
  • Several Edit Scripts applied to a document D can
    result in the same document D

12
Deltas (?) SET
  • We introduce an alternative way of representing
    changes Deltas
  • ?i,j (unit delta) contains the Set of operations
    needed to go from Vi to Vj ( Diff(Vi,Vj) )
  • A Delta (?) over a document D is the sequence of
    unit deltas over D
  • ??1,2,..., ?k-1,k
  • There is a (almost) unique delta from Vi to Vj
  • We represent Deltas as XML documents

13
Shortcomings of Deltas
  • Deltas are not reversible and cannot be composed
    (information on position is missing)
  • Only a) and b) lossless
  • But we would like to have fast access to
  • Vnow
  • ?i,now
  • Storage Policies
  • V1, ?1,2,?now-1,now
  • ?2,1,?now,now-1, Vnow
  • V1, ?2,1,?now,now-1
  • ?1,2,?now-1,now, Vnow

14
Completed Deltas (?)
  • Completed deltas contain more information
  • Delete(m,k,T)
  • Update(n, ov, nv)
  • Insert(m,k,T)
  • Move(n,k,m,p,q)
  • Completed Deltas can be reversed and composed
  • Completed Deltas are in the spirit of some logs
    in DB systems

15
Example of XML ?
  • ltdeltagt
  • ltunit_deltagt
  • lt/unit_deltagt
  • ltunit_deltagt
  • lttime from1 to2/gt
  • ltdelete parent16 position1
    xid-map(1-5)gt
  • ltProductgt
  • ltNamegtCameralt/Namegt
  • ltPricegt300lt/Pricegt
  • lt/Productgt
  • lt/deletegt
  • ltupdate xid13 new_value150
    old_value200/gt
  • ltinsert parent16 position2
    xid-map(17-21)gt
  • ltProductgt
  • ltNamegtDVDlt/Namegt
  • ltPricegt500lt/Pricegt
  • lt/Productgt
  • lt/insertgt

16
Operations on Deltas
  • Compute with version
  • Vi o ?i,j Vj
  • Vi o ?i,j Vj
  • Reverse (?i,j)-1 ?j,i
  • Compose ?i,j?j,k ?i,k
  • Simplify ?i,j ? ?i,j

17
Storage of Versions
  • For a document D (or a query result Q), we store
  • Current Version Vk
  • XID-map (as text) of Vk
  • Current ? ?1,2,..., ?k-1,k
  • When a new version k1 arrives
  • Compute XML diff between k and k1, compute
    ?k,k1
  • Replace current version Vk1
  • Replace XID-map
  • Append ?k,k1 to ?

18
Levels of Versioning
  • Full versioning is expensive, we support
    different levels of versioning
  • Full Versioning Vnow ?
  • Partial Versioning Vnow ?
  • Last Version Update Vnow ?now-1,now
  • Change Support Vnow XML diff computed for
    Query Subscription
  • Not Versioned Vnow

19
Implementation
  • Version Manager and XML diff implemented in C
  • A change simulator was implemented for tests
  • A GUI was implemented

20
GUI Interface
21
Deltas Statistics
  • Reasonable when there are not many modifications
  • Relatively expensive for small documents
  • Depends on the quality of the diff

22
Deltas Statistics (2)
  • 30 of modifications on the document
  • From left to right
  • Snapshots
  • Completed Deltas
  • Deltas composition and previous version
    reconstruction are not possible
  • Composed Completed Deltas advantages of
    Completed Deltas but coarser granularity and
    higher cost.

23
Conclusion
  • Management of Versions based on Change
    Representation
  • Representation in tree data (XML)
  • Study of storage policies
  • Implementation of running prototypes
  • Completed Deltas a Set of Modifications
  • Mathematical properties on completed deltas
    (algebraic group)
  • Current work on Query Subscription, Continuous
    Queries and Changes over Collections of Documents

24
References
  • Version Management
  • Chien, Tsotras and Zaniolo. Efficient Management
    of Multiversion Documents by Object Referencing.
    VLDB 2001.
  • Chawathe, Abiteboul and Widom. Managing
    Historical Semistructured Data. TAPOS 1999.
  • Cellary and Jomier. Consistency of Versions in
    Object-Oriented Databases. VLDB 1990.
  • Adiba and Lindsay. Database Snapshots. VLDB 1980.
  • Diff Algorithms
  • Chawathe and Garcia-Molina. Meaningful Change
    Detection in Structured Data. Sigmod 1997.
  • Cobena, Abiteboul and Marian. Detecting Changes
    in XML Documents. Technical report INRIA.
  • Xyleme
  • Cluet, Veltri and Vodislav. Views in a Large
    Scale XML Repository. VLDB 2001.
  • Nguyen, Abiteboul, Cobena and Preda. Monitoring
    XML data on the Web. Sigmod 2001.

25
Example Edit-Scripts vs. Deltas
  • A Possible Edit-Script
  • Insert(B,1,P)
  • Insert(C,1,P)
  • The Delta
  • Insert(B,2,P)
  • Insert(C,1,P)

P
A
Version 0
Edit-Scripts Deltas
Relative position (at time of operation) Absolute position (final)
26
Example Missing Information for Delta
Composition (?(0,2))
?(0,1) ?(1,2) ?(1,2)
Insert(B,2,P) Delete(C) Insert (D,2,P) Delete(C,1,P) Insert (D,2,P)
  • Deltas do not give information on parents and
    positions of deleted elements
  • Positions of inserted elements in composition
    cannot be computed
Write a Comment
User Comments (0)
About PowerShow.com