TIMBER: A Native XML Database - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

TIMBER: A Native XML Database

Description:

Conclusion. Min Lu. TIMBER: A Native XML DB. 16. Tree Algebra (TAX) Set-at-a-time for ... Conclusion. Min Lu. TIMBER: A Native XML DB. 21. Query Optimization ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 30
Provided by: min66
Category:

less

Transcript and Presenter's Notes

Title: TIMBER: A Native XML Database


1
TIMBER A Native XML Database
  • Author H.V. Jagadish, etc.
  • Presenter Min Lu
  • Date Apr 5, 2005

2
Introduction
  • Growing XML XML repository
  • New Approach Native XML DB
  • TIMBER Tree-structured native XML database
    Implemented at the University of Michigan by
    Bright Energetic Researchers

3
Topics of Discussion
  • Motivation
  • TIMBER Architecture
  • Tree Algebra (TAX)
  • Query Optimization
  • Conclusion

4
Motivation
5
Motivation
  • XML Characteristics
  • Tree structured - elements can be structurally
  • related and these relationships are
    meaningful
  • Flexibility
  • Map XML to Relational DB
  • Unnormalized relational representation
  • Or a large number of tables

6
Motivation
  • Native XML DB
  • Tamino - a commercial one
  • Natix - a native XML data management system,
    designed for storing and processing XML data.
  • Timber on Shore storage manager.

7
Topics of Discussion
  • Motivation
  • TIMBER Architecture
  • Tree Algebra (TAX)
  • Query Optimization
  • Conclusion

8
TIMBER Architecture
  • Shore
  • Disk memory
  • management
  • Buffering
  • Concurrency
  • control

(Shore)
9
TIMBER Architecture Data Flow
Interface
Interface
Interface
Parse tree
One node at a time
Internal representation
(Shore)
10
TIMBER Architecture Query Flow
Operator tree
Call
Call
Call
Call
(Shore)
11
Nodes in TIMBER
  • One node for each element
  • All attributes clubbed into one node
  • Content of element pulled into a child node
  • Processing instruction, comments are simply
    ignored

12
Node Labels
  • The determination of PC, AD relationships is a
    frequent operation
  • Label each node with a triple
  • Start, end, level (S, E, L)

13
Triple Labels for AD PC
  • AD (S1, E1, L1) - (S2, E2, L2)
  • ltgt S1ltS2 E1gtE2
  • ex. (1.0, 9.0, 1) (3.0, 6.0, 5)
  • PC (S1, E1, L1) - (S2, E2, L2)
  • ltgt S1ltS2 E1gtE2 L1L2-1
  • ex. (1.0, 9.0, 1) (2.0, 8.0, 2)

14
Triple Label Benefits
  • Updates no re-labeling
  • Use Double value to leave gaps for new nodes
  • Serves as a node identifier
  • Store nodes by the start labels to cluster their
    sub-elements together with them

15
Topics of Discussion
  • Motivation
  • TIMBER Architecture
  • Tree Algebra (TAX)
  • Query Optimization
  • Conclusion

16
Tree Algebra (TAX)
  • Set-at-a-time for efficiency
  • Bulk algebra input one or more sets of trees and
    output a set of trees
  • Pattern tree the portion of interest
  • Witness tree bears witness to the success of the
    pattern match on the input tree

17
Pattern Tree Witness Tree
A
C
B
18
Operators in TAX
  • Algebra Operations developed
  • Selection, Projection, Product,
  • Set union, Set difference,
  • Renaming, Reordering, Grouping
  • The core of XQuery can be parsed to TAX operators

19
Projection Operator in TAX
  • Input C collection of trees
  • Parameter P pattern tree
  • Parameter PL projection list
  • (the info to keep in the output)

20
Topics of Discussion
  • Motivation
  • TIMBER Architecture
  • Tree Algebra (TAX)
  • Query Optimization
  • Conclusion

21
Query Optimization
  • Consider the join between faculty node and
    secretary node first, then join the result with
    RA node.
  • Join faculty node with RA node first, then, join
    the result with secretary node.

22
Query Optimizer
  • Query optimizer enumerates all evaluation plans,
    estimate their costs, then choose the optimal
    one.
  • An algorithm FP_Optimization for finding the best
    evaluation plan.

23
Case Study for Query Optimization
  • Consider the query against the DB mBench 0.1x
    data set with about 130,000 nodes

A
B
A
D
B
F
D
F
C
E
C
E
G
G
24
Query Optimization
25
Performance Study
26
Topics of Discussion
  • Motivation
  • TIMBER Architecture
  • Tree Algebra (TAX)
  • Query Optimization
  • Conclusion

27
Conclusion
  • A comprehensive set-at-a-time query processing
    ability in a native XML store, with all the
    standard components of relational query
    processing
  • New access methods have been developed to
    evaluate queries from XML
  • New cost estimation and query optimization
    techniques have been developed.

28
Work to be Done
  • Currently all processing instructions, comments,
    and such are simply ignored.
  • - An extra child node of the element node with
    all such
  • data needs to be created.
  • TIMBER was developed when XQuery didnt support
    updates.
  • - 11th Feb 2005 First Public Working Draft of
    the XQuery
  • Update Facility Requirements
  • - A parser has to be implemented to support
    updates.
  • During an extremely localized sequence of
    inserts, the Start End labels become an issue.

29
Questions?
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com