Storing and Querying XML Data using an RDBMS - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Storing and Querying XML Data using an RDBMS

Description:

Lore, Lotus Notes, Tamino... Special-purpose system (con't) http://www-db.stanford.edu/lore/ The Lore project was declared a success in the ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 35
Provided by: Hel6110
Category:
Tags: rdbms | xml | data | lore | querying | storing | using

less

Transcript and Presenter's Notes

Title: Storing and Querying XML Data using an RDBMS


1
Storing and Querying XML Data using an RDBMS
  • Speaker Dapeng Liu
  • dliu_at_cs.wayne.edu
  • Wayne State University
  • Dapartment of Computer Science

2
XML is rapidly becoming a popular data format
  • XML become a standard
  • Large volume of data
  • Research focuses on store and query
  • The purpose of this paper is to present the
    results of an initial study about storing and
    querying XML data. 1999

3
Current status of processing XML
  • RDBMS
  • File system
  • Object-Oriented database
  • Special-purpose/semi-structure system

4
File system
  • Can be used with very little effort to store XML
    data
  • Wont provide any support for querying
  • Organization
  • Unnecessary reconstruction

5
Object-Oriented database
  • Would allow to cluster XML elements and
    sub-elements
  • Currently not mature enough to process complex
    queries on large database, too long to wait
  • Network database, hierarchical database
  • UML?

6
Special-purpose/semi-structure system
  • definition?
  • Lore, Lotus Notes, Tamino

7
Special-purpose system (cont)
  • http//www-db.stanford.edu/lore/ The Lore project
    was declared a success in the year 2000 and is
    now pretty much out of business. These pages
    represent a snapshot of the project at some time
    in the past.
  • http//www.softwareag.com/

8
RDBMS
  • Organize large volumes of XML data
  • Powerful relational database engine
  • xDBC platform

9
What to do?
  • Relational schema
  • Map XML data into relational database
  • Query on relational database
  • Reconstruction

10
How to design RDB schemas?
  • Decide how to store XML manually
  • Infer from DTD e.g. inlining
  • Analyze XML and the expected query workload

11
Ways to map XML into RDB
  • 3 alternative ways to store the edges
  • 2 alternative ways to store the leaves
  • Total 32 6 ways
  • 3 alternative ways to store the structure
  • 2 alternative ways to store the values

12
(No Transcript)
13
Edge Approach
  • Simplest scheme
  • Indices on source and name, target

14
(No Transcript)
15
(No Transcript)
16
Binary Approach
  • Group all edges with the same label into one
    table
  • Horizontal partitioning of the Edge table

17
Edge vs. Binary
  • sr ord name flag target
  • 1 1 age int v1
  • 1 2 name string v2
  • 1 3 address string v3
  • 1 4 child ref 3
  • 1 5 child ref 4
  • 2 1 age int v4
  • Bchild
  • sr ord flag target
  • 1 4 ref 3
  • 1 5 ref 4
  • Bage
  • sr ord flag target
  • 1 1 int v1
  • 2 1 int v4
  • Bname
  • sr ord flag target
  • 1 2 string v2
  • Baddress
  • sr ord flag target
  • 1 3 string v3

18
An BinaryInlining Example
19
Universal Table
  • Full outer join of all Binary tables
  • Separate indices on all the source and all the
    target columns

20
  • Lots of redundancy
  • Store paths

21
Mapping Values
  • Separate value tables
  • need join
  • Inlined values
  • many null values

22
What to measure
  • Size of the resulting relational database
  • Time to reconstruct an XML document
  • Time to execute different classes of XML queries

23
Tested database schemas
  • Edge, Binary, Universal approaches with separate
    value tables
  • Binary approach with inling

24
Configuration
  • two 75MHz Sparc 128MB
  • Solaris 2.6
  • Main memory buffer pool of the database is 6.4MB,
    lt 1/10 of XML 80M
  • Default configuration, otherwise stated
  • Java with JDBC

25
Configuration (cont)
  • Which RDBMS? Oracle, DB2, Infomix, Sybase
  • Two CPUs, small memory (80M data)
  • JDBC

26
Benchmark Specification
27
Benchmark Specification (cont)
  • XML document is flat, no nesting objects
  • Document contains cycles?
  • Two types of values short long strings
  • 100K objects, 450K values, 90K texts of 500
    bytes, 360K strings of 15 bytes
  • Run each query once to warm up database buffer,
    then at least three times
  • Attributes, 13 or 20?

28
Database Size
29
Two extremes in queries
  • Light query in which the predicates are very
    selective so that index lookups are attractive
    and intermediate results fit into the database
    buffers. 0.1
  • Heavy query in which the use of indices is
    typically not attractive and intermediate results
    do not fit into the database buffers. 10

30
(No Transcript)
31
Result about running times for the queries
  • Optimizer configuration force to use indices and
    index nested-loop joins on light queries. Obvious
    effect.
  • Performance
  • Binary gt Edge, Universal
  • Inline gt separated
  • Three exceptions

32
(No Transcript)
33
Reconstructing the XML Docuemnt
  • gt 30 minutes! Even 100 mins.
  • Multi scan in Universal table
  • Proposal save the original XML document
  • In general, there is no way to store data to meet
    the requirements of all purpose

34
Conclusion
  • There is no guarantee that any of the more
    sophisticated approaches known so far will
    perform better than our simple schemas
  • The only operation which had unacceptably high
    cost was completely reconstruction a very large
    XML document

35
Related works
  • XML without schema?relational data
  • DTD?relational schemas
  • XML Schema?relational schemas
  • Order
  • General method to reconstruct XML
  • XQuery, XPath?SQL
  • Constraint
  • Semantics
Write a Comment
User Comments (0)
About PowerShow.com