Storing and Querying XML Data using an RDBMS - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Storing and Querying XML Data using an RDBMS

Description:

Lore, Lotus Notes, Tamino... Special-purpose system (con't) http://www-db.stanford.edu/lore/ The Lore project was declared a success in the ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 35

Provided by: Hel6110

Category:

more less

Transcript and Presenter's Notes

Title: Storing and Querying XML Data using an RDBMS

1
Storing and Querying XML Data using an RDBMS

Speaker Dapeng Liu
dliu_at_cs.wayne.edu
Wayne State University
Dapartment of Computer Science

2
XML is rapidly becoming a popular data format

XML become a standard
Large volume of data
Research focuses on store and query
The purpose of this paper is to present the
results of an initial study about storing and
querying XML data. 1999

3
Current status of processing XML

RDBMS
File system
Object-Oriented database
Special-purpose/semi-structure system

4
File system

Can be used with very little effort to store XML
data
Wont provide any support for querying

Organization
Unnecessary reconstruction

5
Object-Oriented database

Would allow to cluster XML elements and
sub-elements
Currently not mature enough to process complex
queries on large database, too long to wait
Network database, hierarchical database
UML?

6
Special-purpose/semi-structure system

definition?
Lore, Lotus Notes, Tamino

7
Special-purpose system (cont)

http//www-db.stanford.edu/lore/ The Lore project
was declared a success in the year 2000 and is
now pretty much out of business. These pages
represent a snapshot of the project at some time
in the past.
http//www.softwareag.com/

8
RDBMS

Organize large volumes of XML data
Powerful relational database engine
xDBC platform

9
What to do?

Relational schema
Map XML data into relational database
Query on relational database
Reconstruction

10
How to design RDB schemas?

Decide how to store XML manually
Infer from DTD e.g. inlining
Analyze XML and the expected query workload

11
Ways to map XML into RDB

3 alternative ways to store the edges
2 alternative ways to store the leaves
Total 32 6 ways

3 alternative ways to store the structure
2 alternative ways to store the values

12
(No Transcript)
13
Edge Approach

Simplest scheme
Indices on source and name, target

14
(No Transcript)
15
(No Transcript)
16
Binary Approach

Group all edges with the same label into one
table
Horizontal partitioning of the Edge table

17
Edge vs. Binary

sr ord name flag target
1 1 age int v1
1 2 name string v2
1 3 address string v3
1 4 child ref 3
1 5 child ref 4
2 1 age int v4

Bchild
sr ord flag target
1 4 ref 3
1 5 ref 4
Bage
sr ord flag target
1 1 int v1
2 1 int v4
Bname
sr ord flag target
1 2 string v2
Baddress
sr ord flag target
1 3 string v3

18
An BinaryInlining Example
19
Universal Table

Full outer join of all Binary tables
Separate indices on all the source and all the
target columns

Lots of redundancy
Store paths

21
Mapping Values

Separate value tables
need join
Inlined values
many null values

22
What to measure

Size of the resulting relational database
Time to reconstruct an XML document
Time to execute different classes of XML queries

23
Tested database schemas

Edge, Binary, Universal approaches with separate
value tables
Binary approach with inling

24
Configuration

two 75MHz Sparc 128MB
Solaris 2.6
Main memory buffer pool of the database is 6.4MB,
lt 1/10 of XML 80M
Default configuration, otherwise stated
Java with JDBC

25
Configuration (cont)

Which RDBMS? Oracle, DB2, Infomix, Sybase
Two CPUs, small memory (80M data)
JDBC

26
Benchmark Specification
27
Benchmark Specification (cont)

XML document is flat, no nesting objects
Document contains cycles?
Two types of values short long strings
100K objects, 450K values, 90K texts of 500
bytes, 360K strings of 15 bytes
Run each query once to warm up database buffer,
then at least three times
Attributes, 13 or 20?

28
Database Size
29
Two extremes in queries

Light query in which the predicates are very
selective so that index lookups are attractive
and intermediate results fit into the database
buffers. 0.1
Heavy query in which the use of indices is
typically not attractive and intermediate results
do not fit into the database buffers. 10

30
(No Transcript)
31
Result about running times for the queries

Optimizer configuration force to use indices and
index nested-loop joins on light queries. Obvious
effect.
Performance
Binary gt Edge, Universal
Inline gt separated
Three exceptions

32
(No Transcript)
33
Reconstructing the XML Docuemnt

gt 30 minutes! Even 100 mins.
Multi scan in Universal table
Proposal save the original XML document
In general, there is no way to store data to meet
the requirements of all purpose

34
Conclusion

There is no guarantee that any of the more
sophisticated approaches known so far will
perform better than our simple schemas
The only operation which had unacceptably high
cost was completely reconstruction a very large
XML document

35
Related works