On View Support for a Native XML DBMS

About This Presentation

Title:

On View Support for a Native XML DBMS

Description:

On View Support for a Native XML DBMS Ting Chen , Tok Wang Ling School of Computing, National University of Singapore Daofeng Luo, Xiaofeng Meng – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 30

Provided by: dcs117

Category:

more less

Transcript and Presenter's Notes

Title: On View Support for a Native XML DBMS

1
On View Support for a Native XML DBMS

Ting Chen , Tok Wang Ling
School of Computing, National University of
Singapore
Daofeng Luo, Xiaofeng Meng
Information School , Remin University of
China

2
Outline

View for XML Documents
Two Main Approaches
Problems
ORA-SS Object-Relationship-Attribute Model for
Semi-structured Data
ORA-SS class diagram and instance diagram
ORA-SS for view schema definition
Element-Based Clustering (EBC)
Basic Approach
ORA-SS and EBC
XML View Transformation
Problem Definition
Algorithm
Conclusion

3
View for XML Documents

Two main approaches to define views
Define views in script languages like XQuery or
XSLT
General but demanding from users point of view
because XQuery and XSLT scripts are complex
Difficult to optimize from performance view point
Define views by Schema-Mapping
E.g. Clio7 and eXeclon3
A declarative approach alleviate users from
writing complex scripts to perform view
transformation
Schema mappings can then be translated into
XQuery (or XSLT) scripts
We focus on the problem of view transformation
through schema mapping

4
View for XML Documents

View for XML Documents via Schema-Mapping
Problem Current XML schema formats are not able
to express views with semantic constraints,
resulting in ambiguity
E.g. (Next Slide) The source XML file contains
information about researchers working under
different projects and the publication list for
each researcher.

5
View for XML Documents
The view schema in Fig (c) of the above diagram
is such an example. It has at least two possible
meanings which can lead to different view
results! 1. For each project, list all the papers
published by project members for each paper of
the project, list all the authors of the
paper. 2. For each project, list all the papers
published by project members for each paper of
the project, list all the authors of the paper
who work for the project.
6
ORA-SS Data Model

ORA-SS2
Object Class
Relationship Type
Attribute( Object attribute or Relationship
attribute)
E.g. An ORA-SS Instance Diagram

Compared with the XML document in Slide 5, two
extra fields (attribute Date and sub-element
Position) are added in the above ORASS instance
diagram
7
ORA-SS Data Model

ORA-SS Schema Diagram

There are two binary relationship types in the
schema Project-Researcher(JR) and
Researcher-Paper(RP). The set of papers under a
researcher doesnt depend on the project he/she
works in.
Position is an attribute of relationship type JR
instead of Researcher. This means that a
researcher may hold different positions across
projects he works in.
Date is a single-valued attribute of object
class Paper. Different occurrences of the same
paper will always have the same Date value.
J_Name,R_Name and P_ID are identifiers of object
classes Project , Researcher and Paper
respectively as indicated by solid circles.
Identifier values are used to tell if two object
occurrences are identical.

8
ORA-SS Data Model

ORA-SS for View Schema Definition
It is able to define views with different
semantics
View (a) has two binary relationship types. The
intention of the view schema is to find all the
papers published by researchers in a project and
for each paper to find all of its authors.
View (b) has only one ternary relationship type.
The view is defined to find all the papers
published by researchers in a project however,
for each paper View (b) only finds those authors
working for the project.

View (a)
View (b)
9
ORA-SS Data Model

ORA-SS for View Schema Definition
The two view schemas in Slide 8 correspond to two
different XSLT scripts. The following is the XSLT
script for schema (a)

ltrootgt ltxslfor-each-group select"root/Project
" group-by"_at_J_Name"gt ltProjectgt
ltJ_Namegtltxslvalue-of select"_at_J_Name"/gtlt/J_Namegt
ltxslfor-each-group
select"current-group()/Researcher/Paper"
group-by"_at_P_Name"gt
ltPapergt ltxslvariable
name"vPName" select"_at_P_Name"/gt
ltP_Namegtltxslvalue-of select"_at_P_Name"/gtlt/P_Nam
egt ltxslfor-each-group
select"/root/Project/ResearcherPaper/_at_P_Name

vPName" group-by"_at_R_Name"gt ltResearchergt
ltR_Namegtltxslvalue-of select"_at_R_Name"/gtlt/R_Na
megt lt/Researchergt lt/xslfor-each-groupgt
lt/Papergt
lt/xslfor-each-groupgt lt/Projectgt
lt/xslfor-each-groupgt lt/rootgt
10
ORA-SS Data Model

The following is the XSLT script for schema (b)
The main difference of two scripts lies in the
third xslfor-each-group directive for
Researcher. Script for Schema (a) needs to search
the whole document to find the complete author
list of a paper because authors may not work for
the same project. On the other hand, script for
Schema (b) avoids the global search because it
only needs find authors of the paper working for
the same project.

ltrootgt ltxslfor-each-group select"root/Project
" group-by"_at_J_Name"gt ltProjectgt
ltJ_Namegtltxslvalue-of select"_at_J_Name"/gtlt/J_Namegt
ltxslfor-each-group
select"current-group()/Researcher/Paper"
group-by"_at_P_Name"gt
ltPapergt ltxslvariable
name"vPName" select"_at_P_Name"/gt
ltP_Namegtltxslvalue-of select"_at_P_Name"/gtlt/P_Nam
egt ltxslfor-each-group select"current-group()/..
" group-by"_at_R_Name"gt ltResearchergt
ltR_Namegtltxslvalue-of select"_at_R_Name"/gtlt/R_Namegt
lt/Researchergt lt/xslfor-each-groupgt
lt/Papergt lt/xslfor-each-groupgt
lt/Projectgt lt/xslfor-each-groupgt lt/rootgt
11
Element-Based-Clustering (EBC)

Element Based Clustering5
Extension of Element Based (EB6) Strategy
Element nodes (records) with the same tag name
are clustered and organized as a list

12
Element-Based-Clustering (EBC)

EBC Node labeling
EBC gives labels for nodes in a XML document
Labels for nodes in a XML data tree can be
calculated in the following manner
1. The root element has label nil
2. Perform a pre-order traversal (i.e.
Document order) on the XML document
For node x ( here means string
concatenation)
label(x) label(x.parent) . position of x
in x.parents childList
Node A is ancestor of node B if Label(A) is the
prefix of node Label(B) and vice versa

13
ORA-SS and EBC

How does ORA-SS schema help tune XML document
storage?

Project j1(1) j2(2)
Researcher r1(1.1) r2(1.2) r2(2.1) r3(2.2)
Position Leader(1.2.3) Staff(2.1.3) Leader(2.2.2)
Paper p1,05/2002(1.1.1) p1,05/2002(1.2.1) p2,03/2000(1.2.2)
p1,05/2002(2.1.1) p1,03/2000(2.1.2) p2,05/2002(2.2.1)
14
ORA-SS and EBC

1. Object identifier of an object will be stored
together with the object. Objects of the same
class form a cluster. Each cluster is a
sequential file.
2. Relationship attribute values will be stored
in separate cluster.
3. For object attribute values, we need some
heuristics.
It is attempting to store object attributes
together with the object since they are likely to
be accessed at the same time. However, if an
object class has too many objects attributes, to
store all attribute values together with the
object brings us to a situation similar to
Subtree-Based storage strategy. One solution is
to store only those essential (this can be
determined by users) object attributes with the
object and leave other to separated clusters.
4. Node (which can be object, relationship value
and object attribute value) labels will be stored
together with nodes.

15
View Transformation Problem Definition

Problem Definition
Given an XML document D1, a source schema V1 and
a valid view schema V2 of V1, transform D1 to
document D2, so that D2 is a valid document under
V2.

View Transformation
Source Document
View Document
User Defined Schema Mapping
View Schema
Source Schema
16
View Transformation

View schema defined in DTD can have different
interpretations which result in different views

Proj
Slide 8(a)
JP2
View Schema (Ambiguous)
Paper
Proj
PR2
Researcher
?
Paper
Slide 8(b)
Proj
Researcher
Paper
JPR3
Researcher
17
View Transformation

View schema expressed in ORA-SS is unambiguous
The relationship set of an ORA-SS view schema
clearly defines how view document should be
constructed
Two basic techniques are used in construction of
a single relationship in view schema
Structural join SJ (based on object labels)
For each relationship R in view schema, we first
use structural join to find the set of paths of
type R such that the object occurrences in each
path locate on the same path in source document.
Value join Merge (based on logical object
identifiers)
In XML document an object can have many
occurrences. Two occurrences are considered as
the same if they have identical object
identifier. The set of paths resulted from
structural join is then value joined (or merged)
using logical object keys.

18
View Transformation

What makes things complicated?
A path in view schema can have more than one
relationship and we need to join two relationship
together. E.g. View schema in Slide 8(a) contains
two relationships.
Two relationships are joined on their
overlapping object classes. So the results
constructed for one relationship may be used in
construction of another.
Value join (merge) based on logical keys destroys
the sorted-ness of the output path list of
structural join. The consequence is that the
output of value join cant be used in subsequent
structural join with paths from other
relationships efficiently. (Structural join
requires sorted input lists)
Solution Duplicated-Preserving Merge (D-Merge).
It keeps the structural join output path list
intact. For two occurrences of the same object in
the list, their child contents will be merged and
then each of them will have its own copy of the
merged content.
D-Merge is used only when a relationship needs
to join with another relationship.

19
View Transformation Algorithm

The relationship set of a view schema determines
the view transformation process.
Take the two view schemas in Slide 8 as an
example

View(a)
L SJ(list(R), list(P), P,R)
L D-Merge(L, P)
L SJ(L, list(J), J,P)
L Merge(L,J)

View(b)
L SJ(list(J), list(P), J,P)
L SJ(L, list(R), J,P,R)
L Merge(L,J)

20
View Transformation Algorithm

Structural Join
Based on Object Label
Binary Structural Join1
Input
Two sorted (on node numbers) node lists
AList of potential ancestor nodes and DList of
potential descendants nodes
Output
OutputList (ai dj) of join results, in
which ai is the parent/ancestor of dj and ai is
from AList and dj is from DList
EBC schemes stores elements with the same tag
name in pre-ordered( i.e. sorted on element
number) way structural join can be naturally
applied

21
View Transformation Algorithm

Structural Join
Based on Object Label
Complex Structural Join (Example structural
join of three sorted input lists)
Input
Sorted node lists A,B,C
Output
OutputList (ai bj ck) of join results,
in which ai, bj , ck (from List A,B,C
respectively) are located on the same path in
source document
Two binary joins
Step 1 Join A and B
OutputList AB (ai bj ) sorted on ai
Step 2 Join AB (using ai as node label) and C
OutputList (ai bj ck) sorted on ai
Important ck should be on the same path as both
ai AND bj

22
View Transformation Algorithm

A motivating example

Source Document
Source Schema
View Schema
Proj
Proj
JR2
JP2
Researcher
Paper
PR2
RP2
Paper
Researcher
23
View Transformation Algorithm

Data Structures
Record
Object label
Object key (object identifier)
ChildList an array of children record references
Tuple
A tuple consists of an array of records
A tuple has a root record
Example
Array of Tuples

Corresponding Tree
Tuple Array Index 0 1 2 3
Recordr1 (ChildList lt1,2gt), ROOT Recordr2 (ChildList lt3gt) Recordr3 (ChildList nil) Record r4 (ChildList nil)
r1
r2
r3
r4
24
View Transformation Algorithm

Operations
Structural Join (SJ)
Input
L1 Array of Tuples of type ltA1,A2,A3,, Angt
L2 Array of Tuples of type ltB1,B2,B3,, Bngt
Mask M Bit Array, which specifies the object
classes which participate in structural join
Output
L3 Array of Tuples of type ltA1,A2,A3,,
An,B1,B2,B3,, Bngt
For each tuple in L3, its objects whose types
are specified in M MUST locate on the same path
in the source document
Merge (Merge)
Input
L1 Array of Tuples of type lt A1,A2,A3,, Angt
Mask M Bit Array, which specifies a list of
object classes
Output
L Array of Tuples
For any two tuples t1 and t2 in L1 which have
the same set of objects whose types are specified
in M, the two tuples will be merged.
Duplicate-Preserving Merge (D-Merge)
Input
L1 Array of Tuples of type lt A1,A2,A3,, Angt

Note Structural Join operation uses object
number while Merge and D-Merge use object
identifer.
25
View Transformation Algorithm

Definition 1Relationship R1 is lower than R2, or
R1 lt R2, if the top-most participating object
class of R1 is a descendent of the top-most
participating object class of R2.
Definition 2 A relationship is independent if
its set of participating object classes is not
included in any other relationship. Otherwise the
relationship is nested.
Algorithm (Main steps)
Step 0. Initialize L1 null, L2 null,
M ,
//M contains the set of object
classes that has been structurally joined
LO null
for each O in the view schema
Step 1. Put the independent relationships of the
ORA-SS view schema into a partially ordered set
(S, lt )
Step 2 While (S is not empty)
Extract the first relationship R from S
If (MnR is not null)
L1 LO for the highest O(O has
the smallest level in the view schema) in MnR
Else
L1 null
For each object class O in (R M) sorted
decreasingly by their depths in the view schema
M M U O
L2 ClusterO //ClusterO
is an array storing objects of class O
if(L1 ! null )
L1 StructuralJoin(L1,L2,MnR)
else

26
View Transformation Algorithm

Example View Schema Slide 8a
Step 1 Construct Relationship Project-Paper

Step 1.1. L SJ (list(P), list(R), P,R)
Source Doc
View Schema
Proj
JP2
Paper
Step 1.2. L D-Merge (L, P)
PR2
Researcher
S PR,JP
27
View Transformation Algorithm

Example View Schema Slide 8a
Step 2 Construct Relationship Project-Paper

Step 2.1. L SJ(L, list(J), J,P)
Source Doc
View Schema
Proj
JP2
Paper
Step 2.2. L Merge (L, J )
PR2
Researcher
S JP
28
Conclusion

We demonstrate how to combine an efficient XML
storage scheme (EBC) and an expressive XML data
model (ORA-SS) to provide XML view support for a
native DBMS system.
ORA-SS, used as view schema definitions, can
express a great variety of constraints
graphically. More importantly, it can avoid
ambiguity, which is a typical problem if by DTD
or XML Schema is used as view schema format.
In our transformation method, a relationship type
is the basic unit of transformation. Both object
label based structural join and logical key based
value join are employed to construct view
results. ORASS view schema information can guide
correct view transformation.

29
Reference

Shurug Al-Khalifa, H. V. Jagadish, Nick Kouda,
Jignesh M. Patel, Divesh Srivastava, YuqingWu.
Structural Joins A Primitive for Efficient XML
Query Pattern Matching. In Proceedings of ICDE,
2002
Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong
Li Lee ORA-SS An Object-Relationship-Attribute
Model for Semistructured Data TR21/00, Technical
Report, Department of Computer Science, National
University of Singapore, December 2000.
eXcelon. An General XML Data Manager.
http//www.exceloncorp.com/
H. V. Jagadish, Shurug AL-Khalifa, et al. TIMBER
A Native XML Database. Technical Report,
University of Michigan,April 2002.
Xiaofeng Meng, Daofeng Luo, Mong Li Lee, Jing An.
OrientStore A Schema Based Native XML Storage
System. In Proceedings of the 29th VLDB
Conference, Berlin, Germany, 2003
J. McHugh, S. Abiteboul, R. Goldman, D. Quass,
and J.Widom. Lore A Database Management System
for Semistructured Data. SIGMOD Record,
Vol.26(3)54-66,September 1997.
Lucian Popa, Mauricio A. Hernandez ,Yannis
Velegrakis , Renee J. Miller, Felix
Naumann, Howard Ho. Mapping XML and Relational
Schemas with Clio. In ICDE 2002 Demo, 2002