Title: On View Support for a Native XML DBMS
1On View Support for a Native XML DBMS
- Ting Chen , Tok Wang Ling
- School of Computing, National University of
Singapore - Daofeng Luo, Xiaofeng Meng
- Information School , Remin University of
China
2Outline
- View for XML Documents
- Two Main Approaches
- Problems
- ORA-SS Object-Relationship-Attribute Model for
Semi-structured Data - ORA-SS class diagram and instance diagram
- ORA-SS for view schema definition
- Element-Based Clustering (EBC)
- Basic Approach
- ORA-SS and EBC
- XML View Transformation
- Problem Definition
- Algorithm
- Conclusion
3View for XML Documents
- Two main approaches to define views
- Define views in script languages like XQuery or
XSLT - General but demanding from users point of view
because XQuery and XSLT scripts are complex - Difficult to optimize from performance view point
- Define views by Schema-Mapping
- E.g. Clio7 and eXeclon3
- A declarative approach alleviate users from
writing complex scripts to perform view
transformation - Schema mappings can then be translated into
XQuery (or XSLT) scripts - We focus on the problem of view transformation
through schema mapping
4View for XML Documents
- View for XML Documents via Schema-Mapping
- Problem Current XML schema formats are not able
to express views with semantic constraints,
resulting in ambiguity - E.g. (Next Slide) The source XML file contains
information about researchers working under
different projects and the publication list for
each researcher.
5View for XML Documents
The view schema in Fig (c) of the above diagram
is such an example. It has at least two possible
meanings which can lead to different view
results! 1. For each project, list all the papers
published by project members for each paper of
the project, list all the authors of the
paper. 2. For each project, list all the papers
published by project members for each paper of
the project, list all the authors of the paper
who work for the project.
6ORA-SS Data Model
- ORA-SS2
- Object Class
- Relationship Type
- Attribute( Object attribute or Relationship
attribute) - E.g. An ORA-SS Instance Diagram
Compared with the XML document in Slide 5, two
extra fields (attribute Date and sub-element
Position) are added in the above ORASS instance
diagram
7ORA-SS Data Model
- There are two binary relationship types in the
schema Project-Researcher(JR) and
Researcher-Paper(RP). The set of papers under a
researcher doesnt depend on the project he/she
works in. - Position is an attribute of relationship type JR
instead of Researcher. This means that a
researcher may hold different positions across
projects he works in. - Date is a single-valued attribute of object
class Paper. Different occurrences of the same
paper will always have the same Date value. - J_Name,R_Name and P_ID are identifiers of object
classes Project , Researcher and Paper
respectively as indicated by solid circles.
Identifier values are used to tell if two object
occurrences are identical.
8ORA-SS Data Model
- ORA-SS for View Schema Definition
- It is able to define views with different
semantics - View (a) has two binary relationship types. The
intention of the view schema is to find all the
papers published by researchers in a project and
for each paper to find all of its authors. - View (b) has only one ternary relationship type.
The view is defined to find all the papers
published by researchers in a project however,
for each paper View (b) only finds those authors
working for the project.
View (a)
View (b)
9ORA-SS Data Model
- ORA-SS for View Schema Definition
- The two view schemas in Slide 8 correspond to two
different XSLT scripts. The following is the XSLT
script for schema (a) -
ltrootgt ltxslfor-each-group select"root/Project
" group-by"_at_J_Name"gt ltProjectgt
ltJ_Namegtltxslvalue-of select"_at_J_Name"/gtlt/J_Namegt
ltxslfor-each-group
select"current-group()/Researcher/Paper"
group-by"_at_P_Name"gt
ltPapergt ltxslvariable
name"vPName" select"_at_P_Name"/gt
ltP_Namegtltxslvalue-of select"_at_P_Name"/gtlt/P_Nam
egt ltxslfor-each-group
select"/root/Project/ResearcherPaper/_at_P_Name
vPName" group-by"_at_R_Name"gt ltResearchergt
ltR_Namegtltxslvalue-of select"_at_R_Name"/gtlt/R_Na
megt lt/Researchergt lt/xslfor-each-groupgt
lt/Papergt
lt/xslfor-each-groupgt lt/Projectgt
lt/xslfor-each-groupgt lt/rootgt
10ORA-SS Data Model
- The following is the XSLT script for schema (b)
- The main difference of two scripts lies in the
third xslfor-each-group directive for
Researcher. Script for Schema (a) needs to search
the whole document to find the complete author
list of a paper because authors may not work for
the same project. On the other hand, script for
Schema (b) avoids the global search because it
only needs find authors of the paper working for
the same project.
ltrootgt ltxslfor-each-group select"root/Project
" group-by"_at_J_Name"gt ltProjectgt
ltJ_Namegtltxslvalue-of select"_at_J_Name"/gtlt/J_Namegt
ltxslfor-each-group
select"current-group()/Researcher/Paper"
group-by"_at_P_Name"gt
ltPapergt ltxslvariable
name"vPName" select"_at_P_Name"/gt
ltP_Namegtltxslvalue-of select"_at_P_Name"/gtlt/P_Nam
egt ltxslfor-each-group select"current-group()/..
" group-by"_at_R_Name"gt ltResearchergt
ltR_Namegtltxslvalue-of select"_at_R_Name"/gtlt/R_Namegt
lt/Researchergt lt/xslfor-each-groupgt
lt/Papergt lt/xslfor-each-groupgt
lt/Projectgt lt/xslfor-each-groupgt lt/rootgt
11Element-Based-Clustering (EBC)
- Element Based Clustering5
- Extension of Element Based (EB6) Strategy
- Element nodes (records) with the same tag name
are clustered and organized as a list
12Element-Based-Clustering (EBC)
- EBC Node labeling
- EBC gives labels for nodes in a XML document
- Labels for nodes in a XML data tree can be
calculated in the following manner - 1. The root element has label nil
- 2. Perform a pre-order traversal (i.e.
Document order) on the XML document - For node x ( here means string
concatenation) - label(x) label(x.parent) . position of x
in x.parents childList - Node A is ancestor of node B if Label(A) is the
prefix of node Label(B) and vice versa
13ORA-SS and EBC
- How does ORA-SS schema help tune XML document
storage?
Project j1(1) j2(2)
Researcher r1(1.1) r2(1.2) r2(2.1) r3(2.2)
Position Leader(1.2.3) Staff(2.1.3) Leader(2.2.2)
Paper p1,05/2002(1.1.1) p1,05/2002(1.2.1) p2,03/2000(1.2.2)
p1,05/2002(2.1.1) p1,03/2000(2.1.2) p2,05/2002(2.2.1)
14ORA-SS and EBC
- 1. Object identifier of an object will be stored
together with the object. Objects of the same
class form a cluster. Each cluster is a
sequential file. - 2. Relationship attribute values will be stored
in separate cluster. - 3. For object attribute values, we need some
heuristics. - It is attempting to store object attributes
together with the object since they are likely to
be accessed at the same time. However, if an
object class has too many objects attributes, to
store all attribute values together with the
object brings us to a situation similar to
Subtree-Based storage strategy. One solution is
to store only those essential (this can be
determined by users) object attributes with the
object and leave other to separated clusters. - 4. Node (which can be object, relationship value
and object attribute value) labels will be stored
together with nodes.
15View Transformation Problem Definition
- Problem Definition
- Given an XML document D1, a source schema V1 and
a valid view schema V2 of V1, transform D1 to
document D2, so that D2 is a valid document under
V2.
View Transformation
Source Document
View Document
User Defined Schema Mapping
View Schema
Source Schema
16View Transformation
- View schema defined in DTD can have different
interpretations which result in different views
Proj
Slide 8(a)
JP2
View Schema (Ambiguous)
Paper
Proj
PR2
Researcher
?
Paper
Slide 8(b)
Proj
Researcher
Paper
JPR3
Researcher
17View Transformation
- View schema expressed in ORA-SS is unambiguous
- The relationship set of an ORA-SS view schema
clearly defines how view document should be
constructed - Two basic techniques are used in construction of
a single relationship in view schema - Structural join SJ (based on object labels)
- For each relationship R in view schema, we first
use structural join to find the set of paths of
type R such that the object occurrences in each
path locate on the same path in source document.
- Value join Merge (based on logical object
identifiers) - In XML document an object can have many
occurrences. Two occurrences are considered as
the same if they have identical object
identifier. The set of paths resulted from
structural join is then value joined (or merged)
using logical object keys.
18View Transformation
- What makes things complicated?
- A path in view schema can have more than one
relationship and we need to join two relationship
together. E.g. View schema in Slide 8(a) contains
two relationships. - Two relationships are joined on their
overlapping object classes. So the results
constructed for one relationship may be used in
construction of another. - Value join (merge) based on logical keys destroys
the sorted-ness of the output path list of
structural join. The consequence is that the
output of value join cant be used in subsequent
structural join with paths from other
relationships efficiently. (Structural join
requires sorted input lists) - Solution Duplicated-Preserving Merge (D-Merge).
It keeps the structural join output path list
intact. For two occurrences of the same object in
the list, their child contents will be merged and
then each of them will have its own copy of the
merged content. - D-Merge is used only when a relationship needs
to join with another relationship.
19View Transformation Algorithm
- The relationship set of a view schema determines
the view transformation process. - Take the two view schemas in Slide 8 as an
example -
- View(a)
- L SJ(list(R), list(P), P,R)
- L D-Merge(L, P)
- L SJ(L, list(J), J,P)
- L Merge(L,J)
- View(b)
- L SJ(list(J), list(P), J,P)
- L SJ(L, list(R), J,P,R)
- L Merge(L,J)
20View Transformation Algorithm
- Structural Join
- Based on Object Label
- Binary Structural Join1
- Input
- Two sorted (on node numbers) node lists
AList of potential ancestor nodes and DList of
potential descendants nodes - Output
- OutputList (ai dj) of join results, in
which ai is the parent/ancestor of dj and ai is
from AList and dj is from DList - EBC schemes stores elements with the same tag
name in pre-ordered( i.e. sorted on element
number) way structural join can be naturally
applied
21View Transformation Algorithm
- Structural Join
- Based on Object Label
- Complex Structural Join (Example structural
join of three sorted input lists) - Input
- Sorted node lists A,B,C
- Output
- OutputList (ai bj ck) of join results,
in which ai, bj , ck (from List A,B,C
respectively) are located on the same path in
source document - Two binary joins
- Step 1 Join A and B
- OutputList AB (ai bj ) sorted on ai
- Step 2 Join AB (using ai as node label) and C
- OutputList (ai bj ck) sorted on ai
- Important ck should be on the same path as both
ai AND bj
22View Transformation Algorithm
Source Document
Source Schema
View Schema
Proj
Proj
JR2
JP2
Researcher
Paper
PR2
RP2
Paper
Researcher
23View Transformation Algorithm
- Data Structures
- Record
- Object label
- Object key (object identifier)
- ChildList an array of children record references
- Tuple
- A tuple consists of an array of records
- A tuple has a root record
- Example
- Array of Tuples
-
Corresponding Tree
Tuple Array Index 0 1 2 3
Recordr1 (ChildList lt1,2gt), ROOT Recordr2 (ChildList lt3gt) Recordr3 (ChildList nil) Record r4 (ChildList nil)
r1
r2
r3
r4
24View Transformation Algorithm
- Operations
- Structural Join (SJ)
- Input
- L1 Array of Tuples of type ltA1,A2,A3,, Angt
- L2 Array of Tuples of type ltB1,B2,B3,, Bngt
- Mask M Bit Array, which specifies the object
classes which participate in structural join - Output
- L3 Array of Tuples of type ltA1,A2,A3,,
An,B1,B2,B3,, Bngt - For each tuple in L3, its objects whose types
are specified in M MUST locate on the same path
in the source document - Merge (Merge)
- Input
- L1 Array of Tuples of type lt A1,A2,A3,, Angt
- Mask M Bit Array, which specifies a list of
object classes - Output
- L Array of Tuples
- For any two tuples t1 and t2 in L1 which have
the same set of objects whose types are specified
in M, the two tuples will be merged. - Duplicate-Preserving Merge (D-Merge)
- Input
- L1 Array of Tuples of type lt A1,A2,A3,, Angt
Note Structural Join operation uses object
number while Merge and D-Merge use object
identifer.
25View Transformation Algorithm
- Definition 1Relationship R1 is lower than R2, or
R1 lt R2, if the top-most participating object
class of R1 is a descendent of the top-most
participating object class of R2. - Definition 2 A relationship is independent if
its set of participating object classes is not
included in any other relationship. Otherwise the
relationship is nested. - Algorithm (Main steps)
- Step 0. Initialize L1 null, L2 null,
- M ,
//M contains the set of object
classes that has been structurally joined - LO null
for each O in the view schema - Step 1. Put the independent relationships of the
ORA-SS view schema into a partially ordered set
(S, lt ) - Step 2 While (S is not empty)
- Extract the first relationship R from S
- If (MnR is not null)
- L1 LO for the highest O(O has
the smallest level in the view schema) in MnR - Else
- L1 null
- For each object class O in (R M) sorted
decreasingly by their depths in the view schema - M M U O
- L2 ClusterO //ClusterO
is an array storing objects of class O - if(L1 ! null )
- L1 StructuralJoin(L1,L2,MnR)
- else
26View Transformation Algorithm
- Example View Schema Slide 8a
- Step 1 Construct Relationship Project-Paper
Step 1.1. L SJ (list(P), list(R), P,R)
Source Doc
View Schema
Proj
JP2
Paper
Step 1.2. L D-Merge (L, P)
PR2
Researcher
S PR,JP
27View Transformation Algorithm
- Example View Schema Slide 8a
- Step 2 Construct Relationship Project-Paper
Step 2.1. L SJ(L, list(J), J,P)
Source Doc
View Schema
Proj
JP2
Paper
Step 2.2. L Merge (L, J )
PR2
Researcher
S JP
28Conclusion
- We demonstrate how to combine an efficient XML
storage scheme (EBC) and an expressive XML data
model (ORA-SS) to provide XML view support for a
native DBMS system. - ORA-SS, used as view schema definitions, can
express a great variety of constraints
graphically. More importantly, it can avoid
ambiguity, which is a typical problem if by DTD
or XML Schema is used as view schema format. -
- In our transformation method, a relationship type
is the basic unit of transformation. Both object
label based structural join and logical key based
value join are employed to construct view
results. ORASS view schema information can guide
correct view transformation.
29Reference
- Shurug Al-Khalifa, H. V. Jagadish, Nick Kouda,
Jignesh M. Patel, Divesh Srivastava, YuqingWu.
Structural Joins A Primitive for Efficient XML
Query Pattern Matching. In Proceedings of ICDE,
2002 - Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong
Li Lee ORA-SS An Object-Relationship-Attribute
Model for Semistructured Data TR21/00, Technical
Report, Department of Computer Science, National
University of Singapore, December 2000. - eXcelon. An General XML Data Manager.
http//www.exceloncorp.com/ - H. V. Jagadish, Shurug AL-Khalifa, et al. TIMBER
A Native XML Database. Technical Report,
University of Michigan,April 2002. - Xiaofeng Meng, Daofeng Luo, Mong Li Lee, Jing An.
OrientStore A Schema Based Native XML Storage
System. In Proceedings of the 29th VLDB
Conference, Berlin, Germany, 2003 - J. McHugh, S. Abiteboul, R. Goldman, D. Quass,
and J.Widom. Lore A Database Management System
for Semistructured Data. SIGMOD Record,
Vol.26(3)54-66,September 1997. - Lucian Popa, Mauricio A. Hernandez ,Yannis
Velegrakis , Renee J. Miller, Felix
Naumann, Howard Ho. Mapping XML and Relational
Schemas with Clio. In ICDE 2002 Demo, 2002