On View Support for a Native XML DBMS - PowerPoint PPT Presentation

About This Presentation
Title:

On View Support for a Native XML DBMS

Description:

On View Support for a Native XML DBMS Ting Chen , Tok Wang Ling School of Computing, National University of Singapore Daofeng Luo, Xiaofeng Meng – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 30
Provided by: dcs117
Category:
Tags: dbms | xml | construct | dbms | native | support | view

less

Transcript and Presenter's Notes

Title: On View Support for a Native XML DBMS


1
On View Support for a Native XML DBMS
  • Ting Chen , Tok Wang Ling
  • School of Computing, National University of
    Singapore
  • Daofeng Luo, Xiaofeng Meng
  • Information School , Remin University of
    China

2
Outline
  • View for XML Documents
  • Two Main Approaches
  • Problems
  • ORA-SS Object-Relationship-Attribute Model for
    Semi-structured Data
  • ORA-SS class diagram and instance diagram
  • ORA-SS for view schema definition
  • Element-Based Clustering (EBC)
  • Basic Approach
  • ORA-SS and EBC
  • XML View Transformation
  • Problem Definition
  • Algorithm
  • Conclusion

3
View for XML Documents
  • Two main approaches to define views
  • Define views in script languages like XQuery or
    XSLT
  • General but demanding from users point of view
    because XQuery and XSLT scripts are complex
  • Difficult to optimize from performance view point
  • Define views by Schema-Mapping
  • E.g. Clio7 and eXeclon3
  • A declarative approach alleviate users from
    writing complex scripts to perform view
    transformation
  • Schema mappings can then be translated into
    XQuery (or XSLT) scripts
  • We focus on the problem of view transformation
    through schema mapping

4
View for XML Documents
  • View for XML Documents via Schema-Mapping
  • Problem Current XML schema formats are not able
    to express views with semantic constraints,
    resulting in ambiguity
  • E.g. (Next Slide) The source XML file contains
    information about researchers working under
    different projects and the publication list for
    each researcher.

5
View for XML Documents
The view schema in Fig (c) of the above diagram
is such an example. It has at least two possible
meanings which can lead to different view
results! 1. For each project, list all the papers
published by project members for each paper of
the project, list all the authors of the
paper. 2. For each project, list all the papers
published by project members for each paper of
the project, list all the authors of the paper
who work for the project.
6
ORA-SS Data Model
  • ORA-SS2
  • Object Class
  • Relationship Type
  • Attribute( Object attribute or Relationship
    attribute)
  • E.g. An ORA-SS Instance Diagram

Compared with the XML document in Slide 5, two
extra fields (attribute Date and sub-element
Position) are added in the above ORASS instance
diagram
7
ORA-SS Data Model
  • ORA-SS Schema Diagram
  • There are two binary relationship types in the
    schema Project-Researcher(JR) and
    Researcher-Paper(RP). The set of papers under a
    researcher doesnt depend on the project he/she
    works in.
  • Position is an attribute of relationship type JR
    instead of Researcher. This means that a
    researcher may hold different positions across
    projects he works in.
  • Date is a single-valued attribute of object
    class Paper. Different occurrences of the same
    paper will always have the same Date value.
  • J_Name,R_Name and P_ID are identifiers of object
    classes Project , Researcher and Paper
    respectively as indicated by solid circles.
    Identifier values are used to tell if two object
    occurrences are identical.

8
ORA-SS Data Model
  • ORA-SS for View Schema Definition
  • It is able to define views with different
    semantics
  • View (a) has two binary relationship types. The
    intention of the view schema is to find all the
    papers published by researchers in a project and
    for each paper to find all of its authors.
  • View (b) has only one ternary relationship type.
    The view is defined to find all the papers
    published by researchers in a project however,
    for each paper View (b) only finds those authors
    working for the project.

View (a)
View (b)
9
ORA-SS Data Model
  • ORA-SS for View Schema Definition
  • The two view schemas in Slide 8 correspond to two
    different XSLT scripts. The following is the XSLT
    script for schema (a)

ltrootgt ltxslfor-each-group select"root/Project
" group-by"_at_J_Name"gt ltProjectgt
ltJ_Namegtltxslvalue-of select"_at_J_Name"/gtlt/J_Namegt
ltxslfor-each-group
select"current-group()/Researcher/Paper"
group-by"_at_P_Name"gt
ltPapergt ltxslvariable
name"vPName" select"_at_P_Name"/gt
ltP_Namegtltxslvalue-of select"_at_P_Name"/gtlt/P_Nam
egt ltxslfor-each-group
select"/root/Project/ResearcherPaper/_at_P_Name

vPName" group-by"_at_R_Name"gt ltResearchergt
ltR_Namegtltxslvalue-of select"_at_R_Name"/gtlt/R_Na
megt lt/Researchergt lt/xslfor-each-groupgt
lt/Papergt
lt/xslfor-each-groupgt lt/Projectgt
lt/xslfor-each-groupgt lt/rootgt
10
ORA-SS Data Model
  • The following is the XSLT script for schema (b)
  • The main difference of two scripts lies in the
    third xslfor-each-group directive for
    Researcher. Script for Schema (a) needs to search
    the whole document to find the complete author
    list of a paper because authors may not work for
    the same project. On the other hand, script for
    Schema (b) avoids the global search because it
    only needs find authors of the paper working for
    the same project.

ltrootgt ltxslfor-each-group select"root/Project
" group-by"_at_J_Name"gt ltProjectgt
ltJ_Namegtltxslvalue-of select"_at_J_Name"/gtlt/J_Namegt
ltxslfor-each-group
select"current-group()/Researcher/Paper"
group-by"_at_P_Name"gt
ltPapergt ltxslvariable
name"vPName" select"_at_P_Name"/gt
ltP_Namegtltxslvalue-of select"_at_P_Name"/gtlt/P_Nam
egt ltxslfor-each-group select"current-group()/..
" group-by"_at_R_Name"gt ltResearchergt
ltR_Namegtltxslvalue-of select"_at_R_Name"/gtlt/R_Namegt
lt/Researchergt lt/xslfor-each-groupgt
lt/Papergt lt/xslfor-each-groupgt
lt/Projectgt lt/xslfor-each-groupgt lt/rootgt
11
Element-Based-Clustering (EBC)
  • Element Based Clustering5
  • Extension of Element Based (EB6) Strategy
  • Element nodes (records) with the same tag name
    are clustered and organized as a list

12
Element-Based-Clustering (EBC)
  • EBC Node labeling
  • EBC gives labels for nodes in a XML document
  • Labels for nodes in a XML data tree can be
    calculated in the following manner
  • 1. The root element has label nil
  • 2. Perform a pre-order traversal (i.e.
    Document order) on the XML document
  • For node x ( here means string
    concatenation)
  • label(x) label(x.parent) . position of x
    in x.parents childList
  • Node A is ancestor of node B if Label(A) is the
    prefix of node Label(B) and vice versa

13
ORA-SS and EBC
  • How does ORA-SS schema help tune XML document
    storage?

Project j1(1) j2(2)
Researcher r1(1.1) r2(1.2) r2(2.1) r3(2.2)
Position Leader(1.2.3) Staff(2.1.3) Leader(2.2.2)
Paper p1,05/2002(1.1.1) p1,05/2002(1.2.1) p2,03/2000(1.2.2)
p1,05/2002(2.1.1) p1,03/2000(2.1.2) p2,05/2002(2.2.1)
14
ORA-SS and EBC
  • 1. Object identifier of an object will be stored
    together with the object. Objects of the same
    class form a cluster. Each cluster is a
    sequential file.
  • 2. Relationship attribute values will be stored
    in separate cluster.
  • 3. For object attribute values, we need some
    heuristics.
  • It is attempting to store object attributes
    together with the object since they are likely to
    be accessed at the same time. However, if an
    object class has too many objects attributes, to
    store all attribute values together with the
    object brings us to a situation similar to
    Subtree-Based storage strategy. One solution is
    to store only those essential (this can be
    determined by users) object attributes with the
    object and leave other to separated clusters.
  • 4. Node (which can be object, relationship value
    and object attribute value) labels will be stored
    together with nodes.

15
View Transformation Problem Definition
  • Problem Definition
  • Given an XML document D1, a source schema V1 and
    a valid view schema V2 of V1, transform D1 to
    document D2, so that D2 is a valid document under
    V2.

View Transformation
Source Document
View Document
User Defined Schema Mapping
View Schema
Source Schema
16
View Transformation
  • View schema defined in DTD can have different
    interpretations which result in different views

Proj
Slide 8(a)
JP2
View Schema (Ambiguous)
Paper
Proj
PR2
Researcher
?
Paper
Slide 8(b)
Proj
Researcher
Paper
JPR3
Researcher
17
View Transformation
  • View schema expressed in ORA-SS is unambiguous
  • The relationship set of an ORA-SS view schema
    clearly defines how view document should be
    constructed
  • Two basic techniques are used in construction of
    a single relationship in view schema
  • Structural join SJ (based on object labels)
  • For each relationship R in view schema, we first
    use structural join to find the set of paths of
    type R such that the object occurrences in each
    path locate on the same path in source document.
  • Value join Merge (based on logical object
    identifiers)
  • In XML document an object can have many
    occurrences. Two occurrences are considered as
    the same if they have identical object
    identifier. The set of paths resulted from
    structural join is then value joined (or merged)
    using logical object keys.

18
View Transformation
  • What makes things complicated?
  • A path in view schema can have more than one
    relationship and we need to join two relationship
    together. E.g. View schema in Slide 8(a) contains
    two relationships.
  • Two relationships are joined on their
    overlapping object classes. So the results
    constructed for one relationship may be used in
    construction of another.
  • Value join (merge) based on logical keys destroys
    the sorted-ness of the output path list of
    structural join. The consequence is that the
    output of value join cant be used in subsequent
    structural join with paths from other
    relationships efficiently. (Structural join
    requires sorted input lists)
  • Solution Duplicated-Preserving Merge (D-Merge).
    It keeps the structural join output path list
    intact. For two occurrences of the same object in
    the list, their child contents will be merged and
    then each of them will have its own copy of the
    merged content.
  • D-Merge is used only when a relationship needs
    to join with another relationship.

19
View Transformation Algorithm
  • The relationship set of a view schema determines
    the view transformation process.
  • Take the two view schemas in Slide 8 as an
    example
  • View(a)
  • L SJ(list(R), list(P), P,R)
  • L D-Merge(L, P)
  • L SJ(L, list(J), J,P)
  • L Merge(L,J)
  • View(b)
  • L SJ(list(J), list(P), J,P)
  • L SJ(L, list(R), J,P,R)
  • L Merge(L,J)

20
View Transformation Algorithm
  • Structural Join
  • Based on Object Label
  • Binary Structural Join1
  • Input
  • Two sorted (on node numbers) node lists
    AList of potential ancestor nodes and DList of
    potential descendants nodes
  • Output
  • OutputList (ai dj) of join results, in
    which ai is the parent/ancestor of dj and ai is
    from AList and dj is from DList
  • EBC schemes stores elements with the same tag
    name in pre-ordered( i.e. sorted on element
    number) way structural join can be naturally
    applied

21
View Transformation Algorithm
  • Structural Join
  • Based on Object Label
  • Complex Structural Join (Example structural
    join of three sorted input lists)
  • Input
  • Sorted node lists A,B,C
  • Output
  • OutputList (ai bj ck) of join results,
    in which ai, bj , ck (from List A,B,C
    respectively) are located on the same path in
    source document
  • Two binary joins
  • Step 1 Join A and B
  • OutputList AB (ai bj ) sorted on ai
  • Step 2 Join AB (using ai as node label) and C
  • OutputList (ai bj ck) sorted on ai
  • Important ck should be on the same path as both
    ai AND bj

22
View Transformation Algorithm
  • A motivating example

Source Document
Source Schema
View Schema
Proj
Proj
JR2
JP2
Researcher
Paper
PR2
RP2
Paper
Researcher
23
View Transformation Algorithm
  • Data Structures
  • Record
  • Object label
  • Object key (object identifier)
  • ChildList an array of children record references
  • Tuple
  • A tuple consists of an array of records
  • A tuple has a root record
  • Example
  • Array of Tuples

Corresponding Tree
Tuple Array Index 0 1 2 3
Recordr1 (ChildList lt1,2gt), ROOT Recordr2 (ChildList lt3gt) Recordr3 (ChildList nil) Record r4 (ChildList nil)
r1
r2
r3
r4
24
View Transformation Algorithm
  • Operations
  • Structural Join (SJ)
  • Input
  • L1 Array of Tuples of type ltA1,A2,A3,, Angt
  • L2 Array of Tuples of type ltB1,B2,B3,, Bngt
  • Mask M Bit Array, which specifies the object
    classes which participate in structural join
  • Output
  • L3 Array of Tuples of type ltA1,A2,A3,,
    An,B1,B2,B3,, Bngt
  • For each tuple in L3, its objects whose types
    are specified in M MUST locate on the same path
    in the source document
  • Merge (Merge)
  • Input
  • L1 Array of Tuples of type lt A1,A2,A3,, Angt
  • Mask M Bit Array, which specifies a list of
    object classes
  • Output
  • L Array of Tuples
  • For any two tuples t1 and t2 in L1 which have
    the same set of objects whose types are specified
    in M, the two tuples will be merged.
  • Duplicate-Preserving Merge (D-Merge)
  • Input
  • L1 Array of Tuples of type lt A1,A2,A3,, Angt

Note Structural Join operation uses object
number while Merge and D-Merge use object
identifer.
25
View Transformation Algorithm
  • Definition 1Relationship R1 is lower than R2, or
    R1 lt R2, if the top-most participating object
    class of R1 is a descendent of the top-most
    participating object class of R2.
  • Definition 2 A relationship is independent if
    its set of participating object classes is not
    included in any other relationship. Otherwise the
    relationship is nested.
  • Algorithm (Main steps)
  • Step 0. Initialize L1 null, L2 null,
  • M ,
    //M contains the set of object
    classes that has been structurally joined
  • LO null
    for each O in the view schema
  • Step 1. Put the independent relationships of the
    ORA-SS view schema into a partially ordered set
    (S, lt )
  • Step 2 While (S is not empty)
  • Extract the first relationship R from S
  • If (MnR is not null)
  • L1 LO for the highest O(O has
    the smallest level in the view schema) in MnR
  • Else
  • L1 null
  • For each object class O in (R M) sorted
    decreasingly by their depths in the view schema
  • M M U O
  • L2 ClusterO //ClusterO
    is an array storing objects of class O
  • if(L1 ! null )
  • L1 StructuralJoin(L1,L2,MnR)
  • else

26
View Transformation Algorithm
  • Example View Schema Slide 8a
  • Step 1 Construct Relationship Project-Paper

Step 1.1. L SJ (list(P), list(R), P,R)
Source Doc
View Schema
Proj
JP2
Paper
Step 1.2. L D-Merge (L, P)
PR2
Researcher
S PR,JP
27
View Transformation Algorithm
  • Example View Schema Slide 8a
  • Step 2 Construct Relationship Project-Paper

Step 2.1. L SJ(L, list(J), J,P)
Source Doc
View Schema
Proj
JP2
Paper
Step 2.2. L Merge (L, J )
PR2
Researcher
S JP
28
Conclusion
  • We demonstrate how to combine an efficient XML
    storage scheme (EBC) and an expressive XML data
    model (ORA-SS) to provide XML view support for a
    native DBMS system.
  • ORA-SS, used as view schema definitions, can
    express a great variety of constraints
    graphically. More importantly, it can avoid
    ambiguity, which is a typical problem if by DTD
    or XML Schema is used as view schema format.
  • In our transformation method, a relationship type
    is the basic unit of transformation. Both object
    label based structural join and logical key based
    value join are employed to construct view
    results. ORASS view schema information can guide
    correct view transformation.

29
Reference
  • Shurug Al-Khalifa, H. V. Jagadish, Nick Kouda,
    Jignesh M. Patel, Divesh Srivastava, YuqingWu.
    Structural Joins A Primitive for Efficient XML
    Query Pattern Matching. In Proceedings of ICDE,
    2002
  • Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong
    Li Lee ORA-SS An Object-Relationship-Attribute
    Model for Semistructured Data TR21/00, Technical
    Report, Department of Computer Science, National
    University of Singapore, December 2000.
  • eXcelon. An General XML Data Manager.
    http//www.exceloncorp.com/
  • H. V. Jagadish, Shurug AL-Khalifa, et al. TIMBER
    A Native XML Database. Technical Report,
    University of Michigan,April 2002.
  • Xiaofeng Meng, Daofeng Luo, Mong Li Lee, Jing An.
    OrientStore A Schema Based Native XML Storage
    System. In Proceedings of the 29th VLDB
    Conference, Berlin, Germany, 2003
  • J. McHugh, S. Abiteboul, R. Goldman, D. Quass,
    and J.Widom. Lore A Database Management System
    for Semistructured Data. SIGMOD Record,
    Vol.26(3)54-66,September 1997.
  • Lucian Popa, Mauricio A. Hernandez ,Yannis
    Velegrakis , Renee J. Miller, Felix
    Naumann, Howard Ho. Mapping XML and Relational
    Schemas with Clio. In ICDE 2002 Demo, 2002
Write a Comment
User Comments (0)
About PowerShow.com