XMLtoRelational Schema Mapping Algorithm ODTDMap - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

XMLtoRelational Schema Mapping Algorithm ODTDMap

Description:

declares which attributes are allowed or required in which elements attribute types: ... but only this value is allowed. 10. Mapping DTDs to. relational schemas ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 30
Provided by: arth112
Category:

less

Transcript and Presenter's Notes

Title: XMLtoRelational Schema Mapping Algorithm ODTDMap


1
XML-to-Relational Schema Mapping Algorithm ODTDMap
  • Speaker Artem Chebotko
  • Email artem_at_wayne.edu
  • Wayne State University
  • Joint work with Mustafa Atay, Shiyong Lu and
    Farshad Fotouhi

2
Introduction
  • XML has emerged as the standard for representing
    and exchanging data on the World Wide Web.
  • The increasing amount of XML documents requires
    the need to store and query XML documents
    efficiently.

3
Current approaches of storing and querying XML
documents
  • Native XML repositories, e.g., Software AGs
    Tamino, eXcelons XIS.
  • XML-enabled commercial database systems such as
    SQL Server, Oracle, and DB2
  • Using RDBMS/ODBMS to store and query XML
    documents.

4
Issues of the relational approach
  • Schema Mapping
  • XML data model needs to be mapped into the
    relational model
  • Data Mapping
  • XML documents need to be shredded and composed
    into tuples to be inserted into the relational
    database
  • Query Mapping
  • XML queries need to be translated into SQL
    queries
  • Reverse Data Mapping
  • Query results need to be tagged to XML format.

5
Our contributions
  • We propose a schema mapping algorithm, ODTDMap,
    which generates a relational schema from an XML
    DTD for storing and querying ordered XML
    documents.
  • Improvements over the existing algorithms
  • Losslessness
  • Efficient support for XML queries
  • Completeness (recursion, set-valued attributes
    DTD operators)

6
Outline of the talk
  • Introduction of XML DTDs
  • Mapping DTDs to relational schemas
  • Simplifying DTDs
  • Creating and inlining DTD graphs
  • Generating relational schemas
  • An example
  • Conclusions and future work

7
An overview of DTDs A DTD example
  • lt!DOCTYPE memo
  • lt!ELEMENT memo (to, from, date, subject?, body)gt
  • lt!ATTLIST memo security CDATAgt
  • lt!ATTLIST memo lang CDATAgt
  • lt!ELEMENT to (PCDATA)gt
  • lt!ELEMENT from (PCDATA)gt
  • lt!ELEMENT date (PCDATA)gt
  • lt!ELEMENT subject (PCDATA)gt
  • lt!ELEMENT body (para)gt
  • lt!ELEMENT para (PCDATA)gt

8
DTD Document Type Defintion
  • lt!DOCTYPE root-element doctype-declaration...
  • lt!ELEMENT element-name content-modelgt, content
    model , ,, , , ?
  • lt!ATTLIST element-name attr-name attr-type
    attr-default ...gt

9
DTD Document Type Definition (cont)
  • lt!ATTLIST element-name attr-name attr-type
    attr-default ...gtdeclares which attributes are
    allowed or required in which elements attribute
    types
  • CDATA any value is allowed (the default)
  • (value...) enumeration of allowed values
  • ID, IDREF, IDREFS ID attribute values must be
    unique (contain "element identity"), IDREF
    attribute values must match some ID (reference to
    an element)
  • ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION
    just forget these... (consider them deprecated)
  • attribute defaults
  • REQUIRED the attribute must be explicitly
    provided
  • IMPLIED attribute is optional, no default
    provided
  • "value" if not explicitly provided, this value
    inserted by default
  • FIXED "value" as above, but only this value is
    allowed

10
Mapping DTDs to relational schemas
  • Simplifying DTDs
  • Creating and inlining DTD graphs
  • Generating relational schemas

11
Simplifying DTDs
  • A DTD might be very complex due to nesting, e.g.,
  • ltELEMENT a ((b, c, d?)?, (e?, f, (g,
    h?))?)gt
  • An XML query language is concerned about
  • The parent-child relationships between XML
    elements
  • The relative order relationships between siblings
    (add an ordinal attribute to each relation)

12
DTD simplifications rules
  • e ? e
  • e? ? e
  • (e1 en) ? (e1, ,en)
  • (a) (e1, ,en) ? (e1, ,en)
  • (b) e ? e
  • 5. (a) , e, , e, ?,e, ,
  • (b) , e, , e, ?,e, ,
  • (c) , e, , e, ?,e, ,
  • (d) , e, , e, ?,e, ,

13
Example of simplifying a DTD
  • ltELEMENT a ((b, c, d?)?, (e?, f, (g, h?))?)gt
  • simplified to
  • ltELEMENT a (b, c, d, e, f, g, h)gt

14
Creating and inlining DTD graphs
  • We create a DTD graph based on the simplified
    DTD.
  • Definition 3.2 (DTD graph) The structure of a DTD
    can be represented by a labeled graph, in which
    nodes represent elements and attributes, and
    edges represent their parent-child relationships.
    The edges are labeled by either ' (star edge)
    or , ' (normal edge) where the label ,' is not
    shown for simplicity.
  • Idea inline a child c to its parent p if p can
    contain at most one occurrence of c.
  • Rationale inlined elements will produce a
    relation.

15
Inlinable node and subtree, shared node
  • Definition 3.3 (Inlinable node) Given a DTD
    graph, a node is inlinable if and only if it has
    exactly one incoming edge and that edge is a
    normal edge.
  • Definition 3.4 (Inlinable subtree) Given a DTD
    graph and a node e in the graph, e and all other
    inlinable nodes that are reachable from e by
    normal edges constitute a subtree. This subtree
    is called the inlinable subtree for the node e
    (it is rooted at e).
  • Definition 3.5 (Shared node) Given a DTD graph, a
    node is called a shared node if it has more than
    one incoming edge.

16
Inlining
  • Case 1 Node a is connected to b by a normal edge
    and b has no other incoming edges, inlining b to
    a.
  • Case 2 Node a is connected to b by a normal edge
    but b has other incoming edges, b is a shared
    node, no inlining.
  • Case 3 Node a is connected to b by a star edge,
    no inlining.

17
Inlining (cont)
18
Inlining DTD graphs
19
Complexity of inlining
  • Theorem 3.7 (Time Complexity)
  • The time complexity of our inlining algorithm is
    O(n) where n is the number of elements in the
    input DTD.

20
The inlining procedure
21
The inlining procedure (cont)INCORRECT
22
The inlining procedure (cont)CORRECT
23
Generating relational schema
24
Generating schema mapping info.
  • Definition 3.8 (s Mapping) s is a mapping from X
    to R, where X is the set of XML element and
    attribute types in the input XML DTD, and R is
    the set of relations in the relational database.
    Given an XML element type e, s(e) will return the
    corresponding relation that is used to store e.
    Similarly, given an XML attribute type a of
    element type e, s(e.a) will return the
    corresponding relation that is used to store a of
    e.

25
A complete example
26
DTD graphInlined DTD graph
27
Generated relational schema
28
Conclusions
  • We defined the schema mapping algorithm ODTDMap,
    which has several improvements over the existing
    ones.
  • It is lossless in the sense that one can
    reconstruct original XML document in the given
    document order, based on the target relational
    schema generated by ODTDMap.
  • It has efficient support for recursive queries
    and schemas.
  • It defines how to map set-valued XML attributes.
  • Experimental results showed good performance and
    scalability of the algorithm.

29
Future work
  • Extending our work to XML Schema to support data
    types other than string type.
  • Maintain the ID/IDREF/IDREFS in terms of key and
    foreign key constraints.
Write a Comment
User Comments (0)
About PowerShow.com