Design Issues in XML Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Design Issues in XML Databases

Description:

Information on primary keys and foreign keys can be used to create a document hierarchy. ... The system tables contain information about the 'metadata' primary keys, ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 27
Provided by: ranimikk
Learn more at: https://www.cse.scu.edu
Category:

less

Transcript and Presenter's Notes

Title: Design Issues in XML Databases


1
Design Issues in XML Databases
  • Ref Designing XML Databases by Mark Graves

2
Storing data in XML
  • XML can be stored as a flat file, in an
    object-oriented database or in a relational
    database.
  • Flat File Database A flat file is a simple
    storage mechanism and does not provide indexed
    queries or easy modification of a document.
  • The easiest way to store XML data is a single
    flat file that stores the entire XML document.
  • The data in the file is accessible by a variety
    of text editors and xml tools (eg parsers).
  • Flat file databases (consisting of many files)
    are a useful alternative.
  • In cases where the functionality of a traditional
    DBMS may not be necessary and when
    special-purpose operations like text search, or
    temporal or spatial operations are needed, flat
    file databases may be a better choice.

3
Flat-file Databases
  • Usually flat file databases will store large
    quantities of static data in a flat file database
    organized with one document per file.
  • Flat file databases have the following
    limitations
  • quick access and indexing are difficult.
  • Long updates and recovery are not efficient.

4
Flat-file Databases
  • Splitting a document into several document
    fragments may be more efficient than storing an
    entire document in a file.
  • For example, in applications involving many
    transaction, each transaction can be stored in a
    file.
  • Another approach is to slice a document with an
    identifier for each slice.
  • A simple mechanism to link documents is to have a
    special type name, for example, include with
    attributes that provide the necessary information
    to link to another document.
  • Example
  • ltinclude elementtransaction id12345/gt
  • These places where the document can be sliced
    into multiple document fragments are called
    slice points.
  • When the slice points are already determined by
    the application, the appropriate element type may
    used directly, such as
  • lttransaction id12345
  • In practice, this approach is too inefficient
    because of time needed to access each file
    individually.

5
Using a Relational Database to store XML data
  • Developing a relational schema for XML may be the
    most practical approach to integrating XML into a
    high-throughput enterprise.
  • There are three approaches that are used with the
    choice of a Relational Schema
  • Fine-grained
  • Coarse-grained and
  • Medium-grained

6
Using a Relational Database to store XML data
  • Fine-grained approach every construct in the
    document is given a unique identity in the
    relational database.
  • Every element, attribute and character data
    region can be individually accessed, modified or
    deleted with minimal effect on other document
    constructs.
  • This provides the most flexibility and ease of
    access both with the XML DBMS-specific operations
    as well as the traditional relational ones.
  • However, regenerating the entire document can be
    time-consuming when large.

7
Using a Relational Database to store XML data
  • Fine-grained Relational Schema
  • -- Create fine-grained storage tables and
    constraints
  • create table xdb_doc (
  • doc_id NUMBER(8) NOT NULL,
  • name VARCHAR2(128) NOT NULL,
  • root NUMBER(8)
  • )
  • alter table xdb_doc add primary key (doc_id)
  • create table xdb_ele (
  • doc_id NUMBER(8) NOT NULL,
  • ele_id NUMBER(8) NOT NULL,
  • parent_id NUMBER(8),
  • tag VARCHAR2(32) NOT NULL
  • )
  • alter table xdb_ele add primary key (doc_id,
    ele_id)
  • create table xdb_attr (

8
Using a Relational Database to store XML data
  • create table xdb_child (
  • doc_id NUMBER(8) NOT NULL,
  • ele_id NUMBER(8) NOT NULL,
  • indx NUMBER(6) NOT NULL,
  • child_class VARCHAR2(4) NOT NULL, -- ELE, STR,
    or TEXT
  • child_id NUMBER(8) NOT NULL
  • )
  • alter table xdb_child add primary key (doc_id,
    ele_id, indx)
  • create table xdb_str (
  • doc_id NUMBER(8) NOT NULL,
  • cdata_id NUMBER(8) NOT NULL,
  • ele_id NUMBER(8) NOT NULL,
  • value VARCHAR2(255) NOT NULL
  • )

9
Using a Relational Database to store XML data
  • create table xdb_text (
  • doc_id NUMBER(8) NOT NULL,
  • cdata_id NUMBER(8) NOT NULL,
  • ele_id NUMBER(8) NOT NULL,
  • value LONG NOT NULL
  • )
  • alter table xdb_text add primary key (doc_id,
    cdata_id)
  • -- Foreign Keys
  • alter table xdb_doc add constraint
    fk_xdb_doc_root
  • foreign key (doc_id, root) references
    xdb_ele (doc_id, ele_id)
  • alter table xdb_ele add constraint
    fk_xdb_ele_doc_id
  • foreign key (doc_id) references xdb_doc
    (doc_id)
  • alter table xdb_attr add constraint
    fk_xdb_attr_doc_id
  • foreign key (doc_id) references xdb_doc
    (doc_id)
  • alter table xdb_attr add constraint
    fk_xdb_attr_ele_id
  • foreign key (doc_id, ele_id) references
    xdb_ele (doc_id, ele_id)
  • alter table xdb_child add constraint
    fk_xdb_child_doc_id

10
Using a Relational Database to store XML data
  • -- Search for all elements with a given tag name.
  • select doc_id, ele_id
  • from xdb_ele
  • where doc_id 1
  • and tag 'TagName'
  • -- Search for all elements with a given attribute
    name.
  • select doc_id, attr_id
  • from xdb_attr
  • where doc_id 1
  • and name 'AttrName'
  • and value 'AttrValue'
  • -- Search for all elements with a given tag name
    that has a
  • -- character data child consisting of a given
    string.
  • select ele.doc_id, ele.ele_id
  • from xdb_ele ele, xdb_str str
  • where ele.doc_id 1

11
Using a Relational Database to store XML data
  • Coarse-grained approach In this approach, a
    document is stored in its entirety.
  • Even though it is similar to storing the
    documents in flat files, this approach has the
    advantage of allowing the documents to be
    referred to within other structures in the
    database.
  • In addition, it provides the security, recovery
    and other features of a DBMS in which it is
    stored.
  • Its particular usefulness is part of a hybrid
    representation.

12
Using a Relational Database to store XML data
  • Coarse-grained approach
  • create table cgrel_doc (
  • doc_id NUMBER(8) NOT NULL,
  • name VARCHAR2(128),
  • body LONG
  • )
  • alter table cgrel_doc add primary key (doc_id)

13
Using a Relational Database to store XML data
  • Medium-grained approach
  • The fine-grained approach works well to perform
    tasks that access elements, whereas the tasks to
    store and retrieve an entire document are
    difficult.
  • The coarse-grained approach works well in
    manipulating entire documents but has difficulty
    with the element manipulation.
  • The medium-grained approach is a compromise
    between the fine and coarse-grained approaches.

14
Using a Relational Database to store XML data
  • Medium-grained approach The document tree can be
    sliced into sections where the sub-sections are
    stored with a coarse-grained approach. This is
    particularly useful if the sections are accessed
    individually for example, in reference books
    such as dictionaries, a medium grained approach
    would be to store each entry separately.

15
Using a Relational Database to store XML data
  • Medium-grained approach
  • Determining the slice points is a complex issue.
  • How many slice-points are created?
  • How many levels of slicing are created?
  • Does the slicing depend upon the element type
    name or the depth in the tree or the size of the
    document section?
  • Are some sections of the document sliced more
    finely than the other sections?

16
Using a Relational Database to store XML data
  • Medium-grained approach
  • One way to approach the slicing granularity is to
    view slicing as a method to index the database.
  • An index speeds up access for a particular
    database request by creating an index table that
    provides quick navigation of the indexed
    information.
  • Slices ca be created on a element type name (s)
    for which frequent access is anticipated.
  • A combination of element type names and attribute
    values can also be used to drive the index slice
    method.
  • Choosing slice points based on desired indexing
    will reduce the data access time over the
    coarse-grained approach for queries or other
    accesses that involve the indexed tags.
  • Indexes can be created on a few highly requested
    element types.
  • If most of the directly accessed element type
    names are indexed then the access time for those
    queries approaches the access time under the
    fine-grained approach while also reducing the
    document regeneration time.

17
Using a Relational Database to store XML data
  • Medium-grained approach
  • Another way to approach the slicing granularity
    is to view the slicing method as a buffering
    mechanism.
  • Slices may be determined by physical
    characteristics, such as size of the slice.
  • The slice size can be chosen to efficiently use
    network communication protocols to reduce the
    time needed to retrieve a portion of the document
    when the network response is a critical factor.
  • Combinations of approaches may be used for
    particular applications or documents.

18
Using a Relational Database to store XML data
  • Medium-grained approach
  • One point to be addressed is how to represent the
    slice points in the document.
  • One mechanism is to create a specific element
    type to represent the necessary information,
    ensuring that the element type name is unique in
    the document.
  • For example, an element type called slice or
    proxy could be created with attributes that
    contain sufficient information to reconstruct the
    document, namely document-id and element-id.

19
Using a Relational Database to store XML data
  • Medium-grained approach
  • tables and constraints for medium-grained storage
  • create table xdb_frag (
  • doc_id NUMBER(8) NOT NULL,
  • frag_id NUMBER(8) NOT NULL,
  • ele_id NUMBER(8) NOT NULL,
  • body LONG NOT NULL
  • )
  • alter table xdb_frag add primary key (doc_id,
    frag_id)
  • create table xdb_frag_ref (
  • doc_id NUMBER(8) NOT NULL,
  • frag_id NUMBER(8) NOT NULL,
  • ele_ref NUMBER(8) NOT NULL
  • )
  • alter table xdb_frag_ref add primary key (doc_id,
    frag_id, ele_ref)
  • alter table xdb_frag add constraint
    fk_xdb_frag_doc_id

20
Relational Data Server
  • A relational data server (combined with a web
    server) makes data from a relational DBMS
    available in XML.
  • It works by querying the database and formatting
    the report from the RDBMS as XML.
  • The basic process consists of the following
    steps
  • The user specifies a relational query to the web
    browser as a URL.

21
Relational Data Server
  • The basic process consists of the following
    steps
  • The user specifies a relational query to the web
    browser as a URL.
  • The web browser sends the URL to the data server.
  • The data server parses the URL request and
    creates a SQL query.
  • The data server passes the SQL query to the
    database server.
  • The database server executes the query.
  • The database server returns the relational report
    to the data server.
  • The data server formats the report as XML.
  • The data server returns the XML report to the Web
    browser.
  • The web browser parses the XML report and
    displays it to the user.
  • When a stylesheet is used, it must be retrieved
    and parsed by the web browser.

22
Some Issues To be addressed
  • Can relational views be displayed in addition to
    tables?
  • Can data be updated? Or, is data read-only?
  • Can data be retrieved from multiple tables?
  • How are joins handled?
  • How complex can the mapping from relations to XML
    be?
  • Can helper tables or joining tables be recognized
    and handled differently?
  • Can foreign key constraints be followed? To what
    depth?
  • How are links between tables handled?
  • How expressive is the query facility?

23
  • We will briefly examine a relational data server
    (implemented in java) called rServe.
  • Ref http//www.xwdb.org/docs/XWDB_RServe_User_Gui
    de.pdf
  • Some examples of usage
  • http//127.0.0.1/servlets/com.xweave.xmldb.rserve.
    XMLServlet?tablenamecustomerstylesheet/ss/gener
    ic1.xsl
  • Which retrieves all the records from the customer
    table, then formats the data using the stylesheet
    http//127.0.0.1/ss/generic.xsl.

24
Creating a SQL Query
  • The relational data server uses the parameters of
    the URL to build a SQL statement.
  • Example
  • Select from tablename
  • Much of the functionality of a SQL statement can
    be encloded as a URL.
  • The where condition is given by pre-pending the
    condition to the value in a name-value pair.
  • Example
  • pnop1quantitylt20

25
Formatting a Report as XML
  • One way to format data as XML is to use the table
    names and column names to create the element
    types.
  • Information on primary keys and foreign keys can
    be used to create a document hierarchy.
  • An issue to address about element naming is
  • Special characters may be used in table and
    column names that are not valid as xml element
    names.

26
Extracting Dictionary Data
  • A commercial relational DBMS provides information
    in its catalog (Data Dictionary) consisting of
    system tables.
  • The system tables contain information about the
    metadata primary keys, foreign keys and other
    integrity constraints that are defined on the
    data.
  • The data in a RDBMS can be mapped to XML as a
    tree of parent/child relationships (represented
    using primary and foreign keys).
Write a Comment
User Comments (0)
About PowerShow.com