Title: Design Issues in XML Databases
1Design Issues in XML Databases
- Ref Designing XML Databases by Mark Graves
2Storing data in XML
- XML can be stored as a flat file, in an
object-oriented database or in a relational
database. - Flat File Database A flat file is a simple
storage mechanism and does not provide indexed
queries or easy modification of a document. - The easiest way to store XML data is a single
flat file that stores the entire XML document. - The data in the file is accessible by a variety
of text editors and xml tools (eg parsers). - Flat file databases (consisting of many files)
are a useful alternative. - In cases where the functionality of a traditional
DBMS may not be necessary and when
special-purpose operations like text search, or
temporal or spatial operations are needed, flat
file databases may be a better choice.
3Flat-file Databases
- Usually flat file databases will store large
quantities of static data in a flat file database
organized with one document per file. - Flat file databases have the following
limitations - quick access and indexing are difficult.
- Long updates and recovery are not efficient.
4Flat-file Databases
- Splitting a document into several document
fragments may be more efficient than storing an
entire document in a file. - For example, in applications involving many
transaction, each transaction can be stored in a
file. - Another approach is to slice a document with an
identifier for each slice. - A simple mechanism to link documents is to have a
special type name, for example, include with
attributes that provide the necessary information
to link to another document. - Example
- ltinclude elementtransaction id12345/gt
- These places where the document can be sliced
into multiple document fragments are called
slice points. - When the slice points are already determined by
the application, the appropriate element type may
used directly, such as - lttransaction id12345
- In practice, this approach is too inefficient
because of time needed to access each file
individually.
5 Using a Relational Database to store XML data
- Developing a relational schema for XML may be the
most practical approach to integrating XML into a
high-throughput enterprise. - There are three approaches that are used with the
choice of a Relational Schema - Fine-grained
- Coarse-grained and
- Medium-grained
6 Using a Relational Database to store XML data
- Fine-grained approach every construct in the
document is given a unique identity in the
relational database. - Every element, attribute and character data
region can be individually accessed, modified or
deleted with minimal effect on other document
constructs. - This provides the most flexibility and ease of
access both with the XML DBMS-specific operations
as well as the traditional relational ones. - However, regenerating the entire document can be
time-consuming when large.
7 Using a Relational Database to store XML data
- Fine-grained Relational Schema
- -- Create fine-grained storage tables and
constraints - create table xdb_doc (
- doc_id NUMBER(8) NOT NULL,
- name VARCHAR2(128) NOT NULL,
- root NUMBER(8)
- )
- alter table xdb_doc add primary key (doc_id)
- create table xdb_ele (
- doc_id NUMBER(8) NOT NULL,
- ele_id NUMBER(8) NOT NULL,
- parent_id NUMBER(8),
- tag VARCHAR2(32) NOT NULL
- )
- alter table xdb_ele add primary key (doc_id,
ele_id) - create table xdb_attr (
8 Using a Relational Database to store XML data
- create table xdb_child (
- doc_id NUMBER(8) NOT NULL,
- ele_id NUMBER(8) NOT NULL,
- indx NUMBER(6) NOT NULL,
- child_class VARCHAR2(4) NOT NULL, -- ELE, STR,
or TEXT - child_id NUMBER(8) NOT NULL
- )
- alter table xdb_child add primary key (doc_id,
ele_id, indx) - create table xdb_str (
- doc_id NUMBER(8) NOT NULL,
- cdata_id NUMBER(8) NOT NULL,
- ele_id NUMBER(8) NOT NULL,
- value VARCHAR2(255) NOT NULL
- )
9 Using a Relational Database to store XML data
- create table xdb_text (
- doc_id NUMBER(8) NOT NULL,
- cdata_id NUMBER(8) NOT NULL,
- ele_id NUMBER(8) NOT NULL,
- value LONG NOT NULL
- )
- alter table xdb_text add primary key (doc_id,
cdata_id) - -- Foreign Keys
- alter table xdb_doc add constraint
fk_xdb_doc_root - foreign key (doc_id, root) references
xdb_ele (doc_id, ele_id) - alter table xdb_ele add constraint
fk_xdb_ele_doc_id - foreign key (doc_id) references xdb_doc
(doc_id) - alter table xdb_attr add constraint
fk_xdb_attr_doc_id - foreign key (doc_id) references xdb_doc
(doc_id) - alter table xdb_attr add constraint
fk_xdb_attr_ele_id - foreign key (doc_id, ele_id) references
xdb_ele (doc_id, ele_id) - alter table xdb_child add constraint
fk_xdb_child_doc_id
10 Using a Relational Database to store XML data
- -- Search for all elements with a given tag name.
- select doc_id, ele_id
- from xdb_ele
- where doc_id 1
- and tag 'TagName'
- -- Search for all elements with a given attribute
name. - select doc_id, attr_id
- from xdb_attr
- where doc_id 1
- and name 'AttrName'
- and value 'AttrValue'
- -- Search for all elements with a given tag name
that has a - -- character data child consisting of a given
string. - select ele.doc_id, ele.ele_id
- from xdb_ele ele, xdb_str str
- where ele.doc_id 1
11Using a Relational Database to store XML data
- Coarse-grained approach In this approach, a
document is stored in its entirety. - Even though it is similar to storing the
documents in flat files, this approach has the
advantage of allowing the documents to be
referred to within other structures in the
database. - In addition, it provides the security, recovery
and other features of a DBMS in which it is
stored. - Its particular usefulness is part of a hybrid
representation.
12Using a Relational Database to store XML data
- Coarse-grained approach
- create table cgrel_doc (
- doc_id NUMBER(8) NOT NULL,
- name VARCHAR2(128),
- body LONG
- )
- alter table cgrel_doc add primary key (doc_id)
13Using a Relational Database to store XML data
- Medium-grained approach
- The fine-grained approach works well to perform
tasks that access elements, whereas the tasks to
store and retrieve an entire document are
difficult. - The coarse-grained approach works well in
manipulating entire documents but has difficulty
with the element manipulation. - The medium-grained approach is a compromise
between the fine and coarse-grained approaches.
14Using a Relational Database to store XML data
- Medium-grained approach The document tree can be
sliced into sections where the sub-sections are
stored with a coarse-grained approach. This is
particularly useful if the sections are accessed
individually for example, in reference books
such as dictionaries, a medium grained approach
would be to store each entry separately.
15Using a Relational Database to store XML data
- Medium-grained approach
- Determining the slice points is a complex issue.
- How many slice-points are created?
- How many levels of slicing are created?
- Does the slicing depend upon the element type
name or the depth in the tree or the size of the
document section? - Are some sections of the document sliced more
finely than the other sections?
16Using a Relational Database to store XML data
- Medium-grained approach
- One way to approach the slicing granularity is to
view slicing as a method to index the database. - An index speeds up access for a particular
database request by creating an index table that
provides quick navigation of the indexed
information. - Slices ca be created on a element type name (s)
for which frequent access is anticipated. - A combination of element type names and attribute
values can also be used to drive the index slice
method. - Choosing slice points based on desired indexing
will reduce the data access time over the
coarse-grained approach for queries or other
accesses that involve the indexed tags. - Indexes can be created on a few highly requested
element types. - If most of the directly accessed element type
names are indexed then the access time for those
queries approaches the access time under the
fine-grained approach while also reducing the
document regeneration time.
17Using a Relational Database to store XML data
- Medium-grained approach
- Another way to approach the slicing granularity
is to view the slicing method as a buffering
mechanism. - Slices may be determined by physical
characteristics, such as size of the slice. - The slice size can be chosen to efficiently use
network communication protocols to reduce the
time needed to retrieve a portion of the document
when the network response is a critical factor. - Combinations of approaches may be used for
particular applications or documents.
18Using a Relational Database to store XML data
- Medium-grained approach
- One point to be addressed is how to represent the
slice points in the document. - One mechanism is to create a specific element
type to represent the necessary information,
ensuring that the element type name is unique in
the document. - For example, an element type called slice or
proxy could be created with attributes that
contain sufficient information to reconstruct the
document, namely document-id and element-id.
19Using a Relational Database to store XML data
- Medium-grained approach
- tables and constraints for medium-grained storage
- create table xdb_frag (
- doc_id NUMBER(8) NOT NULL,
- frag_id NUMBER(8) NOT NULL,
- ele_id NUMBER(8) NOT NULL,
- body LONG NOT NULL
- )
- alter table xdb_frag add primary key (doc_id,
frag_id) - create table xdb_frag_ref (
- doc_id NUMBER(8) NOT NULL,
- frag_id NUMBER(8) NOT NULL,
- ele_ref NUMBER(8) NOT NULL
- )
- alter table xdb_frag_ref add primary key (doc_id,
frag_id, ele_ref) - alter table xdb_frag add constraint
fk_xdb_frag_doc_id
20Relational Data Server
- A relational data server (combined with a web
server) makes data from a relational DBMS
available in XML. - It works by querying the database and formatting
the report from the RDBMS as XML. - The basic process consists of the following
steps - The user specifies a relational query to the web
browser as a URL.
21Relational Data Server
- The basic process consists of the following
steps - The user specifies a relational query to the web
browser as a URL. - The web browser sends the URL to the data server.
- The data server parses the URL request and
creates a SQL query. - The data server passes the SQL query to the
database server. - The database server executes the query.
- The database server returns the relational report
to the data server. - The data server formats the report as XML.
- The data server returns the XML report to the Web
browser. - The web browser parses the XML report and
displays it to the user. - When a stylesheet is used, it must be retrieved
and parsed by the web browser.
22Some Issues To be addressed
- Can relational views be displayed in addition to
tables? - Can data be updated? Or, is data read-only?
- Can data be retrieved from multiple tables?
- How are joins handled?
- How complex can the mapping from relations to XML
be? - Can helper tables or joining tables be recognized
and handled differently? - Can foreign key constraints be followed? To what
depth? - How are links between tables handled?
- How expressive is the query facility?
23- We will briefly examine a relational data server
(implemented in java) called rServe. - Ref http//www.xwdb.org/docs/XWDB_RServe_User_Gui
de.pdf - Some examples of usage
- http//127.0.0.1/servlets/com.xweave.xmldb.rserve.
XMLServlet?tablenamecustomerstylesheet/ss/gener
ic1.xsl - Which retrieves all the records from the customer
table, then formats the data using the stylesheet
http//127.0.0.1/ss/generic.xsl.
24Creating a SQL Query
- The relational data server uses the parameters of
the URL to build a SQL statement. - Example
- Select from tablename
- Much of the functionality of a SQL statement can
be encloded as a URL. - The where condition is given by pre-pending the
condition to the value in a name-value pair. - Example
- pnop1quantitylt20
25Formatting a Report as XML
- One way to format data as XML is to use the table
names and column names to create the element
types. - Information on primary keys and foreign keys can
be used to create a document hierarchy. - An issue to address about element naming is
- Special characters may be used in table and
column names that are not valid as xml element
names.
26Extracting Dictionary Data
- A commercial relational DBMS provides information
in its catalog (Data Dictionary) consisting of
system tables. - The system tables contain information about the
metadata primary keys, foreign keys and other
integrity constraints that are defined on the
data. - The data in a RDBMS can be mapped to XML as a
tree of parent/child relationships (represented
using primary and foreign keys).