Title: An Extension to XML Schema for Structured Data Processing
1An Extension to XML Schema for Structured Data
Processing
- Presented by Jacky Ma
- Date 10 April 2002
2Presentation Outline
- The Problems
- Research Objectives
- The Schema Extension MMX
- MMX Query System
- Discussion
- Conclusion
3The Problems
- Mapping XML data into relational tables
- Not natural to XML structure
- Efficient, but may not be a effective method
- Legacy application-specific structured data
- Similar modeling but proprietary implementation
- Not interoperable, and difficult to maintain
- Lack of modular design and thus difficult to
combine to form more complex data structure - Meta-data can facilitate wide range of needs,
while XML Schema is solely used for physical data
validation nowadays
4Research Objectives
- To facilitate more effective searching and
storing of XML contents by making use of
meta-data (XML Schema) - Propose a data-oriented model to allow different
storage mechanism, processing model, and query
model on XML contents
5Our Approach MMX
- Use meta-data to map XML data into structured
data objects - Define the structured data models conceptually
and link the models to XML document structure
syntactically - Meta-data is defined as an extension of XML
Schema - The extension is called MMX (Multi Model XML)
6Program Driven vs. Data Driven
Information for processing is hard-coded in
program
Program Driven
MMX!
Data Driven
Processing instruction is hard-coded in data?!
7A Glance of XML Data
8A Glance of The Linked Schema
9Schema Extension
- The extended schema is associated with a
namespace - The extended schema goes within a schema element,
like lttreeelementgt in the example - lttreeelementgt specify a single structure object
instance - Name association for elements and attributes
- Class hierarchies
- lttreeelementgt -gt lttreeinternalgt -gt
lttreeleafNodegt - finally to the structure specified in
lttreeleafNodeValuegt - Additional properties in ltrootNodeAttrgt,
ltinternalNodeAttrgt and ltleafNodeAttrgt - Schema writer has to know the structure model
specification, while the XML writer only needs to
know the given schema
10Modeling
- For an instance of MMX data object
- As an encapsulated information object only
accessible from the root, thus as a single tree
node - As a mapping from root node, query method and
query parameters to the value at leaf nodes - Leaf nodes may contain any valid XML content, as
long as defined in the Schema - I.e. may contain another MMX data object
- A query is modeled as a 3-dimension tuple
- accessing-node, query-method, query-parameters
- Accessing-node is specified by XPath
- Query-method is specified in String Value
- Query-parameters is multi-dimension depends on
the current model
11Modeling (2)
A
Tree(1) is accessible frompoint A, occasionally,
a query (e.g. A, spatial-search,(3, 5),
assuming Tree(1) will accept spatial-search
with two coordinates) may return point B as
answer, either by XPath of B or the XML subtree
of B. From this point B, user may drill down
the tree by issueanother query on Tree(2).
Tree (1)
B
Tree(2)
XML Elements..
12Query with and without MMX
- From the original XML data, we could not assume
the semantics of the data - We can ONLY do XML-based query such as XPath
- We can do the spatial query ONLY IF we can map
the data into a R-Tree - After mapping the data into R-Tree
- Spatial Queries
- Give me the point at (2,7)
- Give me the point nearest to (4,4)
- Nearest Neighbor Search
- Give me the point nearest to Franklin
13Processing
- Users might not know the type of the node (and
not necessary to know). They are interested in
what they can do - Users retrieved the list of possible operation by
issuing a LIST-OPERATION method to the root
element of a MMX object - Possible operations may include queries, updates,
and other model-specific operations
14MMX Query System
- To show that the schema, modeling, and processing
of MMX extension is workable - To illustrate how it assists in querying XML data
- To facilitate as the platform for testing the
implementation of arbitrary structured models - Implement with JDK1.4
15System Design
Clients
XML
DOM
MMXDocument
Node Data
Schema
MMX Element
ParseSchema
FetchClasses
AbstractMMX Element
The Abstract Class defines common interface that
have to be implement in each MMX Element such as
LIST-OPERATION, QUERY, BUILD, etc.
Extends class
(Partly)Defines
VP-Tree
X-Tree
R-Tree
R-TreeSchema
Maps
16Discussions - Pros
- Compatible with the relational approach, and
supersedes that. - Modular design promotes reusability and
maintainability - XML flatten the legacy structured data to make
them text-editable, easy to transport and process
by different systems
17Discussion - Cons
- There is no generic syntax to precisely describe
all kinds of structures models - The size of XML file is often larger than legacy
data file - Each structure model needs additional
implementation effort - Schema specification become longer and longer
quickly as number of supported model increases
18Conclusion
- Propose a representation to encapsulate data
structures - Describe XML data with the Schema conceptually as
well as syntactically - Map legacy structure models into Schema, and map
XML data to the structure models by the Schema - Structured data repository with increased
interoperability, reusability, and
transportability
19QA