Title: XML: How can we use it as a Logical Data Model
1XML How can we use it as a Logical Data Model?
2Outline
- XML 101
- Database Design 101
- XML Schemas How to specify structures and
constraints. - Why use XML for database applications?
- ERex (Entity Relationship extended for XML)
conceptual model - Translating ERex schemas to XML schemas
- Conclusions and Future Work.
3What is XML?
- ltbookgt
- ltauthorgtJ. E. Hopcroftlt/authorgt
- ltauthorgtJ. D. Ullmanlt/authorgt
- ltpublisher nameAddison-Wesley/gt
- lt/bookgt
4XML for information exchange
5XML Publishing
XML
6XML for Data Modeling
Location ? location (_at_val, _at_time, GPS) GPS ?
gps (_at_satellite)
Location ? location (_at_val, _at_time,
Bstation) Bstation ? bstation (_at_id,
_at_sigStrength)
Location ? location (_at_val, _at_time, (GPS
Bstation))
7XML as a logical data model
Location ? location (_at_val, _at_time, (GPS
Bstation))
- Use data modeling features provided by XML
- Union types
- Recursive types
- Ordered relationships
- Easier to Query?
- Problems
- What is a good XML schema for an application?
- How do we store the data in relational databases?
8XML for data integration
9Database Design Stages
Application Requirements
Conceptual Design
Conceptual Schema
Logical Design
Logical Schema
Physical Design
Physical Schema
10Logical Data Model and Redundancy
11Specifying Structures for XML
- G (N, T, P, S)
- N Book, Author, Publisher, PCDATA
- T book, author, publisher, pcdata
- S Book
- Book ? book (Author , Publisher)
- Author ? author (PCDATA)
- Publisher ? publisher (_at_nameString)
- PCDATA ? pcdata (?)
Regular Tree Grammar Every production rule is of
the form A ? a X A ? N, a ? T, X is a regular
expression over N
12Specifying Constraints for XML
(Library, Person, lt_at_namegt) (Library, Book,
lt_at_ISBNgt) (Library, Paper, lt_at_titlegt) (Person,
Review, lt_at_articlegt)
_at_articleIDREF references (Book Paper)
13Why use XML as logical data model?
14Path Expressions vs Joins
Query Give names of students of professors of
age 40
?name ((?age40 (Professor)) ? Student)
professor _at_age40/student/_at_sname
15Union Types - attributes
Person ? person (_at_name, ((_at_city, _at_state) _at_zip))
16Union Types - Relationships
Person ? person (_at_name, _at_zip, (Book
Paper))
17Union Types - Relationships
Conference ? conference (_at_name, _at_venue,
Paper) Journal ? journal (_at_name, _at_publisher,
Paper)
18Recursive Types
Query What are subparts of bike?
WITH RECURSIVE SubPart (name) AS (SELECT name
FROM Assembly WHERE superPartbike)
UNION (SELECT R2.name FROM SubPart R1,
Assembly R2 WHERE R2.superPart R1.name) SELECT
FROM SubPart
part_at_namebike//part/_at_name
19IDREF vs Foreign Keys
_at_PRefIDREF references (Professor)
Query Give names of students of professors of
age 40
student_at_PRef?professor/_at_age40/_at_name
20IDREF as union of foreign keys
Book
Person
Paper
Review
21IDREF as union of foreign keys
_at_articleIDREF references (Book Paper)
22Conceptual Model ERex (ER extended for XML)
23From ER model
24From ER Model
25Ordered Relationships
26Categories and Set Constraints
PersonCity ? PersonZip ? PersonCity ? PersonZip
Person
27Categories
28Set constraints on Roles
personBook ? personPaper ?
29Set Constraints on Roles
confPaper ? journalPaper ? confPaper ?
journalPaper Paper
30Translating ERex schemas to XML schemas
31System Architecture
321n relationships
33Representing 1n Relationships - subelement
University ? university (Professor) Professor ?
professor (_at_name, _at_age, Student) Student ?
student (_at_name, _at_year) (University, Professor,
lt_at_namegt) (University, Student, lt_at_namegt)
34Representing 1n Relationships - IDREF
University ? university (Professor,
Student) Professor ? professor (_at_name, _at_age,
_at_PID) Student ? student (_at_name, _at_year,
_at_PRef) (University, Professor,
lt_at_namegt) (University, Student, lt_at_namegt) _at_PRefIDR
EF references (Professor)
35Representing 1n Relationships foreign keys
University ? university (Professor,
Student) Professor ? professor (_at_name,
_at_age) Student ? student (_at_name, _at_year,
_at_advisor) (University, Professor,
lt_at_namegt) (University, Student, lt_at_namegt) (Universit
y, Student, lt_at_advisorgt) references (University,
Professor, lt_at_namegt)
36mn relationships
37Representing mn relationships
Library ? library (Person, Book) Person ?
person (_at_name, Review) Book ? book (_at_ISBN,
_at_title, _at_BID) Review ? review (_at_article,
_at_rating) (Library, Person, lt_at_namegt) (Library,
Book, lt_at_ISBNgt) (Person, Review,
lt_at_articlegt) _at_articleIDREF references (Book)
38Representing mn relationships
Book
Person
Review
Query What is the rating given by RRM for book T1
?rating ((?titleT1 (Book)) ? (?pnameRRM
(Review)))
person_at_nameRRM/review_at_article?Book/titleT1/_at_
rating
39N-ary relationships
(Root, Company, lt_at_namegt) (Root, Product,
lt_at_namegt) (Root, City, lt_at_namegt) (Company, Supply,
lt_at_ProdRef, _at_CityRefgt) _at_ProdRefIDREF
references (Product) _at_CityRefIDREF references
(City)
Root ? root (Company, Product, City) Company ?
company (_at_name, Supply) Supply ? supply
(_at_ProdRef, _at_CityRef, _at_qty) Product ? product
(_at_name, _at_ProdID) City ? city (_at_name, _at_CityID)
40Recursive Relationships
Assembly ? assembly (Part) Part ? part (_at_name,
_at_qty, Part) (Assembly, Part, lt_at_namegt)
41Ordered Relationships
Library ? library (Person) Person ? person
(_at_name, _at_zip, Book) Book ? book (_at_ISBN, _at_title)
(Library, Person, lt_at_namegt) (Library, Book,
lt_at_ISBNgt)
42Categories and set constraints
PersonCity ? PersonZip ? PersonCity ? PersonZip
Person
Person ? person (_at_name, ((_at_city, _at_state) _at_zip))
43Categories and Set Constraints
Root ? root (Person, Book, Paper) Person ?
person (_at_name, Review) Review ? review
(_at_article, _at_rating) (Root, Person,
lt_at_namegt) (Person, Review, lt_at_articlegt) _at_articleID
REF references (Book Paper)
44Set constraints on Roles
personBook ? personPaper ?
Person ? person (_at_name, (Book Paper))
45Set Constraints on Roles
confPaper ? journalPaper ? confPaper ?
journalPaper Paper
Conference ? conference (_at_name, _at_venue,
Paper) Journal ? journal (_at_name, _at_publisher,
Paper)
46Converting ERex ? XML
- Goals
- Maximize relationships represented using
subelement. - Others try to represent using IDREF
47Algorithm ERex ? XML
- A non-terminal symbol for each
- entity type with key
- mn relationship
- n-ary relationship
- Root non-terminal symbol
- Represent attributes
- Represent relationships and identify top nodes
- 11 and 1n relationships
- mn relationships
- n-ary relationships
- Identify key and IDREF constraints.
48personBook ? personPaper ? PersonCity ?
PersonZip ? PersonCity ? PersonZip Person
49Person ? person (Book Paper) Person ? person
(Review) Review ? review (_at_rating, _at_article)
Root ? root (Person)
N Root, Person, Book, Paper, Review Book ?
book (_at_ISBN, _at_btitle, _at_year) Paper ? paper
(_at_ptitle, _at_year, _at_journal) Person ? person
(_at_name, ((_at_city,
_at_state) _at_zip))
(Root, Person, lt_at_namegt) (Root, Book,
lt_at_ISBNgt) (Root, Paper, lt_at_ptitlegt) (Person,
Review, lt_at_articlegt) _at_articleIDREF references
(Book Paper)
50Conclusions
- Obtained good XML Schema from ERex schemas
- XML for DB applications
- Single-type tree grammars
- Constraints specified on types
51Open Problems
- Publishing relational databases/storing XML as
relational - Formalizing operations for XML (XPath, XQuery,
XSLT) - Translation of XML operations to SQL
- XML used for text/document publishing
- Keyword Search in XML documents
- Storing data consisting of structured and
unstructured portions integrating relational and
XML stores.
52Thank You !
- URL http//www.cs.wpi.edu/mmani
- Email mmani_at_cs.wpi.edu
53XML Schema Language Proposals
- W3C DTD local tree grammar
- W3C XML Schema single type tree grammar
- ISO/OASIS RELAX NG full-fledged regular tree
grammar
54Properties of different Regular Tree Grammar
classes
- Expressiveness
- Regular tree grammar strictly more expressive
than single type tree grammar - Single type tree grammar strictly more expressive
than local tree grammar - Closure properties
- Regular tree grammar closed under union,
intersection and difference - Single type tree grammar/local tree grammar
closed only under intersection - Type assignment
- Type assignment can be ambiguous for regular tree
grammar. - Type assignment is unambiguous for local tree
grammar/single type tree grammar.
55Ambiguous Type Assignment
- G (N, T, P, S)
- N Book, Author1, Author2, Publisher, PCDATA
- T book, author, publisher, pcdata
- S Book
- Book ? book (Author1, Author2, Publisher)
- Author1 ? author (PCDATA)
- Author2 ? author (PCDATA)
- Publisher ? publisher (_at_nameString)
- PCDATA ? pcdata (?)
56Constraint Specification for XML why?
If we represent all relationships only by
hierarchies, then the logical model will have
redundancy.
- What constraint specification?
- Key, Foreign Key
- ID/IDREF
57Specifying Constraints for XML Example
58Specifying Constraints for XML
- Keys are specified using (rel, sel, field)
- rel is relative axis
- sel is selector axis
- field is a set of path expressions
- For any element that belongs to rel, sel will
give a set of elements. For this set of elements,
field is the key. - rel and sel can be types or path expressions
- Foreign keys are specified as (rel1, sel1,
field1) references (rel2, sel2, field2)
59Constraint Specification Proposals
- W3C XML Schema
- Relative axis type
- Selector axis path expression
- Keys for XML WWW10
- Relative axis path expression
- Selector axis path expression
- UCM WWW10
- No relative axis
- Selector axis type
60Our proposal
- Relative axis type
- Selector axis type
- IDREF and IDREFS identify target types