Title: XML Data
1XML Data
- ltbookgt
- lttitlegt database systemslt/titlegt
- ltauthorgt John ltlastnamegt Korthlt/lastnamegtlt/autho
rgt - ltprice currency USDgt 5.87lt/pricegt
- lt/bookgt
- DTD
- lt!ELEMENT book (title, author, price)gt
- lt!ELEMENT title (PCDATA)gt
- lt!ELEMENT author(PCDATA)lastname)
2- lttrgt lttd width"20" valign"top"gt Firma
Karl-Heinz Rosowski lt/tdgt - lttd width"20" valign"top"gt Maikstraße 14 lt/tdgt
- lttd width"20" valign"top"gt 22041 Hamburg lt/tdgt
- lttd width"20" valign"top"gt 721 99 64 lt/tdgt
- lttd width"20" valign"top"gt 21110111 lt/tdgt
lt/trgt
HTML Version
- lt?xml version"1.0"?gt
- ltAddressesgt
- ltAddress id"12359"gt
- ltNamegtFirma Karl-Heinz Rosowskilt/Namegt
- ltStreetgtMaikstraße 14lt/Streetgt
- ltZIPgt22041lt/ZIPgt
- ltCitygtHamburglt/Citygt
- ltTelgt721 99 64lt/Telgt
- ltFaxgt21110111lt/Faxgt ltEmail/gt
- lt/Addressgt
- lt/Addressesgt
XML Version
3XML - Document - Continued
- lt?xml version"1.0"?gt is the XML declaration.
- ElementsMost common form of markup. ltelementgt
lt/elementgt. For example ltnamegtJack Lemon lt/namegt - Attributes are name-value pairs that occur
inside start-tags after the element name. For
example ltAddress id"12359"gt attaches value
12359 to attribute id of Address element. - Entity References to handle special characters
of XML like lt in the XML documents.
4- Comments lt!-- this is a comment --!gt
- CDATA Sections a CDATA (string of characters)
section instructs the parser to ignore most
markup characters. For example source code,
lt!CDATA p q b (I lt 3)gt, between
CDATA and all character data is passed to an
application, with out interpretation.
5XML - DTD - Element Type Declarations
- Element type declarations identify the names of
elements and the nature of their content. A
typical element type declaration looks like - lt!Element Address (Name, Street, ZIP?, City,
Tel, Fax, Email?)gt - Address is the element name, and (Name, Street,
ZIP?, City, Tel, Fax, Email?) is the content
model. Every address must contain, Name, Street,
City and Tel. ZIP and Email are optional, whereas
there can be zero or more Fax numbers.
6- The declarations for Name, Street, ZIP , must
also be given. For example - lt!Element Name (PCDATA)gt
- Attribute List Declarations identify which
elements may have attributes, what values the
attributes may hold, and what value is default.
Attribute values appear only within start-tags
and empty-element tags. - ltAddress id"12359"gt
7XML - Summary
- HTML describes presentation
- XML describes content
- XML vs. HTML
- users define new tags
- arbitrary nesting
- validation is possible
8XML and Semi Structural Data Model
- XML data is fundamentally different than
relational and object oriented data. - XML is not rigidly structured.
- In relational and OO data model every data
instance has a schema which is separate and
independent of the data. - XML data is self describing and can naturally
model irregularities that cannot be modeled by
relational or OO data model.
9- For example, data items may have missing elements
or multiple occurrences of the same element
elements may have atomic values in some data
items and structured values in others and
collections of elements can have heterogeneous
structure. - Even XML data that has an associated DTD is
self-describing (the schema is always stored
with the data) and, except for very restricted
forms of DTDs, may have all the irregularities
described above. - XML is an instance of semistructured data.
10XML-QL
- Regular path expression
- pattern matching
- used edge labeled graphs
- extract data from existing XML documents and
construct new XML documents - support for ordered and unordered views on XML
document - simple and declarative
11XML-QL
- The simplest XML-QL queries extract data from an
XML document. Consider the following DTD - lt!ELEMENT book (author,title,publisher)gt
- lt!ATTLIST Book year CDATAgt
- lt!ELEMENT article (author title year?,
(shortversion longversion))gt - lt!ATTLIST article type CDATAgt
- lt!ELEMENT publisher (name, address)gt
- lt!ELEMENT author (firstname?, lastname)gt
12XML-QL Example Data
ltbibgt ltbook year1995gt lttitlegt An
Introduction to DB Systems lt/titlegt ltauthorgt
ltlastnamegt Date lt/lastnamegtlt/authorgt ltpublishergt
ltnamegt Addison-Wesleylt/namegt lt/publishergt lt/bookgt
ltbook year1995gt lttitlegt Foundations for
OR Databases lt/titlegt ltauthorgt ltlastnamegt Date
lt/lastnamegtlt/authorgt ltauthorgt ltlastnamegt
Darwen lt/lastnamegtlt/authorgt ltpublishergtltnamegt
Addison-Wesleylt/namegt lt/publishergt lt/bookgt lt/bibgt
13Matching Data Using Patterns
- XML uses element patterns to match data in an XML
document. - Find all authors of books whose publisher is
Addison-Wesley in XML document www.a.b.c/bib.xml - WHERE ltbookgt
- ltpublishergtltnamegtAddison-Wesleylt/namegtlt/publishe
rgt - lttitlegt t lt/titlegt
- ltauthorgt a lt/authorgt
- lt/bookgt IN www.a.b.c/bib.xml
- CONSTRUCT a
- matches every ltbookgt element in the XML document
that has at least one lttitlegt element, one
ltauthorgt element , and one publisher element
whose ltnamegt is Addison-Wesley. For each such
match it binds t and a to every title and
author pair.
14XML-QL Constructing XML Data
- Often we would like format the result.
- Find all authors and titles of books whose
publisher is Addison-Wesley in XML document
www.a.b.c/bib.xml - WHERE ltbookgt
- ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
- lttitlegt t lt/titlegt
- ltauthorgt a lt/authorgt
- lt/bookgt IN www.a.b.c/bib.xml
- CONSTRUCT ltresultgt
- ltauthorgt a lt/gt
- lttitlegt t lt/gt
- lt/gt
15Constructing XML Data -cont.
Result of the query ltresultgt ltauthorgtltlastname
gt Date lt/lastnamegtlt/authorgt lttitlegt
Introduction to Database Systems
lt/titlegt lt/resultgt ltresultgt ltauthorgtltlastnamegt
Date lt/lastnamegtlt/authorgt lttitlegt Foundations
for OR Databases lt/titlegt lt/resultgt ltresultgt lt
authorgtltlastnamegt Darwen lt/lastnamegtlt/authorgt ltt
itlegt Foundations for OR Databases
lt/titlegt lt/resultgt One result for each author,
duplicating title information.
16XML-QL Nested Queries.
WHERE ltbookgt lttitlegt t lt/gt ltpublishergtltname
gtAddison-Wesleylt/gtlt/gt lt/gt CONTENT_AS p IN
www.a.b.c/bib.xml CONSTRUCT ltresultgt lttitle
gt t lt/gt WHERE ltauthorgt a lt/gt in
p CONSTRUCT ltauthorgt a lt/gt
lt/gt ltresultgt ltauthorgtltlastnamegt Date
lt/lastnamegtlt/authorgt lttitlegt Introduction to
Database Systems lt/titlegt lt/resultgt ltresultgt lt
authorgtltlastnamegt Date lt/lastnamegtlt/authorgt ltaut
horgtltlastnamegt Darwen lt/lastnamegtlt/authorgt lttitl
egt Foundations for OR Databases
lt/titlegt lt/resultgt
17XML-QL Join Queries
XML queries cab express joins by matching two
or more elements that contain same value. Find
all articles that have at least one author who
has written a book since 1995. WHERE ltarticlegt
ltauthorgt ltfirstnamegt f lt/gt //
firstname f ltlastnamegt l lt/gt //
lastname l lt/gt lt/gt CONTENT_AS a
IN "www.a.b.c/bib.xml" ltbook yearygt
ltauthorgt ltfirstnamegt f lt/gt //
join on same firstname f ltlastnamegt
l lt/gt // join on same lastname l lt/gt
lt/gt IN "www.a.b.c/bib.xml", y gt
1995 CONSTRUCT ltarticlegt a lt/gt
18XML-QL Data Model for XML
- XML graph G in which each node is represented by
a unique string called object identifier (OID),
Gs edges are labelled with element tags, Gs
nodes are labeled with sets of attribute value
pairs, Gs leaves are labeled with one string
value, and G has a distinguished node called
root. -
19XML-QL Data Model for XML
- The model allows several edges between the same
two nodes with the following restriction - between any two nodes there can be at most one
edge with a given label - a node cannot have two leaf children with the
same label and same string value - XML graphs are not only derived from XML
documents, but are also generated by queries.
20XML- Element Identity, Ids, and IDREFS
- For element sharing XML reserves an attribute of
type ID which allows a unique key to be
associated with an element. - An attribute of type IDREF allows an element to
refer to another element with the designated key,
and one of the type IDREFS may refer to multiple
elements.
21- lt!ATTLIST person ID REQUIREDgt
- lt!ATTLIST article author IDREFS IMPLIEDgt
- ltperson ID"o123"gt
- ltfirstnamegtJohnlt/firstnamegt
- ltlastnamegtSmithltlastnamegt
- lt/persongt
- ltperson ID"o234"gt
- . . .
- lt/persongt
- ltarticle author"o123 o234"gt
- lttitlegt ... lt/titlegt
- ltyeargt 1995 lt/yeargt
- lt/articlegt
22XML- Element Identity, Ids, and IDREFS
23The following query produces all lastname, title
pairs by joining the author element's IDREF
attribute value with the person element's ID
attribute value. WHERE ltarticle authorigt
lttitlegt lt/gt ELEMENT_AS t
lt/gt, ltperson IDigt
ltlastnamegt lt/gt ELEMENT_AS l
lt/gt CONSTRUCT ltresultgt t llt/gt The idiom
lttitlegtlt/gt ELEMENT_AS t binds t to a lttitlegt
element with arbitrary contents. The element
expression lttitle/gt matches a lttitlegt element
with empty contents.
24XML-QL- Advanced Examples
Tag Variables Regular Path Expressions Transformin
g XML Data (from one DTD to another) Integrating
Data from different XML sources Embedding queries
in data XML-QL check http//www3.org/TR/NOTE-xml
-ql