Title: XML,%20XML%20Schema,%20XPath%20and%20XQuery%20Query%20Languages
1XML, XML Schema, XPath and XQuery Query Languages
Slides collated from several sources, including
D. Suciu at Univ. of Washington
2XML Data
3XML
- W3C standard to complement HTML
- origins structured text SGML
- motivation
- HTML describes presentation
- XML describes content
-
- HTML e XML subset SGML
4From HTML to XML
HTML describes the presentation
5HTML
- lth1gt Bibliography lt/h1gt
- ltpgt ltigt Foundations of Databases lt/igt
- Abiteboul, Hull, Vianu
- ltbrgt Addison Wesley, 1995
- ltpgt ltigt Data on the Web lt/igt
- Abiteboul, Buneman, Suciu
- ltbrgt Morgan Kaufmann, 1999
6XML
- ltbibliographygt
- ltbookgt lttitlegt Foundations lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
- ltauthorgt Hull lt/authorgt
- ltauthorgt Vianu lt/authorgt
- ltpublishergt Addison Wesley
lt/publishergt - ltyeargt 1995 lt/yeargt
- lt/bookgt
-
- lt/bibliographygt
XML describes the content
7XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
8XML Attributes
- ltbook price 55 currency USDgt
- lttitlegt Foundations of Databases lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
-
- ltyeargt 1995 lt/yeargt
- lt/bookgt
attributes are alternative ways to represent data
9More XML Oids and References
- ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
- ltperson ido456gt ltnamegt Mary lt/namegt
- ltchildren
idrefo123 o555/gt - lt/persongt
- ltperson ido123 mothero456gtltnamegtJohnlt/namegt
- lt/persongt
oids and references in XML are just syntax
10So Far
- Differences between xml data versus relational
data ? - Data model?
- Typed?
- Homogeneity?
- Correctness?
- Usage/Purpose ?
11XML Data Model
- Numerous competing models
- Document Object Model (DOM)
- class hierarchy (node, element, attribute,)
- defines API to inspect/modify the document
- XML query data model (formal)
12XML Namespaces
- http//www.w3.org/TR/REC-xml-names
- name prefixlocalpart
ltbook xmlnsisbnwww.isbn-org.org/defgt
lttitlegt lt/titlegt ltnumbergt 15 lt/numbergt
ltisbnnumbergt . lt/isbnnumbergt lt/bookgt
13XML Namespaces
- syntactic ltnumbergt , ltisbnnumbergt
- semantic provide URL for shared schema
lttag xmlnsmystyle http//gt
ltmystyletitlegt
lt/mystyletitlegt ltmystylenumbergt
lt/taggt
14So Far
- What are namespaces good for ?
- Are they typically available for relational
databases?
15Schemas for XML
16DTD - Element Type Definitions
lt!ELEMENT paper (title,author, year,
(journalconference) )gt
17XML Schemas
- generalizes DTDs (SGML derivative)
- now, instead uses XML syntax
- two main documents structure and data types
- XML Schema more powerful but more complex
18XML Schema
- ltxsdelement namepaper typepapertype/gt
- ltxsdcomplexType namepapertypegt
- ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor
minOccurs0/gt - ltxsdelement nameyear/gt
- ltxsd choicegt lt xsdelement
namejournal/gt - ltxsdelement
nameconference/gt - lt/xsdchoicegt
- lt/xsdsequencegt
- lt/xsdcomplexType
- lt/xsdelementgt
DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
19So Far
- Differences between xml schema versus
relational schema ? - Purpose ? Do we need it ?
- Definition time?
- Strictness of typing ?
- Underlying model ?
20Elements versus Types in XML Schema
DTD lt!ELEMENT person (name, address) gt
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegtlt/xsdelementgt
ltxsdelement nameperson
typettt /gtltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegtlt/xsdco
mplexTypegt
21Elements versus Types in XML Schema
- Types
- Simple types (integers, strings, ...)
- Complex types (regular expressions, like in DTDs)
- Element-type-element alternation
- Root element has a complex type
- Complex type is a regular expression of elements
- Those elements have their complex types ...
- ...
- Leaves have simple types
22Local and Global Types in XML Schema
- Local type
- ltxsdelement namepersongt
define locally the persons type
lt/xsdelementgt - Global type ltxsdelement nameperson
typettt/gt ltxsdcomplexType nametttgt
define here the type ttt
lt/xsdcomplexTypegt
Global types can be reused in other elements
23Local v.s. Global Elements inXML Schema
- Local element
- ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
nameaddress type.../gt...
lt/xsdsequencegt lt/xsdcomplexTypegt - Global element ltxsdelement nameaddress
type.../gt ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
refaddress/gt ... lt/xsdsequencegt
lt/xsdcomplexTypegt
Global elements like in DTDs
24Regular Expressions in XML Schema
- Recall the element-type-element alternation
- ltxsdcomplexType name....gt
regular expression on
elements lt/xsdcomplexTypegt - Regular expressions
- ltxsdsequencegt A B C lt/...gt
- ltxsdchoicegt A B C lt/...gt
- ltxsdgroupgt A B C lt/...gt
- ltxsd... minOccurs0 maxOccursunboundedgt
..lt/...gt - ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
25Regular Expressions in XML Schema
- Regular expressions
- ltxsdsequencegt A B C lt/...gt
A B C - ltxsdchoicegt A B C lt/...gt
A B C - ltxsdgroupgt A B C lt/...gt
(A B C) - ltxsd... minOccurs0 maxOccursunboundedgt
..lt/...gt (...) - ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
(...)?
26Regular Expressions in XML Schema
- Recall the element-type-element alternation
- ltxsdcomplexType name....gt
regular expression on
elements lt/xsdcomplexTypegt - Regular expressions
- ltxsdsequencegt A B C lt/...gt
A B C - ltxsdchoicegt A B C lt/...gt
A B C - ltxsdgroupgt A B C lt/...gt
(A B C) - ltxsd... minOccurs0 maxOccursunboundedgt
..lt/...gt (...) - ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
(...)?
27Attributes in XML Schema
ltxsdelement namepaper typepapertype/gt ltxsd
complexType namepapertypegt
ltxsdsequencegt ltxsdelement
nametitle typexsdstring/gt . .
. . . . lt/xsdsequencegt ltxsdattribute
namelanguage" type"xsdNMTOKEN"
fixedEnglish"/gt lt/xsdcomplexTypegt
Attributes are associated with the type, not the
element Only to complex types more trouble if
we want to add attributes to simple types.
28Derived Types by Extensions
ltcomplexType name"Address"gt ltsequencegt
ltelement name"street" type"string"/gt
ltelement name"city"
type"string"/gt lt/sequencegt lt/complexTypegt
ltcomplexType name"USAddress"gt
ltcomplexContentgt ltextension base
"ipoAddress"gt ltsequencegt ltelement
name"state" type"ipoUSState"/gt
ltelement name"zip"
type"positiveInteger"/gt lt/sequencegt
lt/extensiongt lt/complexContentgt lt/complexTypegt
Corresponds to inheritance
29Key Constraints in XML
30Keys in XML Schema
XML
ltpurchaseReportgt ltregionsgt ltzip code"95819"gt
ltpart number"872-AA" quantity"1"/gt ltpart
number"926-AA" quantity"1"/gt ltpart
number"833-AA" quantity"1"/gt ltpart
number"455-BX" quantity"1"/gt lt/zipgt ltzip
code"63143"gt ltpart number"455-BX"
quantity"4"/gt lt/zipgt lt/regionsgt ltpartsgt
ltpart number"872-AA"gtLawnmowerlt/partgt ltpart
number"926-AA"gtBaby Monitorlt/partgt ltpart
number"833-AA"gtLapis Necklacelt/partgt ltpart
number"455-BX"gtSturdy Shelveslt/partgt
lt/partsgt lt/purchaseReportgt
XML Schema for Key
ltkey name"NumKey"gt ltselector
xpath"parts/part"/gt ltfield xpath"_at_number"/gt lt
/keygt
31Keys in XML Schema
ltkey namesomeDummyNameHere"gt ltselector
xpathp"/gt ltfield xpathp1"/gt ltfield
xpathp2"/gt . . . ltfield
xpathpk"/gt lt/keygt
Notes All XPath expressions start at the
element currently being defined The fields must
identify a single node.
32Keys in XML Schema
- Unique guarantees uniqueness
- Key guarantees uniqueness and existence
- All XPath expressions are restricted
- /a/b /a/c OK for selector
- //a/b//c OK for field
- Note better than DTDs ID mechanism
33Examples of Keys in XML Schema
ltkey name"fullName"gt ltselector
xpath".//person"/gt ltfield xpath"firstname"/gt
ltfield xpath"surname"/gt lt/keygt ltunique
name"nearlyID"gt ltselector xpath".//"/gt
ltfield xpath"_at_id"/gt lt/uniquegt
Note Must have single firstname, Single surname
34Foreign Keys in XML Schema
ltkeyref name"personRef" refer"fullName"gt
ltselector xpath".//personPointer"/gt ltfield
xpath"_at_first"/gt ltfield xpath"_at_last"/gt lt/keyrefgt
35So Far
- Differences between keys/foreign-keysin xml
versus relational model? - Purpose ?
- Underlying model ?
36XPath
37XPath
- Goal Permit access some nodes from document
- XPath main construct Axis navigation
- Navigation step axis node-test predicates
- Examples
- descendantnode()
- childauthor
- attributebooktitle XML
- XPath path consists of one or more navigation
steps, separated by / - Navigation step axis node-test predicates
- Examples
- /descendantnode()/childauthor
- /descendantnode()/childauthorparent/attribute
booktitle XML2
38XPath
- Goal Permit access some nodes from document
- XPath main construct Axis navigation
- Navigation step axis node-test predicates
- Examples
- descendantnode()
- childauthor
- attributebooktitle XML
39XPath
- XPath path consists of one or more navigation
steps, separated by / - Navigation step axis node-test predicates
- Examples
- /descendantnode() /childauthor
- /descendantnode() /childauthor parent
/attributebooktitle XML2 - XPath offers shortcuts
- no axis means child
- // º /descendant-or-selfnode()/
40XPath- Child Axis Navigation
- author is shorthand for childauthor.
- Examples
- aaa -- all the children nodes labeled aaa
- aaa/bbb -- all the bbb grandchildren of aaa
children - /bbb all the bbb grandchildren of any child
- Notes
- . -- the context node
- / -- the root node
41XPath- Child Axis Navigation
- author is shorthand for childauthor.
- Examples
- aaa -- all the children nodes labeled aaa (1,3)
- aaa/bbb -- all the bbb grandchildren of aaa
children (4) - /bbb all the bbb grandchildren of any child
(4,6) - Notes
- . -- the context node
- / -- the root node
42XPath- Child Axis Navigation
- /doc -- all doc children of the root
- ./aaa -- all aaa children of the context node
(equivalent to aaa) - text() -- all text children of context node
- node() -- all children of the context node
(includes text and attribute nodes) - .. -- parent of the context node
- .// -- the context node and all its descendants
- // -- the root node and all its descendants
- //text() -- all the text nodes in the document
43Predicates
- 2 -- the second child node of the context node
- chapter5 -- the fifth chapter child of context
node - last() -- the last child node of the context
node - chaptertitleintroduction -- the chapter
children of the context node that have one or
more title children whose string-value is
introduction (string-value is concatenation of
all text on descendant text nodes) - person.//firstname joe -- the person
children of the context node that have in their
descendants a firstname element with string-value
Joe
44Axis navigation
- So far, our expressions have moved us down by
moving to children nodes. - Exceptions are
- . stay where you are
- / go to the root
- // all descendants of the root
- .// all descendants of the context node
45Axis navigation
- XPath has several axes ancestor,
ancestor-or-self, attribute, child, descendant,
descendant-or-self, following, following-sibling,
namespace, parent, preceding, preceding-sibling,
self - Some of these describe single nodes
- self, parent
- Some describe sequences of nodes
- All others
46XPath Navigation Axes
ancestor
following-sibling
preceding-sibling
self
child
attribute
following
preceding
namespace
descendant
47XPath Abbreviated Syntax
(nothing) child _at_ attribute // /descendan
t-or-selfnode() . selfnode() .// descendan
t-or-selfnode .. parentnode() / (document
root)
48XPath
- Widely adopted -- in XML-Schema and in many query
languages. - About as expressive as regular path expressions
49So Far
- Differences between SQL and XPATH?
- What are similar query capabilities?
- What features does SQL have, but not XPATH?
- What features does XPATH support, but not SQL?
- Is XPath a full-fledged query language?
50Query Languages - XQuery
51Summary of XQuery
- FLWR expressions
- FOR and LET expressions
- Collections and sorting
- Resources
- XQuery A Query Language for XML Chamberlin,
Florescu, et al. - W3C recommendation www.w3.org/TR/xquery/
52XQuery
- Designed based on Quilt (which is based on
XML-QL) - http//www.w3.org/TR/xquery/2/2001
- XML Query data model (ordered)
53FLWR (Flower) Expressions
- FOR ... LET... FOR... LET...
- WHERE...
- RETURN...
54XQuery
- Find the titles of all books published after 1995
FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
How does result look like?
55XQuery
- Find all book titles published after 1995
FOR x IN document("bib.xml")/bib/book WHERE
x/year gt 1995 RETURN x/title
Result lttitlegt abc lt/titlegt lttitlegt def
lt/titlegt lttitlegt ghi lt/titlegt
56XQuery Example
FOR a IN (document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN ltresultgt
a, FOR t IN
/bib/bookauthora/title
RETURN t lt/resultgt
57XQuery Example
For each author of a book by Morgan Kaufmann,
list all books she published
FOR a IN (document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN ltresultgt
a, FOR t IN
/bib/bookauthora/title
RETURN t lt/resultgt
What is query result ?
58XQuery
- Result
- ltresultgt
- ltauthorgtJoneslt/authorgt
- lttitlegt abc lt/titlegt
- lttitlegt def lt/titlegt
- lt/resultgt
- ltresultgt
- ltauthorgtJoneslt/authorgt
- lttitlegt abc lt/titlegt
- lttitlegt def lt/titlegt
- lt/resultgt
- ltresultgt
- ltauthorgt Smith lt/authorgt
- lttitlegt ghi lt/titlegt
- lt/resultgt
59XQuery Example Duplicates
- For each author of a book by Morgan Kaufmann,
list all books she published
FOR a IN distinct(document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN ltresultgt
a, FOR t IN
/bib/bookauthora/title
RETURN t lt/resultgt
distinct a function that eliminates duplicates
60Example XQuery Result
- Result
- ltresultgt
- ltauthorgtJoneslt/authorgt
- lttitlegt abc lt/titlegt
- lttitlegt def lt/titlegt
- lt/resultgt
- ltresultgt
- ltauthorgt Smith lt/authorgt
- lttitlegt ghi lt/titlegt
- lt/resultgt
61XQuery
- FOR x in expr
- binds x to each element in the list expr
- Useful for iteration over some input list
- LET x expr
- binds x to the entire list expr
- Useful for common subexpressions and for grouping
and aggregations
62XQuery with LET Clause
ltbig_publishersgt FOR p IN
distinct(document("bib.xml")//publisher)
LET b document("bib.xml")/bookpublisher
p WHERE count(b) gt 100 RETURN
p lt/big_publishersgt
count a (aggregate) function that returns
number of elements
63XQuery
- Find books whose price is larger than average
LET a avg(document("bib.xml")/bib/book/_at_price)
FOR b in document("bib.xml")/bib/book WHERE
b/_at_price gt a RETURN b
64FOR versus LET
- FOR
- Binds node variables ? iteration
- LET
- Binds collection variables ? one value
65FOR v.s. LET
Returns ltresultgt ltbookgt...lt/bookgtlt/resultgt
ltresultgt ltbookgt...lt/bookgtlt/resultgt ltresultgt
ltbookgt...lt/bookgtlt/resultgt ...
FOR x IN document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
Returns ltresultgt ltbookgt...lt/bookgt
ltbookgt...lt/bookgt
ltbookgt...lt/bookgt ... lt/resultgt
LET x document("bib.xml")/bib/book RETURN
ltresultgt x lt/resultgt
66Collections in XQuery
- Ordered and unordered collections
- /bib/book/author an ordered collection
- distinct(/bib/book/author) an unordered
collection - LET a /bib/book ? a is a collection
- b/author ? a collection (several authors...)
Returns ltresultgt ltauthorgt...lt/authorgt
ltauthorgt...lt/authorgt
ltauthorgt...lt/authorgt
... lt/resultgt
RETURN ltresultgt b/author lt/resultgt
67XQuery Summary
- FOR-LET-WHERE-RETURN FLWR
FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
RETURN Clause
Instances of XQuery data model
68XQuery
69Sorting in XQuery
ltpublisher_listgt FOR p IN distinct(document("
bib.xml")//publisher) RETURN ltpublishergt
ltnamegt p/text() lt/namegt ,
FOR b IN document("bib.xml")//bookpublisher
p RETURN ltbookgt
b/title ,
b/_at_price
lt/bookgt SORTBY (price DESCENDING)
lt/publishergt SORTBY (name)
lt/publisher_listgt
70Sorting in XQuery
- Sorting arguments refer to name space of RETURN
clause, not of FOR clause - TIP To sort on an element you dont want to
display, first return it, then remove it with an
additional query.
71If-Then-Else
FOR h IN //holding RETURN ltholdinggt
h/title, IF
h/_at_type "Journal"
THEN h/editor ELSE
h/author lt/holdinggt SORTBY
(title)
72Existential Quantifiers
FOR b IN //book WHERE SOME p IN b//para
SATISFIES contains(p, "sailing") AND
contains(p, "windsurfing") RETURN b/title
73Universal Quantifiers
FOR b IN //book WHERE EVERY p IN b//para
SATISFIES contains(p, "sailing") RETURN
b/title
74So Far
- Similarities between SQL and XQuery?
- Differences between SQL and XQuery?
75XML, XML Data ModelXML Schema, XPath XQuery