Introduction to XML - PowerPoint PPT Presentation

1 / 110
About This Presentation
Title:

Introduction to XML

Description:

imdb show year='1993' title Fugitive, The /title review suntimes ... imdb Exactly one title. As many reviews as needed after title. Box office. or ... – PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 111
Provided by: yic3
Category:
Tags: xml | imdb | introduction

less

Transcript and Presenter's Notes

Title: Introduction to XML


1
Introduction to XML
  • Yi Chen

2
Acknowledgements
  • The slides on XML introduction are built from
    slides of many people
  • Peter Buneman, Susan Davidson, Zack Ives, Wang
    Chiew Tan, Mary Fernandez, Michael Benedikt,
    Juliana Freire, Arnaud Sahuguet, Daniela
    Florescu, Donald Kossmann

3
Why XML
  • HTML standard format to display unstructured web
    data for human reading
  • XML is the confluence of two factors
  • The Web needed a more declarative format for
    data, trying to describe the meaning of the data
    using extended tags
  • Database people needed a more flexible data
    interchange format
  • XML covers the continuous spectrum from
    unstructured documents to structured data
  • XML universal format to represent
    (semi-)structured web data for machine
    processing.
  • interoperability
  • self-describing
  • extensibility

4
Where does XML data come from?
  • Published by relational databases
  • Produced by information extraction/text mining
    over unstructured data
  • Obtained by integrating data from different
    sources (which can have different schemas)
  • Data whose schema can change over time
  • Obtained by calling web services (amazon, google,
    weather.com .)
  • XML is the format of the web data that is
    processed by programs

5
XML Extensible Markup Language
  • http//www.w3.org/XML/
  • XML 1.0, W3C Recommendation Feb '98
  • XML is just a syntax,
  • But a standardized, extensible syntax
  • Compared with HTML, XML allows specification of
    new dialects by inventing tags.

6
An XML Document Example
  • Fugitive, The
  • Roger
    Ebert gives two thumbs
  • up! A fun action
    movie, Harrison Ford at his best.
  • The standard hollywood
    summer movie strikes back.
  • 183,752,965
  • X Files,The
  • 4

Mixed Content
  • Two basic components
  • Tags / structure / meta data / markup
  • Text / value / data

Attribute
7
Tags
  • Describes the meaning of the text.
  • Tags come in pairs start tags and end tags.
  • E. g . ...
  • They must be properly nested
  • ... ... ---
    good
  • ... ... --- bad
  • The region between start tag and end tag defines
    an element.

8
Structure of XML Data
  • Nesting tags can be used to express various
    structures.
  • I.e. Elements are nested.
  • E.g.

Fugitive, The
9
Structure of XML Data (cont.)
  • We can represent a list by using the same
  • tag repeatedly
  • Order matters!

... ... ...
...
10
Attributes
  • We can have attributes with name and value pairs
    within a start tag.
  • E.g.
  • Alternatively, we can represent the information
    as a nested element (instead of an attribute)
  • E.g. 1993
  • Differences between elements and attributes
  • Elements are ordered, attributes are not
  • For an element, each attribute has a distinct
    name
  • Which one to choose?

11
Text and Mixed Content
  • XML has only one basic type -- text.
  • It is bounded by tags e.g.
  • The Big Sleep
  • XML text is called PCDATA (for parsed character
    data).
  • Can be mixed with other subelements --- Mixed
    Content
  • Roger Ebert gives
    two thumbs
  • up! A fun action movie, Harrison
    Ford at his best.
  • Mixed content is very useful for document data
  • People speak in sentences. XML can preserve the
    structure of natural language, while adding
    semantic markup that can be interpreted by
    machines.

12
Continuous spectrum between text, semi-structured
data, and structured data
  • Roger Ebert gives two thumbs up! The Fugitive is
    a fun action movie, Harrison Ford at his best.
  • two thumbs up!
    The Fugitive is a fun action movie, Harrison Ford
    at his best.
  • Roger Ebert
  • two thumbs up
  • The Fugitive
  • action
  • Harrison Ford

13
Representing XML Data as Trees
  • Fugitive, The
  • Roger
    Ebert gives two thumbs
    up! A fun action movie,
    Harrison Ford at his best.
  • The standard hollywood
    summer movie strikes back.

14
XML and Relational Data (I)
  • XML easily encodes relations
  • movie relation

XML Fugitive,
The 1993
Andrew Davis

Are they equivalent?
What about order?
15
XML and Relational Data (II)
  • movie relation
  • actor relation

Are there other XML formats to encode these two
relations?
16
XML and Relational Data (III)
What about Fugitive,
The 1993
Andrew Davis
Harrison Ford Tommy
Lee Jones
  • movie relation
  • actor relation

17
XML and Relational Data (IV)
  • Relational data
  • Killer application banking industry
  • Invented as a mathematically clean abstract data
    model
  • Philosophy schema first, then data
  • Strict rules for data normalization, flat tables
  • Order is irrelevant, textual data supported but
    not primary goal
  • XML
  • First killer application publishing industry
  • Invented as a syntax for data, only later an
    abstract data model
  • Philosophy data can exist with or without
    schema, or with multiple schemas
  • No data normalization, flexibility is a must
  • Order may be very important, textual data support
    a primary goal

18
Sources of XML data
  • Inter-application communication data (e.g. Web
    Services)
  • Mobile devices communication data
  • Logs
  • Web syndication (RSS)
  • Metadata (e.g. Schema, WSDL, XMP)
  • Presentation data (e.g. XHTML)
  • Documents (e.g. Word)
  • Views of other sources of data
  • e.g. Relational, LDAP(Lightweight Directory
    Access Protocol) , CSV (comma-separated values),
    Excel, etc.
  • Sensor data

19
XML Dialects in Vertical Application Domains
Basically everywhere!
  • HealthCare Level Seven (HL7)
  • Geography Markup Language (GML)
  • Systems Biology Markup Language (SBML)
  • Digital photography metadata (XMP)
  • Extensible Financial Reporting Markup Language
    (XFRML)
  • MusicXML,
  • Spacecraft Markup Language (SML)
  • Bank Internet Payment System (BIPS),
  • Bioinformatic Sequence Markup Language (BSML),
  • Chemical Markup Language (CML),
  • Electronic Business XML Initiative (ebXML),
  • FinXML, Financial Information eXchange protocol
    (FIX),
  • Scalable Vector Graphics (SVG),
  • Real Estate Listing Markup Language (RELML), . .
    .
  • More at http//xml.coverpages.org/gen-apps.html
  • http//www.xml.org/xml/industry_industrysectors.js
    p

20
RSS
  • RSS 2.0 (Really Simple Syndication) web feed
    formats used to publish frequently updated
    digital content, such as blogs, news feeds.
  • RSS delivers its information as an XML file
    called an "RSS feed", "webfeed", "RSS stream", or
    "RSS channel".
  • RSS Users 'subscribes' to a feed by supplying to
    their reader a link to the feed
  • Feed readers (or aggregators) client softwares
    that regularly check a list of feeds on behalf of
    a user, pull and display any updated content that
    they find.
  • RSS readers/ feed readers / feed aggregators /
    news readers / search aggregators
  • Google Reader, My Yahoo!, Bloglines, web
    browsers

21
XForms The next generation of web forms
  • http//www.w3.org/TR/xforms/
  • Benefits
  • device-neutral
  • platform-independent
  • excellent XML integration can create and be
    created from XML documents
  • Provide common features (e.g. validation using
    XML schema and query languages) without scripting

22
Microsoft Office in XML
  • Office 2003 was able to import/export all
    documents into XML
  • Office 2007 models the documents NATIVELY in XML
    (Microsoft Office Open XML, i.e. OOXML)
  • Examples of vocabularies and schemas
  • WordprocessingML (the XML file format for Word
    2003), SpreadsheetML (Excel 2003), and
    DataDiagramingML (Visio 2003)

23
Web Services
  • Web service a software system designed to
    support interoperable machine to machine
    interaction over a network.
  • Web service protocol stack
  • Service transport HTTP, SMTP, ..
  • XML messaging SOAP(Simple Object Access
    Protocol)
  • Service description WSDL(web service descripton
    language), an XML based language that provides a
    model for describing web services
  • Service discovery UDDI (universal description,
    discovery and integration), an XML based registry
    for business to list themselves on the Internet

24
XML Isnt Enough on Its Own
  • Its is just a data format, but we care about
    data!
  • How to design XML format for a given domain?
  • Sometimes more structure about the data is
    helpful.
  • How can we know when we are getting garbage?
  • How can we query the data?
  • How can we understand what we got?

25
XML Standards Landscape
  • Schema languages
  • DTDs http//www.w3schools.com/dtd/default.asp
  • XML Schemas http//www.w3.org/XML/Schema
  • Programming APIs
  • DOM (Document Object Model) http//www.w3.org/DOM
    /
  • SAX (Simple API for XML) http//www.saxproject.or
    g/
  • JAXP (Java API for XML Processing)
    http//java.sun.com/webservices/jaxp/
  • Query languages
  • XPath http//www.w3.org/TR/xpath
  • XQuery http//www.w3.org/TR/xquery/
  • XSLT http//www.w3.org/TR/xslt
  • Standard organizations
  • W3C (the World Wide Web Consortium)
  • OASIS (Organization for the Advancement of
    Structured Information Standards)

26
Comparison with RDBMS
  • XML documents
  • DTD, XML Schema (optional)
  • DOM and SAX API
  • XPath, XQuery, XSLT

Relational databases Relational
Schema (required) JDBC / ODBC SQL
27
XML Schema
  • Schema a model of the data
  • Structural definitions
  • Type definitions
  • Defaults
  • Useful for
  • validating data
  • facilitate the writing of applications that
    process data
  • facilitate the writing of queries
  • designing storage and query evaluation strategies
  • defining prior agreements between parties for
    data exchange
  • mapping to programming languages (e.g. Java, C)

28
DTD Document Type Descriptors
  • Part of the original XML 1.0 specification
  • Describe the grammar of the XML file
  • Element declarations how elements are allowed to
    nest within each other by rules
  • Attributes lists describe what attributes are
    allowed on which element
  • Some constraints on the value of elements and
    attributes
  • the root element of the XML tree

29
Sample XML Data

Fugitive, The

Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.

The standard
hollywood summer movie strikes back.
183,752,965x_office year1994 X
Files,The
4
Exactly one title
As many reviews as needed after title
Box office or seasons info
30
Specifying the Structure
  • title to specify a title
    element
  • director? to specify an
    optional (0 or 1) director elements
  • review to specify 0 or more
    review
  • title, review to specify a title
    followed by 1 or
  • more
    review
  • box_office seasons to specify a box_office
    or a seasons element

31
Specifying the Structure (cont)
  • So the whole structure of a movie element is
    specified by
  • (seasonbox_office))
  • This is known as a regular expression
  • PCDATA only textual content allowed

32
Regular Expressions
  • Each regular expression determines a
    corresponding finite state automaton
  • Lets start with a simpler example
  • title, review
  • This suggests a simple parsing program to
    validate XML data

review
title
33
A More Complicated Example
  • title,review,(box_offcie seasons),actor

34
Defining the attribute lists
  • attribute-type default-value
  • attribute-type CDATA, ....
  • default value
  • value the default value of the attribute
  • REQUIRED the attribute value must be included
  • IMPLIED attribute is optional
  • FIXED value the attribute value is fixed.
  • E.g.

35
The DTD for the Sample XML Data
  • DTD
  • seasons))
  • Indicating DTD in an XML file

36
DTDs Arent Enough Sometimes
  • DTDs capture grammatical structure, but have
    some drawbacks
  • Not themselves in XML inconvenient to build
    tools for them
  • No built-in data types and domains
  • Limited abilities to specify constraints on
    values
  • No way of defining OO-like inheritance

37
XML Schema of the Data
XML namespace, specified by a URI
  • ema"
  • typexsstring/
  • minoccurs1 maxoccursunbounded /
  • namebox_office typexsinteger/
  • nameseasons typexsinteger/
  • useoptional/

A user-defined (unnamed) complex data type
By default, minoccurs1 maxoccurs1 type
xsdanyType
The value of use can be optional or
required. We can also have attribute fixed,
default
38
Using XML Schema for Type Inheritance
  • Suppose that we want to differentiate two types
    of shows movie and tv_show. Both have
    title, review subelements and year
    attributes. Movies have box_office, tv_shows have
    seasons. An XML document records movies and TV
    shows.
  • To do this, we need to
  • name show type explicitly (since this type is
    used more than once in the schema)
  • create two new data types, derived from show by
    extension.
  • Note that XML schema also allows derivation by
    restriction

39
  • ema"
  • typexsstring/
  • mixedtrue minoccurs1 maxoccursunbounded
    /
  • typexsinteger useoptional/
  • namebox_office" type"xsinteger"/


We can also write the schema without naming
Movie and TV_show types
40
DTD vs XML Schema
  • XML Schema is more expressive
  • It defines data type information
  • Simple types and complex types
  • Built-in types and user-defined types
  • Can specify the cardinality of an element within
    its parent element
  • Can specify expressive value constraints, such as
    keys, and foreign keys (more on this later!)
  • XML Schema is an also XML file!

41
Well-Formed XML
  • Well-formed applies to any XML document (with or
    without a DTD)
  • All open-tags have matching close-tags, or a
    special case
  • is a shortcut for
  • Attributes (which are unordered, in contrast to
    elements) only appear once within an element
  • Theres a single root element
  • XML is case-sensitive

42
Valid XML
  • Valid specifies that the document conforms to the
    DTD or XML Schema.
  • conforms to the grammar
  • the types of attributes are correct
  • constraints on references are satisfied

43
Summary
  • As a data format, the main virtues of XML are its
    widespread acceptance and the (important) ability
    to handle semi-structured data (data without
    schema)
  • Problems remain
  • How to store large XML documents?
  • How to query them?
  • How to map XML from/to other data representation
    formats?

44
XML Standard Landscape
  • Schema languages
  • DTDs
  • XML Schemas
  • Programming APIs
  • DOM
  • SAX
  • Query languages
  • XPath
  • XQuery
  • XSLT

45
Programming APIs
DOM Instance or SAX events
XML document
Applications
XML Parser (DOM, SAX)
  • DOM
  • http//www.w3.org/DOM/
  • http//www.w3schools.com/dom/
  • SAX
  • http//www.saxproject.org/

46
DOM Document Object Model
  • Build a tree data structure (in memory)
  • Provide accesses to nodes in a tree.
  • Level 1. Functionality for XML document
    navigation and manipulation.
  • Level 2. Stylesheet and namespaces
  • Level 3. Document loading and saving DTDs and
    schemas

47
DOM Interfaces
  • Interface Document
  • Method createElement getElementsByTagName...
  • Interface Node
  • Attribute parentNode, childNodes, firstChild,
    nextSibling, attributes...
  • Method appendChild removeChild...
  • Interface Element
  • Method getAttributeNode, removeAttributeNode.

48
DOM an Example
49
Navigating DOM trees
  • What does this code fragment do?
  • var xgetElementsByTagName(title')
  • for (i0ildNodes0.nodeValue)

What if we are only interested in titles of
shows?
50
SAX Simple API for XML
  • http//www.saxproject.org/
  • Event based
  • Instead of reading the entire file in memory and
    building a tree, SAX reads a stream of tokens and
    triggers events, e.g.,
  • startDocument
  • startElement
  • endElement
  • endDocuments
  • Characters
  • Applications write handlers for events.
  • Supports document order access to data
  • Read-only access, No update-in-place

51
SAX an example
  • startElement(imdb, null)
  • startElement(show, (year, 1993))
  • startElement(title,)
  • characters(Fugitive, The)
  • endElement(title)
  • startElement(review, null)
  • startElement(suntimes, null)
  • startElement(reviewer)
  • characters(Roger
    Ebert)
  • endElement(reviewer)
  • characters( gives )
    ...
  • startElement(rating,
    null)
  • characters(two thumbs
    up)
  • endElement(rating)
  • characters(! A fun movie)
  • endElement(suntimes)
  • endElement(review) .


Fugitive, The

Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.

..
How can you find all show titles using a SAX
parser?
52
DOM vs SAX
  • DOM
  • XML represents a tree model and DOM is very
    natural to understand.
  • Supports navigation to document
  • Enable dynamic update, add, and delete to
    document content
  • SAX
  • Lightweight
  • Good for applications that read large XML
    documents once.
  • E.g. filter stock quotes, network alerts,
    load XML documents into storage systems

53
Link to XML parsers download site
  • http//xml.apache.org/
  • Xerces projects implement DOM and SAX parser in
    Java, C, Perl.
  • http//xml.apache.org/xerces2-j/
  • http//xml.apache.org/xerces-c/
  • http//xml.apache.org/xerces-p/

54
XML Standards Landscape
  • Schema languages
  • DTDs
  • XML Schemas
  • Programming APIs
  • DOM
  • SAX
  • Query languages
  • XPath
  • XQuery
  • XSLT

55
Common Querying Tasks
  • Filter, select XML values
  • Navigation, selection, extraction
  • Merge, integrate values from multiple XML sources
  • Joins, aggregation
  • Transform XML values from one schema to another
  • XML construction

56
XML Query Languages
  • XPath
  • httpwww.w3.org/TR/xpath
  • Language for navigation, selection, extraction
  • Used in XSLT, XQuery, XPointer, Xlink, XML
    Schema, et al
  • XQuery
  • http//www.w3.org/TR/xquery/
  • Strongly-typed query language
  • Additional join, transformation and construction
    ability
  • XSLT
  • http//www.w3.org/TR/xslt
  • Transform XML to XML, HTML, Text

57
A Simple XPath Query
  • In its simplest form, an XPath is like a path in
    a file system /imdb/show/review

imdb
imdb
show
show
title
review
review
_at_year
review
Fugitive, The
1993
XPath Query
suntimes
nyt
...
rating
reviewer
Roger Ebert
gives
XML Data
58
XPath Basics
  • The XPath returns a set of XML nodes (and their
    subtrees) selected by the path
  • XPaths can have node tests at the end, returning
    only particular node types
  • e.g., text(), returns the PCDATA associated
    with a node and,
  • element(), attribute(), etc.
  • XPath is fundamentally an ordered language it
    can query in order-aware fashion, and it returns
    nodes in order

59
Wildcards
  • In fact, besides use tag/attribute names, we may
    use as a wildcard in queries.
  • E.g. /imdb//review

60
XPath Axes
  • In the previous XPath query, / matches a child
    edge in XML tree, i.e. go down the XML tree one
    step
  • XPath have axes to specify more expressive
    navigations, so we can go up, left, right, for
    multiple steps.
  • self
  • child (/) parent
  • descendant (//) ancestor
  • descendant-or-self ancestor-or-self
  • preceding-sibling following-sibling
  • preceding following

61
Another Sample XPath Query
imdb
child
imdb
show
descendant
title
following-sibling
review
review
title
_at_year
review
Fugitive, The
1993
XPath Query /childimdb/descendant title/follo
wing-siblingreview
suntimes
nyt
...
rating
reviewer
Roger Ebert
gives
XML Data
62
Context Nodes and Relative Paths
  • XPath has a notion of a context node
  • analogous to a current directory in a file
    system
  • . represents this context node
  • .. represents the parent node, so we can
    express relative paths
  • E.g. .///../..
  • By default, the document root is the context node
    at the start of an XPath evaluation

63
Predicates
  • A predicate allows us to filter the node set
    based on selection-like conditions over
    sub-XPaths
  • A predicate tests existence of a path.
  • A predicate can be boolean expressions (and, or)
    and include XPath functions
  • An example XPath Query
  • //show./review//ratingtwo thumbs up/title

64
An XPath Query with Predicates
imdb
descendant
show
show
child
child
title
review
descendant
review
review
title
_at_year
rating
Fugitive, The
1993
two thumbs up
suntimes
nyt
XPath Query //show./review//ratingtwo thumbs
up/title
...
rating
reviewer
Roger Ebert
gives
XML Data
65
XQuery
  • A concrete syntax
  • http//www.w3.org/TR/xquery
  • A formal semantics and algebra
  • http//www.w3.org/TR/query-semantics/
  • Some use cases
  • http//www.w3.org/TR/xquery-use-cases/

66
XQuery
  • A strongly-typed XML manipulation language
  • Designed mostly by DB and functional language
    people
  • Attempts to satisfy the needs of data management
    and document management
  • The database-style core is mostly complete
  • The document keyword querying features are in
    progress shows in the order-preserving default
    model
  • http//www.w3.org/TR/xquery-full-text-requirements
    /
  • http//www.w3.org/TR/xmlquery-full-text-use-cases/

67
XQuerys Basic Form
  • Has an analogous form to SQLs SELECT..FROM..WHERE
    ..GROUP BY..ORDER BY
  • The model
  • binds nodes (or node sets) to variables
  • operates over each legal combination of
    bindings
  • produces a set of nodes
  • FLWOR statement
  • for iterators that bind variables
  • let collections
  • where conditions
  • order by order-conditions
  • return output constructor

68
Iterations in XQuery For
  • A series of (possibly nested) FOR statements
    assigning the results of XPaths to variables
  • The FOR clause iterates over the items in the
    binding sequence, binding the variable to each
    item in turn
  • for root in document(http//my.org/my.xml)
  • for v1 in root//show,
  • v2 in v1/reviews,
  • document()/doc() function specifies an input file
    as a URI
  • Something like a template that pattern-matches,
    and produces binding tuples
  • For each of these, we evaluate the WHERE and
    possibly output according to the RETURN template

69
XQuery Example Q1
  • for s in document(imdb.xml)/imdb/show,
  • yr in s/_at_year
  • where yr 1994
  • return s

70
Output of Q1

Fugitive, The

Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.

The standard
hollywood summer movie strikes back.
183,752,965x_office year1994 X
Files,The
4
X
Files,The
4
71
Output in XQuery Return
  • Building a nested XML trees is perhaps the most
    common operation
  • In XQuery, its easy put a subquery in the
    return clause where you want things to repeat!
  • Curly braces delimit enclosed expressions
    from literal text
  • Q2
  • for s in document(imdb.xml)/imdb/show
  • where s/_at_year 1993
  • return
  • s/title
  • for r in s/reviews
  • where r//reviewer/text() Roger Ebert
    return r

72
Output of Q2

Fugitive, The

Roger Ebert gives
two thumbs
up ! A fun
action movie, Harrison Ford at his best.

The standard
hollywood summer movie strikes back.
183,752,965x_office year1994 X
Files,The
4
73
Output of Q2 for Revised XML Data
Will this show appear in the output?

Fugitive, The
The standard hollywood
summer movie strikes back.
183,752,965fice
  • Q2
  • for s in document(imdb.xml)/shows
  • where s/_at_year 1993
  • return
  • s/title
  • for r in s/reviews
  • where r//reviewer/text()
    Roger Ebert return r

How to revise Q2 such that this show will not be
returned?
74
Collections Aggregation in XQuery Let
  • In XQuery, many operations return collections,
    the LET clause binds each variable to the result
    of its associated expression, without iteration
  • Aggregation simply applies a function over a
    collection, where the function returns a value
  • Q3
  • let s document(imdb.xml)/imdb/show
  • return
  • fncount(fndistinct-values(
    s/reviewers))

75
Distinct-ness
  • In XQuery, DISTINCT-ness happens as a function
    over a collection
  • But since we have nodes, we can do duplicate
    removal according to value or node
  • fndistinct-values(collection) remove duplicate
    atomic values
  • http//www.xqueryfunctions.com/xq/fn_distinct-valu
    es.html
  • fndistinct-nodes(collection) remove duplicate
    nodes
  • E.g.
  • Roger Ebert
  • Roger Ebert

Same value, Different nodes
76
Iterations vs Collections
  • Different uses of FOR and LET clauses.
  • Example 1
  • let s (, , ) return
    s
  • Example 2
  • for s in (, , ) return
    s
  • Output
  • Only one item is generated, containing the
    binding of s
  • Output

  • One tuple is generated for each of these
    bindings, and the return clause is invoked for
    each tuple

77
Collections Aggregation in XQuery Let (cont)
  • Q4 Suppose each reviewer has a first name and a
    last name, list all distinct reviewer names.
  • let r doc(imdb.xml")//reviewer
  • for last in distinct-values(r/last),
  • first in distinct-values(rlast
    last/first)
  • order by last, first
  • return
  • last
  • first

78
Collections Aggregation in XQuery Let (cont)
  • We can compose aggregations and create new
    collections from the old
  • Q5
  • let avgItemsSold fnavg( for order in
    document(my.xml)/orders/order let
    totalSold fnsum(order/item/quantity)
    return totalSold)
  • return avgItemsSold
  • What does this query do?

79
Joins in XQuery
  • We can use variable bindings to represent joins
  • Q6
  • for r in distinct-values(doc(imdb.xm
    l")//reviewer)
  • return
  • r
  • for s in
    doc(imdb.xml")/show
  • where some sr in
    s//reviewer r
  • return s/title

According to distinct-values function, each r
is bound to a string value, not an element node.
In a direct element constructor, curly braces
delimit enclosed expressions, distinguishing them
from literal text. Enclosed expressions are
evaluated and replaced by their value.
, !, , compares the values (of
sequences), atomization is applied to each
operand. (eq, ne, lt, le, gt, ge for comparing
single values) is, compares node
identity or document order http//www.w3.org/TR/xq
uery/id-comparisons
80
Sorting in XQuery
  • SQL actually allows you to sort its output, with
    a special ORDER BY clause
  • XQuery borrows this idea
  • In XQuery, what we order is the sequence of
    result tuples output by the return clause
  • Q7
  • for x in document(imdb.xml)/imdb/show
  • order by x/year
  • return x

81
What If Order Doesnt Matter?
  • By default
  • SQL is unordered
  • XQuery is ordered, since XML is ordered!
  • What if we want to use XML to represent unorder
    data, e.g. relations?
  • XQuery has a way of telling the query engine to
    avoid preserving order
  • unordered for x in (mypath)

82
XQuery Beyond FLWOR
  • XQuery has many built-in functions and
    predicates,such as
  • count(), sum(), min(), max(), position(),
    first(), last() which work over sequences
  • distinct-values(), distinct-nodes() remove
    duplicates
  • Set operations union, intersection
  • If-then-else statements and function definition
    (define function name (params) returns result)
    are also included

83
Querying Defining Metadata
  • XML is a model that mixes data and meta data.
    XQuery has capabilities to query them seamlessly
  • Obtain a nodes name using function node-name()
  • for x in document(imdb.xml)/imdb/
  • return node-name(x)
  • Construct elements and attributes
  • for x in document(imdb.xml)/imdb/,
  • year in x/_at_year,
  • title in x/title/text(),
  • element node-name(x)
  • attribute year year title
  • Cant do this in SQL!

84
XSL(T) The Bridge Back to HTML
  • XSL (XML Stylesheet Language)
  • A transformation language
  • XSLT is designed primarily for the
    transformations used in XSL. Besides XSLT, XSL
    includes an XML vocabulary for specifying
    formatting.
  • E.g. convert from XML ? HTML, which is how a lot
    of people do their formatting
  • Products like Apache Cocoon generally translate
    XML ? HTML on the server side
  • http//www.w3.org/TR/xslt

85
  • We have discussed the schema (partially), API and
    query languages for XML.
  • Comparing XPath with XQuery (FLWOR)
  • XPath expresses limited FWR
  • Most of the components in RDBMS have their
    counterpart in XML standards.
  • What else?

86
Keys
  • An essential part of database design
  • provide a way to identify a tuple in a relation
  • Important for updates
  • More philosophically,
  • a key of a tuple is the invariant connection
    between the tuple and the real world entity it
    represents eg. SSN of Persons relation

SSN
Name
H. Simpson
1234
87
DOM Node Addresses
db
1
2
composer
composer
1
2
4
2
1
3
_at_period
work
work
work
period
name
born
name
baroque
_at_num
2
1
1
1
2
1
_at_num
_at_num
1
title
num
num
first
last
num
last
first
title
19
82
552
1685
1
1
1
1
1
1
G.F
Handel
Art Thou Troubled?
Ich habe genug
J.S
Bach
Need a value-based mechanism for identifying
nodes
88
Value-based Constraints for XML
  • ID/IDREF in DTD
  • KEY/KEYREF in XML Schema
  • Functional dependencies for XML

89
Specifying ID and IDREF attributes in DTD
Used to uniquely identify elements in an XML
document
  • id ID REQUIRED
  • mother IDREF IMPLIED
  • father IDREF IMPLIED
  • children IDREFS IMPLIED

90
Navigate IDREF edges in XML Data
  • Using ID/IDREF, XML can be used to represent a
    directed graph data model.
  • To navigate this graph model, we can use
  • Function id()
  • Selects elements by their unique ID
  • Examples
  • id("foo") selects the element with unique ID
    foo
  • id(//person/mother)

91
  • Is function id() sufficient to express
    navigations in a graph?
  • Restrictions of using id() to navigate through
    IDREF
  • Need to explicitly invoke id() in the query
  • Navigate IDREF edges in the graph differently as
    other edges.
  • There is no reverse traversal of id() can be
    performed

92
Why ID/IDREF Is Not Sufficient for Value-based
Constraints?
  • It is not categorized.
  • Person name and book title share the same ID
    type
  • Unary keys only
  • We can not specify that a person has key of the
    first name and last name.
  • It is impossible to express two alternative keys
    for a node
  • It is always global
  • i.e. ID attributes are unique within the
    entire document

93
Example 1 Absolute/Global Keys
db
composer
composer
work
name
work
work
born
period
name
baroque
title
num
last
first
num
num
last
first
title
19
82
552
1685
G.F
Handel
Art Thou Troubled?
Ich habe genug
J.S
Bach
94
Example 2 Relative/Local Keys
db
book
book
. . .
chapter
. . .
title
chapter
chapter
chapter
. . .
title


sec
sec
num
sec
sec
sec
num
sec
sec
sec
num
num
Biology
Chemistry


num
num
num
num
num
num
num
num
Eleven
One
Twelve
One

6
1
1
5
1
3
1
4
95
Expressing Keys in XML Schema (I)
XML
XML Schema

namecomposer"
type"rcomposerType"/
name"NumKey" xpath//work"/


J.S
Bach
Ich
habe genug 82
552
1685
..
Note 1. All XPaths start at the element
currently being defined 2. Each field must
identify a single node for each node evaluated
from selector
96
An Example Key in XML Schema (II)
XML
XML Schema

J.S
Bach
Ich
habe genug 82
552
1685
..
xpath"//composer"/

97
Two Flavors of Keys in XML Schema
XML Schema
xpathp"/ xpathp2"/ . . . xpathpk"/
xpathp"/ xpathp2"/ . . . xpathpk"/
98
Keys in XML Schema
  • Unique guarantees uniqueness
  • Key guarantees uniqueness and existence
  • All XPath expressions are restricted
  • /a/b /a/c OK for selector
  • //a/b//c OK for field
  • To help the implementation
  • Note better and more expressive than DTDs ID
    mechanism

99
How to Use XML Keys to Specify an ID?

  • Why we use unique instead of key?
  • Often XML does not require every node has an ID,
    the existence of ID is not guaranteed.

100
How to Specify a Relative Key?
db
book
book
. . .
chapter
. . .
title
chapter
chapter
chapter
. . .
title


sec
sec
num
sec
sec
sec
num
sec
sec
sec
num
num
Biology
Chemistry


num
num
num
num
num
num
num
num
Eleven
One
Twelve
One

6
1
1
5
1
3
1
4
101
How to Specify a Relative Key (cont)
XML Schema

. namebookKey" xpath/book"/ xpathtitle"/ namebook"
. namechapterKey" xpath/chapter"/ xpathnum"/
  • Define keys in the right context!
  • A chapter can be uniquely identified by num
    within a book node.

102
Keyrefs in XML Schema
db
XML Schema
XML Data
namebookKey" xpath/book"/ xpathtitle"/ namebookRef" referbookKey"



..
..
..

..
..

For Keyref Sometimes it is not clear what
paths should be selector, versus field
103
Reference to Global/Relative Keys
  • We can use Keyref to refer a global key.
  • E.g. We can refer a book by its title anywhere in
    the document.
  • To refer a relative key, Keyrefs must be in the
    same context (element) as keys.
  • Since we have no mechanisms to specify the
    context of a key in a keyref
  • E.g. Can not refer to a chapter of another book!

104
Another Representation for Relative Keys
  • Compute an absolute key from several relative
    keys in a transitive way
  • Such a (absolute) key can be referenced anywhere
    in the document



ment namebook" namechapterKey" xpath/chapter"/ xpathnum"/
105
Combining Keys and Schemas
  • On XML Integrity Constraints in the Presence of
    DTDs, Fan and Libkin, PODS2001
  • Keys DTDs sometimes imply unexpected effects

106
Combining Keys and Schemas
expertJim DB

Graphics nameJim AI

OS
. . . .
107
Combining Keys/Keyrefs and Schemas
DTD subject) REQUIRED subject expert CDATA REQUIRED
  • Keys
  • Any teacher node is keyed by _at_name
  • Any subject node is keyed by _at_expert
  • Keyrefs
  • All the values of //subject/_at_expert appear in
    //teacher/_at_name
  • But it is impossible for an XML document to
    satisfy all of them !
  • In general undecidable to check if such an XML
    document exists

108
Functional dependencies for XML
  • for x in //student, x/_at_sno-x/name
  • Why this would be useful?
  • Not in XML Schema yet!

109
References for XML Constraints
  • DTD http//www.w3.org/TR/REC-xml/
  • XML Schema http//www.w3.org/TR/xmlschema-0/
  • Keys for XML by Buneman, Davidson, Fan, Hara,
    Tan, in WWW10, 2001.
  • A Normal Form for XML Documents by Marcelo
    Arenas, Leonid Libkin, in PODS 2002.
  • Data on the Web Abiteboul, Buneman, Suciu
    section 7.7

110
Useful links
  • http//www.w3.org/XML/
  • http//www.w3.org/TR/xpath
  • http//www.w3.org/TR/xquery-use-cases/
  • http//www.w3.org/TR/xquery/
Write a Comment
User Comments (0)
About PowerShow.com