Querying XML: XQuery and XSLT - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Querying XML: XQuery and XSLT

Description:

Attempts to satisfy the needs of data management and document management ... The document keyword querying features are still in the works shows in the ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 29
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: Querying XML: XQuery and XSLT


1
Querying XML XQuery and XSLT
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 550 Database Information Systems
  • October 21, 2009

Some slide content courtesy of Susan Davidson,
Dan Suciu, Raghu Ramakrishnan
2
Reminders
  • Homework 4 handed out
  • XQuery and XPath
  • Midterm on Monday
  • Will have more detailed project specs next week

3
Querying XML
  • How do you query a directed graph? a tree?
  • The standard approach used by many XML,
    semistructured-data, and object query languages
  • Define some sort of a template describing
    traversals from the root of the directed graph
  • In XML, the basis of this template is called an
    XPath
  • In its simplest form, an XPath is like a path in
    a file system
  • but there are more elaborate versions with
    axes, predicates, etc.

4
XML Data Model Visualized
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
5
XQuery
  • A strongly-typed, Turing-complete XML
    manipulation language
  • Attempts to do static typechecking against XML
    Schema
  • Based on an object model derived from Schema
  • Unlike SQL, fully compositional, highly
    orthogonal
  • Inputs outputs collections (sequences or bags)
    of XML nodes
  • Anywhere a particular type of object may be used,
    may use the results of a query of the same type
  • Designed mostly by DB and functional language
    people
  • Attempts to satisfy the needs of data management
    and document management
  • The database-style core is mostly complete (even
    has support for NULLs in XML!!)
  • The document keyword querying features are still
    in the works shows in the order-preserving
    default model

6
XQuerys Basic Form
  • Has an analogous form to SQLs SELECT..FROM..WHERE
    ..GROUP BY..ORDER BY
  • The model bind nodes (or node sets) to
    variables operate over each legal combination of
    bindings produce a set of nodes
  • FLWOR statement note case sensitivity!
  • for iterators that bind variables
  • let collections
  • where conditions
  • order by order-conditions (older version was
    SORTBY)
  • return output constructor

7
Iterations in XQuery
  • A series of (possibly nested) FOR statements
    assigning the results of XPaths to variables
  • for root in document(http//my.org/my.xml)
  • for sub in root/rootElement,
  • sub2 in sub/subElement,
  • Something like a template that pattern-matches,
    produces a binding tuple
  • For each of these, we evaluate the WHERE and
    possibly output the RETURN template
  • document() or doc() function specifies an input
    file as a URI
  • Old version was document now doc but it
    depends on your XQuery implementation

8
Two XQuery Examples
  • ltroot-taggt
  • for p in document(dblp.xml)/dblp/proceedings,
  • yr in p/yr
  • where yr 1999
  • return ltprocgt p lt/procgt
  • lt/root-taggt
  • for i in document(dblp.xml)/dblp/inproceedings
    author/text() John Smith
  • return ltsmith-papergt
  • lttitlegt i/title/text() lt/titlegt
  • ltkeygt i/_at_key lt/keygt
  • i/crossref
  • lt/smith-papergt

9
Nesting in XQuery
  • Nesting XML trees is perhaps the most common
    operation
  • In XQuery, its easy put a subquery in the
    return clause where you want things to repeat!
  • for u in document(dblp.xml)/universities
  • where u/country USA
  • return ltms-theses-99gt
  • u/title
  • for mt in u/../mastersthesis
  • where mt/year/text() 1999 and
    ____________
  • return mt/title
  • lt/ms-theses-99gt

10
Collections Aggregation in XQuery
  • In XQuery, many operations return collections
  • XPaths, sub-XQueries, functions over these,
  • The let clause assigns the results to a variable
  • Aggregation simply applies a function over a
    collection, where the function returns a value
    (very elegant!)
  • let allpapers document(dblp.xml)/dblp/articl
    e
  • return ltarticle-authorsgt
  • ltcountgt fncount(fndistinct-values(allpapers/
    authors)) lt/countgt
  • for paper in doc(dblp.xml)/dblp/article
  • let pauth paper/author
  • return ltpapergt paper/title
  • ltcountgt fncount(pauth) lt/countgt
  • lt/papergt
  • lt/article-authorsgt

11
Collections, Ctd.
  • Unlike in SQL, we can compose aggregations and
    create new collections from old
  • ltresultgt
  • let avgItemsSold fnavg(for order in
    document(my.xml)/orders/orderlet totalSold
    fnsum(order/item/quantity)return
    totalSold)return avgItemsSold
  • lt/resultgt

12
Distinct-ness
  • In XQuery, DISTINCT-ness happens as a function
    over a collection
  • But since we have nodes, we can do duplicate
    removal according to value or node
  • Can do fndistinct-values(collection) to remove
    duplicate values, or fndistinct-nodes(collection)
    to remove duplicate nodes
  • for years in fndistinct-values(doc(dblp.xml)//
    year/text()
  • return years

13
Sorting in XQuery
  • SQL actually allows you to sort its output, with
    a special ORDER BY clause (which we havent
    discussed, but which specifies a sort key list)
  • XQuery borrows this idea
  • In XQuery, what we order is the sequence of
    result tuples output by the return clause
  • for x in document(dblp.xml)/proceedings
  • order by x/title/text()
  • return x

14
What If Order Doesnt Matter?
  • By default
  • SQL is unordered
  • XQuery is ordered everywhere!
  • But unordered queries are much faster to answer
  • XQuery has a way of telling the query engine to
    avoid preserving order
  • unordered for x in (mypath)

15
Querying Defining Metadata Cant Do This in
SQL
  • Can get a nodes name by querying node-name()
  • for x in document(dblp.xml)/dblp/
  • return node-name(x)
  • Can construct elements and attributes using
    computed names
  • for x in document(dblp.xml)/dblp/,
  • year in x/year,
  • title in x/title/text(),
  • element node-name(x)
  • attribute year- year title

16
XQuery Summary
  • Very flexible and powerful language for XML
  • Clean and orthogonal can always replace a
    collection with an expression that creates
    collections
  • DB and document-oriented (we hope)
  • The core is relatively clean and easy to
    understand
  • Turing Complete well talk more about XQuery
    functions soon

17
XSL(T) The Bridge Back to HTML
  • XSL (XML Stylesheet Language) is actually divided
    into two parts
  • XSLFO formatting for XML
  • XSLT a special transformation language
  • Well leave XSLFO for you to read off
    www.w3.org, if youre interested
  • XSLT is actually able to convert from XML ? HTML,
    which is how many people do their formatting
    today
  • Products like Apache Cocoon generally translate
    XML ? HTML on the server side

18
A Different Style of Language
  • XSLT is based on a series of templates that match
    different parts of an XML document
  • Theres a policy for what rule or template is
    applied if more than one matches (its not what
    youd think!)
  • XSLT templates can invoke other templates
  • XSLT templates can be nonterminating (beware!)
  • XSLT templates are based on XPath matches, and
    we can also apply other templates (potentially to
    selected XPaths)
  • Within each template, we describe what should be
    output
  • (Matches to text default to outputting it)

19
An XSLT Stylesheet
  • ltxslstylesheet version1.1gt
  • ltxsltemplate match/dblpgt
  • lthtmlgtltheadgtThis is DBLPlt/headgt
  • ltbodygt
  • ltxslapply-templates /gt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt
  • ltxsltemplate matchinproceedingsgt
  • lth2gtltxslapply-templates selecttitle /gtlt/h2gt
  • ltpgtltxslapply-templates selectauthor/gtlt/pgt
  • lt/xsltemplategt
  • lt/xslstylesheetgt

20
Results of XSLT Stylesheet
  • ltdblpgt
  • ltinproceedingsgt
  • lttitlegtPaper1lt/titlegt
  • ltauthorgtSmithlt/authorgt
  • lt/inproceedingsgt
  • ltinproceedingsgt
  • ltauthorgtChakrabartilt/authorgt
  • ltauthorgtGraylt/authorgt
  • lttitlegtPaper2lt/titlegt
  • lt/inproceedingsgt
  • lt/dblpgt
  • lthtmlgtltheadgtThis Is DBLPlt/headgt
  • ltbodygt
  • lth2gtPaper1lt/h2gt
  • ltpgtSmithlt/pgt
  • lth2gtPaper2lt/h2gt
  • ltpgtChakrabartilt/pgt
  • ltpgtGraylt/pgt
  • lt/bodygt
  • lt/htmlgt

21
What XSLT Can and Cant Do
  • XSLT is great at converting XML to other formats
  • XML ? diagrams in SVG HTML LaTeX
  • XSLT doesnt do joins (well), it only works on
    one XML file at a time, and its limited in
    certain respects
  • Its not a query language, really
  • But its a very good formatting language
  • Most web browsers (post Netscape 4.7x) support
    XSLT and XSL formatting objects
  • But most real implementations use XSLT with
    something like Apache Cocoon
  • You may want to use XSL/XSLT for your projects
    see www.w3.org/TR/xslt for the spec

22
Querying XML
  • Weve seen three XML manipulation formalisms
    today
  • XPath the basic language for projecting and
    selecting (evaluating path expressions and
    predicates) over XML
  • XQuery a statically typed, Turing-complete XML
    processing language
  • XSLT a template-based language for transforming
    XML documents
  • Each is extremely useful for certain applications!

23
Views in SQL and XQuery
  • A view is a named query
  • We use the name of the view to invoke the query
    (treating it as if it were the relation it
    returns)
  • SQL
  • CREATE VIEW V(A,B,C) AS
  • SELECT A,B,C FROM R WHERE R.A 123
  • XQuerydeclare function V() as element(content)
  • for r in doc(R)/root/tree,
  • a in r/a, b in r/b, c in r/c
  • where a 123
  • return ltcontentgta, b, clt/contentgt

Using the views
SELECT FROM V, RWHERE V.B 5 AND V.C R.C
for v in V()/content, r in doc(r)/root/tree
where v/b r/breturn v
24
Whats Useful about Views
  • Providing security/access control
  • We can assign users permissions on different
    views
  • Can select or project so we only reveal what we
    want!
  • Can be used as relations in other queries
  • Allows the user to query things that make more
    sense
  • Describe transformations from one schema (the
    base relations) to another (the output of the
    view)
  • The basis of converting from XML to relations or
    vice versa
  • This will be incredibly useful in data
    integration, discussed soon
  • Allow us to define recursive queries

25
Materialized vs. Virtual Views
  • A virtual view is a named query that is actually
    re-computed every time it is merged with the
    referencing query
  • CREATE VIEW V(A,B,C) AS
  • SELECT A,B,C FROM R WHERE R.A 123
  • A materialized view is one that is computed once
    and its results are stored as a table
  • Think of this as a cached answer
  • These are incredibly useful!
  • Techniques exist for using materialized views to
    answer other queries
  • Materialized views are the basis of relating
    tables in different schemas

SELECT FROM V, RWHERE V.B 5 AND V.C R.C
26
Views Should Stay Fresh
  • Views (sometimes called intensional relations)
    behave, from the perspective of a query language,
    exactly like base relations (extensional
    relations)
  • But theres an association that should be
    maintained
  • If tuples change in the base relation, they
    should change in the view (whether its
    materialized or not)
  • If tuples change in the view, that should reflect
    in the base relation(s)

27
View Maintenance and the View Update Problem
  • There exist algorithms to incrementally recompute
    a materialized view when the base relations
    change
  • We can try to propagate view changes to the base
    relations
  • However, there are lots of views that arent
    easily updatable
  • We can ensure views are updatable by enforcing
    certain constraints (e.g., no aggregation),but
    this limits the kinds of views we can have!

R
S
R?S
delete?
28
Next Time
  • Can we have views in XML over tables in
    relations?
  • Or vice versa?
  • What other things can we use views for
Write a Comment
User Comments (0)
About PowerShow.com