Querying XML - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Querying XML

Description:

ELEMENT dblp((mastersthesis | article) ... article mdate='2002-01-03' key='tr/dec/SRC1997-018' editor Paul R. McJones /editor ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 37
Provided by: zack4
Category:
Tags: xml | querying

less

Transcript and Presenter's Notes

Title: Querying XML


1
Querying XML
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 455 / 555 Internet and Web Systems
  • February 3, 2009

2
Today
  • DTDs
  • XPath and XSLT
  • Reminders
  • Assignment 1 milestone 1 due tonight
  • Assignment 1 milestone 2 due Feb 17

3
Integrating XML What If We Have Multiple
Sources with the Same Tags?
  • Namespaces allow us to specify a context for
    different tags
  • Two parts
  • Binding of namespace to URI
  • Qualified names
  • lttag xmlnsmynshttp//www.fictitious.com/mypath
    gt
  • ltthistaggtis in namespace mynslt/thistaggt
  • ltmynsthistaggtis the samelt/mynsthistaggtltotherns
    thistaggtis a different taglt/othernsthistaggt
  • lt/taggt

4
XML Isnt Enough on Its Own
  • Its too unconstrained for many cases!
  • How will we know when were getting garbage?
  • How will we query?
  • How will we understand what we got?

5
Document Type Definitions (DTDs)
  • DTD is an EBNF grammar defining XML structure
  • XML document specifies an associated DTD, plus
    the root element
  • DTD specifies children of the root (and so on)
  • DTD defines special significance for attributes
  • IDs special attributes that are analogous to
    keys for elements
  • IDREFs references to IDs
  • IDREFS space-delimited list of IDREFs

6
An Example DTD
  • Example DTD
  • lt!ELEMENT dblp((mastersthesis article))gt
  • lt!ELEMENT mastersthesis(author,title,year,school,c
    ommitteemember)gt
  • lt!ATTLIST mastersthesis(mdate CDATA REQUIRED ke
    y ID REQUIRED
  • advisor CDATA IMPLIEDgt
  • lt!ELEMENT author(PCDATA)gt
  • Example use of DTD in XML file
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • lt!DOCTYPE dblp SYSTEM my.dtd"gt
  • ltdblpgt

7
Representing Graphs in XML
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • lt!DOCTYPE graph SYSTEM special.dtd"gt
  • ltgraphgt
  • ltauthor idauthor1gt
  • ltnamegtJohn Smithlt/namegt
  • lt/authorgt
  • ltarticlegt
  • ltauthor refauthor1 /gt lttitlegtPaper1lt/titlegt
  • lt/articlegt
  • ltarticlegt
  • ltauthor refauthor1 /gt lttitlegtPaper2lt/titlegt
  • lt/articlegt

8
Graph Data Model
Root
graph
?xml
!DOCTYPE
article
article
author
id
title
title
author
author
name
Paper1
author1
ref
Paper2
ref
John Smith
author1
author1
9
Graph Data Model
Root
graph
?xml
!DOCTYPE
article
article
author
id
title
title
author
author
name
Paper1
author1
ref
Paper2
ref
John Smith
10
DTDs Are Very Limited
  • DTDs capture grammatical structure, but have some
    drawbacks
  • Not themselves in XML inconvenient to build
    tools for them
  • Dont capture types of scalars
  • Global ID/reference space is inconvenient
  • No way of defining OO-like inheritance

11
XML Schema DTDs Rethought
  • Features
  • XML syntax
  • Better way of defining keys using XPaths
  • Type subclassing
  • And, of course, built-in datatypes

12
Basic Constructs of Schema
  • Separation of elements (and attributes) from
    types
  • complexType is a structured type
  • It can have sequences or choices
  • element and attribute have name and type
  • Elements may also have minOccurs and maxOccurs
  • Subtyping, most commonly using
  • ltcomplexContentgt ltextension baseprevTypegt
    lt/gt

13
Simple Schema Example
  • ltxsdschema xmlnsxsd"http//www.w3.org/2001/XMLS
    chema"gt
  • ltxsdelement namemastersthesis"
    typeThesisType"/gt
  • ltxsdcomplexType nameThesisType"gt
  • ltxsdattribute namemdate" type"xsddate"/gt
  • ltxsdattribute namekey" type"xsdstring"/gt
  • ltxsdattribute nameadvisor" type"xsdstring"/gt
  • ltxsdsequencegt
  • ltxsdelement nameauthor" typexsdstring"/gt
  • ltxsdelement nametitle" typexsdstring"/gt
  • ltxsdelement nameyear" typexsdinteger"/gt
  • ltxsdelement nameschool" typexsdstring/gt
  • ltxsdelement namecommitteemember"
    typeCommitteeType minOccurs0"/gt
  • lt/xsdsequencegt
  • lt/xsdcomplexTypegt

14
Embedding XML Schema
  • ltroot xmlnsxsi"http//www.w3.org/2000/10/XMLSche
    ma-instance" xsinoNamespaceSchemaLocation"s1.xsd
    " gt ltgradegtalt/gradegt lt/rootgt
  • lts1root xmlnss1"http//www.schemaValid.com/s1ns
    " xmlnsxsi"http//www.w3.org/2000/10/XMLSchema-i
    nstance" xsischemaLocation"http//www.schemaVali
    d.com/s1ns s1ns.xsd" gt lts1gradegtalt/s1gradegt
    lt/s1rootgt
  • But the XML parser is actually free to ignore
    this the schema is typically specified from
    outside the document

15
Designing an XML Schema/DTD
  • Often we are given a DTD/Schema if not, we need
    to design one
  • We orient the XML tree around the central
    objects in a particular application

16
Manipulating XML
  • Sometimes
  • Need to restructure an XML document
  • Or simply need to retrieve certain parts that
    satisfy a constraint, e.g.
  • All books
  • All books by author XYZ

17
Document Object Model (DOM)vs. Queries
  • Build a DOM tree (as we saw earlier) and access
    via Java (etc.) DOMNode object
  • DOM objects have methods like getFirstChild(),
    getNextSibling
  • Common way of traversing the tree
  • Can also modify the DOM tree alter the XML
    via insertAfter(), etc.
  • Alternate approach a query language
  • Define some sort of a template describing
    traversals from the root of the directed graph
  • In XML, the basis of this template is called an
    XPath
  • Can also declare some constraints on the values
    you want
  • The XPath returns a node set of matches

18
XPaths
  • In its simplest form, an XPath is like a path in
    a file system
  • /mypath/subpath//morepath
  • The XPath returns a node set representing the XML
    nodes (and their subtrees) at the end of the path
  • XPaths can have node tests at the end, returning
    only particular node types, e.g., text(),
    processing-instruction(), comment(), element(),
    attribute()
  • XPath is fundamentally an ordered language it
    can query in order-aware fashion, and it returns
    nodes in order

19
Sample XML
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • ltdblpgt
  • ltmastersthesis mdate"2002-01-03"
    key"ms/Brown92"gt
  •   ltauthorgtKurt P. Brownlt/authorgt
  •   lttitlegtPRPL A Database Workload
    Specification Languagelt/titlegt
  •   ltyeargt1992lt/yeargt
  •   ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
  •   lt/mastersthesisgt
  • ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
    018"gt
  •   lteditorgtPaul R. McJoneslt/editorgt
  •   lttitlegtThe 1995 SQL Reunionlt/titlegt
  •   ltjournalgtDigital System Research Center
    Reportlt/journalgt
  •   ltvolumegtSRC1997-018lt/volumegt
  •   ltyeargt1997lt/yeargt
  •   lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
  •   lteegthttp//www.mcjones.org/System_R/SQL_Reunio
    n_95/lt/eegt
  •   lt/articlegt

20
XML Data Model Visualized
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
21
Some Example XPath Queries
  • /dblp/mastersthesis/title
  • /dblp//editor
  • //title
  • //title/text()

22
Context Nodes and Relative Paths
  • XPath has a notion of a context node its
    analogous to a current directory
  • . represents this context node
  • .. represents the parent node
  • We can express relative paths
  • subpath/sub-subpath/../.. gets us back to the
    context node
  • By default, the document root is the context node

23
Predicates Filtering Operations
  • A predicate allows us to filter the node set
    based on selection-like conditions over
    sub-XPaths
  • /dblp/articletitle Paper1
  • which is equivalent to
  • /dblp/article./title/text() Paper1
  • because of type coercion. What does this do
  • /dblp/article_at_key 123 and ./title/text()
    Paper1 and ./author//element()

24
Axes More Complex Traversals
  • Thus far, weve seen XPath expressions that go
    down the tree (and up one step)
  • But we might want to go up, left, right, etc.
  • These are expressed with so-called axes
  • selfpath-step
  • childpath-step parentpath-step
  • descendantpath-step ancestorpath-step
  • descendant-or-selfpath-step ancestor-or-selfpa
    th-step
  • preceding-siblingpath-step following-siblingpa
    th-step
  • precedingpath-step followingpath-step
  • The previous XPaths we saw were in abbreviated
    form

25
Users of XPath
  • XML Schema uses simple XPaths in defining keys
    and uniqueness constraints
  • XLink and XPointer, hyperlinks for XML
  • XSLT useful for converting from XML to other
    representations (e.g., HTML, PDF, SVG)
  • XQuery useful for restructuring an XML document
    or combining multiple documents
  • Might well turn into the glue between Web
    Services, etc.

26
XSLT Transforming an XML Document
  • XSLT XML Stylesheet Language Transformations
  • Companion to XSLFO, formatting for XML
  • A language for substituting structured fragments
    for XML content
  • Transforms single document ? single document
  • Useful for XML ? XML conversions, XML ? HTML
  • Runs on server side (Apache Cocoon) or
    client-side (modern browsers)

27
A Functional Language for XML
  • XSLT is based on a series of templates that match
    different parts of an XML document
  • Theres a policy for what rule or template is
    applied if more than one matches (its not what
    youd think!)
  • XSLT templates can invoke other templates
  • XSLT templates can be nonterminating (beware!)
  • XSLT templates are based on XPath matches, and
    we can also apply other templates (potentially to
    selected XPaths)
  • Within each template, directly describe what
    should be output

28
An XSLT Template
  • An XML document itself
  • XML tags create output OR are XSL operations
  • All XSL tags are prefixed with xsl namespace
  • All non-XSL tags are part of the XML output
  • Common XSL operations
  • template with a match XPath
  • Recursive call to apply-templates, which may also
    select where it should be applied
  • Attach to XML document with a processing-instructi
    on
  • lt?xml version 1.0 ?gtlt?xml-stylesheet
    typetext/xsl hrefhttp//www.com/my.xsl ?gt

29
An Example XSLT Stylesheet
  • ltxslstylesheet version1.1gt
  • ltxsltemplate match/dblpgt
  • lthtmlgtltheadgtThis is DBLPlt/headgt
  • ltbodygt
  • ltxslapply-templates /gt
  • lt/bodygt
  • lt/htmlgt
  • lt/xsltemplategt
  • ltxsltemplate matchinproceedingsgt
  • lth2gtltxslapply-templates selecttitle /gtlt/h2gt
  • ltpgtltxslapply-templates selectauthor/gtlt/pgt
  • lt/xsltemplategt
  • lt/xslstylesheetgt

30
XSLT Processing Model
  • List of source nodes ? result tree fragment(s)
  • Start with root
  • Find all template rules with matching patterns
    from root
  • Find best match according to some heuristics
  • Set the current node list to be the set of things
    it maches
  • Iterate over each node in the current node list
  • Apply the operations of the template
  • Append the results of the matching template
    rule to the result tree structure
  • Repeat recursively if specified to by
    apply-templates

31
What If Theres More than One Match?
  • Eliminate rules of lower precedence due to
    importing
  • Break a rule into any branches and consider
    separately
  • Choose rule with highest computed or specified
    priority
  • Simple rules for computing priority based on
    precision
  • QName preceded by XPath child/axis specifier
    priority 0
  • NCName preceded by child/axis specifier priority
    -0.25
  • NodeTest preceded by child/axis specifier
    pririty -0.5
  • else priority 0.5

32
Other Common Operations
  • Iteration
  • ltxslfor-each selectpathgtlt/xslfor-eachgt
  • Conditionals
  • ltxslif test./text() lt abcgtlt/xslifgt
  • Copying current node and children to the result
    set
  • ltxslcopygt ltxslapply-templates /gtlt/xslcopygt

33
Creating Output Nodes
  • Return text/attribute data (this is a default
    rule)
  • ltxsltemplate matchtext()_at_gt ltxslvalue-of
    select./gtlt/xsltemplategt
  • Create an element from text (attribute is
    similar)
  • ltxslelement nametext()gt ltxslapply-templates
    /gtlt/xslelementgt
  • Copy nodes matching a path
  • ltxslcopy-of select/gt

34
Embedding Stylesheets
  • You can import or include one stylesheet from
    another
  • ltxslimport hrefhttp//www.com/my.xsl/gt
  • ltxslinclude hrefhttp//www.com/my.xsl/gt
  • Include the rules get same precedence as in
    including template
  • Import the rules are given lower precedence

35
XSLT Summary
  • A very powerful, template-based transformation
    language for XML document ? other structured
    document
  • Commonly used to convert XML ? PDF, SVG, GraphViz
    DOT format, HTML, WML,
  • Primarily useful for presentation of XML or for
    very simple conversions
  • But sometimes we need more complex operations
    when converting data from one source to another
  • Joins combining and correlating information
    from multiple sources
  • Aggregation computing averages, counts, etc.

36
Why XSLT Isnt Enough
  • XSLT is focused on reformatting documents
  • Stylesheets are focused around one XML file
  • XML file must reference the stylesheet
  • What if we want to
  • Manage and combine collections of XML documents?
  • Make Web service requests for XML?
  • Glue together different Web service requests?
  • Query for keywords within documents, with ranked
    answers
  • This is where XQuery plays a role
Write a Comment
User Comments (0)
About PowerShow.com