Module 3 XML Processing XPath, XQuery, XUpdate - PowerPoint PPT Presentation

1 / 118
About This Presentation
Title:

Module 3 XML Processing XPath, XQuery, XUpdate

Description:

Take actions based on the existing data. No conceptual organization like for relational databases (applications are ... E.g. Dom, Sax, Stax (JSR 173), XMLReader ... – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 119
Provided by: donaldk
Category:

less

Transcript and Presenter's Notes

Title: Module 3 XML Processing XPath, XQuery, XUpdate


1
Module 3XML Processing(XPath, XQuery, XUpdate)
2
Managing XML data
  • Huge amounts of XML information, and growing
  • We need to manage it
  • Store it efficiently
  • Verify the correctness
  • Filter, search, select, join, aggregate
  • Create new data
  • Update it
  • Take actions based on the existing data
  • No conceptual organization like for relational
    databases (applications are too heterogeneous)

3
Frequent solutions to XML data management
  • Map it to generic programming APIs
  • Manually map it to non-generic APIs
  • Automatically map it to non-generic structures
  • Use XML extensions of existing languages
  • Shredding for relational stores
  • Native XML processing through XSLT and XQuery

4
1. Mapping to generic structures
  • Represent the data
  • Original UNICODE form or
  • Some binary representation
  • Store it
  • Directly on a file system or
  • On a transacted file system (e.g. SleepyCat, or
    a relational database)
  • Map the XML data to generic XML programmatic APIs
  • E.g. Dom, Sax, Stax (JSR 173), XMLReader
  • Use the native programming languages (e.g. Java,
    C) to manipulate the data
  • Re-serialize it at the end

5
1. Manual mapping to generic structures (example)
  • ltpurchaseOrdergt
  • ltlineItemgt
  • ..
  • lt/lineItemgt
  • ltlineItemgt
  • ..
  • lt/lineItemgt
  • lt/purchaseOrdergt
  • ltbookgt
  • ltauthorgtlt/authorgt
  • lttitlegt.lt/titlegt
  • ..
  • lt/bookgt

Class DomNode public String getNodeName() publi
c String getNodeValue() public void
setNodeValue(nodeValue) public short
getNodeType()
Hard coded mappings
6
2. Manual mapping to non-generic structures
  • ltpurchaseOrdergt
  • ltlineItemgt
  • ..
  • lt/lineItemgt
  • ltlineItemgt
  • ..
  • lt/lineItemgt
  • lt/purchaseOrdergt
  • ltbookgt
  • ltauthorgtlt/authorgt
  • lttitlegt.lt/titlegt
  • ..
  • lt/bookgt

Class PurchaseOrder public List
getLineItems() ..
Class Book public List getAuthor() public
String getTitle()
Hard coded mappings
7
3. Automatic mapping to non-generic structures
  • lttype namebook-typegt
  • ltsequencegt
  • ltattribute nameyear typexsintegergt
  • ltelement nametitle typexsstringgt
  • ltsequence minoccurs0gt
  • ltelement nameauthor typexsstringgt
  • lt/sequencegt
  • lt/sequencegt
  • lt/typegt
  • ltelement namebook typebook-typegt

Class Book-type public integer
getYear() public string getTitle() public List
getAuthors() ..
Automatic mapping e.g.XMLBeans
8
4. XML extensions of existing procedural languages
  • Examples
  • C-omega, ECMAscript, PHP extensions,
  • Phyton extensions, etc.
  • Most of them define
  • A way of importing XML data into their native
    type system
  • A rich API for XML data manipulation
  • A way of navigating/searching/querying the XML
    data via their extensions (Xpath based or Xpath
    inspired)

9
5. Native XML processingXSLT and XQuery
  • Most promising alternative for the future ()
  • The only alternative such that
  • the data is modeled only once
  • is well integrated with XML Schema type system
  • it preserves the logical/physical data
    independence
  • the code deals with non-generic structures
  • Data is stored
  • in plain file systems or in sophisticated data
    stores (e.g. XML extensions of relational stores)
  • Currently an incomplete solution
  • No procedural logic
  • Other disadvantages (hopefully temporary)
  • Perceived complexity
  • Perceived loss of performance

10
Why XQuery ?
  • Why a query language for XML ?
  • Need to process XML data
  • Preserve logical/physical data independence
  • The semantics is described in terms of an
    abstract data model, independent of the physical
    data storage
  • Declarative programming
  • Such programs should describe the what, not the
    how
  • Why a native query language ? Why not SQL ?
  • We need to deal with the specificities of XML
    (hierarchical, ordered , textual, potentially
    schema-less structure)
  • Why another XML processing language ? Why not
    XSLT?
  • The template nature of XSLT was not appealing to
    the database people. Not declarative enough.

11
What is XQuery ?
  • A programming language that can express
    arbitrary XML to XML data transformations
  • Logical/physical data independence
  • Declarative
  • High level
  • Side-effect free
  • Strongly typed language
  • An expression language for XML.
  • Commonalities with functional programming,
    imperative programming and query languages
  • The query part might be a misnomer ()

12
XQuery Use Case Scenarios
  • XML transformation language in Web Services
  • Large and very complex queries
  • Input message external data sources
  • Small and medium size data sets (xK -gt xM)
  • Transient and streaming data (no indexes)
  • With or without schema validation
  • XML message brokers
  • Simple path expressions, single input message
  • Small data sets
  • Transient and streaming data (no indexes)
  • Mostly non schema validated data
  • Semantic data verification
  • Mostly messages
  • Potentially complex (but small) queries
  • Streaming and multiquery optimization required

13
XQuery Usage Scenarios (ctd.)
  • Data Integration
  • Complex but smaller queries (FLOWRs, aggregates,
    constructors)
  • Large, persistent, external data repositories
  • Dynamic data (via Web Services invocations)
  • Large volumes of centralized XML data
  • Logs and archives
  • Complex queries (statistics, analytics)
  • Mostly read only
  • Large content repositories
  • Large volume of data (books, manuals, etc)
  • With or without schema validation
  • Full text essential, update required

14
XQuery Usage Scenarios (ctd.)
  • Large volumes of distributed textual data
  • XML search engines
  • High volume of data sources
  • Full text, semantic search crucial
  • RSS data
  • High number of input data channels
  • Data is pushed, not pulled
  • Structure of the data very simple, each item
    bounded size
  • Aggregators using mostly full-text search

15
XQuery Implementations
  • Open Source
  • Saxon (Michael Kay)
  • Galax (ATT, Mary Fernandez)
  • Commercial
  • BEA System (WebLogic Integration)
  • IBM, Microsoft, Oracle (with DB products)
  • Some freelancers
  • Visit www.w3c.org/xquery

16
Roadmap for XQuery
  • XML Data Model
  • XML Type System
  • XQuery Environment
  • XQuery Expressions
  • XUpdate
  • Examples

17
XML Data Model
  • Abstract (I.e. logical) data model for XML data
  • Same role for XQuery as the relational data model
    for SQL
  • Purely logical --- no standard storage or access
    model (in purpose)
  • XQuery is closed with respect to the Data Model

XQuery Xpath 2.0 XSLT 2.0
Infoset
XML Data Model
PSVI
18
XML Data model life cycle
XQuery Data Model
XQuery Data Model
.xml
parse
Xpath 2.0
serialize
.xml
XQuery
validate
.xsd
XSLT 2.0
application- dependent
19
XML Data Model
Remember Lisp ?
  • Instance of the data model
  • a sequence composed of zero or more items
  • The empty sequence often considered as the null
    value
  • Items
  • nodes or atomic values
  • Nodes
  • document element attribute text
    namespaces PI comment
  • Atomic values
  • Instances of all XML Schema atomic types
  • string, boolean, ID, IDREF, decimal, QName, URI,
    ...
  • untyped atomic values
  • Typed (I.e. schema validated) and untyped (I.e.
    non schema validated) nodes and values


20
Sequences
  • Can be heterogeneous (nodes and atomic values)
  • (lta/gt, 3)
  • Can contain duplicates (by value and by identity)
  • (1,1,1)
  • Are not necessarily ordered in document order
  • Nested sequences are automatically flattened
  • ( 1, 2, (3, 4) ) (1, 2, 3, 4)
  • Single items and singleton sequences are the same
  • 1 (1)

21
Atomic values
  • The values of the 19 atomic types available in
    XML Schema
  • E.g. xsinteger, xsboolean, xsdate
  • All the user defined derived atomic types
  • E.g myNSShoeSize
  • xdtuntypedAtomic
  • Atomic values carry their type together with the
    value
  • (8, myNSShoeSize) is not the same as (8,
    xsinteger)

22
XML nodes
  • 7 types of nodes
  • document element attribute text
    namespaces PI comment
  • Every node has a unique node identifier
  • Scope of node identifier uniqueness is
    implementation dependent
  • Nodes have children and an optional parent
  • conceptual tree
  • Nodes are ordered based of the topological order
    in the tree (document order)

23
Node accessors
  • base-uri xsanyURI ?
  • Document-uri xsanyURI ?
  • node-kind xsstring
  • node-name xsQname ?
  • parent node() ?
  • string-value xsstring
  • typed-value xdtanyAtomicType
  • type-name xsQname ?
  • children node()
  • attributes attribute()
  • namespaces node()
  • nilled xsboolean ?

24
Example of well formed XML
  • ltbook year1967 gt
  • lttitlegtThe politics of experiencelt/titlegt
  • ltauthorgtR.D. Lainglt/authorgt
  • lt/bookgt
  • 3 element nodes, 1 attribute node, 2 text nodes
  • name(book element) -book
  • In the absence of schema validation
  • type(book element) xdtuntyped
  • type(author element) xdtuntyped
  • type(year attribute) xdtuntypedAtomic
  • typed-value(author element) (R.D. Laing ,
    xdtuntypedAtomic)
  • typed-value(year attribute) (1967,
    xdtuntypedAtomic)

25
XML schema example
  • lttype namebook-typegt
  • ltsequencegt
  • ltattribute nameyear typexsintegergt
  • ltelement nametitle typexsstringgt
  • ltsequence minoccurs0gt
  • ltelement nameauthor typexsstringgt
  • lt/sequencegt
  • lt/sequencegt
  • lt/typegt
  • ltelement namebook typebook-typegt

26
Schema validated XML data
  • ltbook year1967 gt
  • lttitlegtThe politics of experiencelt/titlegt
  • ltauthorgtR.D. Lainglt/authorgt
  • lt/bookgt
  • After schema validation
  • type(book element) uribook-type
  • type(author element) xsstring
  • type(year attribute) xsinteger
  • typed-value(author element) (R.D. Laing ,
    xsstring)
  • typed-value(year attribute) (1967 , xsinteger)
  • Schema validation impacts the data model
    representation and therefore the XQuery
    semantics!!

27
Lexical and binary aspect
  • Every node holds (logically) redundant
    information
  • lta xsitypexsintegergt001lt/agt
  • dmstring-value () 001 as xs
  • dmtyped-value ()
  • 001 as an xdtuntyped before validation
  • 1 as an xsinteger after validation
  • Implementations can store
  • The string value
  • Retrieve the typed value dynamically based on the
    type, every time is needed
  • The typed value
  • Retrieve an acceptable lexical value for that
    type every time this is required
  • Both
  • In case of unvalidated data the two are the same

28
XML queries
  • An XQuery basic structure
  • a prolog an expression
  • Role of the prolog
  • Populate the context where the expression is
    compiled and evaluated
  • Prologue contains
  • namespace definitions
  • schema imports
  • default element and function namespace
  • function definitions
  • collations declarations
  • function library imports
  • global and external variables definitions
  • etc

29
XQuery expressions
  • XQuery Expr Constants Variable
    FunctionCalls PathExpr
  • ComparisonExpr ArithmeticExpr LogicExpr
  • FLWRExpr ConditionalExpr
    QuantifiedExpr
  • TypeSwitchExpr InstanceofExpr CastExpr
  • UnionExpr IntersectExceptExpr
  • ConstructorExpr ValidateExpr
  • Expressions can be nested with full generality !
  • Functional programming heritage.

30
Constants
  • XQuery grammar has built-in support for
  • Strings 125.0 or 125.0
  • Integers 150
  • Decimal 125.0
  • Double 125.e2
  • 19 other atomic types available via XML Schema
  • Values can be constructed
  • with constructors in FO doc fntrue(),
    fndate(2002-5-20)
  • by casting
  • by schema validation

31
Variables
  • QName
  • bound, not assigned
  • XQuery does not allow variable assignment
  • created by let, for, some/every, typeswitch
    expressions, function parameters
  • example
  • let x ( 1, 2, 3 )
  • return count(x)
  • above scoping ends at conclusion of return
    expression

32
A built-in function sampler
  • fndocument(xsanyURI)gt document?
  • fnempty(item) gt boolean
  • fnindex-of(item, item) gt xsunsignedInt?
  • fndistinct-values(item) gt item
  • fndistinct-nodes(node) gt node
  • fnunion(node, node) gt node
  • fnexcept(node, node) gt node
  • fnstring-length(xsstring?) gt xsinteger?
  • fncontains(xsstring, xsstring) gt xsboolean
  • fntrue() gt xsboolean
  • fndate(xsstring) gt xsdate
  • fnadd-date(xsdate, xsduration) gt xsdate
  • See Functions and Operators W3C
    specification

33
Constructing sequences
  • (1, 2, 2, 3, 3, lta/gt, ltb/gt)
  • , is the sequence concatenation operator
  • Nested sequences are flattened
  • (1, 2, 2, (3, 3)) gt (1, 2, 2, 3,3)
  • range expressions (1 to 3) gt (1, 2,3)

34
Combining sequences
  • Union, Intersect, Except
  • Work only for sequences of nodes, not atomic
    values
  • Eliminate duplicates and reorder to document
    order
  • x lta/gt, y ltb/gt, z ltc/gt
  • (x, y) union (y, z) gt (lta/gt, ltb/gt, ltc/gt)
  • FO specification provides other functions
    operators eg. fndistinct-values() and
    fndistinct-nodes() particularly useful

35
Arithmetic expressions
  • 1 4 a div 5
  • 5 div 6 b mod 10
  • 1 - (4 8.5) -55.5
  • ltagt42lt/agt 1 ltagtbazlt/agt 1
  • validate lta xsitypexsintegergt42lt/agt 1
  • validate lta xsitypexsstringgt42lt/agt 1
  • Apply the following rules
  • atomize all operands. if either operand is (), gt
    ()
  • if an operand is untyped, cast to xsdouble (if
    unable, gt error)
  • if the operand types differ but can be promoted
    to common type, do so (e.g. xsinteger can be
    promoted to xsdouble)
  • if operator is consistent w/ types, apply it
    result is either atomic value or error
  • if type is not consistent, throw type exception

36
Atomization
  • If every item in the input sequence is either an
    atomic value or a node whose typed value is a
    sequence of atomic values, then return them
  • Otherwise, raise a type error.
  • Fndata(node) extracts the typed value of a node.
  • Often implicit
  • In arithmetic, comparisons, function calls, node
    constructors, sorting, etc

37
Logical expressions
  • expr1 and expr2
  • expr1 or expr2 fnnot() as a function
  • return true, false
  • Different from SQL
  • two value logic, not three value logic
  • Different from imperative languages
  • and, or are commutative
  • Rules
  • first compute the Boolean Effective Value (BEV)
    for each operand
  • if (), , NaN, 0, then return false
  • if the operand is of type xsboolean, return it
  • If operand is a sequence with first item a node,
    return true
  • else raises an error
  • then use standard two value Boolean logic on the
    two BEV's as appropriate
  • false and error gt false or error !
    (non-deterministically)

38
Comparisons
39
Value and general comparisons
  • ltagt42lt/agt eq 42 true
  • ltagt42lt/agt eq 42 error
  • ltagt42lt/agt eq 42.0 false
  • ltagt42lt/agt eq 42.0 error
  • ltagt42lt/agt 42 true
  • ltagt42lt/agt 42.0 true
  • ltagt42lt/agt eq ltbgt42lt/bgt true
  • ltagt42lt/agt eq ltbgt 42lt/bgt false
  • ltagtbazlt/agt eq 42 error
  • () eq 42 ()
  • () 42 false
  • (ltagt42lt/agt, ltbgt43lt/bgt) 42.0 true
  • (ltagt42lt/agt, ltbgt43lt/bgt) 42 true
  • nsshoesize(5) eq nshatsize(5) true
  • (1,2) (2,3) true

40
Conditional expressions
  • if ( book/_at_year lt1980 )
  • then nsWS(ltoldgtx/titlelt/oldgt)
  • else nsWS(ltnewgtx/titlelt/newgt)
  • Only one branch allowed to raise execution errors
  • Impacts scheduling and parallelization

41
Path Expressions by Example
  • Names of all family members (Navigation)/family/m
    ember/name ( Projection)
  • Names of four year olds./family/member_at_age
    4/name (Selection)
  • Name of the second eldest./family/member2/name
    (Selection Ranking)
  • Names of members who have a hobby./family/member
    hobby/name(Selection by Type)
  • All names (of anything).//name
    (Transitive Closure, Recursion)

42
Path expressions
  • Second order expression
  • expr1 / expr2
  • Semantics
  • Evaluate expr1 gt sequence of nodes
  • Bind . to each node in this sequence
  • Evaluate expr2 with this binding gt sequence of
    nodes
  • Concatenate the partial sequences
  • Eliminate duplicates
  • Sort by document order
  • Implicit iteration
  • A standalone step is an expression
  • step (axis, nodeTest) where
  • nodeTest (node kind, node name, node type)

43
More on Xpath expressions
  • A stand-alone step is an expression
  • Any kind of expression can be a step !
  • Two syntaxes for steps abbreviated or not
  • Step in the non-abbreviated syntax
  • axis nodeTest
  • Axis control the navigation direction in the tree
  • attribute, child, descendant, descendant-or-self,
    parent, self
  • The other Xpath 1.0 axes are optional
  • Node test by
  • Name (e.g. publisher, myNSpublisher,
    publisher, myNS , )
  • Kind of item (e.g. node(), comment(), text() )
  • Type test (e.g. element(nsPO, nsPoType),
    attribute(, xsinteger)

44
Long syntax of XPath
  • document(bibliography.xml)/childbib
  • x/childbib/childbook/attributeyear
  • x/parent
  • x/child/descendentcomment()
  • x/childelement(, nsPoType)
  • x/attributeattribute(, xsinteger)
  • x/ancestorsdocument(schema-element(nsPO))
  • x/(childelement(, xsdate)
    attributeattribute(, xsdate)
  • x/f(.)

45
XPath abbreviated syntax
  • Axis can be missing
  • By default the child axis
  • x/childperson -gt x/person
  • Short-hands for common axes
  • Descendent-or-self
  • x/descendant-or-self/childcomment()-gt
    x//comment()
  • Parent
  • x/parent -gt x/..
  • Attribute
  • x/attributeyear -gt x/_at_year
  • Self
  • x/self -gt x/.

46
XPath filter predicates
  • Syntax
  • expression1 expression2
  • is an overloaded operator
  • Filtering by position (if numeric value)
  • /book3
  • /book3/author1
  • /book3/author1 to 2
  • Filtering by predicate
  • //book author/firstname ronald
  • //book _at_price lt25
  • //book count(author _at_genderfemale )gt0
  • Classical Xpath mistake
  • x/a/b1 means x/a/(b1) and not (x/a/b)1

47
Simple iteration expression
  • Syntax
  • for variable in expression1
  • return expression2
  • Example
  • for x in document(bib.xml)/bib/book
  • return x/title
  • Semantics
  • bind the variable to each root node of the
    forest returned by expression1
  • for each such binding evaluate expression2
  • concatenate the resulting sequences
  • nested sequences are automatically flattened

48
Static context
  • Xpath 1.0 compatibility mode
  • Statically known namespaces
  • Default element/type namespace
  • Default function namespace
  • In-scope schema definitions
  • In-scope variables
  • In scope function signatures
  • Statically known collations
  • Default collation
  • Construction mode
  • Ordering mode
  • Boundary space policy
  • Copy namespace mode
  • Base URI
  • Statically known documents and collections
  • change XQuery expression semantics
  • impact compilation
  • can be set by application or by
  • prolog declarations

49
Dynamic context
  • Values for external variables
  • Values for the current item, current position and
    size
  • Implementation for external functions
  • Current date and time
  • Implicit timezone
  • Available documents and collections

50
XQuery processing
51
Local variable declaration
  • Syntax
  • let variable expression1
  • return expression2
  • Example
  • let x document(bib.xml)/bib/book
  • return count(x)
  • Semantics
  • bind the variable to the result of the
    expression1
  • add this binding to the current environment
  • evaluate and return expression2

52
FLW(O)R expressions
  • Syntactic sugar that combines FOR, LET, IF
  • Example
  • for x in //bib/book
    / similar to FROM in SQL /
  • let y x/author
    / no analogy in SQL /
  • where x/titleThe politics of experience

  • / similar to WHERE in SQL /
  • return count(y)
    / similar to SELECT in SQL /

This slide is not up-to-date, it omits ORDER BY.
53
FLWR expression semantics
  • FLWR expression
  • for x in //bib/book
  • let y x/author
  • where x/titleUlysses
  • return count(y)
  • Equivalent to
  • for x in //bib/book
  • return (let y x/author
  • return
  • if (x/titleUlysses )
  • then count(y)
  • else ()
  • )

54
More FLWR expression examples
  • Selections
  • for b in document("bib.xml")//book
  • where b/publisher Springer Verlag" and
  • b/_at_year "1998"
  • return b/title
  • Joins
  • for b in document("bib.xml")//book,
  • p in //publisher
  • where b/publisher p/name
  • return ( b/title , p/address)

55
The O in FLW(O)R expressions
  • Syntactic sugar that combines FOR, LET, IF
  • Syntax
  • for x in //bib/book
    / similar to FROM in SQL /
  • let y x/author
    / no analogy in SQL /
  • stable order by ( expr empty-handling ?
    Asc-vs-desc? Collation? )
  • / similar to ORDER-BY in SQL /
  • return count(y)
    / similar to SELECT in SQL /

56
Node constructors
  • Constructing new nodes
  • elements
  • attributes
  • documents
  • processing instructions
  • comments
  • text
  • Side-effect operation
  • Affects optimization and expression rewriting
  • Element constructors create local scopes for
    namespaces
  • Affects optimization and expression rewriting

57
Literal vs. evaluated element content
  • ltresultgt
  • literal text content
  • lt/resultgt
  • ltresultgt
  • x/name -- evaluated content --
  • lt/resultgt
  • ltresultgt
  • some content here x/name and some more
    here
  • lt/resultgt
  • Braces "" used to delineate evaluated content
  • Same works
    for attributes

58
Nested scopes
  • declare namespace nsuri1
  • for x in fndoc(uri)/nsa
  • where x/nsb eq 3
  • return
  • ltresult xmlnsnsuri2gt
  • for x in fndoc(uri)/nsa
  • return x / nsb
  • lt/resultgt

Local scopes impact optimization and rewriting !
59
Operators on datatypes
  • expression instanceof sequenceType
  • returns true if its first operand is an instance
    of the type named in its second operand
  • expression castable as singleType
  • returns true if first operand can be casted as
    the given sequence type
  • expression cast as singleType
  • used to convert a value from one datatype to
    another
  • expression treat as sequenceType
  • treats an expr as if its datatype is a subtype of
    its static type (down cast)
  • typeswitch
  • case-like branching based on the type of an input
    expression

60
Schema validation
  • Explicit syntax
  • validate validation mode expression
  • Validation mode strict or lax
  • Semantics
  • Translate XML Data Model to Infoset
  • Apply XML Schema validation
  • Ignore identity constraints checks
  • Map resulting PSVI to a new XML Data Model
    instance
  • It is not a side-effect operation

61
Ignoring order
  • In the original application XML was totally
    ordered
  • Xpath 1.0 preserves the document order through
    implicit expensive sorting operations
  • In many cases the order is not semantically
    meaningful
  • The evaluation can be optimized if the order is
    not required
  • Ordered expr and unordered expr
  • Affect path expressions, FLWR without order
    clause, union, intersect, except
  • Leads to non-determinism
  • Semantics of expressions is again context
    sensitive
  • let x (//a)1 unordered
    (//a)1/b
  • return unordered x/b

62
Functions in XQuery
  • In-place XQuery functions
  • declare function nsfoo(x as xsinteger) as
    element()
  • ltagt x1lt/agt
  • Can be recursive and mutually recursive
  • External functions

XQuery functions as database views
63
How to pass input data to a query ?
  • External variables (bound through an external
    API)
  • declare variable x as xsinteger external
  • Current item (bound through an external API)
  • .
  • External functions (bound through an external
    API)
  • declare function orasql(x as xsstring) as
    node() external
  • Specific built-in functions
  • fndoc(uri), fncollection(uri)

64
XQuery optional features
  • Schema import feature
  • Static typing feature
  • Full axis feature
  • Module feature

65
Typed vs. untyped XML Data
  • Untyped data (non XML Schema validated)
  • ltagt3lt/agt eq 3
  • ltagt3lt/agt eq 3
  • Typed data (after XML Schema validation)
  • lta xsitypexsintegergt3lt/agt eq 3
  • lta xsitypexsstringgt3lt/agt eq 3
  • lta xsitypexsintegergt3lt/agt eq 3
  • lta xsitypexsstringgt3lt/agt eq 3

66
XML data equivalence
  • XQuery has multiple notions of data equality
  • , eq, is, fndeep-equal()
  • Expected properties
  • Transitivity, reflexivity and symmetry
  • Necessary for grouping, indexing and hashing
  • Additional property
  • if ( data1 equal data2 ) then ( f(data1) equal
    f(data2) )
  • Necessary for memoization, caching
  • None of the equality relationships above (except
    is) satisfies those properties
  • The is relationship only applies to nodes
  • Careful implementations for indexes, hashing,
    caches

67
XQuery type system
  • XQuery has a powerful (and complex!) type system
  • XQuery types are imported from XML Schemas
  • Every XML data model instance has a dynamic type
  • Every XQuery expression has a static type
  • Pessimistic static type inference
  • The goal of the type system is
  • detect statically errors in the queries
  • infer the type of the result of valid queries
  • ensure statically that the result of a given
    query is of a given (expected) type if the input
    dataset is guaranteed to be of a given type

68
XQuery type system components
  • Atomic types
  • xdtuntypedAtomic
  • All 19 primitive XML Schema types
  • All user defined atomic types
  • Empty, None
  • Type constructors (simplification!)
  • Elements element name type
  • Attributes attribute name type
  • Alternation type1 type 2
  • Sequence type1, type2
  • Repetition type
  • Interleaved product type1 type2
  • type1 intersect type2 ?
  • type1 subtype of type2 ?
  • type1 equals type2 ?

69
SQL vs. XQuery
  • XQuery has implicit Operations
  • casts, exists, duplicate elimination, sorting,
    ...
  • Important for heterogeneous dataImportant for
    queries if the schema is unknown
  • XQuery has Constructors
  • Important for Transformations of Messages (Info
    Hubs)
  • XQuery can be used for Documents
  • Important for natural-language processing, CMS
  • XQuery ist Turing-complete
  • Can be extended to be a full-fledge PL
  • XQuery has formals semantics
  • Easier to implement, optimize, and teach (???)

70
  • Give Company of Authors (implicit exists)
  • for a in ltaddressgt
  • ltnamegtChamberlinlt/namegt
  • ltcompanygtIBMlt/companygt
  • lt/addressgt ...
  • for b in ltarticlegt
  • ltauthorgtChamberlinlt/authorgt
  • ltauthorgtFlorescult/authorgt ...
  • lt/articlegt
  • where a//name b//author
  • return a//company

71
SQL vs. XQuery
  • SELECT auttor
  • FROM article
  • ERROR!
  • article/auttor
  • () or ERROR!

72
Constructors / Transformation
  • This is legal XQuery
  • ltbookgt
  • ltauthorgtChamberlinlt/authorgt
  • lttitlegtDB2 Universal Databaselt/titlegt
  • lt/bookgt
  • This is also legal XQuery
  • ltbookgt
  • ltauthorgt addresscompany IBM/name
  • lt/authorgt
  • lttitlegtDB2 Universal Databaselt/titlegt
  • lt/bookgt
  • SQL needs DDL Operations (Administrator) for this!

73
Transformation
  • Group Books by Author
  • for a in distinct-values(bib//author)
  • let t bib//author a//title
  • return
  • ltbibgt
  • ltauthor name a gt t lt/authorgt
  • lt/bibgt

74
Transformation
  • Group Books by Author
  • ltbibgt
  • ltbookgt
  • ltauthorgtChamberlinlt/authorgt
  • lttitlegtDB2 Universal databaselt/titlegt
  • lt/bookgt
  • ...
  • lt/bibgt

75
Transformation
  • Group Books by Author
  • ltbibgt
  • ltauthor name Chamberlingt
  • lttitlegtDB2 Universal databaselt/titlegt
  • lttitlegtQuilt An XML Query...lt/titlegt
  • ...
  • lt/authorgt
  • ...
  • lt/bibgt

76
Library modules (example)
Importing module
Library module
  • module namespace modmoduleURI
  • declare namespace nsURI1
  • define variable modzero as xsinteger 0
  • define function modadd(x as xsinteger, y as
    xsinteger)
  • as xsinteger
  • xy

import module namespace nsmoduleURI nsadd(2,
nszero)
77
Some missing functionalities
  • Standard semantics for Web services invocation
  • Try-catch mechanism
  • Window-based aggregates
  • Group by
  • Distinct by
  • Eval () function
  • Full text search ()
  • Updates()
  • Integrity constraints / assertions
  • Metadata introspection

78
XQuery Full Text Search Extension
  • Complete specification
  • Current W3C Working Draft
  • Examples
  • /book_at_year2004" and ./title ftcontains
    "Expert"
  • for book in /book.//author ftcontains Laing"
  • let score ftscore(book/title ftcontains
    "Web Site Usability")
  • where score gt 0.8
  • order by score descending return book/_at_number

79
A fraction of a real customer XQuery
80
let wlc document("tests/ebsample/data/ebSample
.xml") let ctrlPackage "foo.pkg" let wfPath
"test" let tp-list for tp in
wlc/wlc/trading-partner return lttrading-partner
name"tp/_at_name"
business-id"tp/party-identifier/_at_business-id"
description"tp/_at_description"
notes"tp/_at_notes" type"tp/_at_type"
email"tp/_at_email"
phone"tp/_at_phone" fax"tp/_at_fax"
username"tp/_at_user-name"
81
for tp-ad in tp/address
return tp-ad for
eps in wlc/extended-property-set where
tp/_at_extended-property-set-name eq eps/_at_name
return eps for
client-cert in tp/client-certificate
return ltclient-certificate
name"client-cert/_at_name" gt
lt/client-certificategt
82
for server-cert in tp/server-certific
ate return ltserver-certificate
name"server-cert/_at_name"
gt lt/server-certificategt
for sig-cert in tp/signature-certificate
return ltsignature-certificate
name"sig-cert/_at_name" gt
lt/signature-certificategt for
enc-cert in tp/encryption-certificate
return ltencryption-certificate
name"enc-cert/_at_name" gt
lt/encryption-certificategt
83
for eb-dc in
tp/delivery-channel for eb-de
in tp/document-exchange for
eb-tp in tp/transport where
eb-dc/_at_document-exchange-name eq eb-de/_at_name
and eb-dc/_at_transport-name
eq eb-tp/_at_name and
eb-de/_at_business-protocol-name eq "ebXML"
return ltebxml-binding
name"eb-dc/_at_name"
business-protocol-name"eb-de/_at_b
usiness-protocol-name"
business-protocol-version"eb-de/_at_protocol-versi
on" \
is-signature-required"eb-dc/_at_nonrepudiation-of-
origin"
is-receipt-signature-required"eb-dc/_at_nonrepudia
tion-of-receipt"
signature-certificate-name"eb-de/EBXML-binding/
_at_signature-certificate-n"
delivery-semantics"eb-de/EBXML-binding/_at_delive
ry-semantics"
if(xfempty(eb-de/EBXML-binding/_at_ttl))
then()
else attribute persist-duration
concat((eb-de/EBXML-binding/_at_ttl
div 1000), " seconds")
84
if( xfempty(eb-de/EBX
ML-binding/_at_retries))
then () else
eb-de/EBXML-binding/_at_retries
if(
xfempty(eb-de/EBXML-binding/_at_retry-interval))
then ()
else attribute retry-interval
concat((eb-de/EBXML-binding/_at_ret
ry-interval div 1000), " seconds")
lttransport
protocol"eb-tp/_at_protocol"
protocol-version"eb-tp/_at_protocol-ve
rsion"
endpoint"eb-tp/endpoint1/_at_uri"
gt
85
for ca in wlc/wlc/collaboration-agreement
for p1 in
ca/party1 for
p2 in ca/party2
for tp1 in wlc/wlc/trading-partner
for tp2 in
wlc/wlc/trading-partner
where p1/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
or p2/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
86
return
if (p1/_at_trading-partner-nametp/_at_name)
then
ltauthentication

client-partner-name"tp2/_at_name"

client-certificate-name"tp2/client-certificate/
_at_name"
client-authentication"

if(xfempty(tp2/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp1/_at_type"REMOTE")
then

tp1/server-certificate/_at_name
else ""
"

server-authentication"
if(eb-tp/_at_protocol"htt
p")
then "NONE"
else "SSL_CERT"
"
87
gt
lt/authenticationgt
else
ltauthentication
client-partner-name"tp1/_at_na
me"
client-certificate-name"tp1/client-certifica
te/_at_name"
client-authentication"

if(xfempty(tp1/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp2/_at_type"REMOTE")
then
tp2/server-certificate/_at_name
else ""
"

server-authentication"
if(eb-tp/_at_protocol"htt
p")
then "NONE"
else "SSL_CERT"
"
gt
lt/authenticationgt

88
lt/transportgt
lt/ebxml-bindinggt -- RosettaNet
Binding -- for eb-dc
in tp/delivery-channel for
eb-de in tp/document-exchange
for eb-tp in tp/transport where
eb-dc/_at_document-exchange-name eq eb-de/_at_name
and eb-dc/_at_transport-name
eq eb-tp/_at_name and
eb-de/_at_business-protocol-name eq "RosettaNet"
return
ltrosettanet-binding
name"eb-dc/_at_name"
business-protocol-name"eb-de/_at_business-protocol
-name"
business-protocol-version"eb-de/_at_protocol-versi
on"
89
is-signature-required"eb-dc/_at_nonrepudiation-of
-origin"
is-receipt-signature-required"eb-dc/_at_nonrepudia
tion-of-receipt"
signature-certificate-name"eb-de/RosettaNet-bin
ding/_at_signature-certi\ ficate-name"
encryption-certificate-name"eb-de/Ro
settaNet-binding/_at_encryption-cer\ tificate-name"
cipher-algorithm"eb-de/
RosettaNet-binding/_at_cipher-algorithm"
encryption-level"
if (eb-de/RosettaNet-binding/_at_encr
yption-level 0)
then "NONE" else
if(eb-de/RosettaNet-binding/_at_encryption-level
1) then
"PAYLOAD"
else "ENTIRE_PAYLOAD"
" -- process-timeout"eb-d
e/RosettaNet-binding/_at_time-out" --
gt
if( xfempty(eb-de/RosettaNet-binding/_at_retries))
then ()
else eb-de/RosettaNet-binding/_at_retries

90
if(xfempty(eb-de/Rose
ttaNet-binding/_at_retry-interval))
then () else
attribute retry-interval
concat((eb-de/RosettaNet-binding/_at_retry-i
nterval div 1000), "\ seconds")

if(xfempty(eb-de/RosettaNet-binding/_at_time-out))
then()
else attribute process-timeout
concat((eb-de/RosettaNet-bindi
ng/_at_time-out div 1000), " secon\ ds")
lttransport
protocol"eb-tp/_at_protocol"
protocol-version"eb-tp/_at_protoco
l-version"
endpoint"eb-tp/endpoint1/_at_uri"
gt
91
for ca in wlc/wlc/collaboration-agreement
for p1 in
ca/party1 for
p2 in ca/party2
for tp1 in wlc/wlc/trading-partner
for tp2 in
wlc/wlc/trading-partner
where p1/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
or p2/_at_delivery-channel-name eq
eb-dc/_at_name and
tp1/_at_name eq p1/_at_trading-partner-name
and tp2/_at_name eq
p2/_at_trading-partner-name
return
if (p1/_at_trading-partner-nametp/_at_name)
then

ltauthentication
92
ltauthentication
client-partner-name"tp2/_at_name"

client-certificate-name"tp2/client-certificate/
_at_name"
client-authentication"

if(xfempty(tp2/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp1/_at_type"REMOTE")
then

tp1/server-certificate/_at_name

else ""
"
server-authentication"

if(eb-tp/_at_protocol"http")
then "NONE"
else
"SSL_CERT"
"
gt
lt/authenticationgt
93
else
ltauthentication
client-partner-name"tp1/_at_name"

client-certificate-name"tp1/client-certificate/
_at_name"
client-authentication"

if(xfempty(tp1/client-certificate))
then
"NONE"
else "SSL_CERT_MUTUAL"
"
server-certificate-n
ame"
if(tp2/_at_type"REMOTE")
then

tp2/server-certificate/_at_name

else ""
"
server-authentication"

if(eb-tp/_at_protocol"http")
then "NONE"
else
"SSL_CERT"
"
gt
lt/authenticationgt
94
lt/transportgt
lt/rosettanet-bindinggt lt/trading-partne
rgt let sv for cd in wlc/wlc/conversation-de
finition for role in cd/role where
xfnot(xfempty(role/_at_wlpi-template) or
role/_at_wlpi-template"") and cd/_at_business-protoc
ol-name"ebXML" or cd/_at_business-protocol-name"Ro
settaNet" return ltservicePairgt
ltservice name"xfconcat(wfPa
th, role/_at_wlpi-template, '.jpd')"
description"role/_at_description"
note"role/_at_note"
service-type"WORKFLOW"
business-protocol"xfupper-case(cd/_at_business-pr
otocol-name)" gt
95
. . . (60 )
96
XQuery Use Case Scenarios
  • XML transformation language in Web Services
  • Large and very complex queries
  • Input message external data sources
  • Small and medium size data sets (xK -gt xM)
  • Transient and streaming data (no indexes)
  • With or without schema validation
  • XML message brokers
  • Simple path expressions, single input message
  • Small data sets
  • Transient and streaming data (no indexes)
  • Mostly non schema validated data
  • Semantic data verification
  • Mostly messages
  • Potentially complex (but small) queries
  • Streaming and multiquery optimization required

97
XQuery Usage Scenarios (cont.)
  • Data Integration
  • Complex but smaller queries (FLOWRs, aggregates,
    constructors)
  • Large, persistent, external data repositories
  • Dynamic data (via Web Services invocations)
  • Large volumes of centralized XML data
  • Logs and archives
  • Complex queries (statistics, analytics)
  • Mostly read only
  • Large content repositories
  • Large volume of data (books, manuals, etc)
  • With or without schema validation
  • Full text essential, update required

98
XQuery Usage Scenarios (cont.)
  • Large volumes of distributed textual data
  • XML search engines
  • High volume of data sources
  • Full text, semantic search crucial
  • RSS data
  • High volume of input data channels
  • Data is pushed, not pulled
  • Structure of the data very simple, each item
    bounded size
  • Aggregators using mostly full-text search

99
XQuery Usage Scenarios (cont.)
  • Data Integration
  • Complex but smaller queries (FLOWRs, aggregates,
    constructors)
  • Large, persistent, external data repositories
  • Dynamic data (via Web Services invocations)
  • Large volumes of centralized XML data
  • Logs and archives
  • Mostly read only
  • Large volumes of distributed textual data
  • XML data sources scattered on the Web
  • BLOGS
  • Lots (e.g. millions) of input data channels
  • Data is pushed, not pulled
  • Structure of the data very simple, each item
    bounded size
  • Aggregators using mostly full-text search

100
Criteria for XQuery usages
  • Type of queries (e.g. simple, complex,
    construction-intensive, full text search
    intensive)
  • Volume of queries
  • Native XML or virtual XML views of other forms of
    data
  • XML Schema validated data or not
  • Volume of data per query
  • Number of data sources
  • Transient data vs. persistent data
  • Push data vs. pull data
  • Typed vs. untyped data
  • Read only data vs. updatable data
  • Distributed vs. centralized data sets
  • Data compressed/encrypted or not
  • Target architectures
  • Customer expectation

Each scenario requires different processing
techniques.
101
XUpdate
102
Update XQuery extension
  • Activity in W3C is just beginning
  • W3C Requirements document
  • Use as transformation DB operation
    (side-effect)
  • Preserve Ids of affected nodes! (No
    NodeConstruction!)
  • Tentative examples
  • delete //book_at_yearlt1968
  • insert ltauthor/gt into //book_at_ISBN34556
  • for x in //book
  • where x/yearlt2000 and x/price gt100
  • do replace value of x/price with
    x/price-0.3x/price
  • if (book/price gt200) then do rename book as
    expensive-book

103
Overview
  • Insert
  • Insert new XML instances
  • Delete
  • Delete XML instances
  • Replace, Rename
  • Replace/Rename XML Instances
  • Empty Update
  • No operation
  • FLWUpdate
  • bulk update (For-Loop)
  • Conditional Update
  • Conditional update (If)

104
INSERT - Variant 1
  • Insert a new element into a documentinsert
    UpdateContent into TargetNode
  • UpdateContent any sequence of items (nodes,
    values)
  • TargetNode Exactly one document or element
  • otherwise ERROR
  • Specify whether to insert at the beginning or end
  • as last Content becomes first child of Target
    (default)
  • as first Content becomes last child of Target
  • Nodes in Content assume a new Id.
  • Whitespace, Textconventions as in
    ElementConstruction of XQuery

105
INSERT Variant 1
  • Insert new book at the end of the library
  • insert ltbookgt lttitlegtDie wilde Wutzlt/titlegt
    lt/bookgt
  • into document(www.uni-bib.de)//bib
  • Insert new book at the beginning of the
    libraryinsert ltbookgt lttitlegtDie wilde
    Wutzlt/titlegt lt/bookgt
  • as first into document(www.uni-bib.de)//bib
  • Insert new attribte into an element
  • insert (attribute age 13 , ltparents xsinil
    true/gt)
  • into document(ewm.de)//person_at_name KD

106
INSERT - Variant 2
  • Insert at a particular point in the
    documentinsert UpdateContent (after before)
    TargetNode
  • UpdateContent No attributes allowed!
  • TargetNode One Element, Comment or PI.
  • Otherwise ERROR
  • Specify whether before or behind target
  • Before vs. After
  • Nodes in Content assume new Identity
  • Whitespace, Text conventions as
    ElementConstructors of XQuery

107
Insert - Variant 2
  • Add an author to a book

insert ltauthorgtFlorescult/authorgt before
//articletitle XL/author. Grünhagen
108
INSERT - Open Questions
  • Insert in Schema-validated Documents?
  • When and how to validate types?
Write a Comment
User Comments (0)
About PowerShow.com