XQuery Implementation in a Relational Database System - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

XQuery Implementation in a Relational Database System

Description:

Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan ... Parser is XmlReader (not DOM) Evaluate simple XPath (without branching) during parsing ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 32
Provided by: shan72
Category:

less

Transcript and Presenter's Notes

Title: XQuery Implementation in a Relational Database System


1
XQuery Implementation in a Relational Database
System
  • Shankar Pal
  • Istvan Cseri, Oliver Seeliger, Michael Rys,
    Gideon Schaller, Wei Yu, Dragan Tomic, Adrian
    Baras, Brandon Berg, Denis Churin, Eugene Kogan
  • SQL Server
  • Microsoft Corp

2
Overview
  • Background
  • XML Support in SQL Server 2005
  • OrdPath labeling of XML nodes
  • XML indexes PATH, VALUE, PROPERTY
  • Main topic XQuery compilation
  • Architecture
  • XML operators
  • Mapping XML operators to relational ops
  • Conclusions

3
Background XML Support in SQL Server 2005
  • Create table DOCS ( ID int primary key,
  • XDOC xml)
  • XML stored in an internal, binary form (blob)
  • Optionally typed by a collection of XML schemas
  • Used for storage and query optimizations
  • 3 of 5 methods on XML data type
  • query() returns XML type
  • value() returns scalar value
  • exist() checks conditions on XML nodes
  • XML indexing
  • More information at http//msdn.microsoft.com/xml

4
Background XQuery embedded in SQL
  • Retrieve section titles from ltbookgt wrapped in
    new lttopicgt elements
  • SELECT ID, XDOC
  • .query('
  • for s in /BOOK/SECTION
  • return lttopicgt
  • data(s/TITLE)
  • lt/topicgt
  • ')
  • FROM DOCS

5
Background XQuery supported features
  • XQuery clauses for, where, return and
    order by
  • XPath axes child, descendant, parent,
    attribute, self and descendant-or-self
  • Functions numeric, string, Boolean, nodes,
    context, sequences, aggregate, constructor, data
    accessor
  • SQL Server extension functions to access SQL
    variable and column data within XQuery
  • Numeric operators (, -, , div, mod)
  • Value comparison operators (eq, ne, lt, gt, le,
    ge)
  • General comparison operators (, !, lt, gt, lt, gt)

6
Background SIGMOD04 ORDPATH Label of Nodes
node1 is ancestor of node2 ?? ORDPATH
(node1) is prefix of ORDPATH
(node2)
BOOK 1
Section 1.3
Section 1.5
_at_ISBN 1.1
Figure 1.3.3
Title 1.3.1
Title 1.5.1
Figure 1.5.3
node1 precedes node2 in document order ??
ORDPATH (node1) lt ORDPATH (node2)
ORDPATH(1.3) id lt Descendant_Limit (1.3) 1.4
7
Background VLDB 2004Indexing XML column
  • Primary XML index on an XML column
  • Creates Btree tree on data model content of the
    XML nodes
  • Adds column Path_ID for the reversed, encoded
    path from each XML node to root of XML tree
  • OrdPath labeling schema is used for XML nodes
  • Relative order of nodes
  • Document hierarchy

8
Background XML example
  • INSERT INTO myTable VALUES (7,
  • ltBook xmlns"myns" ISBN "1-55860-3612"gt
  • ltSectiongt
  • ltTitlegtBad Bugslt/Titlegt
  • lt/Sectiongt
  • ltSectiongt
  • ltTitlegt Tree frogs lt/Titlegt
  • ltFiguregtlt/Figuregt
  • lt/Sectiongt
  • lt/Bookgt)

9
Background Primary XML Index Entries
  • Encoding of tags types stored in system
    meta-data - Additional details not shown

Clustering key
10
Background Secondary XML indexes
  • To speed up different classes of commonly
    occurring queries
  • Statistics created on key columns of the primary
    and secondary XML indexes
  • Used for cost-based selection of secondary XML
    indexes

11
Background Handling Types
  • If XML column is typed
  • Values are stored in XML blob and XML indexes
    with appropriate typing
  • Untyped XML
  • Values are stored as strings
  • Convert to appropriate types for operations
  • SQL typed values stored in primary XML index
  • Most SQL types are compatible with XQuery types
    (integer)
  • Value comparisons on XML index columns suffice
  • Some types (e.g. xsdatetime) are stored in
    internal format and processed specially

12
XQuery Processing Architecture
  • XQuery Compiler
  • Parses XQuery expr
  • Checks static type correctness
  • Type annotations
  • Applies static optimiztns
  • Path collapsing
  • Rewrites using XML schemas
  • XML Operator Mapper
  • Recursively traverses XML algebra tree
  • Converts each XmlOp to reln operator sub-tree
  • Mapping depends upon existence of primary XML
    index

XQuery expression
XQuery Compiler
XML algebra tree (XmlOp ops)
XML Operator Mapper
Relational Operator Tree (relational operators)
Reln Query Processor
13
Examples of XML Operators
14
XML Operator Mapping Overview
PATH Index
XQUERY
PK
OrdPath
XML
PK
Primary XML Index
VALUE Index
PROPERTY Index
Special handling for SELECT XDOC
REL tree
15
New operators
  • Some produce N rows from M (? N) rows
  • XML_Reader streaming, pull-model XML parser
  • XML_Serializer to serialize query result as XML
  • Some are for efficiency
  • Contains to evaluate XQuery contains()
  • TextAdd to evaluate the XQuery function
    string()
  • Data to evaluate XQuery data() function
  • Some are for specific needs
  • Check validate XML during insertion or
    modification

16
XML Operator Mapping
  • Following categories
  • Mapping of XPath expressions
  • Mapping of XQuery expressions
  • Mapping of XQuery built-in functions

17
XPath Expressions
  • Two cases
  • Fully known, forward paths without branching
    after path collapsing
  • Paths without branching that are not fully known
    after path collapsing
  • Segments of the path cannot be collapsed or a
    path is split into multiple segments
  • Occurs most commonly for paths containing
    wildcard steps, //, self and parent axes
  • Evaluated using LIKE operator on XML idx

18
Non-indexed XML, Full Path
  • XML_Reader produces subtrees of ltSECTIONgt
  • Node table rows
  • Contains OrdPath
  • No PK or PATH_ID
  • XML_Serialize reassembles those row into XML data
    type
  • To output result

XML operator tree XmlOp_Path PATH
/BOOK/SECTION Rel operator tree XML_Serializ
e XML_Reader (XDOC, /BOOK/SECTION)
19
Query Execution on XML Blob
  • XDOC column value in each row parsed at runtime
  • Parser is XmlReader (not DOM)
  • Evaluate simple XPath (without branching) during
    parsing
  • Rest of processing done in memory using
    relational operators
  • // and are also pushed into XML_Reader

SELECT ID, XDOC.query ('
/BOOK/SECTION 2 ') FROM DOCS
20
Sample query execution using Primary XML Index
  • /Book/Section ? 31 (by XML Op Mapper)

Clustering key
21
Indexed XML, Full Path
  • XmlOp_Path mapped to SELECT
  • GET(PXI) rows from primary XML index
  • Match PATH_ID
  • Not shown
  • JOIN with base table on PK

XML_Serialize
Assemble Subtree
Apply
Select (b)
Select
b.OrdP OrdPlt DL(b)
GET (PXI)
GET (PXI)
Path_IDSECTIONBOOK
22
XML index PATH
  • Speeds up path evaluations
  • Example /Book/Section ? 31

23
Indexed XML, Imprecise Paths
  • /BOOK/SECTION//
  • TITLE
  • Matched using LIKE operator on Path_ID

XML_Serialize
Apply
Assemble subtree of ltTITLEgt
Select (s)
GET (PXI)
Path_ID LIKE TITLESECTIONBOOK
24
Predicate Evaluation
  • /BOOK_at_ISBN 12
  • Search value compared with VALUE column in PXI
  • Collapsed path /BOOK/_at_ISBN
  • Induce index seeks
  • Reduce intermediate result size
  • Parent check Par(b)
  • Using OrdPath
  • Value conversion might be needed

XML_Serialize
Apply
Apply
Assemble subtree of ltBOOKgt
Select (b)
Select
Path_ID_at_ISBNBOOK VALUE12 Par(b)
GET (PXI)
GET (PXI)
Path_IDBOOK
25
Ordinal Predicate
  • /BOOKn
  • Adds ranking column to the rows for ltBOOKgt
    elements
  • Retrieves the nth ltBOOKgt node
  • Special optimizations
  • 1 ? TOP 1 ascending
  • last() ? TOP 1 descending
  • Avoids sorting when input is sorted
  • Example in XML_Serializer

26
Error handling
  • Static type errors at compilation time
  • Raises static type errors if an expression could
    fail at runtime due to type safety violation
  • Addition of string to integer
  • Querying non-existent node name in typed XML
  • Non-singleton in eq
  • Some can be fixed using explicit cast or ordinal
    specification
  • Dynamic error converted to empty sequence
  • Yields correct result in predicates without
    negations

27
for Iterator
XML_Serialize
  • for s in /BOOK//SECTION
  • where s/_at_num gt 3
  • return s/TITLE
  • XML op for for is XmlOp_Apply
  • Maps to APPLY
  • Binds s and iterates over ltSECTIONgt
  • Determines its ltTITLEgt children
  • Nested for and for with multiple bindings
    turn into nested APPLY
  • Each APPLY binds to a different variable

Apply
Assemble ltSECTIONgt
Apply (s)
Path_ID LIKE TITLESECTIONBOOK Par(s)
Select
Exists
Select (s)
Select
GET(PXI)
GET (PXI)
Path_ID LIKE _at_numSECBK VALUE gt 3 Par(s)
Path_ID LIKE SECTIONBOOK
28
XQuery order by and where
  • Order by
  • Sorts rows based on order-by expression
  • Adds a ranking column to these rows
  • Ranking column converted into OrdPath values
  • Yield the new order of the rows
  • Fits rest of query processing framework
  • Where
  • Becomes SELECT on input sequence
  • Filters rows satisfying specified condition

29
XQuery return
  • Return nodes sequence in document order
  • Use OrdPath values and
  • XML_Serialize operator
  • New element and sequence constructions
  • Merge constructed and existing nodes into a
    single sequence (SWITCH_UNION)

30
XQuery Functions Operators
  • Built-in fn and op are mapped to relational fn
    and op if possible
  • fncount() ? count()
  • Additional support for XQuery types, functions
    and operators that cannot be mapped directly
  • Intrinsics

31
Optimizations
  • Exploiting Ordered Sets
  • Sorting information (OrdPath) made available to
    further relational operators
  • XML_Serialize is an example
  • Using static type information
  • Eliminates CONVERT() in operations
  • Allows range scan on VALUE index

32
Conclusions
  • Built-up infrastructure for query processing
    framework
  • Other XQuery features (such as let and
    typeswitch) can be implemented
  • Data modification language
  • Fits into relational query processing framework
  • XQuery features can be implemented using rel
    operators
  • Optimizations pose the biggest challenges
  • More cost-based optimizations can be done
  • Enhanced costing model (e.g. choice of PXI)
  • Matching materialized views

33
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com