Title: Query Processing with XML
1Query Processing with XML
- CSE 350 Advanced Database Topics
- Jeffrey R. Ellis
2Query Processing Topics
- Why?
- Java and Other Programming Languages
- XPath/XSLT
- XQuery (W3C-sponsored Query Language)
- Current Research
- Other Query Languages
- XISS (XML Indexing and Storage System)
3FIRST Distinction between XML and HTML/Web
Technologies
- XML spotlight is analogous to Java
- Immediate benefits applied to World Wide Web
- Long-range, more exciting benefits in
applications - XML IS NOT AN HTML REPLACEMENT
- HTML marks pages up for presentation on the web
- XML marks text for semantic information purposes
- XML can encode HTML pages, but HTML works well on
the Web
4XML Data Storage
- XML Documents
- Data is delineated semantically
- Schemas/DTDs control contents of elements
- Semi-structured attitude allows flexibility
- Text is human-readable and machine-parsable
- Open standards work with common tools
- File data storage allows for easy sharing
- Can queries control access to data?
5Traditional Database Storage
- Databases
- Data is delineated semantically
- Schemas control contents of rows
- No flexibility from semi-structured storage
- Data is not human-readable, but only
machine-parsable - Proprietary standards prevent interoperability
- Proprietary storage prevents data sharing
- Queries control access to data
6XML for Query Processing
- If we can get efficient query processing, XML
document storage provides many benefits over
traditional database storage. - Sample application
- Employee database document
- XML Schema assumed to exist
- Employee information queried as per standard HR
processing
7- lt?xml version"1.0"?gt
- lt!DOCTYPE employees SYSTEM "employee.xsd"gt
- ltemployeesgt
- ltemp gender'm'gt
- ltnamegt
- ltlastgtBisselllt/lastgt
- ltfirstgtBrianlt/firstgt
- lt/namegt
- ltpositiongtIT Specialistlt/positiongt
- ltsalarygt35,000lt/salarygt
- ltlocationgtCTlt/locationgt
- lt/empgt
- ltemp gender'm'gt
- ltnamegt
- ltlastgtPhamlt/lastgt
- ltfirstgtHunglt/firstgt
- ltmigtQlt/migt
- lt/namegt
- ltpositiongtSenior IT Specialistlt/positiongt
8Tree Structure of XML Document
- Remember that XML documents are trees
emp
gender
name
position
salary
location
last
first
mi
9Query Processing Programming Languages
- XML Documents are flat files
- Any language with file I/O can read XML document
- Any language with string parsing capabilities can
use XML data - Query processing done through language syntax
- Obvious result different from traditional
databases
10Query Processing Programming Languages
- Strategy
- Basic File I/O through language
- Basic String matching to identify elements
- Processing possible, but not necessarily
efficient - Languages have gathered XML processing tools in
libraries - xerces Apache library for Java and C
- Two methods for parsing XML data
- DOM
- SAX
11DOM
- Document Object Model
- Defined by W3C for XML, HTML, and stylesheets
- Provides an hierarchical, object-view of the
document - DOMParser parses through file, then provides
access to nodes - Key Every item in XML document is a node
12DOM Example
Node (Element) nameemp attribute1 child1
Node (Attr) namegender valuem parent
Node (Element) namename parent child1
Node (Element) namelast parent child1
Node (Text) valueBissell parent
13SAX
- Simple API for XML
- Defined by XML-DEV mailing list
- Provides an event-driven processing of the
document - XMLReader parses through file and activates
different methods and functions based on the
elements retrieved - Key Methods are defined in interface,
implemented in user code
14DOM versus SAX
- SAX is primarily Java-based DOM defined for most
languages - DOM requires storage of entire document in
memory SAX processes as it reads - DOM mirrors a document that can be revisited
suited for document processing - SAX mirrors object lifecycles suited for data
processing
15Query Processing - XPath/XSLT
- Standard XML technologies XPath and XSLT provide
a ready-made querying infrastructure - XPath identifies the location of various document
elements - XSL Stylesheets provide methods for tranforming
data from one format to another - Combining XPath and XSLT provides easy generation
of result sets based on queries
16XPath
- Provides element, value, and attribute
identification - employees/emp/name/first Brian, Hung,
Sara, Brian - //salary 35,000, 40,000, 35,000, 60,000
- count(/employees/emp) 4
- //mi Q
17XSLT
- Stylesheet transforms data from one form into
another - ltxsltemplate matchnamegt
- ltxslvalue-of selectfirst/gt
- ltxslvalue-of selectlast/gt
- lt/xsltemplategt
- Brian Bissell, Hung Pham, Sara Menillo, Brian
Chicos
18Combine XPath and XSLT for Queries
- Query Find the last name and position of each
employee named Brian - ltxsltemplate match'employees'gt
- ltxslfor-each select'emp'gt
- ltxslif test'name/first"Brian"'gt
- ltxslvalue-of select'name/last'/gt
- ltxsltextgtlt/xsltextgt
- ltxslvalue-of select'position'/gt
- ltxsltextgt lt/xsltextgt
- lt/xslifgt
- lt/xslfor-eachgt
- lt/xsltemplategt
19Combine XPath and XSLT for Queries
- Query Find the average salary of all
non-managers - ltxsltemplate match'employees'gt
- ltxslvariable name'running_sum'gt
- ltxslvalue-of select'sum(emp/salary../p
osition!"Manager")'/gt - lt/xslvariablegt
- ltxslvariable name'running_count'gt
- ltxslvalue-of select'count(empposition!
"Manager")'/gt - lt/xslvariablegt
- ltxslvalue-of select'running_sum div
running_count'/gt - lt/xsltemplategt
20Results XSLT/XPath
- Many SQL queries can be accomplished
- XPath provides element (data) access
- XPath provides basic functions (e.g., sum() )
- XPath provides WHERE functionality
- XSLT provides SELECT functionality
- XSLT provides ORDER BY functionality (sort)
- XSLT provides result set formatting
- UNION functionality provided ..?
21Querying with XPath and XSLT
- Important questions
- Is it sufficient?
- Is it efficient?
- Is there a better way?
- XML community has need to design a full query
language - XQuery Working draft published 7 June 2001
22Query Processing - XQuery
- XML provides flexibility in representing many
kinds of information - Good query language must be likewise flexible
- Pre-XQuery languages are good for specific types
of data - Goal Small, easily implementable language in
which queries are concise and easily understood.
23XQuery Forms
- Path expressions
- Element constructors
- FLWR expressions
- Operator/Function expressions
- Conditional expressions
- Quantified expressions
- Data Type expressions
24XQuery Path Expressions
- Contribution of XPath
- XQuery 1.0 and XPath 2.0 Data Model
- document(sample1.xml)//emp/salary
- /employees/emp/name../_at_genderf
- //emp1 TO 3/name/first
25XQuery Element Constructors
- Queries can generate new elements
- Similar to XSLT abilities
- ltworkergt
- name/last
- position
- lt/workergt
26XQuery FLWR Expressions
- For clause/Let clause/Where clause/Return
- Similar to SQL
- FOR e IN document(sample1.xml)//emp
- WHERE e/salary gt 38000
- AND e/_at_gender f
- RETURN e/name
27XQuery Operator/Function Expressions
- Pre-defined and user-defined operators and
functions - Still under development Union, Intersect, Except
- FOR e IN //employees/emp
- WHERE not(empty(e//mi))
- RETURN e/name
28XQuery Conditional Expressions
- If-then-else expressions are not yet limited to
boolean (ongoing discussion) - FOR e IN /employees/emp
- RETURN
- ltworkergt
- name
- IF (e/positionManager)
- THEN ltmanager /gt
- lt/workergt
29Quanitifed Expressions
- Some/Every conditions
- Some/Every evaluates to True or False
- FOR e IN //employees
- WHERE SOME p IN e//emp/position Manager
- RETURN e
30Data Types
- Data Types based on those available from XML
Schema - Data types can be literal (Brian), from
constructor functions (date(2001-10-11) ), or
from casting ( CAST AS xsdinteger(24) ) - User-defined data types are also allowable and
parsable
31XQuery
- More choices than XSLT/XPath combination
- Work in progress
- Current W3C efforts into query language
- Influencing the future design of the core XML
technologies (XPath) - Hopes to be fully flexible for all future XML
applications
32Query Processing Research
- XQuery specification continues to undergo review
and change - 6 of 7 specification documents released since
June - All specifications released in 2001
- Other avenues of research
- Other Query languages
- Indexing strategies
- Implementation
33Query Processing Other Query Languages
- Many query languages exist
- Quilt (basis for XQuery)
- W3C early languages (XML-QL, XQL)
- Adopted traditional languages (OQL, XSQL)
- Research papers (XML-GL, YATL, Lorel)
- Other query languages often optimized for a
particular subset of XML documents - Query language field MAY be standardizing to
XQuery
34Query Processing Indexing Strategy
- Query language less important better indexing
techniques lead to efficiency - XISS (XML Indexing and Storage System)
- September 19, 2001 publishing
- Builds sets of indexes on XML data elements and
attributes on initial parse of XML document - Lookup becomes constant-time through the various
built indexes - Demonstrated successes in test runs
35Query Processing - Implementation
- XML is currently in state of flux
- Standards are still being revised
- Industry cautious before embracing a new
technology - Economic slowdown may prevent new research and
development efforts - XML still waiting for its Killer App,
application that forces immediate acceptance
36XML Query Processing
- XML is a functional database storage language
- Efficient query language needed to turn XML into
a viable database - Query language solutions are being developed
- Java/C hooks first developed OK
- XSLT/XPath implemented GOOD
- XQuery being designed GREAT?
- Future additions ????