Query Processing with XML - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Query Processing with XML

Description:

XISS (XML Indexing and Storage System) FIRST Distinction between ... Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos. Combine XPath and XSLT for Queries ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 37
Provided by: jeffrey115
Category:
Tags: xml | chicos | processing | query

less

Transcript and Presenter's Notes

Title: Query Processing with XML


1
Query Processing with XML
  • CSE 350 Advanced Database Topics
  • Jeffrey R. Ellis

2
Query Processing Topics
  • Why?
  • Java and Other Programming Languages
  • XPath/XSLT
  • XQuery (W3C-sponsored Query Language)
  • Current Research
  • Other Query Languages
  • XISS (XML Indexing and Storage System)

3
FIRST Distinction between XML and HTML/Web
Technologies
  • XML spotlight is analogous to Java
  • Immediate benefits applied to World Wide Web
  • Long-range, more exciting benefits in
    applications
  • XML IS NOT AN HTML REPLACEMENT
  • HTML marks pages up for presentation on the web
  • XML marks text for semantic information purposes
  • XML can encode HTML pages, but HTML works well on
    the Web

4
XML Data Storage
  • XML Documents
  • Data is delineated semantically
  • Schemas/DTDs control contents of elements
  • Semi-structured attitude allows flexibility
  • Text is human-readable and machine-parsable
  • Open standards work with common tools
  • File data storage allows for easy sharing
  • Can queries control access to data?

5
Traditional Database Storage
  • Databases
  • Data is delineated semantically
  • Schemas control contents of rows
  • No flexibility from semi-structured storage
  • Data is not human-readable, but only
    machine-parsable
  • Proprietary standards prevent interoperability
  • Proprietary storage prevents data sharing
  • Queries control access to data

6
XML for Query Processing
  • If we can get efficient query processing, XML
    document storage provides many benefits over
    traditional database storage.
  • Sample application
  • Employee database document
  • XML Schema assumed to exist
  • Employee information queried as per standard HR
    processing

7
  • lt?xml version"1.0"?gt
  • lt!DOCTYPE employees SYSTEM "employee.xsd"gt
  • ltemployeesgt
  • ltemp gender'm'gt
  • ltnamegt
  • ltlastgtBisselllt/lastgt
  • ltfirstgtBrianlt/firstgt
  • lt/namegt
  • ltpositiongtIT Specialistlt/positiongt
  • ltsalarygt35,000lt/salarygt
  • ltlocationgtCTlt/locationgt
  • lt/empgt
  • ltemp gender'm'gt
  • ltnamegt
  • ltlastgtPhamlt/lastgt
  • ltfirstgtHunglt/firstgt
  • ltmigtQlt/migt
  • lt/namegt
  • ltpositiongtSenior IT Specialistlt/positiongt

8
Tree Structure of XML Document
  • Remember that XML documents are trees

emp
gender
name
position
salary
location
last
first
mi
9
Query Processing Programming Languages
  • XML Documents are flat files
  • Any language with file I/O can read XML document
  • Any language with string parsing capabilities can
    use XML data
  • Query processing done through language syntax
  • Obvious result different from traditional
    databases

10
Query Processing Programming Languages
  • Strategy
  • Basic File I/O through language
  • Basic String matching to identify elements
  • Processing possible, but not necessarily
    efficient
  • Languages have gathered XML processing tools in
    libraries
  • xerces Apache library for Java and C
  • Two methods for parsing XML data
  • DOM
  • SAX

11
DOM
  • Document Object Model
  • Defined by W3C for XML, HTML, and stylesheets
  • Provides an hierarchical, object-view of the
    document
  • DOMParser parses through file, then provides
    access to nodes
  • Key Every item in XML document is a node

12
DOM Example
Node (Element) nameemp attribute1 child1
Node (Attr) namegender valuem parent
Node (Element) namename parent child1
Node (Element) namelast parent child1
Node (Text) valueBissell parent
13
SAX
  • Simple API for XML
  • Defined by XML-DEV mailing list
  • Provides an event-driven processing of the
    document
  • XMLReader parses through file and activates
    different methods and functions based on the
    elements retrieved
  • Key Methods are defined in interface,
    implemented in user code

14
DOM versus SAX
  • SAX is primarily Java-based DOM defined for most
    languages
  • DOM requires storage of entire document in
    memory SAX processes as it reads
  • DOM mirrors a document that can be revisited
    suited for document processing
  • SAX mirrors object lifecycles suited for data
    processing

15
Query Processing - XPath/XSLT
  • Standard XML technologies XPath and XSLT provide
    a ready-made querying infrastructure
  • XPath identifies the location of various document
    elements
  • XSL Stylesheets provide methods for tranforming
    data from one format to another
  • Combining XPath and XSLT provides easy generation
    of result sets based on queries

16
XPath
  • Provides element, value, and attribute
    identification
  • employees/emp/name/first Brian, Hung,
    Sara, Brian
  • //salary 35,000, 40,000, 35,000, 60,000
  • count(/employees/emp) 4
  • //mi Q

17
XSLT
  • Stylesheet transforms data from one form into
    another
  • ltxsltemplate matchnamegt
  • ltxslvalue-of selectfirst/gt
  • ltxslvalue-of selectlast/gt
  • lt/xsltemplategt
  • Brian Bissell, Hung Pham, Sara Menillo, Brian
    Chicos

18
Combine XPath and XSLT for Queries
  • Query Find the last name and position of each
    employee named Brian
  • ltxsltemplate match'employees'gt
  • ltxslfor-each select'emp'gt
  • ltxslif test'name/first"Brian"'gt
  • ltxslvalue-of select'name/last'/gt
  • ltxsltextgtlt/xsltextgt
  • ltxslvalue-of select'position'/gt
  • ltxsltextgt lt/xsltextgt
  • lt/xslifgt
  • lt/xslfor-eachgt
  • lt/xsltemplategt

19
Combine XPath and XSLT for Queries
  • Query Find the average salary of all
    non-managers
  • ltxsltemplate match'employees'gt
  • ltxslvariable name'running_sum'gt
  • ltxslvalue-of select'sum(emp/salary../p
    osition!"Manager")'/gt
  • lt/xslvariablegt
  • ltxslvariable name'running_count'gt
  • ltxslvalue-of select'count(empposition!
    "Manager")'/gt
  • lt/xslvariablegt
  • ltxslvalue-of select'running_sum div
    running_count'/gt
  • lt/xsltemplategt

20
Results XSLT/XPath
  • Many SQL queries can be accomplished
  • XPath provides element (data) access
  • XPath provides basic functions (e.g., sum() )
  • XPath provides WHERE functionality
  • XSLT provides SELECT functionality
  • XSLT provides ORDER BY functionality (sort)
  • XSLT provides result set formatting
  • UNION functionality provided ..?

21
Querying with XPath and XSLT
  • Important questions
  • Is it sufficient?
  • Is it efficient?
  • Is there a better way?
  • XML community has need to design a full query
    language
  • XQuery Working draft published 7 June 2001

22
Query Processing - XQuery
  • XML provides flexibility in representing many
    kinds of information
  • Good query language must be likewise flexible
  • Pre-XQuery languages are good for specific types
    of data
  • Goal Small, easily implementable language in
    which queries are concise and easily understood.

23
XQuery Forms
  1. Path expressions
  2. Element constructors
  3. FLWR expressions
  4. Operator/Function expressions
  5. Conditional expressions
  6. Quantified expressions
  7. Data Type expressions

24
XQuery Path Expressions
  • Contribution of XPath
  • XQuery 1.0 and XPath 2.0 Data Model
  • document(sample1.xml)//emp/salary
  • /employees/emp/name../_at_genderf
  • //emp1 TO 3/name/first

25
XQuery Element Constructors
  • Queries can generate new elements
  • Similar to XSLT abilities
  • ltworkergt
  • name/last
  • position
  • lt/workergt

26
XQuery FLWR Expressions
  • For clause/Let clause/Where clause/Return
  • Similar to SQL
  • FOR e IN document(sample1.xml)//emp
  • WHERE e/salary gt 38000
  • AND e/_at_gender f
  • RETURN e/name

27
XQuery Operator/Function Expressions
  • Pre-defined and user-defined operators and
    functions
  • Still under development Union, Intersect, Except
  • FOR e IN //employees/emp
  • WHERE not(empty(e//mi))
  • RETURN e/name

28
XQuery Conditional Expressions
  • If-then-else expressions are not yet limited to
    boolean (ongoing discussion)
  • FOR e IN /employees/emp
  • RETURN
  • ltworkergt
  • name
  • IF (e/positionManager)
  • THEN ltmanager /gt
  • lt/workergt

29
Quanitifed Expressions
  • Some/Every conditions
  • Some/Every evaluates to True or False
  • FOR e IN //employees
  • WHERE SOME p IN e//emp/position Manager
  • RETURN e

30
Data Types
  • Data Types based on those available from XML
    Schema
  • Data types can be literal (Brian), from
    constructor functions (date(2001-10-11) ), or
    from casting ( CAST AS xsdinteger(24) )
  • User-defined data types are also allowable and
    parsable

31
XQuery
  • More choices than XSLT/XPath combination
  • Work in progress
  • Current W3C efforts into query language
  • Influencing the future design of the core XML
    technologies (XPath)
  • Hopes to be fully flexible for all future XML
    applications

32
Query Processing Research
  • XQuery specification continues to undergo review
    and change
  • 6 of 7 specification documents released since
    June
  • All specifications released in 2001
  • Other avenues of research
  • Other Query languages
  • Indexing strategies
  • Implementation

33
Query Processing Other Query Languages
  • Many query languages exist
  • Quilt (basis for XQuery)
  • W3C early languages (XML-QL, XQL)
  • Adopted traditional languages (OQL, XSQL)
  • Research papers (XML-GL, YATL, Lorel)
  • Other query languages often optimized for a
    particular subset of XML documents
  • Query language field MAY be standardizing to
    XQuery

34
Query Processing Indexing Strategy
  • Query language less important better indexing
    techniques lead to efficiency
  • XISS (XML Indexing and Storage System)
  • September 19, 2001 publishing
  • Builds sets of indexes on XML data elements and
    attributes on initial parse of XML document
  • Lookup becomes constant-time through the various
    built indexes
  • Demonstrated successes in test runs

35
Query Processing - Implementation
  • XML is currently in state of flux
  • Standards are still being revised
  • Industry cautious before embracing a new
    technology
  • Economic slowdown may prevent new research and
    development efforts
  • XML still waiting for its Killer App,
    application that forces immediate acceptance

36
XML Query Processing
  • XML is a functional database storage language
  • Efficient query language needed to turn XML into
    a viable database
  • Query language solutions are being developed
  • Java/C hooks first developed OK
  • XSLT/XPath implemented GOOD
  • XQuery being designed GREAT?
  • Future additions ????
Write a Comment
User Comments (0)
About PowerShow.com