Introduction to XML Algebra - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to XML Algebra

Description:

Goals of Niagara Algebra. Be independent of schema information ... Optimization with Niagara. Optimizer based on Niagara algebra: ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 37
Provided by: webC
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to XML Algebra


1
Introduction to XML Algebra
  • Based on talk prepared for CS561 by Wan Liu and
    Bintou Kane

2
Data Model
  • data model core data structures and data types
    supported by DBMS
  • relational database is a table (set-oriented)
    data model
  • XML format is a tree-structured hierarchical
    model

3
Why XML Algebra?
  • It is common to translate a query language into
    an algebra.
  • First, the algebra is used to give a semantics
    for the query language.
  • Second, the algebra is used to support query
    optimization.

4
XML Algebra History
  • Lore Algebra (August 1999)
  • -- Stanford University
  • IBM Algebra (September 1999)
  • --Oracle IBM Microsoft Corp
  • YAT Algebra (May 2000)
  • ATT Algebra (June 2000)
  • --ATT Bell Labs
  • Niagara Algebra (2001)
  • -- University of Wisconsin -Madison

5
NIAGARA
  • Title Following the paths of XML Data An
    algebraic framework for XML query evaluation
  • By Leonidas Galanis, Efstratios Viglas, David
    J. DeWitt, Jeffrey. F. Naughton, and David Maier.
  • Univ. of Wisconsin

6
Outline
  • Concepts of Niagara Algebra
  • Operations
  • Optimization

7
Goals of Niagara Algebra
  • Be independent of schema information
  • Query on both structure and content
  • Generate simple, flexible, yet powerful algebraic
    expressions
  • Allow re-use of traditional optimization
    techniques

8
Example XML Source Documents
Invoice.xml ltInvoice_Documentgt ltinvoice No
1gt ltaccount_numbergt2 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.25lt/totalgt
lt/invoicegt ltinvoicegt ltaccount_numbergt1
lt/account_numbergt ltcarriergtSprintlt/carriergt
lttotalgt1.20lt/totalgt lt/invoicegt
ltinvoicegt ltaccount_numbergt1 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.75lt/totalgt
lt/invoicegt lt/Invoice_Documentgt
  • Customer.xml
  • ltCustomer_Documentgt
  • ltcustomergt
  • ltaccountgt1 lt/accountgt
  • ltnamegtTom lt/namegt
  • lt/customer gt
  • ltcustomergt
  • ltaccountgt2 lt/accountgt
  • ltnamegtGeorge lt/namegt
  • lt/customer gt
  • lt/Customer _Documentgt

9
XML Data Model and Tree Graph
  • Example

Invoice_Document
ltInvoice_Documentgt ltinvoicegt
ltnumbergt2lt/numbergt ltcarriergtSprintlt/carriergt
lttotalgt0.25lt/totalgt lt/invoicegt
ltinvoicegt ltnumbergt1lt/numbergt ltcarriergtSprintlt/car
riergt lttotalgt1.20lt/totalgt lt/invoicegt lt/Invoice
_Documentgt

Invoice
Invoice
number
carrier
number
total
total
carrier
2
ATT
0.25
1
1.20
Sprint
Ordered Tree Graph, Semi structured Data
10
XML Data Model GVDNM01
  • Collection of bags of vertices.
  • Vertices in a bag have no order.
  • Example

Root invoice.xml invoice
invoice.account_number
lt account_number gt element-content lt/
account_number gt
ltinvoicegt Invoice-element-content lt/invoicegt
Rootinvoice.xml, invoice, invoice.
account_number
11
Data Model
  • Bag elements are reachable by path expressions.
  • Path expression consists of two parts
  • An entry point
  • A relative forward part
  • Example account_numberinvoice

12
Operators
  • Source S , Follow ?, Select ?, Join , Rename
    ?, Expose ?, Vertex ?, Group ?, Union ?,
    Intersection ?, Difference - , Cartesian Product
    ?.

13
Source Operator S
  • Input a list of documents
  • Output a collection of singleton bags
  • Examples
  • S () All Known XML documents
  • S (invoice.xml) All XML documents whose
    filename match
  • invoice.xml
  • S (,schema.dtd) All known XML documents that
    conform
  • to
    schema.dtd

14
Follow operator ?
  • Input a path expression in entry point notation
  • Functionality extracts vertices reachable by
    path expression
  • Output a new bag that consists of the extracted
    vertex all contents of original bag (in case of
    unnesting follow)

15
Follow operator (Example)
Root invoice.xml , invoice, invoice.carrier
Root invoice.xml invoice
invoice.carrier
ltcarriergt carrier -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
Unnesting Follow
?(carrierinvoice)
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice
16
Select operator ?
  • Input a set of bags
  • Functionality filters the bags of a collection
    using a predicate
  • Output a set of bags that conform to the
    predicate
  • Predicate Logical operator (?,?,?), or simple
    qualifications (?,?,?,?,?,?)

17
Select operator (Example)
Root invoice.xml , invoice,
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
? invoice.carrier Sprint
Root invoice.xml invoice
Root invoice.xml invoice
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt Invoice-element-content lt/invoicegt
Root invoice.xml , invoice, Root invoice.xml
, invoice,
18
Join operator
  • Input two collections of bags
  • Functionality Joins the two collections based on
    a predicate
  • Output the concatenation of pairs of pages that
    satisfy the predicate

19
Join operator (Example)
Root invoice.xml , invoice, Root customer.xml ,
customer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
account_number invoice numbercustomer
Root invoice.xml invoice
Root customer.xml customer
ltinvoicegt Invoice-element-content lt/invoicegt
ltcustomergt customer-element-content lt/customergt
Root invoice.xml , invoice
Root customer.xml , customer
20
Expose operator ?
  • Input a list of path expressions of vertices to
    be exposed
  • Output a set of bags that contains vertices in
    the parameter list with the same order

21
Expose operator (Example)
Root invoice.xml , invoice.bill_period,
invoice.carrier
Root invoice.xml invoice.
bill_period invoice.carrier
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt carrier-element-content lt/invoicegt
?(bill_period,carrier)
Root invoice.xml invoice
invoice.carrier invoice.bill_period
ltcarriergt bill_period -element-content lt/carrier gt
ltinvoicegt Invoice-element-content lt/invoicegt
ltinvoicegt carrier-element-content lt/invoicegt
Root invoice.xml , invoice, invoice.carrier,
invoice.bill_period
22
Vertex operator ?
  • Creates the actual XML vertex that will encompass
    everything created by an expose operator
  • Example

? (Customer_invoice)?(?(account)invoice.account_
number, ?(inv_total)invoice.total)
23
Other operators
  • Group ? is used for arbitrary grouping of
    elements based on their values
  • Aggregate functions can be used with the group
    operator (i.e. average)
  • Rename ? Changes entry point annotation of
    elements of a bag.
  • Example ?(invoice.bill_period,date)

24
Example XML Source Documents
Invoice.xml ltInvoice_Documentgt
ltinvoicegt ltaccount_numbergt2 lt/account_numbergt
ltcarriergtATTlt/carriergt lttotalgt0.25lt/totalgt
lt/invoicegt ltinvoicegt ltaccount_numbergt1
lt/account_numbergt ltcarriergtSprintlt/carriergt
lttotalgt1.20lt/totalgt lt/invoicegt
ltinvoicegt ltaccount_numbergt1 lt/account_numbergt
lttotalgt0.75lt/totalgt lt/invoicegt ltauditorgt
maria lt/auditorgt lt/Invoice_Documentgt
Customer.xml ltCustomer_Documentgt
ltcustomergt ltaccountgt1 lt/accountgt ltnamegtTom
lt/namegt lt/customer gt ltcustomergt ltaccountgt
2 lt/accountgt ltnamegtGeorge lt/namegt
lt/customer gt lt/Customer _Documentgt
25
Xquery Example
  • List account number, customer name, and invoice
    total for all invoices that has carrier
    Sprint.
  • FOR i in (invoices.xml)//invoice,
  • c in (customers.xml)//customer
  • WHERE i/carrier Sprint and
  • i/account_number c/account
  • RETURN
  • ltSprint_invoicesgt
  • i/account_number,
  • c/name,
  • i/total
  • lt/Sprint_invoicesgt

26
Example Xquery output
  • ltSprint_Invoicegt
  • ltaccount_numbergt1 lt/account_numbergt
  • ltnamegtTom lt/namegt
  • lttotalgt1.20lt/totalgt
  • lt/Sprint_Invoice gt

27
Algebra Tree Execution
Account_number name total
Expose (.account_number , .name, .total )
invoice(2) customer(1)
Join (.invoice.account_number.customer.account)
invoice (2)
Select (carrier Sprint )
customer (2)
customer(1)
Invoice (1)
invoice (2)
invoice (3)
Follow (.invoice)
Follow (.customer)
Source (Invoices.xml)
Source (cutomers.xml)
28
Optimization with Niagara
  • Optimizer based on Niagara algebra
  • Use the operation more efficiently
  • Produce simpler expressions by combining
    operations

29
Language Convention
  • A and B are path expressions
  • Alt B --? Path Expression A is prefix of B
  • AnB ---? Common prefix of path A and B
  • AnB ---? Greatest common of path A and B
  • - ---? Null path Expression

30
Heuristics using Rewrite Rules
  • Allow optimization based on path selectivity
  • When applying un-nesting following operation Fµ

31
Interchangeability of Follow operation
  • Fµ(A) Fµ(B)Fµ (B)Fµ (A)
  • TRUE when exists C such that C lt A C lt B and
    C AnB
  • Or AnB -

32
Application of Rule on Invoice
  • Fµ(acc_Numinvoice)Fµ(carrierinvoice)
  • ?
  • Fµ(carrierinvoice)Fµ(acc_Numinvoice)

33
Application of Rule on Invoice
  • Fµ(acc_Numinvoice)Fµ(carrierinvoice)
  • ?
  • Fµ(carrierinvoice)Fµ(acc_Numinvoice)
  • Equivalent because both share the common prefix
    invoice.
  • Case AnB invoice

34
Benefit of Rule Application
  • NOTE let us assume that acc_Num is required for
    each invoice element, while
  • carrier is not required for invoice element
  • THEN
  • Fµ(acc_Numinvoice)Fµ(carrierinvoice)
  • ?
  • Fµ(carrierinvoice)Fµ(acc_Numinvoice)
  • Then what algebra tree do we prefer?
  • Fµ(acc_Numinvoice)Fµ(acc_Numcustomer)
  • make more sense than Why?

35
Discussion
  • Reduction of Input Size on first
  • Sub-operation
  • Fµ(carrierinvoice)

36
  • Should we/can we apply the rule below?
  • Fµ(acc_Numinvoice)Fµ(acc_NumCustomer)

37
  • acc_Numinvoice and
  • acc_Numcustomer
  • are two totally different paths
  • Case is AnB -
  • So yes, rule is valid.
Write a Comment
User Comments (0)
About PowerShow.com