XAL An XML ALgebra for Query Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

XAL An XML ALgebra for Query Optimization

Description:

Use XML technologies for querying, transforming, and integrating large amounts of Web data ... Acyclic (lexical view) Cyclic (semantic view) Formally, ADC 2002 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 31
Provided by: flaviusf
Category:

less

Transcript and Presenter's Notes

Title: XAL An XML ALgebra for Query Optimization


1
XAL - An XML ALgebra for Query Optimization
  • Flavius Frasincar
  • Geert-Jan Houben
  • Cristian Pau

Databases Hypermedia Group Division of Computer
Science
2
Contents
  • Motivation
  • XML Query Algebra Goals
  • XML Query Algebras
  • XAL
  • XAL Optimization Laws
  • XAL Heuristic Optimization Algorithm
  • XAL Query Example
  • Conclusion and Future Work

3
1. Motivation
  • Hera project automatic hypermedia presentation
    of data residing in the heterogeneous deep web
  • Use XML technologies for querying, transforming,
    and integrating large amounts of Web data
  • Optimization of XML queries is important need of
    an XML algebra for query optimization

4
2. XML Query Algebra Goals
  • Based on W3C XML Query Data Model
  • Genericity logical operators independent of the
    underlying storage representation
  • Optimizability support query optimizations
  • Expressivity express a large class of queries
  • Composability operators are closed on the same
    data type
  • Flexibility support various data types

5
3. XML Query Algebras
  • Lore (Stanford)
  • specific set of logical operators
  • Beech et al. (industry)
  • logical model, no optimization strategies
  • YATL (INRIA)
  • specific data model, focus on data
    integration
  • XOM (Zhang Dong)
  • complete and closed, no optimization support
  • SAL (Beeri Tzaban)
  • focus on semistructured data,
  • limited optimization support
  • XQuery (W3C)
  • weak support for optimization (unordered
    forests)

6
4. XAL
  • Based on W3C XML Query Data Model
  • Reduces the impedance mismatch between databases
    and XML (query languages) by allowing a mix of
    ordered/unordered operators
  • Support for optimization (reuse the query
    optimization heuristics from relational systems)
  • Fine grained algebra of vertices and edges
    (Genericity)
  • Composability, Flexibility, XQuery Compatibility

7
4.1. XAL Data Model
  • Rooted connected directed graph with a partial
    order relation on edges
  • Acyclic (lexical view)
  • Cyclic (semantic view)
  • Formally,

8
Properties for Vertex
9
Properties for Edge
Note Derived Property apply to E, D edges
10
4.2. XAL Operators
  • All operators have the following form
  • of(x1, x2, xn expression)
  • Unary operators evaluate the input to a
    collection of vertices and use the implicit map
    operation to evaluate the result
  • Closedness all operators are closed on
    collections (support composability)

11
Operator Semantics
  • of(x expression)
  • Variable x is bound to each vertex in the
  • input collection. For each such binding f(x) is
  • evaluated
  • The semantics of the operator o defines how
  • the partial result (resulting from one variable
  • binding) is computed from f(x)
  • The operator result is built by concatenating
  • all the partial results

12
Collection
  • Generalization of list and set (collections have
    a boolean order property)
  • Similar to the mathematicians monad and
    functional programmers (list) comprehension
  • MonadltMgt, where M is a type is a triplet of
    functions
  • (mapltMgt, unitltMgt, join ltMgt)
  • XAL has map and join (called union) but no unit
    operator
  • (the singleton collection is written as the
    singleton itself)
  • Collections have elements of arbitrary types

13
Operators Type
  • Extraction operators retrieve the needed
    information from XML documents
  • Meta-operators control the evaluation of
    expressions
  • Construction operators build new XML documents
    from the extracted data
  • Note two vertices are equal if they have the
    same value

14
Extraction Operators
  • Projection ?type, name(e expr)
  • Selection ?condition(e expr)
  • Unorder ?(e expr)
  • Join (x expr) ?condition
    (y expr)
  • Cartesian Product (x expr) ? (y expr)
  • Union (x expr) ? (yexpr)
  • Difference (x expr) ? (yexpr)
  • Intersection (x expr) ? (yexpr)

Note Flexibility, x and y do not have to be
union compatible like in relational algebra
15
Projection
  • ?type, name(e expression)
  • type E, A, R, D or disjunctions () of these
  • name regular expression over strings
  • Example. ?E, (Pp)ainters)(e) produces all
    the target vertices of element containment (E)
    edges that have names starting with Painter,
    painter, Painters, or painters, and that
    originate from the vertices in e

16
Meta-operators Construction Operators
  • Map
  • mapf(e expression)
  • Kleene Star
  • f(e expression)
  • Note e is included in the result
  • Create vertex
  • vertextype(value)
  • Note for element vertices the value
    (identifier) is given by the system
  • Create edge
  • edgetype, name, parent(child)

17
An Example
  • Copy a complete graph starting from the vertex v
  • mapedgetype(e), name(e),
  • vertextype(parent(e))(value(par
    ent(e)))
  • (vertextype(child(e))(value(chi
    ld(e))))
  • (e)
  • where e parentedge(?EAD, (child(x)))
  • (x parentedge(?EAD,
    (v)))

18
5. XAL Optimization Laws
  • The main factor in the execution cost of algebra
    expressions is the iteration (explicit or
    implicit map operator) over collections
  • The proposed set of optimization laws aims at
    reducing iteration size for the data extraction
    expressions
  • The laws are inspired by monad laws and
    relational algebraic optimization rules

19
  • Law 1 (Left unit)
  • If e1 is of unit type (singleton
    collection), then
  • e2(e1) e2 (v e1)
  • Law 2 (Right unit)
  • If e2 is the identity function, i.e. e2 (v)
    v, then
  • e2(e1) e1
  • Law 3 (Associativity)
  • (e1 o e2) o e3 e1 o ( e2 o e3 )
  • Law 4 (Empty collection)
  • If e2 is the empty function, i.e. e2(v)
    (), then
  • e2(e1) ()
  • Law 5 (Decomposition of join)
  • e1 ?condition e2 ?condition(e1 ? e2)

20
  • Law 6 (Decomposition of projection)
  • If name is a regular expression that can be
    decomposed in several regular expressions n1, n2
    , nn and e is an unordered collection, then
  • ?name(e) ?n1(e) ? ?n2(e) ? ?nn(e)
  • Law 7 (Cascading of selection)
  • ?c1?c2? cn(e) ?c1(? c2( (? cn (e))
    ))
  • Law 8 (Commutativity of selection)
  • ?c1(?c2(e)) ?c2(?c1(e))
  • Law 9 (Commutativity of selection with
    projection)
  • If the condition c involves solely vertices
    that have incoming edges named by the regular
    expression name, then
  • ?name(?c(?name)(e)) ?c(?name(e))
  • Law 10 (Commutativity of selection with cartesian
    product)
  • If the condition c involves solely vertices
    from e1 , then
  • ?c(e1 ? e2) ?c(e1 ) ? e2

21
  • Law 11 (Commutativity of selection with binary
    operators)
  • If ? is one of the set operators ?, ?, or
    ?, then
  • ?c(e1 ? e2) ?c(e1) ? ?c(e2)
  • Law 12 (Commutativity of binary operators)
  • If ? is one of the set operators ?, ?, or
    ? and e1 and e2 are unordered collections, then
  • e1 ? e2 e2 ? e1
  • Law 13 (Commutativity of projection with
    cartesian product)
  • If name is a regular expression that can
    decomposed in two regular expressions name1 and
    name2, name1 involves solely vertices in e1 and
    name2 involves solely vertices in e2 , then
  • ?name(e1 ? e2) ?name1(e1) ? ?name2(e2)
  • Law 14 (Commutativity of projection with union)
  • ?name(e1 ? e2) ?name(e1) ? ?name(e2)

22
6. XAL Heuristic Optimization Algorithm
  • S1. Eliminate unnecessary iterations (use Laws 1,
    2, and 4). After each following step, S1 is
    applied again.
  • S2. Unorder collections (use unorder operator).
    Collections for which order is not relevant are
    unordered.
  • S3. Decompose joins (use Law 5).
  • S4. Decompose selections (use Law 7). Break down
    selections into a cascade of selections. It
    enables moving select operations down in the
    query tree.
  • S5. Move selections down as far as possible (use
    Laws 8, 9, 10, and 11). Based on the
    commutativity of selection with other operators
    move selections down in the query tree as far as
    it is permitted by the selection condition.

23
  • S6. Apply the most restrictive selections first
    (use Laws 3 and 12). Based on the commutativity
    and associativity of binary operators rearrange
    the leaf vertices so that the most restrictive
    selections apply first.
  • Note As a selectivity criterion one can use
    the size of the collection.
  • The most restrictive selections are the
    selections that produce collections with the
    fewest elements.
  • S7. Decompose projections (use Law 6). Break down
    projections into a union of projections. It
    enables moving the project operations down in the
    query tree.
  • S8. Move projections down as far as possible (use
    Laws 1, 2, and 4). Based on the commutativity of
    projection with other operators, move projections
    down in the query tree as far as possible.
  • S9. Identify combined operations (use composition
    laws). Identify subtrees that group operations
    that can be executed by a single program.

24
7. XAL Query Example
  • XML repository with three documents

painters.xml ltpaintersgt ltpaintergt ltnamegtRembrandt
lt/namegt ltdescriptiongtDutch painterlt/descriptiongt lt
/paintergt lt/paintersgt
catalogue.xml ltitemsgt ltitemgt ltpaintingidgtPainting
_ID01lt/paintingidgt ltpricegt1500000lt/pricegt lt/itemgt
lt/itemsgt
paintings.xml ltpaintingsgt ltpaintinggt ltidgtPainting
_ID01lt/idgt ltnamegtThe Stone Bridgelt/namegt ltauthorgtR
embrandtlt/authorgt lt/paintinggt lt/paintingsgt
25
  • Query
  • Return in alphabetical order the name of the
    painters that have a painting over 1 000 000
  • (the name of the painters will appear in the
    ltresultgt element as many times as the number of
    their paintings that fulfill the above condition)
  • XQuery 1.0
  • ltresultgt
  • FOR i IN document(painters.xml)/painters/paint
    er,
  • j IN document(paintings.xml)/painting
    s/paintingauthor i/name,
  • k IN document(catalogue.xml)/items/ite
    mpaintingid j/id
  • WHERE k/price/data() gt 1000000
  • RETURN i/name
  • SORTBY ./data()
  • lt/resultgt

26
  • Input
  • painters.xml 3 painters (1,2,3)
  • paintings.xml 100 paintings for painter 1
  • 150 paintings for
    painter 2
  • 100 paintings for
    painter 3
  • catalogue.xml Only painter 1 has 20 paintings
    more expensive than 1 000 000, all the other
    paintings are below 1 000 000

27
  • Initial Query Tree
  • Output is alphabetically ordered!
  • Cartesian Product
  • 3 x 350 x 350 367 500
  • elements

28
  • I Optimization
  • Step 2 Unorder collections
  • (commutativity of XAL binary operators)
  • Step 4 Decompose selections
  • Step 5 Move selections down as far as possible
  • Cartesian Product
  • 3 x 350
  • 350 x 20 8 050 elements

29
  • II Optimization
  • Step 6 Apply the most restrictive selections
    first
  • (switch positions of painter and item)
  • Cartesian Product
  • 20 x 350
  • 20 x 3 7 060 elements

30
8. Conclusion and Future Work
  • XAL provides an elegant way (by applying the
    unorder ? operator) to reuse the heuristic
    optimization algorithm from relational queries
  • Investigate new optimization laws that take
    advantage of the XML specific features (e.g. tree
    structure, internal references)
  • Build a translation scheme from XQuery to XAL,
    exploring the power of expression of XAL
Write a Comment
User Comments (0)
About PowerShow.com