Optimization in XSLT and XQuery - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Optimization in XSLT and XQuery

Description:

Streamed execution (pipelining) lazy evaluation. Rewrite optimizations ... Closely associated with streaming ... Streaming/pipelining. etc ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 25
Provided by: Micha457
Category:

less

Transcript and Presenter's Notes

Title: Optimization in XSLT and XQuery


1
Optimizationin XSLT and XQuery
  • Michael Kay

2
Challenges
  • XSLT/XQuery are high-level declarative languages
    performance depends on good optimization
  • Performance also depends on good programming!
  • How can users write good programs if they dont
    know what the optimizer is doing?

3
What is optimization?
  • Widest sense
  • Everything thats done to make your query go fast
  • Narrower sense
  • Expression rewriting replacing the code that you
    write with equivalent, faster code that has the
    same effect

4
Main performance contributors
  • Efficient internal coding
  • Tree model for documents
  • Streamed execution (pipelining)
  • lazy evaluation
  • Rewrite optimizations
  • Including join optimization
  • Tail recursion
  • XSLT template rule matching

5
Databases vs. in-memory processors
  • Databases
  • 90 of optimization is about finding and using
    indexes
  • You can spend more time building the data to
    reduce query costs
  • Indexes are long-lived
  • Queries may be repeatable or one-off
  • In-memory processors
  • Loading the data is a significant part of the
    overall cost
  • Memory utilization needs to be minimized
  • Indexes, if used, are transient
  • Queries/Stylesheets may be repeatable or one-off

6
The Saxon TinyTree Model
  • Requirements
  • Low memory footprint
  • Fast construction
  • Fast access paths
  • Support for document order
  • Non-requirement
  • In-situ update

7
TinyTree example
  • ltrootgt
  • ltagt12lt/agt
  • ltbgtPraguelt/bgt
  • lt/rootgtassume whitespace is stripped

8
TinyTree key points
  • No object-per-node overhead
  • Names held as integer codes
  • Fast child navigation
  • Fast document order comparison
  • Extra information added dynamically if needed
  • preceding-sibling pointers
  • Base-uri, line numbers etc
  • Indexes

9
Streaming (Pipelining)
  • Common practice in set-based languages
  • Functional programming languages
  • SQL
  • Each node in the expression tree can deliver its
    results incrementally to the parent node
  • Can be implemented as pull or push (Saxon uses
    both)

10
Example filter expressions
nodesx1
filter
nodes

x
1
  • Class FilterExpressionIterator
  • public Item next()
  • while (true)
  • Item item base.next()
    if (item EOS) return EOS
  • if (matches(item, predicate))
    return item

11
Example Many-to-One Comparisons
x1

x
1
  • Class ManyToOneComparisonEvaluator
  • public boolean evaluate () while (true)
  • Item item lhs.next()
  • if (item rhs) return true
  • return false

12
Benefits of Streaming
  • Saves memory
  • No memory for intermediate results
  • Allocating and de-allocating memory takes time
  • Early exit, for example in
  • (a/b/c/d)1
  • bookauthor Smith
  • exists(//_at_xmlspace)

13
Lazy Evaluation
  • Closely associated with streaming
  • Variables and function arguments are not
    evaluated until the value is needed
  • Benefits
  • The value might never be needed
  • Only part of the value might be needed (early
    exit)
  • Memory is used for the minimum time

14
Compile-time Expression Rewrites
  • General approach
  • Parse the source code into an expression tree
  • Resolve references (variables, functions)
  • Decorate the tree with attributes
  • Type of an expression
  • Dependencies of an expression
  • Other properties, e.g. whether a node-set is
    sorted
  • Scan the tree repeatedly to identify expressions
    that can be replaced by faster equivalents

15
Two kinds of rewrites
  • Rewrites that could have been done by the
    programmer
  • count(A) gt 3 ?exists(A4)
  • Rewrites that use constructs not available to the
    programmer
  • Aposition()last() ? AisLast()

16
Some important rewrites
  • Sort removal
  • Not sorting path expressions where the result is
    already sorted
  • Constant subexpressions
  • Evaluated at compile time where possible
  • Extracting subexpressions from loops
  • Distributing WHERE conditions
  • many ad-hoc rewrites

17
Some rewrites that Saxon doesnt yet do
  • Inline expansion of variable references
  • Inline expansion of function calls
  • Detecting common subexpressions
  • Creating new global variables

18
Type Checkingand its effect on performance
  • XQuery and XSLT 2.0 allow you to declare types of
    variables and functions
  • But its not mandatory
  • Main benefit is better error detection
  • Type information can also be used by the
    optimizer
  • With Saxon, this rarely makes a big difference

19
Optimistic static type checking
  • The static type of an expression S is compared
    with the required type R
  • Possible outcomes
  • S is a subtype of R no action needed
  • S overlaps with R run-time type-checking code is
    generated
  • S and R are disjoint static error reported
  • Special case
  • integer and string overlap (both allow the
    empty sequence)

20
Join Optimization
  • Less important in XQuery than in SQL
  • Except that some people write XQuery as if it
    were SQL
  • General strategy in Saxon-SA
  • Distribute the join predicates (turn WHERE
    clauses into filter expressions)
  • Use indexed lookup for predicates where
    appropriate

21
Indexes in Saxon-SA
  • Explicit user-defined indexes
  • xslkey
  • Implicit document-level indexes
  • //a/b/c_at_idparam
  • Implicit sequence-level indexes
  • abc_at_id param
  • Hash tables for many-to-many
  • bookkeyword keywords

22
Some tips for effective indexing
  • Declare your types
  • Avoid untypedAtomic
  • use a schema
  • Use eq rather than

23
Tail Recursion
  • See the printed paper

24
Conclusions
  • Optimization techniques are similar for XSLT and
    XQuery
  • But vary between database products and in-memory
    processors
  • Compile-time techniques
  • Type analysis
  • Expression rewriting
  • Run-time techniques
  • Streaming/pipelining
  • etc
Write a Comment
User Comments (0)
About PowerShow.com