RankSQL: Query Algebra and Optimization for Relational Topk Queries - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

RankSQL: Query Algebra and Optimization for Relational Topk Queries

Description:

Ranking (top-k) queries: Query result is sorted by rank and limited to top k ... Sort the result according to a given ranking function. Take only top k tuples. ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 24
Provided by: csU61
Category:

less

Transcript and Presenter's Notes

Title: RankSQL: Query Algebra and Optimization for Relational Topk Queries


1
RankSQL Query Algebra and Optimization for
Relational Top-k Queries
  • AUTHORS Chengkai Li
  • Kevin Chen-Chuan Chang
  • Ihab F. Ilyas
  • Sumin Song
  • Presenter Roman Yarovoy
  • October 3, 2007

2
Before RankSQL
  • Ranking (top-k) queries Query result is sorted
    by rank and limited to top k results.
  • Support for ranking was lacking from RDBMS.
  • Previously, isolated cases of top-k query
    processing were studied.
  • No way to integrate top-k operations with other
    relational operations.

3
Previous (traditional) approach
  • Query processing without ranking support
  • Evaluate select-project-join (SPJ) query and
    materialize the result.
  • Sort the result according to a given ranking
    function.
  • Take only top k tuples.
  • Associated problems
  • No interest in total order of all the results.
  • Evaluating ranking function(s) can be expensive.

4
Key contribution
  • Li et al. proposed
  • Extending relational algebra to support ranking
    as a first-class database construct.
  • Consequence Rank-aware relational query engine
    ? Rank-aware query optimization.

5
Top-k query Example 1
  • R
  • T

6
Example 1 (contd)
  • SELECT
  • FROM R r, T t
  • WHERE r.a1t.b1 AND r.a2gtt.b2
  • ORDER-BY p1p2p3p4p5
  • LIMIT 2
  • (where F p1p2p3p4p5)

7
Rank-relational algebra
  • There was no way to express such query in
    relational algebra.
  • Extend relational algebra by adding rank as a
    first-class operation.
  • Based on the observations of first-class
    constructs (eg. selection), two requirements are
    needed to support ranking
  • Splitting Predicate-by-predicate rank
    evaluation.
  • Interleaving Swapping rank operator with other
    operators (i.e. ranking is not only applied after
    filtering).

8
Ranking Principle
  • Def Given a ranking function F and a set of
    evaluated predicates Pp1, p2, , pn,
    maximal-possible score of a tuple t is defined
    as
  • Ranking Principle If FPt1 gt FPt2, then t1
    must be ranked before t2.

9
Rank-Relation
  • Def For monotonic scoring function F(p1, , pn)
    and a subset P of p1, , pn, a relation R
    augmented with ranking induced by P is called a
    rank-relation, denoted by RP.
  • Implicit attribute of RP is the score of tuple t,
    that is FPt.
  • Order relationship of RP
  • For all t1, t2 ? RP t1 lt RP t2 ? FPt1 lt
    FPt2

10
Operators of rank-relations
  • Rank (or µ) operator adds a predicate p to set
    P.
  • i.e. µp(RP) R P Up.
  • Example 2 µp1(Rp2) Rp1, p2, where F?(p1,
    p2, p3).

11
Extended operators
12
Example 3 Extended Join
  • pa1,a2,b2(sc (Rp1, p2 p3 JOIN Tp4, p5))
  • SELECT r.a1, r.a2, t.b1
  • FROM R r, T t
  • WHERE c
  • ORDER-BY F
  • LIMIT 2
  • (F ? P and c r.a1r.a2 lt t.b1)

13
Extended operators (contd)
  • Note
  • Cartesian product is defined similarly to join,
    but not discussed in the paper.
  • Projection operator p has not changed.
  • Computation is based on both Boolean and ranking
    logical properties.
  • Perform Boolean operations and maintain the order
    induced by all given ranking predicates.

14
Equivalence relations
  • In the extended rank-relational model, ranking is
    a first-class construct.
  • Can derive algebraic equivalences from the
    definitions of operators (Proofs are omitted).
  • Example 4
  • sc(RP) (scR)P
  • RP1 n TP2 (R n T)P1 U P2
  • Thus, we can interleave the rank operator with
    other operators (i.e. push µ down across
    operators).

15
Equivalence relations (contd)
16
Equivalence relations (contd)
  • Note
  • Proposition 1 states that ranking can be done in
    stages (i.e. one predicate at the time).
  • By Propositions 2, 3, and 4, the relations hold
    commutative and associative laws.
  • By Propositions 4 and 5, µ can be swapped with
    other operators.

17
Incremental execution
  • Blocking operators (eg. sort) lead to
    materialization of intermediate results.
  • Goal To avoid materialization and implement a
    pipelining execution strategy.
  • We want to split rank computation into stages and
    to reduce the number of tuples considered in the
    upcoming stages.
  • We can output (i.e. advance to the next stage) a
    tuple t, whenever t has a score which is greater
    or equal to the score of any future tuple t'' .

18
Incremental execution (contd)
  • Apply µp to RP and maintain priority queue
    ordered by P Up.
  • Let X set of tuples from preceding stage.
  • Draw t' from X.
  • If FP Upt FPt' and FPt' FPt'' for
    any future t'' drawn from x,
  • then FP Upt FP Upt'' and t can be
    output (proceed to next stage).

19
Example 5 Top 2 of W
  • Given F AVG(p6, p7, p8)
  • idxScanp6(W) µp7
    µp8

20
Different evaluation plans
  • There exist algorithms to implement rank-aware
    operators as well as incremental evaluation.
  • Efficiency of query evaluation will now depend
    not only on the regular operators, but also on
    the rank-aware operators.
  • Due to algebraic equivalence laws, we can define
    additional evaluation plans.
  • Hence, we want a query optimizer to take
    additional execution plans into consideration.

21
Rank-aware optimizer
  • Extended algebra ? Extended search space.
  • Impact on enumeration algorithm
  • Li et al. designed a 2-dimension enumeration
    algorithm Dimension 1 Join size, Dimension 2
    Ranking predicates.
  • The algorithm is exponential in both dimensions.
  • Heuristics applied to reduce search space.
  • Impact on cost model
  • For ranking queries, it is more difficult to
    estimate the query cardinality of the
    intermediate results, whose accuracy is the core
    of the cost model.
  • Authors proposed to estimate cardinality by
    randomly sampling tuples.

22
Critique
  • Erroneous examples.
  • No example of tie-breaking function.
  • Bad explanation of incremental evaluation.

23
Future research directions
  • Cardinality estimation New/improved techniques
    for random sampling over joins.
  • Dynamically determined/chosen k.
  • Exploring physical properties of rank-aware
    execution plans.
Write a Comment
User Comments (0)
About PowerShow.com