RankSQL: Query Algebra and Optimization for Relational Topk Queries - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

RankSQL: Query Algebra and Optimization for Relational Topk Queries

Description:

Ranking (top-k) queries: Query result is sorted by rank and limited to top k ... Sort the result according to a given ranking function. Take only top k tuples. ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 24

Provided by: csU61

Category:

more less

Transcript and Presenter's Notes

Title: RankSQL: Query Algebra and Optimization for Relational Topk Queries

1
RankSQL Query Algebra and Optimization for
Relational Top-k Queries

AUTHORS Chengkai Li
Kevin Chen-Chuan Chang
Ihab F. Ilyas
Sumin Song
Presenter Roman Yarovoy
October 3, 2007

2
Before RankSQL

Ranking (top-k) queries Query result is sorted
by rank and limited to top k results.
Support for ranking was lacking from RDBMS.
Previously, isolated cases of top-k query
processing were studied.
No way to integrate top-k operations with other
relational operations.

3
Previous (traditional) approach

Query processing without ranking support
Evaluate select-project-join (SPJ) query and
materialize the result.
Sort the result according to a given ranking
function.
Take only top k tuples.
Associated problems
No interest in total order of all the results.
Evaluating ranking function(s) can be expensive.

4
Key contribution

Li et al. proposed
Extending relational algebra to support ranking
as a first-class database construct.
Consequence Rank-aware relational query engine
? Rank-aware query optimization.

5
Top-k query Example 1

6
Example 1 (contd)

SELECT
FROM R r, T t
WHERE r.a1t.b1 AND r.a2gtt.b2
ORDER-BY p1p2p3p4p5
LIMIT 2
(where F p1p2p3p4p5)

7
Rank-relational algebra

There was no way to express such query in
relational algebra.
Extend relational algebra by adding rank as a
first-class operation.
Based on the observations of first-class
constructs (eg. selection), two requirements are
needed to support ranking
Splitting Predicate-by-predicate rank
evaluation.
Interleaving Swapping rank operator with other
operators (i.e. ranking is not only applied after
filtering).

8
Ranking Principle

Def Given a ranking function F and a set of
evaluated predicates Pp1, p2, , pn,
maximal-possible score of a tuple t is defined
as
Ranking Principle If FPt1 gt FPt2, then t1
must be ranked before t2.

9
Rank-Relation

Def For monotonic scoring function F(p1, , pn)
and a subset P of p1, , pn, a relation R
augmented with ranking induced by P is called a
rank-relation, denoted by RP.
Implicit attribute of RP is the score of tuple t,
that is FPt.
Order relationship of RP
For all t1, t2 ? RP t1 lt RP t2 ? FPt1 lt
FPt2

10
Operators of rank-relations

Rank (or µ) operator adds a predicate p to set
P.
i.e. µp(RP) R P Up.
Example 2 µp1(Rp2) Rp1, p2, where F?(p1,
p2, p3).

11
Extended operators
12
Example 3 Extended Join

pa1,a2,b2(sc (Rp1, p2 p3 JOIN Tp4, p5))
SELECT r.a1, r.a2, t.b1
FROM R r, T t
WHERE c
ORDER-BY F
LIMIT 2
(F ? P and c r.a1r.a2 lt t.b1)

13
Extended operators (contd)

Note
Cartesian product is defined similarly to join,
but not discussed in the paper.
Projection operator p has not changed.
Computation is based on both Boolean and ranking
logical properties.
Perform Boolean operations and maintain the order
induced by all given ranking predicates.

14
Equivalence relations

In the extended rank-relational model, ranking is
a first-class construct.
Can derive algebraic equivalences from the
definitions of operators (Proofs are omitted).
Example 4
sc(RP) (scR)P
RP1 n TP2 (R n T)P1 U P2
Thus, we can interleave the rank operator with
other operators (i.e. push µ down across
operators).

15
Equivalence relations (contd)
16
Equivalence relations (contd)

Note
Proposition 1 states that ranking can be done in
stages (i.e. one predicate at the time).
By Propositions 2, 3, and 4, the relations hold
commutative and associative laws.
By Propositions 4 and 5, µ can be swapped with
other operators.

17
Incremental execution

Blocking operators (eg. sort) lead to
materialization of intermediate results.
Goal To avoid materialization and implement a
pipelining execution strategy.
We want to split rank computation into stages and
to reduce the number of tuples considered in the
upcoming stages.
We can output (i.e. advance to the next stage) a
tuple t, whenever t has a score which is greater
or equal to the score of any future tuple t'' .

18
Incremental execution (contd)

Apply µp to RP and maintain priority queue
ordered by P Up.
Let X set of tuples from preceding stage.
Draw t' from X.
If FP Upt FPt' and FPt' FPt'' for
any future t'' drawn from x,
then FP Upt FP Upt'' and t can be
output (proceed to next stage).

19
Example 5 Top 2 of W

Given F AVG(p6, p7, p8)
idxScanp6(W) µp7
µp8

20
Different evaluation plans

There exist algorithms to implement rank-aware
operators as well as incremental evaluation.
Efficiency of query evaluation will now depend
not only on the regular operators, but also on
the rank-aware operators.
Due to algebraic equivalence laws, we can define
additional evaluation plans.
Hence, we want a query optimizer to take
additional execution plans into consideration.

21
Rank-aware optimizer

Extended algebra ? Extended search space.
Impact on enumeration algorithm
Li et al. designed a 2-dimension enumeration
algorithm Dimension 1 Join size, Dimension 2
Ranking predicates.
The algorithm is exponential in both dimensions.
Heuristics applied to reduce search space.
Impact on cost model
For ranking queries, it is more difficult to
estimate the query cardinality of the
intermediate results, whose accuracy is the core
of the cost model.
Authors proposed to estimate cardinality by
randomly sampling tuples.

22
Critique