Incrementally Computing Ordered Answers of Acyclic Conjunctive Queries - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Incrementally Computing Ordered Answers of Acyclic Conjunctive Queries

Description:

Many duplicates to eliminate (e.g., many apartments with similar properties) ... Q(C,B,P): Apartments(I,C,P,B), Climates(C,T), Distances(C, London',D) ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 39
Provided by: csHu
Category:

less

Transcript and Presenter's Notes

Title: Incrementally Computing Ordered Answers of Acyclic Conjunctive Queries


1
Incrementally Computing Ordered Answers of
Acyclic Conjunctive Queries
Benny Kimelfeld and Yehoshua Sagiv
The Selim and Rachel Benin School of Engineering
and Computer Science
??????????? ?????? ????????
The Hebrew University of Jerusalem
2
Introduction
3
Order in SQL
Properties of apartments in each city
SELECT DISTINCT A.city, bedrooms, price FROM
Apartments A, Climates C, Distances D WHERE
A.cityC.cityfromCity toCityLondon ORDER BY
avgTemp10-distanceprice/1000 DESC
Order of appearance
In this talk Evaluation of queries with ORDER BY
  • Some ORDER BY attributes are projected
  • Some are not

4
Naïve Evaluation
  • 1. Compute w/o projection (FROMWHERE)
  • Non-projected tuples are needed for sorting

2. Sort (ORDER BY)
3. Project (SELECT)
  • 4. Remove duplicates (DISTINCT)
  • Only the first occurrence of each answer is left

5
Incremental Evaluation
  • An incremental evaluation is needed
  • Generate tuples in sorted order
  • A small delay between successive tuples
  • Frequently,
  • Using order indicates how tuples of the result
    will be processed by the application
  • For example, transforming chunks of tuples into
    pages of a Web browser
  • Users phrase queries that return many tuples,
    whereas only the first few tuples are actually
    needed

How much time is needed to get the next page?
How long does it take to generate the first k
tuples?
Total evaluation time
6
The Naïve Evaluation is Inefficient
Generating the whole result before returning the
first (page of) tuples
Many duplicates to eliminate (e.g., many
apartments with similar properties)
7
Existing Techniques
  • Techniques (e.g., the threshold algorithm) for
    minimizing database accesses
  • Evaluating the query (naively) over increasingly
    larger parts of the relations
  • Duplicate elimination due to projection
  • In worst-case scenarios, as inefficient as the
    naïve approach
  • Even for simple queries (e.g., acyclic) and orders

From a theoretical point of view, existing
approaches are heuristics
8
The Questions we Addressed
Are there evaluation algorithms that are truly
(i.e., provably) incremental? (or is it necessary
to use heuristics?)
Which are the tractable cases?
9
Our Results (Informally)
How long does it take to find just the first
tuple?
For conjunctive queries, that is all that matters!
If your setting allows the first tuple to be
found efficiently, then you can evaluate the
whole query incrementally (with small delays
between tuples)
10
The Formal Setting
11
Conjunctive Queries
We consider the class of conjunctive queries
Q(u) R1(u1),R2(u2),,Rk(uk)
Each Ri is a relation symbol
Each uj is a list of terms
u is a list of variables from the ujs
A term is either a constant value or a variable
Each Ri(uj) is a conjunct
12
The Example as a Conjunctive Query
SELECT DISTINCT A.city, bedrooms, price FROM
Apartments A, Climates C, Distances D WHERE
A.cityC.cityfromCity toCityLondon ORDER BY
avgTemp10-distanceprice/1000 DESC
Q(C,B,P) Apartments(I,C,P,B),
Climates(C,T),Distances(C,London,D)
ORDER BY is not modeled yet
13
Homomorphisms
Q(C,B,P) Apartments(I,C,P,B), Climates(C,T),
Distances(C,London,D)
  • A homomorphism from the query to the database
  • assigns a value to each variable
  • All resulting facts are contained in the database

Apartments(51,Leeds,1020,3)
Climates(Leeds,10.1)
Distances(Leeds,London,274)
  • An answer is obtained from a homomorphism by
  • applying the assignment to the head

(Leeds,3,1020)
14
Orders over Homomorphisms
  • Order in SQL can be defined over attributes that
    are not in the result (not in the SELECT clause)
  • Hence, for a proper model,

We assume an underlying order ? over the
homomorphisms from the query to the database
(rather than over the answers)
? is reflexive, transitive and total
For example, a lexicographic order
H(X1), then by H(X2), then by,, then by H(Xk)
X1,,Xk are variables from the query (in some
order)
15
Orders Defined by Ranking
Orders can be obtained by ranking homomorphisms
H1 ? H2 ? rank(H1) rank(H2)
For example
  • Linear combinations / monomials

rank(H) a1H(X1) a2H(X2) akH(Xk) rank(H)
H(X1)m1H(X2)m2H(Xk)mk
  • Maximum / minimum

rank(H) max( f(H(X1)) , f(H(X2)) ,, f(H(Xk)
) rank(H) min( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
16
Ordering Answers
The order over the answers is the one obtained
from the following (inefficient) process
1. Generate all homomorphisms in a sorted order
2. Obtain an answer from each homomorphism
3. Remove duplicate answers
Only the first occurrence of each answer is left
17
The Implied Order on Answers
The goal Generate all the answers in the
implied order
  • The process (from the previous slide) defines
    how an order over homomorphisms implies an order
    over answers

Given an order ? over the homomorphisms and two
answers A1 and A2, A1 ? A2 holds if for each
homomorphism H2 producing A2, there is a
homomorphism H1 producing A1, such that H1 ? H2
In other words, A1 precedes A2 if the best
homomorphism producing A1 is better than the best
homomorphism producing A2
18
The Formal Requirement for Efficiency
Yardstick of efficiency Polynomial delay That
is, polynomial time between generating successive
answers, under query-and-data complexity
Top-k Algorithm
19
So what is the Problem?
Exponential number of answers
Exponential number of duplicates to eliminate
20
Task Formulation
Input
  • A database
  • A conjunctive query
  • An order over the homomorphisms

Goal
Enumerate all the answers in the implied order
Performance
Polynomial delay (under combined complexity)
21
Our Results
22
Intractable Cases
It is not always possible to obtain an efficient
ranked enumeration, for at least two reasons
  • Sometimes, generating any answer is intractable
    (regardless of the order)
  • Non-emptiness of conjunctive queries is
    NP-complete
  • For some ranking functions, finding just the
    first tuple is intractable (even if non-emptiness
    is tractable)

SELECT FROM R1,R2,,Rn ORDER BY
ABS(R1.A1Rn.An-K)
Cartesian product
(subset sum)
23
Acyclic Conjunctive Queries
A conjunctive query is acyclic if the conjuncts
can be placed on some tree T, such that for each
variable X, the conjuncts containing X form a
subtree of T
24
Tractability of Acyclic Queries
Recall that for general conjunctive queries,
testing non-emptiness (which is necessary for
efficient incremental evaluation) is intractable
Acyclic conjunctive queries are among the largest
known classes that can be evaluated in polynomial
total time Yannakakis, 1981
25
Simplification
  • For simplicity of presentation, the next theorem
    is less general than the one in the proceedings
  • In particular, only acyclic conjunctive queries
    are considered
  • Furthermore, we consider orders that are defined
    globally over all assignments of variables
  • And, in particular, on homomorphisms

26
Characterizing Incremental Evaluation
Theorem The following two are equivalent for an
order ?, in the case of acyclic conjunctive
queries
1. Given a database and a query, a maximal
homomorphism can be found in polynomial time
2. Given a database and a query, answers can be
enumerated in sorted order with polynomial delay
27
Extending the Theorem
  • In the proceedings, the theorem is stated for
    queries that are more general than just acyclic
    conjunctive queries
  • We only require closure under the (rather
    trivial) operation illustrated below
  • Furthermore, the order can be defined per
    families of databases and queries, rather than on
    all assignments
  • Hence, more general types of orders are possible

28
Specific Types of Orders
  • Next, we identify orders for which the first
    tuple can be computed efficiently, in the case of
    acyclic conjunctive queries

By the theorem, for these types of orders,
answers of acyclic conjunctive queries can be
enumerated in sorted order with polynomial delay
29
Monotonic Orders
  • Intuitively, an order is monotonic if replacing
    a part of an assignment with a better part can
    only increase the rank
  • The exact definition is in the proceedings

Lemma monotonic orders satisfy the first
condition of the theorem
Monotonic orders have an efficient ordered
evaluation
30
Examples of Monotonic Orders
  • Lexicographic orders

H(X1), then by H(X2), then by,, then by H(Xk)
  • Linear combinations / monomials

rank(H) a1H(X1) a2H(X2) akH(Xk) rank(H)
H(X1)m1H(X2)m2H(Xk)mk
  • Maximum / minimum

rank(H) max( f(H(X1)) , f(H(X2)) ,, f(H(Xk)
) rank(H) min( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
31
c-Determined Orders
  • c is a fixed positive integer
  • An order is c-determined if the rank of each
    assignment is determined by some c variables
  • The ranks of two different assignments are not
    necessarily determined by the same c variables
  • Extends the ranking functions used by Cohen
    Sagiv, 2005 in the context of ranked full
    disjunctions

Lemma c-determined orders satisfy the first
condition of the theorem
c-determined orders have efficient ordered
evaluation
32
c-Determined Orders Examples
  • Maximum is 1-determined

rank(H) max( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
  • Minimum is not c-determined for every constant c

rank(H) min( f(H(X1)) , f(H(X2)) ,, f(H(Xk) )
  • An example of a 3-determined order

H(X1), then by H(X2) / H(X3)
33
Proof Techniques (Overview)
Ranked evaluation with polynomial delay can be
obtained by adapting two different techniques
  • Iterative Binding of Variables
  • Limited to lexicographic and some c-determined
    orders
  • All attributes determining the order must be in
    the result
  • More efficient w.r.t. space usage (does not
    collect info.)
  • Lawlers Method Mangement Science, 1972
  • A general procedure for finding the top-k answers
    to discrete optimization problems
  • Need to fill in missing parts for the specific
    setting
  • Much more general than iterative binding of
    variables

34
Conclusion
35
A Summary
  • Evaluation of conjunctive queries with order
  • has been considered
  • Formal model
  • Order over homomorphisms
  • Implied order over answers
  • Polynomial delay as a yardstick of efficiency

36
A Summary (contd)
We have shown that for acyclic conjunctive queries
Finding the first tuple in the given order in
polynomial time
?
Enumerating all answers in sorted order with
polynomial delay
As a corollary, acyclic conjunctive queries have
an efficient ordered evaluation if the order is
either monotonic or c-determined
In the proceedings, the result is extended to
more general queries and orders
37
Ongoing and Future Work
  • Practical considerations
  • Our algorithms require novel optimization
    techniques
  • Implementation of an algorithm for finding the
    top answer (the bottleneck of the computation)
  • Querying XML
  • This work has been extended to effective querying
    of graph-structured XML by twig joins (Web and
    Databases, 2006)

38
Thank You.
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com