Efficient Detection of Empty Result Queries presentation

About This Presentation

Transcript and Presenter's Notes

Title: Efficient Detection of Empty Result Queries

1
Efficient Detection of Empty Result Queries

Gang Luo
IBM T.J. Watson Research Center
luog_at_us.ibm.com

2
Empty Result Problem

Query returns an empty result set
User gets lost about where to look at next
Frequently encountered in interactive exploration
of massive data sets
Our contribution method for quickly detecting
empty result sets

3
Example Percentages of Empty Result Queries

In a Customer Relationship Management (CRM)
application developed by IBM
18.07 (3,396 empty result queries in 18,793
queries)
In a real estate application developed by IBM
5.75
In a digital library application JCM00
10.53
In a bioinformatics application RCP98
38

4
Empty Result Queries May Not Finish Execution
Quickly

Consider a query joining two relations
Query execution time is longer than join time, no
matter whether or not query result set is empty
Even if a query finishes in a few seconds in a
lightly loaded RDBMS, it may last longer than one
minute in a heavily loaded RDBMS

5
Outline

Limitations of previous approaches
Fast detection method for empty result queries
Some experiments

6
Existing Solutions to the Empty Result Problem

Explain what leads to the empty result set
Automatically generalize the query so that the
generalized query will return some answers

7
Limitations of Existing Solutions

Require domain specific knowledge
Only apply to a restricted form of queries
Require an excessive amount of time
Give too many reasons why the result set is empty
Users cannot reuse each others query results

8
Outline

Limitations of previous approaches
Fast detection method for empty result queries
Some experiments

9
Our Solution

Only consider read-only environment
From previous queries execution, remember the
query parts that lead to empty result sets
When a new query Q comes, match it with the
remembered query parts. If such a match exists,
report that Q will return an empty result set
without executing Q
Utilize special properties of empty result sets
and thus often more powerful than traditional
materialized view method

10
Definitions

Empty result propagating operator An operator
whose output is empty if any input is empty
Empty result propagating query A query whose
query plan only contains empty result propagating
operators (our focus)
Query part A sub-tree of a query plan
Atomic query part An ordered pair (relation
names RN, selection condition SC)
Corresponds to a relational algebra formula
first product join all relations in RN, then
apply SC
SC is a conjunction of primitive terms, where
each primitive term is a comparison

11
Definitions Cont.

Cover Atomic query part P1(RN1, SC1) covers
atomic query part P2(RN2, SC2) if
RN1?RN2
Whenever SC2 is true, SC1 is true
Property Suppose atomic query part P1 covers
atomic query part P2. For a given database, if
the output of P1 is empty, the output of P2 is
also empty.

12
Given an Empty Result Query

Find the lowest-level query part P whose output
is empty

13
Transforming P into a Simplified Query Part Ps

Drop all operators (e.g., projection, hash, sort)
that have no influence on the emptiness of the
output
Replace each physical join operator with a
logical join operator
Replace each index-scan operator with a
table-scan operator followed by a selection
operator, where the selection condition is the
index-scan condition

14
Transforming P into a Simplified Query Part Ps
Cont.

Corresponding relational algebra formula
(?50ltA.alt100 ? A.b200 (A)) ?A.cB.d (?B.elt40 ?
B.e50 (B))

15
Breaking Ps into Atomic Query Parts

Get all selection conditions in the
selection/join operators
Rewrite the conjunction of these selection
conditions into a disjunctive normal form (DNF)
Negations on numeric or string attributes are
removed using complementary operators
Interval-based comparison is treated as a single
primitive term
Generate a set of atomic query parts (RN, SC)
RN input relations of all table-scan operators
in Ps
SC a term in the DNF

16
Breaking Ps into Atomic Query Parts Cont.
(?50ltA.alt100 (A)) ?A.cB.d (?B.elt40
(B)) (?A.b200 (A)) ?A.cB.d (?B.elt40
(B)) (?50ltA.alt100 (A)) ?A.cB.d (?B.e50
(B)) (?A.b200 (A)) ?A.cB.d (?B.e50 (B))

Property The following three assertions are
equivalent to each other
The output of the query part P is empty
The output of the simplified query part Ps is
empty
The output of each generated atomic query part is
empty

17
Storing the Generated Atomic Query Parts

For each generated atomic query part Pa
Insert Pa into a collection Caqp of atomic query
parts
Remove from Caqp all previously stored atomic
query parts that are covered by Pa
See paper for details of the coverage checking
algorithm

18
When Getting a New Query Q

Break Q into a set of atomic query parts
For each such atomic query part Pa, check whether
some atomic query part Ai in Caqp covers Pa
If such an Ai exists for each Pa, report that Q
will return an empty result set without executing
Q

19
Outline

Limitations of previous approaches
Fast detection method for empty result queries
Some experiments

20
Setup

Testing environment
PostgreSQL 7.3.4
Windows XP OS
Dell Inspiron 8500 PC with one 2.2GHz CPU, 512MB
memory, one 40GB disk
TPC-R benchmark
See paper for detection probability analysis

21
Overhead Experiment

Query Q1 Find the information about certain
parts that were sold on certain days

select from orders o, lineitem lwhere
o.orderkeyl.orderkey and (o.orderdated1 or
or o.orderdatede) and (l.partkeyp1 or or
l.partkeypf)
22
Overhead Experiment Cont.

Query Q2 Find the information about certain
parts that were sold to certain customers on
certain days

select from orders o, lineitem l, customer
cwhere o.orderkeyl.orderkey and
o.custkeyc.custkey and (o.orderdated1 or or
o.orderdatede) and (l.partkeyp1 or or
l.partkeypf) and (c.nationkeyn1 or or
c.nationkeyng)
23
Overhead Experiment Cont.

The overhead of our method increases with both
query complexity and the number of atomic query
parts stored in Caqp
When check fails, the overhead of our method is
higher than that when check succeeds

24
Overhead Experiment Cont.

The overhead of our method is trivial compared to
query execution overhead

25
Summary

Provide a fast detection method for empty result
queries
Low overhead
High detection probability once enough
information has been accumulated

26
Open Issues

In the presence of update, correctly preserve as
much stored information as possible
A hybrid method that can combine the advantages
of both our method and the existing solutions
More aggressive storage saving technique

Write a Comment

User Comments (0)

About PowerShow.com

Efficient Detection of Empty Result Queries PowerPoint PPT Presentation