Efficient Detection of Empty Result Queries - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Efficient Detection of Empty Result Queries

Description:

From previous queries' execution, remember the query parts that lead to empty result sets ... report that Q will return an empty result set without executing Q ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: tor45
Category:

less

Transcript and Presenter's Notes

Title: Efficient Detection of Empty Result Queries


1
Efficient Detection of Empty Result Queries
  • Gang Luo
  • IBM T.J. Watson Research Center
  • luog_at_us.ibm.com

2
Empty Result Problem
  • Query returns an empty result set
  • User gets lost about where to look at next
  • Frequently encountered in interactive exploration
    of massive data sets
  • Our contribution method for quickly detecting
    empty result sets

3
Example Percentages of Empty Result Queries
  • In a Customer Relationship Management (CRM)
    application developed by IBM
  • 18.07 (3,396 empty result queries in 18,793
    queries)
  • In a real estate application developed by IBM
  • 5.75
  • In a digital library application JCM00
  • 10.53
  • In a bioinformatics application RCP98
  • 38

4
Empty Result Queries May Not Finish Execution
Quickly
  • Consider a query joining two relations
  • Query execution time is longer than join time, no
    matter whether or not query result set is empty
  • Even if a query finishes in a few seconds in a
    lightly loaded RDBMS, it may last longer than one
    minute in a heavily loaded RDBMS

5
Outline
  • Limitations of previous approaches
  • Fast detection method for empty result queries
  • Some experiments

6
Existing Solutions to the Empty Result Problem
  • Explain what leads to the empty result set
  • Automatically generalize the query so that the
    generalized query will return some answers

7
Limitations of Existing Solutions
  • Require domain specific knowledge
  • Only apply to a restricted form of queries
  • Require an excessive amount of time
  • Give too many reasons why the result set is empty
  • Users cannot reuse each others query results

8
Outline
  • Limitations of previous approaches
  • Fast detection method for empty result queries
  • Some experiments

9
Our Solution
  • Only consider read-only environment
  • From previous queries execution, remember the
    query parts that lead to empty result sets
  • When a new query Q comes, match it with the
    remembered query parts. If such a match exists,
    report that Q will return an empty result set
    without executing Q
  • Utilize special properties of empty result sets
    and thus often more powerful than traditional
    materialized view method

10
Definitions
  • Empty result propagating operator An operator
    whose output is empty if any input is empty
  • Empty result propagating query A query whose
    query plan only contains empty result propagating
    operators (our focus)
  • Query part A sub-tree of a query plan
  • Atomic query part An ordered pair (relation
    names RN, selection condition SC)
  • Corresponds to a relational algebra formula
    first product join all relations in RN, then
    apply SC
  • SC is a conjunction of primitive terms, where
    each primitive term is a comparison

11
Definitions Cont.
  • Cover Atomic query part P1(RN1, SC1) covers
    atomic query part P2(RN2, SC2) if
  • RN1?RN2
  • Whenever SC2 is true, SC1 is true
  • Property Suppose atomic query part P1 covers
    atomic query part P2. For a given database, if
    the output of P1 is empty, the output of P2 is
    also empty.

12
Given an Empty Result Query
  • Find the lowest-level query part P whose output
    is empty

13
Transforming P into a Simplified Query Part Ps
  • Drop all operators (e.g., projection, hash, sort)
    that have no influence on the emptiness of the
    output
  • Replace each physical join operator with a
    logical join operator
  • Replace each index-scan operator with a
    table-scan operator followed by a selection
    operator, where the selection condition is the
    index-scan condition

14
Transforming P into a Simplified Query Part Ps
Cont.
  • Corresponding relational algebra formula
  • (?50ltA.alt100 ? A.b200 (A)) ?A.cB.d (?B.elt40 ?
    B.e50 (B))

15
Breaking Ps into Atomic Query Parts
  • Get all selection conditions in the
    selection/join operators
  • Rewrite the conjunction of these selection
    conditions into a disjunctive normal form (DNF)
  • Negations on numeric or string attributes are
    removed using complementary operators
  • Interval-based comparison is treated as a single
    primitive term
  • Generate a set of atomic query parts (RN, SC)
  • RN input relations of all table-scan operators
    in Ps
  • SC a term in the DNF

16
Breaking Ps into Atomic Query Parts Cont.
(?50ltA.alt100 (A)) ?A.cB.d (?B.elt40
(B)) (?A.b200 (A)) ?A.cB.d (?B.elt40
(B)) (?50ltA.alt100 (A)) ?A.cB.d (?B.e50
(B)) (?A.b200 (A)) ?A.cB.d (?B.e50 (B))
  • Property The following three assertions are
    equivalent to each other
  • The output of the query part P is empty
  • The output of the simplified query part Ps is
    empty
  • The output of each generated atomic query part is
    empty

17
Storing the Generated Atomic Query Parts
  • For each generated atomic query part Pa
  • Insert Pa into a collection Caqp of atomic query
    parts
  • Remove from Caqp all previously stored atomic
    query parts that are covered by Pa
  • See paper for details of the coverage checking
    algorithm

18
When Getting a New Query Q
  • Break Q into a set of atomic query parts
  • For each such atomic query part Pa, check whether
    some atomic query part Ai in Caqp covers Pa
  • If such an Ai exists for each Pa, report that Q
    will return an empty result set without executing
    Q

19
Outline
  • Limitations of previous approaches
  • Fast detection method for empty result queries
  • Some experiments

20
Setup
  • Testing environment
  • PostgreSQL 7.3.4
  • Windows XP OS
  • Dell Inspiron 8500 PC with one 2.2GHz CPU, 512MB
    memory, one 40GB disk
  • TPC-R benchmark
  • See paper for detection probability analysis

21
Overhead Experiment
  • Query Q1 Find the information about certain
    parts that were sold on certain days

select from orders o, lineitem lwhere
o.orderkeyl.orderkey and (o.orderdated1 or
or o.orderdatede) and (l.partkeyp1 or or
l.partkeypf)
22
Overhead Experiment Cont.
  • Query Q2 Find the information about certain
    parts that were sold to certain customers on
    certain days

select from orders o, lineitem l, customer
cwhere o.orderkeyl.orderkey and
o.custkeyc.custkey and (o.orderdated1 or or
o.orderdatede) and (l.partkeyp1 or or
l.partkeypf) and (c.nationkeyn1 or or
c.nationkeyng)
23
Overhead Experiment Cont.
  • The overhead of our method increases with both
    query complexity and the number of atomic query
    parts stored in Caqp
  • When check fails, the overhead of our method is
    higher than that when check succeeds

24
Overhead Experiment Cont.
  • The overhead of our method is trivial compared to
    query execution overhead

25
Summary
  • Provide a fast detection method for empty result
    queries
  • Low overhead
  • High detection probability once enough
    information has been accumulated

26
Open Issues
  • In the presence of update, correctly preserve as
    much stored information as possible
  • A hybrid method that can combine the advantages
    of both our method and the existing solutions
  • More aggressive storage saving technique
Write a Comment
User Comments (0)
About PowerShow.com