Efficient Query Rewriting for Inconsistent Databases - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Efficient Query Rewriting for Inconsistent Databases

Description:

The query we used in the example only has joins between a non-key and a key attribute ... Get all employees who have the same salary as two other employees ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 52
Provided by: a15433
Category:

less

Transcript and Presenter's Notes

Title: Efficient Query Rewriting for Inconsistent Databases


1
Efficient Query Rewriting for Inconsistent
Databases
  • Ariel D. Fuxman, Renée J. Miller
  • Dept. of Computer Science
  • University of Toronto

2
Motivation
and well after it is created, we realize that
some constraint is missing
  • Suppose we create a database

Institute ! City
3
Semantics
  • How do we query an inconsistent database?
  • Repairs of the database ABC99.
  • Certain answers, where the possible worlds are
    the repairs of the database.

4
Repairs
Repair 1
Inconsistent database
Repair 2
Institute ! City
5
Consistent Query Answering
  • qGet cities where there is some institute

Repair 1
q
Repair 2
q
consistentS(q,I)(San Jose)
q(I)(Markham), (Toronto), (San Jose)
6
Computation
  • Goal Efficient computation of consistent query
    answers.
  • avoid the explicit construction of the repairs.
  • query the inconsistent database directly.
  • reuse existing database technology.

7
Query Rewriting
q,?
First-Order
Rewriting Algorithm
Q
RDBMS
  • t 2 Q(I) iff t 2 consistent?(q,I), for every
    instance I

8
Problems
  • Retrieving consistent answers is a hard problem
  • coNP-hard in general for conjunctive queries and
    one f.d. per relation. CM02,CLR03
  • First-order query rewriting algorithms have been
    given for quantifier-free queries ABC99
  • Quantifier-free queries do not allow projections.

9
Our Goals
  • Develop a first-order query rewriting algorithm.
  • Characterize a class of queries for which the
    algorithm is correct.
  • Main requirement the class must capture most
    queries that arise in practice.

10
Setting
  • The database is inconsistent wrt at most one key
    dependency (primary key) per relation of the
    schema.
  • Convention the key attributes of a relation are
    underlined in the query.
  • q9 in,ci. R1(in,ci) Æ R2(ci,canada)

11
Query Rewriting Example 1
  • q(in) R1(in,toronto)
  • Get institutes in Toronto
  • Q(in) R1(in,toronto) Æ
  • (8 ci.R1(in,ci) ! citoronto)

Institute ! City
12
Query Rewriting Example 2
  • q(in)9 ci. R1(in,ci) Æ R2(ci,canada)
  • Get institutes in Canada

R2
R1
Institute ! City
City ! Country
  • consistentS(q,I)(TO Lab)

13
Query Rewriting Example 2
  • q(in)9 ci. R1(in,ci) Æ R2(ci,canada)

R2
R1
  • Q(in)9 ci.R1(in,ci) Æ R2(ci,canada) Æ
  • (8 ci.R1(in,ci) !(9 co.R2(ci,co) Æ
  • (8 co.R2(ci,co) ! cocanada)))

14
Query Rewriting Example 2
  • q(in)9 ci. R1(in,ci) Æ R2(ci,canada)

R2
R1
  • Q(in)9 ci.R1(in,ci) Æ R2(ci,canada) Æ
  • (8 ci.R1(in,ci) !(9 co.R2(ci,co) Æ
  • (8 co.R2(ci,co) ! cocanada)))

15
SQL Rewriting
  • Schemas R1(IN,CI) R2(CI,CO)
  • SELECT r1.IN
  • FROM R1 r1, R2 r2
  • WHERE r1.CIr2.CI AND r2.COcanada
  • AND NOT EXISTS
  • (SELECT FROM R1 r1 WHERE r1.INr1.IN AND
  • (NOT EXISTS (SELECT FROM R2 r2
  • WHERE r1.CIr2.CI)
  • OR (EXISTS (SELECT FROM R2 r2
  • WHERE r1.CI r2.CI
  • AND r2.CO ltgtcanada))))

16
Class of queries
  • For what class of conjunctive queries does the
    algorithm work?

17
Join graph
  • q9 x,y,z,w R1(x,y) Æ R2(y,z) Æ R3(z,w) Æ R4(y,a)

R1
R2
R3
R4
Condition The join graph of the query is a tree.
18
A practical class
  • Condition join graph is a tree
  • Allows key-to-key and non-key to key joins
  • The query must be acyclic
  • We capture many queries that arise in practice.
  • 20 out of 22 queries in the TPC-H standard are
    within the class.

19
Future Work
  • Theory
  • Incompleteness inconsistency
  • Find a dichotomy for the class of conjunctive
    queries.
  • Find a rewriting for queries in more expressive
    query languages.
  • Inclusion dependencies
  • Practice
  • Implement the rewriting for SQL.
  • Experiment with the performance of the
    translation
  • Provide optimization techniques
  • Integrate the techniques into a data exchange
    environment

20
Bibliography
J. Chomicki and J. Marcinkowski. On the
Computational Complexity of Consistent Query
Answers. coRR cs.DB/0204010, 2002. M. Arenas, L.
Bertossi, and J. Chomicki. Consistent Query
Answers in Inconsistent Databases, Proc. ACM
PODS, 1999. Andrea Calì, Domenico Lembo,
Riccardo Rosati. On the decidability and
complexity of query answering over inconsistent
and incomplete databases, Proc. ACM PODS,
2003. Oliver Duschka, Michael Genesereth. Query
Planning with Disjunctive Sources. AAAI Workshop
on AI and Information Integration.
21
Rewritable queries
  • The problem of computing consistentS(q,I) is
    not tractable for the following queries.
  • q9 x,z,y R(x,y,a) Æ R(z,y,b)
  • q9 x,y,z R(x,z) Æ R(x,y) Æ R(z,y)
  • Notice that both queries have joins between
    non-key attributes.

22
Rewritable queries
  • The query we used in the example only has joins
    between a non-key and a key attribute
  • q(in)9 ci R1(in,ci) Æ R2(ci,US)
  • This is the most common situation in practice.

23
A class of rewritable queries
  • Theorem Let q be a conjunctive query containing
    only joins between non-key and key attributes.
    Then, RewriteConjunctive produces a first-order
    rewriting Q of q. Furthermore,
    RewriteConjunctive runs in polynomial time in the
    size of q.

24
Applications
  • Data Integration
  • Data Exchange (Clio,)
  • Data Cleaning (dynamic data cleaning)
  • Peer-to-peer data exchange

25
Example A tractable query
Are there two employees with the same salary?
Inconsistent instance
Graph of the inconsistent instance
1000
John
Mary
2000
Anna
3000
Employee ! Salary
26
Example A tractable query
)
1000
John
Mary
2000
Anna
3000
27
Example A tractable query
)
1000
John
Mary
2000
Anna
3000
28
Example A tractable query
(
1000
John
Mary
2000
Anna
3000
29
Example A tractable query
(
1000
John
Mary
2000
Anna
3000
30
Generalization
  • Arbitrary conjunctive queries with inequalities
    on one binary relation.
  • Reduction to degree-constrained subgraph problem,
    which can be solved by network-flow techniques.
    The running time of the algorithm is O(nk2).

31
Practical Considerations (I)
Conflicts are usually confined to a small portion
of the database
Robert
4000
Fred
5000
Paul
6000
7000
Peter
1000
John
2000
Mary
Anna
3000
32
Practical Considerations (I)
Conflicts are usually confined to a small portion
of the database
1000
John
2000
Mary
Anna
3000
33
Practical Considerations (II)
Reasonable assumption constant number of
conflicts per key.
Financial
Global Database
Employee ! Salary
Human Resources
Employee ! Salary
Employee ! Salary
34
Practical Considerations (II)
  • Assuming bounded number of conflicts per key,
  • queries with some free variables are easier than
    boolean queries!
  • Are there three employees with the same salary?
  • Get all employees who have the same salary as two
    other employees
  • Q can be solved in O(n2), as opposed to O(n3)
    for Q

35
Motivation (Data Exchange)
Source 1 Institute City
Target Institute City
Institute ! City
Institute ! City
Source 2 Institute City
Institute ! City
36
Motivation (Data Exchange)
Source1
Target
Institute ! City
Source2
Institute ! City
Mapping Source1(in,ci) µ Target(in,ci) Source2(in,
ci) µ Target(in,ci)
Institute ! City
37
Motivation (Data Exchange)
Source1
Target
Institute ! City
Source2
Institute ! City
Institute ! City
38
Motivation (Data Exchange)
  • Approaches in data exchange
  • Say that there is no target database for the
    given sources.
  • Construct a target database, disregarding the
    target constraints. The database may be
    inconsistent wrt the constraints.

Institute ! City
39
Inconsistent Databases
  • Both examples have in common that we end up with
    a database that is inconsistent wrt a set of
    constraints.

40
Repairs
  • Definition Let S be set of integrity
    constraints, and I be database instance. We say
    that a database instance I is a repair of I wrt
    S iff the following conditions hold
  • I ² S, and
  • D(I,I) is minimal under set inclusion in the
    class of instances that satisfy S, that is, there
    is no instance I' such that I' ² S and
    D(I,I) ½ D(I,I)

41
Consistent Query Answers
  • Definition Let I be a database instance,
    possibly not satisfying a set S of integrity
    constraints. Let q be a query. We say that a
    tuple t is a consistent answer wrt S, denoted t 2
    consistentS(q,I), if for every repair I of I wrt
    S, t 2 q(I).

42
Query Rewriting An Example
  • q(in)9 ci R1(in,ci) Æ R2(ci,US)
  • Q(in)9 ci R1(in,ci)

R2
R1
in
43
Query Rewriting An Example
  • q(in)9 ci R1(in,ci) Æ R2(ci,US)
  • Q(in)9 ci R1(in,ci) Æ
  • (8 ciR1(in,ci) !

R2
R1
ci
in
44
Query Rewriting An Example
  • q(in)9 ci R1(in,ci) Æ R2(ci,US)
  • Q(in)9 ci R1(in,ci) Æ
  • (8 ciR1(in,ci) !(9 coR2(ci,co)

R2
R1
ci
in
45
Query Rewriting An Example
R2
R1
ci
co
in
46
Motivation (Data Cleaning)
  • We could clean the database

but we might miss some answers. Is TO Lab an
IBM institute?
YES
NO
Institute ! City
47
Motivation (Data Cleaning)
  • Or we may not have enough information to do the
    cleaning

Institute ! City
48
Repairs
Repairs must differ minimally from the
inconsistent database
The following is not a repair
Inconsistent database
Institute ! City
49
Consistent Query Answers
  • qGet all IBM institutes

Repair 1
q
Repair 2
q
  • consistentS(q,I)(TO Lab),(Almaden)

50
Consistent Query Answering
  • Based on the notion of certain answers.
  • But the possible worlds are the repairs.
  • Intuition
  • Input Query q.
  • Apply q to every repair, and obtain a query
    result from each one.
  • We say that a tuple t is in the consistent answer
    if it appears in every query answer.

51
SQL Rewriting
  • Schemas R1(IN,CI) R2(CI,CO)
  • SELECT r1.IN
  • FROM R1 r1 JOIN R2 r2 ON r1.CIr2.CI
  • WHERE r2.CO canada AND
  • NOT EXISTS
  • (SELECT
  • FROM R1 r1 JOIN R2 r2
  • ON r1.CI r2.CI
  • WHERE r1.INr1.IN AND r2.CO ltgtcanada)
Write a Comment
User Comments (0)
About PowerShow.com