Title: On Answering Queries in the Presence of Limited Access Patterns
1On Answering Queries in the Presence of Limited
Access Patterns
- Chen Li
- Stanford University
- joint work with Edward Chang, UC Santa Barbara
2A movie database
Q(Award) - r(henry fonda,Movie),
s(Movie,Award)
s(Movie, Award)
3Limited access patterns
Should provide a star.
Should provide a movie.
s(Movie, Award)
4Answering Q given the restrictions
Q(Award) - r(henry fonda,Movie),
s(Movie,Award)
s(Movie, Award)
5The answer is complete
- We did not retrieve all the tuples from the
relations. - Still we computed all tuples in the answer to the
query.
Q(Award) - r(henry fonda,Movie),
s(Movie,Award)
s(Movie, Award)
6Change the restriction
- We cannot compute the complete answer to Q.
- There can always be some tuples that are not
retrievable.
Q(Award) - r(henry fonda,Movie),
s(Movie,Award)
s(Movie, Award)
7General questions
- Given a query on relations with limited access
patterns, can we compute its complete answer by
accessing the relations with legal patterns? - Stable queries
- Different classes of queries
- Another problem studied testing query
containment in the presence of binding patterns.
8Rest of the talk
- Binding patterns, query stability
- Testing stability of queries
- Conjunctive queries
- Unions of conjunctive queries
- Conjunctive queries with arithmetic comparisons
- Datalog queries
- Dynamic computability of complete answer to
conjunctive queries - Conclusion and related work
9(I) Binding patterns
- Attributes with adornments
- b bound
- f free
- Example
- r(Starb, Movief), s(Movieb, Awardf)
- A relation can have multiple binding patterns.
10- Reasons of the restrictions
- Web search forms
- Legacy databases
- Security concerns
- Observations
- If a relation does not have an all-free
binding pattern, then after certain queries are
sent to this relation, there can always be some
tuples that have not been retrieved.
11Query stability
- A query Q on relations with binding patterns is
stable if for any database, we can compute Qs
complete answer by accessing the relations with
legal patterns. - The complete answer is the computable answer if
we could retrieve all the tuples from the
relations. - Use partial tuples to derive the complete answer
we need reasoning.
12Assumptions about bindings
- Use values from Q and results from the relations
as bindings - The definition says for any database
- Relations not in the query can be assumed to be
empty - Not allowed try arbitrary strings as bindings to
access the relations - Does not terminate
- Impractical
13(II) Testing stability of queries
- Conjunctive query
- q(X) - g1(X1),,gn(Xn)
- Feasible order of some subgoals of a CQ Q.
- Each subgoal in the order is executable
- That is, we have enough bound variables to
satisfy one binding pattern of the relation - Example
- Q(Award) - r(henry fonda,Movie),
- s(Movie,Award)
14Feasible CQs
- A CQ is feasible if it has a feasible order of
all its subgoals. - Lemma A feasible CQ is stable.
- Testing feasibility of a CQ
- A greedy algorithm Inflationary
15What if Q is not feasible?
- Q(Award) - r(henry fonda,Movie),
- s(Movie,Award),r(Star,Movie)
- Not feasible variable Star cannot be bound
- Equivalent to the old query
- Q(Award) - r(henry fonda,Movie),
- s(Movie,Award)
- The new query Q is stable!
16Testing stability of a CQ
- Theorem
- A CQ Q is stable iff its minimal equivalent Qm
is feasible. - Minimal equivalent query Qm
- Qm is unique
17Main idea of the proof
- Construct two databases of the relations
- They have the same observable tuples, but yield
different answers to the query - Thus, we cannot tell whether the computed answer
is complete or not
Same observable tuples
Different answers to Q
18Two algorithms for CQs
- Algorithm CQStable
- Minimize Q, get its minimal equivalent Qm
- Test feasibility of Qm by calling Inflationary
- Algorithm CQStable
- Compute all executable subgoals of Q
- If all subgoals become executable, then Q is
stable - Otherwise, test equivalence between Q and the new
query with the executable subgoals - CQStable is more efficient than CQStable
- Testing stability of a CQ is NP-complete.
19Other classes of queries
- Unions of CQs two algorithms
- CQs with arithmetic comparisons
- An algorithm for the testing stability
- Datalog queries
- Undecidable
- Give a sufficient condition for stability of
Datalog
20(III) Dynamic computability of complete answer to
CQs
- For a nonstable CQ Q, for certain database,
Qs complete answer might be computed.
21An example
- Q1 ans(B) - r(a,B,C),s(C,D)
- Not stable
- For the following database, we can still compute
Q1s complete answer b1,b2.
r(Ab, Bf, Cf)
s(Cf, Db)
p(Df)
d1
a
b1
c1
d1
c1
d2
a
b2
c2
d2
c2
a
b2
c3
22Change the head argument
- Q2 ans(D) - r(a,B,C),s(C,D)
- Still not stable
- For the database, we cannot compute Q2s complete
answer.
r(Ab, Bf, Cf)
s(Cf, Db)
p(Df)
d1
a
b1
c1
d1
c1
d2
a
b2
c2
d2
c2
a
b2
c3
23Difference between Q1 and Q2
- b f f f b
- Q1 ans(B) - r(a,B,C),s(C,D)
- Q2 ans(D) - r(a,B,C),s(C,D)
- Q1s head argument B is bound by the executable
subgoal r(a,B,C). - Q2s head argument D is not bound by the
executable subgoal r(a,B,C).
24Generalization
- q(X) - g1(X1), , gk(Xk),
- gk1(Xk1), , gn(Xn)
- Executable subgoals E g1(X1),, gk(Xk)
- If all arguments in X are bound in E
- we might compute its complete answer.
- The computability is database dependent.
- If some arguments in X are not bound in E
- we can never compute its complete answer.
- Unless the relation after the subgoals in E is
empty.
25A decision tree
- It guides the planning process of computing the
complete answer to a query. - Two approaches while traversing the tree
- optimistic
- pessimistic
26Conclusion
- Stability of queries with binding patterns
- Various classes of queries
- CQs (two algorithms)
- Unions of CQs (two algorithms)
- CQs with arithmetic comparisons (one algorithm)
- Datalog (undecidable)
- Dynamic computability of a CQs complete answer
- Another contribution decidability result of
testing relative query containment with binding
restrictions
27Related work
- Answering queries using views with binding
patterns RSU95 - Query optimization YLUGM99,FLMS99
- Computing maximal answer to queries DL97,LC00
-
- Our work considers whether the complete answer
to a query is computable.