Title: A Scalable Algorithm for Answering Queries Using Views
1A Scalable Algorithm for Answering Queries Using
Views
- Rachel Pottinger
- Qualifying Exam
- October 29, 1999
- Advisor Alon Levy
2Answering Queries Using Views
- Problem access views instead of original
relations - Useful in data integration and query optimization
- NP-Complete
- Many papers on the subject
- No empirical testing of algorithms
3Data IntegrationQuery Reformulation
- Data sources are pre-calculated views
- Views are not complete
- Get the most answers possible given the views
- Many data sources
Car sale information
Ford cars - dealer prices - sticker prices -
inventory
Cheap cars - prices -manufacturer
Used cars - prices - dealer - year
4Data Integration Example
Query find the prices of cars that we can buy at
cost
Database relations
Query
- Q(cost)-dealercost(car,cost)
stickerprice(car,cost) - V1(price1,price2)-dealercost(car, price1)
- stickerprice(car, price2) maker(car, Ford)
- V2(cost)-dealercost(car, cost)
stickerprice(car,cost) cheap(car) - Q1(cost)-Ford(cost, cost) ?
Q2(cost)-BMW(cost)
Views
existential
distinguished
Maximally contained rewriting
Conjunctive rewritings
5Outline
- Previous algorithms
- Bucket Algorithm Levy, Rajaraman, Ordille, 1996
- Inverse rules Duschka, Genesereth, 1997
- Minimum Necessary Connections (MiniCon) Algorithm
- Experimental evaluation
- Extension to arithmetic comparisons
- Conclusions and future work
6The Bucket Algorithm
- Introduced as part of Information Manifold
- Treats subgoals individually
7Bucket Algorithm Populating buckets
- For each subgoal in the query, place relevant
views in the subgoals bucket - Inputs
- Q(x)- r1(x,y) r2(y,x)
- V1(a)-r1(a,b)
- V2(d)-r2(c,d)
- V3(f)- r1(f,g) r2(g,f)
Buckets
8Combining Buckets
- For every combination in the Cartesian products
from the buckets, check containment in the query
- Candidate rewritings
- Q1(x) - V1(x) V2(x) ?
- Q2(x) - V1(x) V3(x) ?
- Q3(x) - V3(x) V2(x) ?
- Q4(x) - V3(x) V3(x) ?
Bucket Algorithm will check all possible
combinations
Buckets
r1(x,y)
r2(y,x)
9Inverse Rules
- Part of the Info Master system
- Inverse rules show how to get database tuples
from the views - Cannot be extended to interpreted predicates
- Stops earlier than the Bucket Algorithm
10Creating Inverse Rules
For each V(X)-r1(X1) rn(Xn) for each j 1,
, n form an inverse rule rj(Xj)-V(X)
- Inverse Rules
- IR1 r1(a, sfV1(a)) -V1(a)
- IR2 r2(sfV2(d),d) -V2(d)
- IR3 r1(f,sfV3(f)) -V3(f)
- IR4 r2(sfV3(f),f) -V3(f)
- Inputs
- V1(a)-r1(a,b)
- V2(d)-r2(c,d)
- V3(f)- r1(f,g) r2(g,f)
Skolem Function
11Combining Inverse Rules
At query time, query over rules
- Inverse Rules
- IR1 r1(a, sfV1(a)) -V1(a)
- IR2 r2(sfV2(d),d) -V2(d)
- IR3 r1(f,sfV3(f)) -V3(f)
- IR4 r2(sfV3(f),f) -V3(f)
- Tuples
- V1(g)
- V2(h)
- V3(j)
- V3(m)
Query
Q(x)-r1(x,y) r2(y,x)
Expansion r1(g,sfV1(g)), r2(sfV2(h),h),
r1(j,sfV3(j)), r2(sfV3(j),j) r1(m,sfV3(m)),
r2(sfV3(m),m)
12Unfolding rules before tuples
IR1 IR3
IR2 IR4
Use unification to see if rewriting is contained
in the query No containment check necessary
13The MiniCon Algorithm
- Concentrate on variables rather than subgoals to
create MiniCon Descriptions (MCDs) - Combine MCDs that only overlap on distinguished
view variables - No containment check!
14MiniCon Description Formation
- Form all MiniCon Descriptions (MCDs) that map all
query variables that have to be mapped together - Inputs
- Q(x) -r1(x,y) r2(y,x)
- V1(a)-r1(a,b)
- V2(d)-r2(c,d)
- V3(f)- r1(f,g) r2(g,f)
MCDs
15MiniCon Combination
- Take all combinations of MCDs that
- map disjoint sets of subgoals
- map all subgoals of the query
- MCDs
Rewriting Q(x)-V3(x)
16Experimental Evaluation
- Tested performance and scale up of
- Bucket Algorithm
- Inverse Rules extended with unification
- MiniCon Algorithm
- MiniCon at least as good in all cases, much
better in some - Show results for chain queries
- Q(a)-r1(a,b), r2(b,c), r3(c,d), r4(d,e)
17Many Rewritings
18Few rewritings, very structured query and views
19Few rewritings, less structured views
20ExtensionInterpreted Predicates
- Problem is in general undecidable
- We looked at subgoals of the form
- var lt constant or var gt constant
- If maps to an existential view variable, require
interpreted predicates implied - Ex Q(x)-r1(x,y), y gt 17
- V1(a)-r1(a,b), b gt 18
- Guaranteed to be sound
Interpreted Predicates
21Interpreted Predicate Results
22Future Work
- Query Optimization
- Look for the fastest answer to query
- Assume that all views are complete
- Require equivalent rewritings
- Need to allow overlap on subgoals mapped
- A fuller comparison of interpreted predicates
23Conclusions
- Scalability of previous algorithms understood
- MiniCon Algorithm invented
- First experimental comparison of algorithms for
answering queries using views - Extensions to binding patterns, interpreted
predicates - New maximally contained rewriting form
24Maximally contained Rewritings
- Q is a maximally contained rewriting of a query
Q using the views V V1, , Vn if - For any database D, and extensions v1, , vn of
the views such that vi ? Vi(D), 1 ?i ?n, then
Q(v1, , v2) ? Q(D) for all i - There is no other query Q1 such that
- Q(v1, , vn) ? Q1(v1, , vn)
- (2) Q1(v1, , vn) ? Q(D), and there exists at
least one database for which ? is a strict subset
25Containment Checks
- Q1 ? Q2 if the answer to Q1 is a subset of Q2
- m is a containment mapping from Vars(Q2) to
Vars(Q1) if - m maps every subgoal in the body of Q2 to a
subgoal in the body of Q1 - m maps the head of Q2 to the head of Q1
26Inverse Rules With Unification
- Find all Inverse Rules that match each query
subgoal place in bucket for that subgoal - For each rule in the first bucket
- For each other subgoal, i, attempt to unify the
rules so far with all elements in the bucket for
I - If we cannot unify with anything in that bucket,
break out of loop, otherwise, recurse
27Correctness requirements
- We need both soundness and completeness
- A sound rewriting has a valid containment mapping
from the variables of the query to the variables
of the view - For completeness we need only to check rewritings
of length less than or equal to that of the query
28Extensions to XML
- Need to choose a query language
- Containment checks should still hold
- Need to check to make sure that restructured
elements are distinguished - May even be more scalable vs Inverse Rules,
Bucket Algorithm