Loading...

PPT – On the Inverse rules algorithm PowerPoint presentation | free to download - id: d5df-NTM3O

The Adobe Flash plugin is needed to view this content

On the Inverse rules algorithm

- It is guaranteed to compute the certain answers
- But, what about its efficiency?
- As presented, it computes tuples using views that

cannot contribute to the rewriting, and then

discards these tuples - We show examples, and then how to address the

problems

Example A db parenthood relation par(c, p)

A view v(C, G) - par(C, P), par(P, G) // o

nly grandchildren A query Q q(X, Y) - p

ar(X, Z), par(Z, Y) // find grandchildren

The algorithm inverts the view

par(C, f(C, G)) , par ((f(C,G), G) -

v(C,G) Given n tuples in the view, it produces 2n

tuples, then joins, the discards the results

that contain f(-,-) The bucket algorithm will spe

nd more time on rewriting, find

Q(X, Y) - v(X, Y)

And then output the n results

Example (university db) Views v1(s, c, q,

t) - registered(s, c, q), course(c, t),

c500, qa98 v2(s, p, c, q) - registered(s

, c, q), teaches(p, c, q) v3(s, c) -

registered(s, c, q), q v4(p, c, t, q) - registered(s, c, q),

teaches(p, c, q), course(c, t), qQuery q(s, p, c) - registered(s, c, q), te

aches(p, c, q), course(c, t), c300, qa95

Inverting v3 registered(s, c, f(s,c)) -

v3(s, c) This may produce any number of facts f

or registered, but for this query none can be

used why?

- v3(s, c) - registered(s, c, q),

q - q(s, p, c) - registered(s, c, q), teaches(p,

c, q), course(c, t), c300, qa95 - How should the constraint on q in v3 be

represented? - Could export it by f(s, c) conflict with f(s, c) a95 in query (how is q

in the query transformed to f(s,c)?) - But, what if the view contained no constraint?
- The view must export variables constrained in the

query - The query has a join on q with teaches teaches

facts are derived only from other views, so q

will be exported as a different function symbol,

or as q (which of these here?) - ? a join will fail (cannot join f1(-,-) with

f2(-,-) or a regular variable) - ? The view must export join variables of the

query

The factors that determine usability of a view

are the same as in the bucket algorithm, but the

inverse rules algorithm tries to use all views

anyway Solution compose query with inverse rul

es, to obtain a new query that uses directly the

views Composition Consider the heads of inverse

rules as a db collection of facts

Look for valuations mapping of query variables

that map query atoms to this db

Then repalce query goals by views

Example A db parenthood relation

par(c, p) A view v(C, G) - par(C, P), par(P,

G) // only grandchildren A query Q q(X

, Y) - par(X, Z), par(Z, Y) // find

grandchildren The algorithm inverts the view

par(C, f(C, G)) , par ((f(C,G), G) - v

(C,G) Two candidate valuation mappings X ? C, Z

? f(C,G), Y ? G ? q(C, G) - v(C, G),

v(C, G) X ? f(C, G), Z ? ,G, Y ? f(C, G) ? (assu

ming we add CG)

q(f(G, G), f(G,G)) - v(G, G),

v(G, G) 2nd is discarded no function symbols in

result Minimization of 1st gives q(C, G) - v(C,

G), same as bucket

db

- q(s, p, c) - registered(s, c, q), teaches(p,

c, q), course(c, t), c300, qa95 - registered(s, c, f(s, c)), f(s, c)v3(s, c)
- Any valuation that uses this fact must map q ?

f(s, c) - The constraint f(s, c) f(s,c)a95,
- but what if there is no constraint to

export? - The mapping q? f(s, c) cannot be used to map

teaches to any fact derived from other views - ? v3 cannot be used

- A mapping will fail to define a valuation if
- a view does not export a join variable, and does

not contain the join (why?) - The view does not export a variable that is

constrained in the query (cannot check the

constraint in the db) - Thus, the results (for a CQ query, possibly with

constraints) will be the same as for bucket

(assuming it is correct complete) - The amount of work invested will probably be

similar - Composition can be performed also for Datalog

queries, but weeding out useless mappings is more

difficult

The MiniCon algorithm --- the final one?

- Motivation
- Preliminaries
- The MiniCon algorithm

Motivation

- Previous algorithms bucket, inverse

rules, may be quite expensive to use, especially

for systems with many views. - The bucket algorithm has a narrow peephole in 1st

stage each bucket is for a single atom - ? global constraints are treated only in 2nd

stage - ? Many useless combinations may be examined
- The inverse rules algorithm improved by

composition, seems to perform similar work - The motivation find an algorithm that will do

more work in preliminary filtering, and will

scale up to hundreds of views

Preliminaries

- The idea
- Once a view is put in a bucket of a query atom,

switch to considering join variables and find

which other atoms are necessarily covered by the

view - Along the way, find out also which view head

variables need to be equated - Given coverage by views, combine views with

disjoint covers - Expected gain
- more filtering in the 1st stage,
- better representation of information
- ? A smaller number of combinations, reduced

number of containment checks in the 2nd stage

Example A db parenthood relation par(c, p)

A view v(C, G) - par(C, P), par(P, G) // o

nly grandchildren A query Q q(X, Y) - p

ar(X, Z), par(Z, Y) Bucket one view in e

ach bucket par(X, Z) v(X,G) par(Z

, Y) v(P, Y) When the two view atoms are com

bined, a containment check discovers that GY ?

containment, redundancy of 2nd atom

Alternative given par(X, Z) v(X,G), since Z

(join var) occurs in 2nd atom of query, add

par(Z, Y) to coverage of v(X,G), with GY

In 2nd stage, just use v(X, Y)

- Assumptions, terminology
- CQ queries and views, for now no constants /

constraints in query/views - View definitions use variables different from

those in query or other views (disjoint sets of

variables) - b(Q) body atoms of Q, b(V) body atoms of view

V - A mapping from vars(Q) to a vars(V) is

interesting only if it maps a non-empty subset

of b(Q) to b(V) - Considered mappings always map Q head vars to V

head vars head var preservation (hvp) - If h maps x in vars(Q) to an existential var in

some V, then all atoms of b(Q) that contain x

must be mapped to same V - join variable condition --- (jvc)

- Given Q(X), assume Q is a rewriting in terms of

views - Q q(X) - v1(X1), , vn(Xn)
- (some vi, vj may be occurrences of

same view v) - Exists containment mapping h from Q to

exp(Q) (satisfies hvp) - Let
- Gi be the set of atoms of b(Q) mapped to

b(exp(vi)) - h/i h restricted to vars(Gi)
- Then
- And Gi satisfies (jvc)
- if h/i maps x of vars(Gi) to existential

variable of vi, - then every atom g in b(Q) that contains

this atom is in Gi

The occurrence of vi in Q may have some head

variables equated Example the original hea

d might be vi(A, B, C) the head in Q vi(X

, X, Z) These equalities are given by a unique le

ast set of equality constraints Ei

(v/E -- the view v, with head variables equated

as specified by E) Summary (so far) the contain

ment mapping can be decomposed into disjoint

components (vi, Ei, h/i , Gi)

All we need to do is find such components, then

combine them What is the condition for successful

combination? Does a combination (s.t.

) ever fail

?

- To find such components, we must use the given

view definitions (variables different from those

of Q or exp(Q)). - Answer a component and its mapping can be

expressed as - Here
- hi is a mapping from Q to the given view

definition for vi - Ei the least set of equalities that make

hi a good mapping - hi is a variable renaming
- Ei and hi depend only on Q and the definition of

vi - We can find components mappings from Q to the

view defs, then combine rename, possibly

equating more head vars

h/i

Gi

exp(vi(Xi))

hi

hi

vi/Ei

- One more step
- A component (vi, Ei, hi , Gi) may be further

decomposed into smaller components (vi, Ei1, hi1

, Gi1), (vi, Ei2, hi2 , Gi2) provided - each of Gi1, Gi2 satisfies (jvc), and they are

disjoint - Each of Ei1, Ei2 is a subset of Ei, least sets

for the mappings hi1, hi2 to be ok - When these are combined, Ei1 union Ei2 is

augmented with the remaining equalities of Ei - Minimal such components
- Easier to find
- Can be re-used for different combinations.

- What is a minimal component?
- C (vi, Ei, hi, Gi) is minimal if
- hi satisfies (hvp) (jvc) (assuming the

equalities in Ei) - There is no component C1 whose last three

components are contained in Cs last three

components (at least one is proper containment) - A component minicon (mini containment)

description -- MCD - The algorithm constructs and combines minimal MCDs

The MiniCon Algorithm

- Minimal MCD Construction Algorithm
- For each g in b(Q), each k in each b(vi)
- Let E(g,k) be the least set of equalities s.t.

a mapping h(g,k) from g to k that satisfies (hvp)

exists - // E(g,k)

and h(g,k), if they exist, - // are

uniquely determined by g, k - If E(g,k) and h(g,k) exist
- find all minimal MCDs that extend them
- (vi, Ei, hi, Gi) extends if
- Ei contains E(g,k), hi contains

h(g,k), Gi contains g - For the final set of MCDs remove duplicates

- How do we find minimal MCDs that extend a given

mapping? - I. Extension to one more query atom, one view

atom - extend (vi, E, h, g, k) // E equalities on head

vars of vi - // h

vars(Q) ? vars(vi), partial, hvp with E - // g in

b(Q), k in b(vi) - try to extend h to map g to k, with hvp, by

adding equalities to E - return fail, or the (uniquely determined)

E,h - (The first step in alg. of previous page is this

one, given empty E and h)

- How do we find minimal MCDs that extend a given

mapping? - II. Extend repeatedly, as long as needed and

successful - Given vi, g, k , E(g,k) and h(g,k)
- Let C (vi, E(g,k), h(g,k), g, MC

//C initial component, (jvc) possibly not

satisfied - While C not empty
- remove some c (vi, E, h, G) from C
- if (jvc) satisifed put in MC
- if not, exists x in vars(Q) s.t. h(x) is

existential, g that contains x, g not in G - for each k in b(vi)
- if extend(vi, E, h, g, k)

succeeds, put extension in C - Remove duplicates from MC

- Example
- A db parenthood relation par(c, p)
- A view v(C, G) - par(C, P), par(P, G) //

only grandchildren - A query Q q(X, Y) - par(X, Z), par(Z, Y)

- MCDs
- 1st query atom, 1st view atom h(1,1) X?C, Z?

P, E(1.1) - need to extend to par(Z, Y), can only map to

2nd view atom - MCD (v, E, hX?C, Z?P, Y?G, b(Q))
- 1st query atom, 2nd view atom no mapping
- The only MCD is the above

Comment In the paper, if (vi, Ei1, hi1, Gi

1) and (vi, Ei2, hi2, Gi2) are both minimal

extensions, and Gi1 is contained in Gi2, then

the 2nd is thrown away (another minimization)

I do not know how to explain this optimization,

or prove that with it the algorithm is still

complete

2nd phase MCD combination, and variable renaming

A set of MCDs (vi, Ei, hi, Gi) is a candidate

if For each candidate set Rename variables

for each view variable y If hi(x) y (y

a view variable), rename y to x

else rename y to a fresh distinct

variable Note if x in domain of both hi, hj ,

then hi(x), hj(x) are head variables of vi, vj

(by def of MCD), ? renaming makes them equal

Example (contd) A db parenthood relation p

ar(c, p) A view v(C, G) - par(C, P), par(P, G

) // only grandchildren A query Q q(X,

Y) - par(X, Z), par(Z, Y)

MCD (v, E, hX?C, Z?P, Y?G, b(Q))

Rename in v C to X, G to Y Rewriting q(X, Y) -

v(X, Y)

- Example
- A db parenthood relation par(c, p)
- A view v(C, G) - par(C, P), par(P,

G) // only grandchildren - A query Q q(X, X) - par(X, Z), par(Z, X)

// I am my own grandpa - MCDs
- 1st query atom, 1st view atom h(1,1) X?C, Z?

P, E(1.1) - need to extend to par(Z, X), can only map to

2nd view atom - MCD (v, CG, X?C, Z?P, b(Q))
- 1st query atom, 2nd view atom no mapping
- The only MCD is the above

- Example
- A db parenthood relation par(c, p)
- A view v(C, P) - par(C, P), par(P,

G) -

// parents where grandparents exist - A query Q q(X, Y) - par(X, Z), par(Z, Y)

- MCDs
- h(1,1) X? C, Z? P, E(1.1)
- ? MCD A1 ( v(C, P), , h(1,1),

par(X,Z) ) - h(1, 2) X? P, Z ? G, E(1,2), fails

(why?) - h(2, 1) Z? C, Y ? P, E(2,1)
- ? MCD A2 ( v(C, P), , h(2,1), ,

par(Z,Y) ) - h(2, 2) Z? P, Y ? G, fails (why?)

A view v(C, P) - par(C, P), par(P,

G) A query Q q(X, Y) - par(X, Z), pa

r(Z, Y) MCDs A1 ( v(C, P), , h(

1,1), par(X,Z) ) A2 ( v(C, P), , h(2,1),

par(Z,Y) ) Rewritings (rename views to hav

e distinct vars) A1A2 X? C1, Z? P1, Z? C2, Y ?

P2 add P1 (in 1st v) C2 (in 2nd v)

rewriting v(C1,P1), v(P1, P2)

renaming v(X, Z), v(Z, Y) a correct

rewriting

- When Q or views contain constants
- MCD formation
- a of Q must be mapped to a head variable of vi,

or itself - If x is in headvar(Q), it can be mapped to

headvar(vi) or to a - Whenever x is mapped to a, hi records this fact
- MCD combination
- If A1, A2 are defined on x, then allow also
- Both map x to a
- One maps x to a, the other to head var of view
- In either case, rename x to a in rewriting

- When Q or views contain comparisons
- If views contain comparisons, no change to

algorithm (it finds contained

rewritings anyway) - If Q contains comparisons, then there may be no

Datalog program that computes the certain answers

(can express x ! y) - But, we can expect that extending the algorithm

for comparisons will be a good heuristics, and

will find certain answers in many cases

- When Q or views contain comparisons
- C(Q) constraints of Q (closed under inference)
- MCD formation (vi, Ei, hi, Gi) (extend the

join variable condition) - If hi(x) is existential of vi, and c(x, y) in

C(Q), then hi(y) is defined - C(vi) must imply all constraints in hi(C(Q))

that involve at least one existential of vi - MCD combination
- Add all constraints of C(Q) not covered by those

of the views