Pushing Aggregate Constraints by DivideandApproximate - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Pushing Aggregate Constraints by DivideandApproximate

Description:

1. Pushing Aggregate Constraints by Divide-and-Approximate. Ke Wang, Yuelong Jiang, Jeffrey Xu Yu, ... Anti-monotonicity is too loose as a pruning strategy. ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 25

Provided by: jiaw213

Category:

more less

Transcript and Presenter's Notes

Title: Pushing Aggregate Constraints by DivideandApproximate

1
Pushing Aggregate Constraints by
Divide-and-Approximate

Ke Wang, Yuelong Jiang, Jeffrey Xu Yu,
Guozhu Dong and Jiawei Han

2
No Easy to Push Constraints

The exists a gap between the interesting
criterion and the techniques used in mining
patterns from a large amount of data
Anti-monotonicity is too loose as a pruning
strategy.
Anti-monotonicity is too restricted as an
interesting criterion.
Should we design new algorithms to mine those
patterns that can only be found using
anti-monotonicity?
Mining patterns with general constraints

3
Iceberg-Cube Mining

A iceberg-cube mining query select A, B, C,
count() from R cube by A, B, C having
count() gt 2
Count() gt 2 is an anti-monotone constraint.

4
Iceberg-Cube Mining
R1

Another query select A, B, C, sum(M) from R
cube by A, B, C having sum(M) gt 150
sum(M) gt 150 is an anti-monotone constraint,
when all values in M are positive.
sum(M) gt 150 is not an anti-monotone constraint,
when some values in M are negative.

R2
5
The Main Idea

Study Iceberg-Cube Mining
Consider f(v) ? s
f is a function with SQL-like aggregates and
arithmetic operators (, -, , /) v is a
variable s is a constant, and ? is either or
.
Can we push the constraints into iceberg-cube
mining that are not anti-monotone or monotone? If
so, what is pushing method that is not specific
to a particular constraint?
Divide-Approximate find a stronger approximate
for the constraint in a subspace.

6
Some Definitions

A relation with many dimensions Di and one or
more measures Mi.
A cell is, didk, from Di, , Dk.
Use c as a cell variable
Use didk for a cell value (representative)
SAT(d1dk) (or SAT(c)) contains all tuples that
contains all values in d1dk (or c).
C is a super-cell of c, or c is a sub-cell of
c, if c contains all the values in c.
Let C be a constraint (f(v) ? s). CUBE(C) denotes
the set of cells that satisfy C.
A constraint C is weaker than C if CUBE(C) ?
CUBE(C)

7
An Example

Iceberg-Cube Mining select A, B, C, sum(M) from
R cube by A, B, C having sum(M) gt 150
sum(c) gt 150 is neither anti-monotone nor
monotone.
Let the space be S ABC, AB, AC, BC, A, B, C
Let sum(c) psum(c) nsum(c) gt 150.
psum(c) is the profit, and nsum(c) is the cost.
Push an anti-monotone approximator
Use psum(c) gt 150, and ignore nsum(c).
If nsum(c) is large, there are have many false
positive.
Use a min nsum in S psum(c) nsummin(ABC) gt
150.
nsummin(ABC) is the minimum nsum in S.
Use a min nsum in a subspace of S (a stronger
constraint)

8
The Search Strategy (using a lexicographic tree)
0
E
A
C
D
B
AE
AC
AD
DE
AB
CD
CE
BD
BC
BE
ADE
ABC
ACD
CDE
ACE
BCD
BDE
ABD
BCE
ABE
BCDE
ACDE
ABCD

A node represents a group-by
BUC (BottomUpCube)
Partition the database in the depth-first order
of the lexicographic tree.

ABCE
ABDE
ABCDE
9
Another Example

Iceberg-Cube Mining select A, B, C, D, E,
sum(M) from R cube by A, B, C having sum(M) gt
200
At node ABCDE, sum(12345) psum(12345)
nsum(12345) 200 250 -50. (fails).
Backtracking to ABC, psum(123) nsummin(12345)
290 - 100 190. (fails)
Then, at node ABCE, p1235, must fail.
Therefore, all tuples, t1235, can be pruned.

10
uk

0
Tree(uk)
A
E
D
B
C
uk
AD
AE
AC
AB
CE
CD
DE
BD
BC
BE
ADE
ABC
ACD
ACE
CDE
BDE
ABD
BCD
ABE
BCE
ABCD
ACDE
ABCE
u0

A node in tree(uk) is group-by attributes
A cell in tree(uk, p) is group-by values

BCDE
ABCDE
ABDE
u0

Find a cell p at u0 fails C, and then extract an
anti-monotone approximator Cp.
Consider an ancestor uk of u0, where u0 is the
left-most leaf in tree(uk).
pu denote p projected onto u (a cell of u).
tree(uk, p) pu u is a node in tree(uk).
p is the max cell in tree(uk, p) and puk is the
min cell.
In tree(uk, p).
If puk fails Cp, all cells in tree(uk, p)
fails.
Note tree(uk, p) ? tree(uk, p) if p ? p.

11
The Pruning

uk
0
Tree(uk)
A
B
C
D
E
AD
AE
AC
AB
DE
BD
BC
BE
CE
CD
ABC
ACD
ADE
ACE
ABD
BCD
CDE
BDE
ABE
BCE
ABCD
ACDE
ABCE
u0
BCDE
ABCDE
ABDE

On the backtracking from u0 to uk
Check if u0 is on the left-most path in tree(uk)
Check if puk can use the same anti-monotone
approximator as pu0
Check if puk fails Cp.
If all conditions are met, then
For every unexplored child ui of uk, we prune all
the tuples that match p on tail(ui), because such
tuples generate only cells in tree(uk, p), which
fail Cp.
tail(u) the set of all dimensions appearing in
tree(u).

12

0
A
uk
B
C
D
E
ui
AD
AE
AC
AB
DE
BD
BC
BE
CE
CD
uk
ABC
ACD
ADE
ACE
ABD
ui
BCD
CDE
BDE
ABE
BCE
ABCD
ACDE
ABCE
u0
BCDE
ABCDE
ABDE

Given a leaf node u0 and a cell p at u0.
Let the leftmost path uku0 in tree(uk), k gt 0.
p is a pruning anchor wrt (uk,u0).
Tree(uk, p) the pruning scope.

Suppose that a cell pABCDE fails.
On the backtracking from ABCDE to ABC,
If conditions are met (pABC fails)
Prune tuples such that tABCE pABCE
On the backtracking from ABC to AB,
If conditions are met (pAB fails)
Prune tuples such that tABDE pABDE from
tree (ABD)
Prune tuples such that tABE pABE from
tree(ABE)

13
The DA Algorithm

Modify BUC.
Push up a pruning anchor p along the leftmost
path from u0 to uk.
Partition the prunning anchors pushed up to the
current node, in addition to partitioning the
tuples

14
With Min-Support
0
E
A
C
D
B
AE
DE
Min-sup 3 sum(M) gt 100
AC
AB
AD
BD
BC
BE
CE
CD
ADE
BDE
BCD
ABC
ACD
CDE
ACE
BCE
ABD
ABE
BCDE
ACDE
ABCD
ABCE

Suppose cell abcd is frequent, but cell abcde is
infrequent. (Shoud stop at abcd)
If cell abcd is anchored at node A, cannot prune
ae, abe, ace, ade in tree(A, abcd).

ABDE
ABCDE
15
Rollback tree
0
B
A
E
C
D
CB
AC
AD
ED
EB
EC
Min-sup 3 sum(M) gt 100
AB
AE
DC
DB
EDC
AEC
AED
ADC
DBC
EBC
ABC
ABE
EBD
ABD
BBCD
AECD
ABCD
ABCE

RBtree(AD), RBtree(AC), RBtree(ABD), RBtree(D),
RBtree(C), and RBtree(B) do not have E.
If abcd is anchored at the root, we can prune
tuples from RBtree(D), RBtree(C), and RBtree(B).

ABED
ABCDE
16
Constraint/Function Monotonicity

A constraint C is a-monotone if whenever a cell
is not in CUBE(C), neither is any super-cell.
A constraint C is m-monotone if whenever a cell
is in CUBE(C), so its every super-cell.
A function x(y) is a-monotone wrt y if x
decreases as y grows (for cell-valued y) or
increases (for real-valued y).
A function x(y) is m-monotone wrt y if x
increases as y grows (for cell-valued y) or
increases (for real-values y).
An example sum(v) psum(v) nsum(v)
sum(v) is m-monotone wrt psum(v)
sum(v) is a-monotone wrt nsum(v)

17
Constraint/Function Monotonicity

Let a denote m, and m denote a. Let t denote
either a or m.
Example psum(v) s is a-monotone, then psum(v)
s is m-monotone
If psum(c1) s is not held, then psum(c2) s is
not true, where c2 is a super cell of c1. (say c1
is a cell of ABC, and c2 is a cell of ABCD)
f(v) s is t-monotone if and only if f(v) is
t-monotone wrt v.
f(v) s is t-monotone if and only if f(v) is
t-monotone wrt v.
An example sum(v) psum(v) nsum(v) s.
sum(v) s is m-monotone with psum(v), because
sum(v) is m-monotone wrt psum(v).
sum(v) s is a-monotone with nsum(v), because
sum(v) is a-monotone wrt nsum(v).

18
Find Approximators

Consider f(v) s.
Divide f(v) s into two groups.
A As cell v grows (becomes a super cell), f
monotonically increases.
A- As cell grows (becomes a super cell), f
monotonically decreases.
Consider sum(v) psum(v) nsum(v) s.
A nsum(v)
A- psum(v)
f(A A-/cmin) s and f(A/cmin A-) s are
m-monotone approximators in a subspace Si, where
cmin is a min cell instantiation in Si.
f(A/cmax A-) s and f(A A-/cmax) s are
a-monotone approximators in a subspace Si, where
cmax is a max cell instantiation in Si.
sum(nsum/cmax psum) s