Pushing Aggregate Constraints by DivideandApproximate - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Pushing Aggregate Constraints by DivideandApproximate

Description:

1. Pushing Aggregate Constraints by Divide-and-Approximate. Ke Wang, Yuelong Jiang, Jeffrey Xu Yu, ... Anti-monotonicity is too loose as a pruning strategy. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 25
Provided by: jiaw213
Category:

less

Transcript and Presenter's Notes

Title: Pushing Aggregate Constraints by DivideandApproximate


1
Pushing Aggregate Constraints by
Divide-and-Approximate
  • Ke Wang, Yuelong Jiang, Jeffrey Xu Yu,
  • Guozhu Dong and Jiawei Han

2
No Easy to Push Constraints
  • The exists a gap between the interesting
    criterion and the techniques used in mining
    patterns from a large amount of data
  • Anti-monotonicity is too loose as a pruning
    strategy.
  • Anti-monotonicity is too restricted as an
    interesting criterion.
  • Should we design new algorithms to mine those
    patterns that can only be found using
    anti-monotonicity?
  • Mining patterns with general constraints

3
Iceberg-Cube Mining
  • A iceberg-cube mining query select A, B, C,
    count() from R cube by A, B, C having
    count() gt 2
  • Count() gt 2 is an anti-monotone constraint.

4
Iceberg-Cube Mining
R1
  • Another query select A, B, C, sum(M) from R
    cube by A, B, C having sum(M) gt 150
  • sum(M) gt 150 is an anti-monotone constraint,
    when all values in M are positive.
  • sum(M) gt 150 is not an anti-monotone constraint,
    when some values in M are negative.

R2
5
The Main Idea
  • Study Iceberg-Cube Mining
  • Consider f(v) ? s
  • f is a function with SQL-like aggregates and
    arithmetic operators (, -, , /) v is a
    variable s is a constant, and ? is either or
    .
  • Can we push the constraints into iceberg-cube
    mining that are not anti-monotone or monotone? If
    so, what is pushing method that is not specific
    to a particular constraint?
  • Divide-Approximate find a stronger approximate
    for the constraint in a subspace.

6
Some Definitions
  • A relation with many dimensions Di and one or
    more measures Mi.
  • A cell is, didk, from Di, , Dk.
  • Use c as a cell variable
  • Use didk for a cell value (representative)
  • SAT(d1dk) (or SAT(c)) contains all tuples that
    contains all values in d1dk (or c).
  • C is a super-cell of c, or c is a sub-cell of
    c, if c contains all the values in c.
  • Let C be a constraint (f(v) ? s). CUBE(C) denotes
    the set of cells that satisfy C.
  • A constraint C is weaker than C if CUBE(C) ?
    CUBE(C)

7
An Example
  • Iceberg-Cube Mining select A, B, C, sum(M) from
    R cube by A, B, C having sum(M) gt 150
  • sum(c) gt 150 is neither anti-monotone nor
    monotone.
  • Let the space be S ABC, AB, AC, BC, A, B, C
  • Let sum(c) psum(c) nsum(c) gt 150.
  • psum(c) is the profit, and nsum(c) is the cost.
  • Push an anti-monotone approximator
  • Use psum(c) gt 150, and ignore nsum(c).
  • If nsum(c) is large, there are have many false
    positive.
  • Use a min nsum in S psum(c) nsummin(ABC) gt
    150.
  • nsummin(ABC) is the minimum nsum in S.
  • Use a min nsum in a subspace of S (a stronger
    constraint)

8
The Search Strategy (using a lexicographic tree)
0
E
A
C
D
B
AE
AC
AD
DE
AB
CD
CE
BD
BC
BE
ADE
ABC
ACD
CDE
ACE
BCD
BDE
ABD
BCE
ABE
BCDE
ACDE
ABCD
  • A node represents a group-by
  • BUC (BottomUpCube)
  • Partition the database in the depth-first order
    of the lexicographic tree.

ABCE
ABDE
ABCDE
9
Another Example
  • Iceberg-Cube Mining select A, B, C, D, E,
    sum(M) from R cube by A, B, C having sum(M) gt
    200
  • At node ABCDE, sum(12345) psum(12345)
    nsum(12345) 200 250 -50. (fails).
  • Backtracking to ABC, psum(123) nsummin(12345)
    290 - 100 190. (fails)
  • Then, at node ABCE, p1235, must fail.
    Therefore, all tuples, t1235, can be pruned.

10
uk

0
Tree(uk)
A
E
D
B
C
uk
AD
AE
AC
AB
CE
CD
DE
BD
BC
BE
ADE
ABC
ACD
ACE
CDE
BDE
ABD
BCD
ABE
BCE
ABCD
ACDE
ABCE
u0
  • A node in tree(uk) is group-by attributes
  • A cell in tree(uk, p) is group-by values

BCDE
ABCDE
ABDE
u0
  • Find a cell p at u0 fails C, and then extract an
    anti-monotone approximator Cp.
  • Consider an ancestor uk of u0, where u0 is the
    left-most leaf in tree(uk).
  • pu denote p projected onto u (a cell of u).
  • tree(uk, p) pu u is a node in tree(uk).
  • p is the max cell in tree(uk, p) and puk is the
    min cell.
  • In tree(uk, p).
  • If puk fails Cp, all cells in tree(uk, p)
    fails.
  • Note tree(uk, p) ? tree(uk, p) if p ? p.

11
The Pruning

uk
0
Tree(uk)
A
B
C
D
E
AD
AE
AC
AB
DE
BD
BC
BE
CE
CD
ABC
ACD
ADE
ACE
ABD
BCD
CDE
BDE
ABE
BCE
ABCD
ACDE
ABCE
u0
BCDE
ABCDE
ABDE
  • On the backtracking from u0 to uk
  • Check if u0 is on the left-most path in tree(uk)
  • Check if puk can use the same anti-monotone
    approximator as pu0
  • Check if puk fails Cp.
  • If all conditions are met, then
  • For every unexplored child ui of uk, we prune all
    the tuples that match p on tail(ui), because such
    tuples generate only cells in tree(uk, p), which
    fail Cp.
  • tail(u) the set of all dimensions appearing in
    tree(u).

12

0
A
uk
B
C
D
E
ui
AD
AE
AC
AB
DE
BD
BC
BE
CE
CD
uk
ABC
ACD
ADE
ACE
ABD
ui
BCD
CDE
BDE
ABE
BCE
ABCD
ACDE
ABCE
u0
BCDE
ABCDE
ABDE
  • Given a leaf node u0 and a cell p at u0.
  • Let the leftmost path uku0 in tree(uk), k gt 0.
  • p is a pruning anchor wrt (uk,u0).
  • Tree(uk, p) the pruning scope.
  • Suppose that a cell pABCDE fails.
  • On the backtracking from ABCDE to ABC,
  • If conditions are met (pABC fails)
  • Prune tuples such that tABCE pABCE
  • On the backtracking from ABC to AB,
  • If conditions are met (pAB fails)
  • Prune tuples such that tABDE pABDE from
    tree (ABD)
  • Prune tuples such that tABE pABE from
    tree(ABE)

13
The DA Algorithm
  • Modify BUC.
  • Push up a pruning anchor p along the leftmost
    path from u0 to uk.
  • Partition the prunning anchors pushed up to the
    current node, in addition to partitioning the
    tuples

14
With Min-Support
0
E
A
C
D
B
AE
DE
Min-sup 3 sum(M) gt 100
AC
AB
AD
BD
BC
BE
CE
CD
ADE
BDE
BCD
ABC
ACD
CDE
ACE
BCE
ABD
ABE
BCDE
ACDE
ABCD
ABCE
  • Suppose cell abcd is frequent, but cell abcde is
    infrequent. (Shoud stop at abcd)
  • If cell abcd is anchored at node A, cannot prune
    ae, abe, ace, ade in tree(A, abcd).

ABDE
ABCDE
15
Rollback tree
0
B
A
E
C
D
CB
AC
AD
ED
EB
EC
Min-sup 3 sum(M) gt 100
AB
AE
DC
DB
EDC
AEC
AED
ADC
DBC
EBC
ABC
ABE
EBD
ABD
BBCD
AECD
ABCD
ABCE
  • RBtree(AD), RBtree(AC), RBtree(ABD), RBtree(D),
    RBtree(C), and RBtree(B) do not have E.
  • If abcd is anchored at the root, we can prune
    tuples from RBtree(D), RBtree(C), and RBtree(B).

ABED
ABCDE
16
Constraint/Function Monotonicity
  • A constraint C is a-monotone if whenever a cell
    is not in CUBE(C), neither is any super-cell.
  • A constraint C is m-monotone if whenever a cell
    is in CUBE(C), so its every super-cell.
  • A function x(y) is a-monotone wrt y if x
    decreases as y grows (for cell-valued y) or
    increases (for real-valued y).
  • A function x(y) is m-monotone wrt y if x
    increases as y grows (for cell-valued y) or
    increases (for real-values y).
  • An example sum(v) psum(v) nsum(v)
  • sum(v) is m-monotone wrt psum(v)
  • sum(v) is a-monotone wrt nsum(v)

17
Constraint/Function Monotonicity
  • Let a denote m, and m denote a. Let t denote
    either a or m.
  • Example psum(v) s is a-monotone, then psum(v)
    s is m-monotone
  • If psum(c1) s is not held, then psum(c2) s is
    not true, where c2 is a super cell of c1. (say c1
    is a cell of ABC, and c2 is a cell of ABCD)
  • f(v) s is t-monotone if and only if f(v) is
    t-monotone wrt v.
  • f(v) s is t-monotone if and only if f(v) is
    t-monotone wrt v.
  • An example sum(v) psum(v) nsum(v) s.
  • sum(v) s is m-monotone with psum(v), because
    sum(v) is m-monotone wrt psum(v).
  • sum(v) s is a-monotone with nsum(v), because
    sum(v) is a-monotone wrt nsum(v).

18
Find Approximators
  • Consider f(v) s.
  • Divide f(v) s into two groups.
  • A As cell v grows (becomes a super cell), f
    monotonically increases.
  • A- As cell grows (becomes a super cell), f
    monotonically decreases.
  • Consider sum(v) psum(v) nsum(v) s.
  • A nsum(v)
  • A- psum(v)
  • f(A A-/cmin) s and f(A/cmin A-) s are
    m-monotone approximators in a subspace Si, where
    cmin is a min cell instantiation in Si.
  • f(A/cmax A-) s and f(A A-/cmax) s are
    a-monotone approximators in a subspace Si, where
    cmax is a max cell instantiation in Si.
  • sum(nsum/cmax psum) s

19
Separate Monotonicity
  • Consider function rewriting
  • (E1 E2) E into E1 E E2 E.
  • Consider space division
  • divide a space into subspaces, Si.
  • Find approximators using equation rewriting
    techniques for a subspace, Si.

20
Experimental Studies
  • Consider sum(v) psum(v) nsum(v)
  • Three algorithms
  • BUC push only the minimum support.
  • BUC push approximators and mininum support.
  • DA push approximators and minimum support.

21
Vary minimum support
22
Without minimum support
) psum(v) gt sigma
23
Scalability
24
Conclusion
  • General aggregate constraints, rather than only
    well-behaved constraints.
  • SQL-like tuple-based aggregates, rather than
    item-based aggregates.
  • Constraint independent techniques, rather than
    constraint specific techniques
  • A new push strategy divide-and-approximate
Write a Comment
User Comments (0)
About PowerShow.com