Mining for Empty Rectangles in Large Data Sets - PowerPoint PPT Presentation

About This Presentation
Title:

Mining for Empty Rectangles in Large Data Sets

Description:

Construct staircase(x,y) Output all maximal 0-rectangles. with x,y as bottom-right corner ... Staircase(x,y) 11. Constructing Maximal Rectangles. Jarek Gryz: ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 28
Provided by: jarek
Category:

less

Transcript and Presenter's Notes

Title: Mining for Empty Rectangles in Large Data Sets


1
Mining for Empty Rectangles in Large Data Sets
Jeff Edmonds Jarek Gryz Dongming Liang Renee
Miller
2
Matrix representation
?A,B(R
S)
3
Find All Maximal 0-Rectangles
?A,B(R
S)
4
Example
?A,B(R
S)

95 96 97
0
0
0

0

1

BMW Z3

1

0

0

Honda L2

0

0

1

Toyota 6A


First BMW Z3 series cars were made in 1997.
5
Relation to Previous Work
Namaad, Hsu, Lee
Our Work
Lui, Ku, Hsu Orlowski
Problem
Purpose
of maximal 0-rectangles
6
Relation to Previous Work
Namaad, Hsu, Lee
Our Work
Lui, Ku, Hsu Orlowski
Time
Space
7
Relation to Previous Work
Namaad, Hsu, Lee
Our Work
Lui, Ku, Hsu Orlowski
Practical Implementation
Scalable
Practical?
8
Structure of Algorithm
  • loop y 1..Y
  • loop x 1..X
  • Construct staircase(x,y)
  • Output all maximal 0-rectangles
  • with ltx,ygt as bottom-right corner

1
X
Y
1
Timing O(1) amortized time per ltx,ygt
1
  • 0

0
1
1
ltx,ygt
1
9
Structure of Algorithm
  • loop y 1..Y
  • loop x 1..X
  • Construct staircase(x,y)
  • Output all maximal 0-rectangles
  • with ltx,ygt as bottom-right corner

1
X
Y
1
Query Optimization Experimental Results
1
  • 0

0
1
1
ltx,ygt
1
10
Staircase(x,y)
Staircase(x,y)
  • Jarek Gryz

1
Y
1
ltx,ygt
X
11
Constructing Maximal Rectangles
  • Jarek Gryz

ltx,ygt
12
Constructing Maximal Rectangles
  • Jarek Gryz
  • Too Narrow
  • Maximal
  • Too short

ltx,ygt
13
Constructing staircase(x,y)from staircase(x-1,y)
  • Jarek Gryz

1
1
0
Case 1
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
0
ltx-1,ygt
1
0
1
0
0
0
0
14
Constructing staircase(x,y)from staircase(x-1,y)
  • Jarek Gryz

1
Case 2
1
1
1
0
1
0
1
0
0
0
0
1
0
ltx-1,ygt
1
0
1
0
0
0
0
15
Constructing staircase(x,y)from staircase(x-1,y)
  • Jarek Gryz

1
  • Too Narrow
  • Maximal
  • Too short

( x ,y )
r
r
1
1
Y
1
1
0
0
1
0
0
0
0
0
1
0
( x ,y )
1
1
ltx-1,ygt
1
0
( x, y )
1
0
0
0
0
X
16
Constructing x(x,y) y(x,y)
  • Jarek Gryz

1
( x ,y )
r
r
1
1
1
0
1
0
0
1
0
0
0
0
0
1
0
( x ,y )
1
1
ltx-1,ygt
1
0
( x, y )
x(x-1,y)
1
0
0
0
0
17
Constructing x(x,y) y(x,y) from x(x-1,y)
y(x,y-1)
  • Jarek Gryz

1
( x ,y )
r
r
1
y(x,y-1)
1
1
(saved)
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
( x ,y )
1
1
ltx-1,ygt
1
0
( x, y )
x(x-1,y)
1
0
0
0
0
18
Structure of Algorithm
  • loop y 1..Y
  • loop x 1..X
  • Construct staircase(x,y)
  • Output all maximal 0-rectangles
  • with ltx,ygt as bottom-right corner

1
X
Y
1
Timing O(1) amortized time per ltx,ygt
1
  • 0

0
1
1
ltx,ygt
ltx.ygt
1
19
Timing
  • Jarek Gryz

Only work that is not constant Time
Delete
1
  • Too Narrow
  • Maximal
  • Too short

( x ,y )
r
r
1
1
Y
1
1
0
0
0
1
0
0
0
0
0
1
0
( x ,y )
1
1
ltx,ygt
1
0
( x, y )
1
0
0
0
0
X
20
Timing
Amortized of steps deleted (per ltx,ygt)
of steps created (per ltx,ygt) 1
21
Number of Maximal Rectangles
of maximal 0-rectangles
  • O( ( 1s)2 ) Namaad, Hsu, Lee
  • Running time of alg O( 0s )


22
How many empty rectangles are there?
Tests done on 4 pairs of attributes with
numerical domain present in typical joins in a
real-world workload of a health insurance
company.
23
How big are the rectangles?
24
Query rewrite simple case
select from R, S,... where R.CS.C and
60ltR.Alt80 and 20ltS.Blt80 and...
select from R, S,... where R.CS.C and
60ltR.Alt80 and 20ltS.Blt60 and...
25
Query rewrite complex case
select from R, S,... where R.CS.C and
60ltR.Alt80 and 20ltS.Blt80 and...
select from R, S,... where R.CS.C and (
and ) or ( and ) or ( and ) or ...
26
How much do the rectangles overlap with queries?
27
Query optimization experiments
  • real-world workload of 26 queries
  • 5 of the queries qualified for the rewrite
  • only simple rewrites were considered
  • all rewrites led to improved performance
Write a Comment
User Comments (0)
About PowerShow.com