Title: An Optimal and Progressive Algorithm for Skyline Queries
1An Optimal and Progressive Algorithm for Skyline
Queries
- Presenter Jongwuk Lee
- Information Database Systems Lab.
- POSTECH
2Contents
- Introduction
- Skyline queries
- Existing solutions
- Motivation
- Algorithms BBS
- Other discussions
3Finding the Cheapest Closest Hotels
price
- Which one is better?
- i and h?
- i, because its price and distance dominate those
of h. - i and k?
- I do not know.
y
10
b
e
9
a
c
8
7
d
6
g
f
5
l
h
n
4
3
2
i
k
m
1
x
o
1
2
3
4
5
6
7
8
9
10
distance
4Skyline Objects
- A set of objects not dominated by any other
objects. - Dominating region
5Existing Solutions
- Block Nested Loop (BNL)
- Divide-and-Conquer (DC)
- Bitmap method
- Index method
- Nearest Neighbor (NN)
Elementary skyline algorithms
Progressive skyline algorithms
6Existing Solutions
- Block Nested Loop (BNL)
- Scan the dataset and keep a list of candidate
skyline points. - Compare a point p with every other point in the
list. - Advantages
- Wide applicability
- Disadvantages
- Numerous comparisons, inadequacy for on-line
processing
7Existing Solutions
- Divide-and-Conquer (DC)
- Divide the dataset into several partitions.
- Compute partial skylines in each partition.
- Compute global skylines by merging them.
8Existing Solutions
- Nearest Neighbor (NN)
- Find nearest neighbor point.
- Divide the space by the nearest neighbor point.
- Compute recursively until empty space.
1
2
3
4
5
6
7
8
9
10
9Existing Solutions
- NN over three or more dimensions
- Has overlapped partitions in divided subspaces.
- Needs duplicate elimination.
NN partitions for 3 dimensions
10Motivation
- Advantages of NN algorithm
- Fast running time to finding the first result
- Progressiveness
- Disadvantages of NN algorithm
- Redundant I/O computation
- Explosive to-do list size
- Goal Improve NN algorithm and offer useful
variations.
Do you think there exists more efficient and
useful skyline algorithm?
11Contents
- Introduction
- Algorithms BBS
- Preliminary R-Tree
- How BBS works on
- Example
- Other Discussions
12R-Tree Clustering by Proximity
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
13R-Tree
14R-Tree
15Branched and Bound Skyline (BBS)
- Assume all points are indexed in an R-tree.
- Top-down Approach
- mindist the L1 distance between its lower-left
corner and the origin.
f(x, y) x y
16Branched and Bound Skyline (BBS)
- Data structure
- Heap by min distance
- List to maintain the current skyline
- Dominance check condition
- Before expanding or inserting, compare to current
skylines. - Before inserting an object, also check for
internal objects. - Stop condition empty heap
17Example of BBS
- Each heap entry keeps the mindist of the MBR.
access root
18Example of BBS
- Process entries in ascending order of their
mindists.
access root
expand e7
19Example of BBS
access root
expand e7
expand e3
20Example of BBS
access root
expand e7
expand e3
remove e6
21Example of BBS
access root
expand e7
expand e3
remove e6
remove e5
22Example of BBS
access root
expand e7
expand e3
remove e6
remove e5
expand e1
23Example of BBS
access root
expand e7
expand e3
remove e6
remove e5
expand e1
expand e4
24Contents
- Introduction
- Algorithms BBS
- Other Discussions
- Constrained skyline queries
- K-dominating queries
25Constrained Skyline Queries
26Constrained Skyline Queries
27Constrained Skyline Queries
28Constrained Skyline Queries
29Constrained Skyline Queries
30K-dominating Queries
- Retrieve 3 points that dominate the largest
number of other points.
h and m may dominate at most 7 points. (num(i)
2)
num(i) 9, num(a)2, num(k)2
3-dominating result i
3-dominating result i
31K-dominating Queries
num(h) 7, num(m)5, num(a)2, num(k)2
c and g may dominate at most 5 points. (num(h)
2)
3-dominating result i, h, m
2-dominating result i, h
32Thats it.
QA
Thank you!