Efficient Computation of the Skyline Cube - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Efficient Computation of the Skyline Cube

Description:

The University of New South Wales & NICTA. Sydney, Australia. Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK) ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 31
Provided by: yid
Category:

less

Transcript and Presenter's Notes

Title: Efficient Computation of the Skyline Cube


1
Efficient Computation of the Skyline Cube
  • Yidong Yuan
  • School of Computer Science Engineering
  • The University of New South Wales NICTA
  • Sydney, Australia
  • Joint Work Xuemin Lin (UNSW), Qing Liu (UNSW),
    Wei Wang (UNSW), Jeffrey Xu Yu
    (CUHK), Qing Zhang (UNSW CSIRO)

2
Outline
  • Introduction
  • Skycube Computation Techniques
  • Experiments
  • Summary

3
Skyline Query
(x1, x2, , xd) (y1, y2, , yd) ? ?i, xi ?
yi ?k, xkltyk
dist
P4
  • A real estate example
  • P5 P1
  • skyline returns data points not dominated by
    others

P3
Skyline on price dist
P1
P5
P2
price
Properties and Values
4
Skyline Cube
Skycube Example
  • Skycube
  • Skyline on price dist age
  • Skyline on price dist
  • Skyline on price age
  • A union of skyline results of all the non-empty
    subsets of d-dimensional set (2d - 1)

Dataset
Lattice Structure of a Skycube
5
Motivation
  • How to compute Skycube efficiently?
  • existing skyline techniques are applicable
  • no sharing computation ? Not efficient!

6
Motivation (cont.)
  • nested-loop-based alg.
  • BNL ICDE 01
  • redundant comparison ? Not efficient!
  • SFS ICDE 03 presort the dataset ? keep the
    candidate list minimum
  • repeated sorting ? Not
    efficient!

B
P4
P3
P1
P5
P2
A
7
Motivation (cont.)
  • divide-and-conquer-based alg. (DC ICDE 01)
  • repeat same divide/merge steps ? Not efficient!

B
BC
P4
P4
P3
P3
Divide Step of Skyline on A, B, and C
Divide Step of Skyline on A and B
P1
P1
P5
P5
P2
P2
A
A
mA
mA
mA
mA
mA
mA
8
Outline
  • Introduction
  • Skycube Computation Techniques
  • Bottom-Up Skycube Algorithm (BUS)
  • Top-Down Skycube Algorithm (TDS)
  • Experiments
  • Summary

9
Property of Skycube
  • Distinct Value Condition
  • no two data points have same value on the same
    dimension
  • SKYU(S) skyline on sub-dimension set U
  • SKYU(S) ? SKYV(S) ? U ? V
  • General Case
  • Keep track of the bad guys

10
Basic Idea
  • compute the Skycube in a level-wise and
    bottom-up manner
  • each skyline is computed by a nested-loop-based
    algorithm

ABC
AB
AC
BC
A
B
C
11
Sharing Strategies
  • share-results SKYU(S) ? SKYV(S)
  • reduce the size of input
  • reduce the of dominance test
  • share-sorting sort the dataset on each dimension
  • keep the candidate list minimum
  • reduce the of sorting from 2d 1 to d

AB
A
B
12
Filtering
  • Effective Dominance Test
  • filter function ??p? sum of ps coordinates
  • no false negative ??p? ? ??q? ? q does not
    dominate p
  • maintain the candidate list in a non-decreasing
    order of filtering values (e.g. avl-tree)

Skyline on A and B
B
P4
P3
P1
P5
P2
A
13
DC Algorithm
Merge Step
Divide Step
S12
S22
B
B
S1
S2
B
P4
P4
P3
P3
P3
P1
P1
P1
mB
P5
P5
P5
P2
P2
P2
S11
S21
mA
mA
A
A
A
14
Sharing Opportunities
  • share-partitioning

S1
S1
S2
S2
B
BC
P4
P4
skyline on A and B
skyline on A, B, and C
P3
P3
P1
P1
P5
P5
P2
P2
A
A
mA
mA
mi
mj




mi
mj




15
Sharing Opportunities (cont.)
S1
S2
S1
S2
BC
B
P4
P3
P3
P1
P1
  • share-merging

P5
P5
P2
P2
A
A
mA
mA
decompose merge step
skyline on A and B
skyline on A, B, and C
16
TDS Algorithm
ABC
  • Basic Idea
  • compute skylines on a path simultaneously
  • find a minimal set of paths
  • share-parent using parents skyline result as
    the input

AB
A
S
ABC
SKYABC(S)
SKYABC(S)
ABC
AB
AC
BC
BC
AC
AB
A
B
A
B
C
C
17
Outline
  • Introduction
  • Skycube Computation Techniques
  • Experiments
  • Summary

18
Experiment Setting
19
Effect of Dimensionality
independent
Dimensionality (n 500k)
20
Effect of Dimensionality (cont.)
correlated
anti-correlated
Dimensionality (n 500k)
Dimensionality (n 500k)
21
Effect of Cardinality
anti-correlated
x100K
Cardinality (d 8)
22
Effect of Duplicate Values
independent (d 8)
23
Outline
  • Introduction
  • Skycube Computation Techniques
  • Experiments
  • Summary

24
Summary
  • A novel concept Skycube
  • Skycube computation Techniques
  • Bottom-Up Skycube algorithm
  • share-results, share-sorting
  • Top-Down Skycube algorithm
  • share-partition-and-merging, share-parent
  • Future Work
  • I/O based techniques
  • multiple skyline queries

25
QA
  • Thank you.

26
Preliminaries
  • Existing Skyline Computation Algorithms
  • nested-loop-based
  • Block-Nested-Loop (BNL) algorithm BKS, ICDE 01
  • Sort-Filter-Skyline (SFS) algorithm CGG, ICDE
    03
  • divide-and-conquer-based
  • Divide-and-Conquer (DC) algorithm BKS, ICDE 01
  • index-based
  • Bitmap, Index-Method TEO, VLDB 01
  • R-tree Index Based KRR, VLDB 02 PTF, SIGMOD 03

27
Preliminaries BNL and SFS Algorithms
  • BNL algorithm
  • SFS algorithm
  • entropy value (indicator of the dominance power)
  • pre-sort the dataset (e.g., P5, P2, P3, P1, P4)

28
Preliminaries DC Algorithm
Merge Step
Divide Step
S12
S22
S1
S2
B
P4
P3
P1
mB
P5
P2
S11
S21
mA
mA
A
29
General Case
  • Issue SKYU(S) ? SKYV(S) does not necessarily
    hold
  • Solution
  • share-results re-examine SKYU(S) on V

SKYB(S) P3, P4, P5 SKYAB(S) P3
30
Motivation (cont.)
  • other techniques
  • Index method VLDB 01
  • R-tree based index VLDB 02 SIGMOD 03
  • Goal
  • Maximizing sharing computation!

pre-computation (e.g. index) is not reusable
repeat pre-computation
?
Not efficient!
?
Write a Comment
User Comments (0)
About PowerShow.com