SUBSKY: Efficient Computation of Skylines in Subspaces - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

SUBSKY: Efficient Computation of Skylines in Subspaces

Description:

Branch-and-Bound Skyline (BBS) state-of-the-art index-based method. utilize ... measure the number of page accesses on the B-tree (SUBSKY) and the R-tree (BBS) ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 32
Provided by: XiaoXi
Category:

less

Transcript and Presenter's Notes

Title: SUBSKY: Efficient Computation of Skylines in Subspaces


1
SUBSKY Efficient Computation of Skylines in
Subspaces
  • Yufei Tao City University of Hong Kong
  • Xiaokui Xiao City University of Hong Kong
  • Jian Pei Simon Fraser University

2
Outline
  • Motivation
  • Existing solutions
  • Our technique
  • Experiments
  • Conclusion

3
Skyline
  • Dominate p1 dominates p2.

4
Motivation
  • The housing relation may contain many other
    attributes
  • size
  • distances to the nearest super-market / school /
    subway
  • security / pollution / traffic ratings of the
    neighborhood

5
Motivation (cont.)
  • A user wants to retrieve the skyline only in a
    subspace involving a small number of dimensions.
  • price and size
  • price, distance to the supermarket
  • price, security rating
  • Why small?
  • Skylines are increasingly meaningless as the
    dimensionality grows.

6
Existing Solutions
  • Non index-based
  • require scanning the dataset at least once.
  • E.g. Block-nested-loop
  • Index-based
  • much lower query time.
  • State-of-the-art BBS Papadias, Tao, Fu,
    Seeger, ACM TODS 2005

7
Our Solution SUBSKY
  • Supports any subspace with a single B-tree.
  • Can be implemented in relational databases.

8
Basic SUBSKY
  • Optimized for uniform data.
  • Without loss of generality, assume that each
    dimension has a domain 0, 1.
  • Define the maximal corner as the point which have
    coordinate 1 on all dimensions
  • We convert a d-dimensional point p to a single
    value f(p)

9
Basic Property
10
Lemma 1
  • An example
  • d 3
  • SUB 1, 2, psky (0.05, 0.1, --)
  • f(p)
  • e.g., p 0.2, 0.3, 0.25

11
Algorithm
  • Find the skyline in SUB 1, 2
  • Access the points in descending order of their
    f(p).
  • p3, p4, p5, p1, p6, p2, p8, p7
  • Skyline

12
Algorithm (cont.)
  • p4, p5, p1, p6, p2, p8, p7
  • Skyline p3
  • Threshold 0.5

13
Algorithm (cont.)
  • p5, p1, p6, p2, p8, p7
  • Skyline p3, p4
  • Threshold 0.5

14
Algorithm (cont.)
  • p1, p6, p2, p8, p7
  • Skyline p3, p4, p5
  • Threshold 0.5

15
Algorithm (cont.)
  • p6, p2, p8, p7
  • Skyline p1, p4, p5
  • Threshold 0.8

16
Algorithm (cont.)
  • p2, p8, p7
  • Skyline p1, p4, p5
  • Threshold 0.8

17
Why Does it Work?
  • Assume a 15D dataset with 100k points.
  • We want to compute the subspace skyline on two
    dimensions.
  • There is a high chance that a skylinepoint lies
    very close to the origin inthe subspace.
  • For ? 0.001, the probability 90.
  • Such a point prunes points lying ina
    15-dimensional square with length0.999.
  • I.e., 0.99915 98.5 of the dataset!

18
SUBSKY for Arbitrary Distribution
  • Anchor

19
Assigning a Point to an Anchor
20
Selecting the Anchors
21
Index Structure
  • Use a single B-tree
  • Each object is indexed with a composite key ( j,
    f(p) )

22
Experiments
  • Real datasets
  • NBA, 13d, 17k
  • Household, 6d, 127k
  • Color, 9d, 68k
  • Synthetic data
  • Uniform, clustered
  • Page size 4k bytes

23
Experiments (cont.)
  • Competitor
  • Branch-and-Bound Skyline (BBS)
  • state-of-the-art index-based method
  • utilize an R-tree
  • page size 4k bytes
  • Queries
  • 100 random subspace queries
  • measure the number of page accesses on the B-tree
    (SUBSKY) and the R-tree (BBS)

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Cost vs. Cardinality
29
Cost vs. Full Space Dimensionality
30
Conclusion
  • Subspace skyline computation is often more
    important than retrieval in the full space.
  • We developed a fast relational approach for
    processing subspace queries.

31
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com