Handling WorstCase in Skyline - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Handling WorstCase in Skyline

Description:

... in Skyline. Introduction (Hotel Example) 2000. 20. Leela ... Hilton. 1000. 30. Meridien. 5000. 15. Sheraton. 3000. 25. Taj. Price(Rs.) Distance. Stars. Hotel ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 21
Provided by: romil5
Category:

less

Transcript and Presenter's Notes

Title: Handling WorstCase in Skyline


1
Handling Worst-Case in Skyline
- Romil Jain
2
Introduction (Hotel Example)
Query Find hotels that are best on stars,
distance, price
3
Formal Definition
In simple words, a Skyline is the set of all
non-dominated tuples
T set of tuples n Tuples k dimensions (or
columns) for each tuple ti value of tuple t on
dimension i
t, t? T, t dominates t iff ? i ? 1..k, ti ?
ti ? ? i ? 1..k, ti ? ti
The skyline of T is t ? T ? t ? T, t
dominates t
4
Previous ResultsGSG07
Other prominent algorithms FLET, LDC, SDC,
BBS, NN
5
Research Motivation
  • Create a skyline algorithm that
  • Improves worst-case complexity
  • Is external (i.e., with low I/O cost)

Vector, point, tuple all mean the same thing
6
SFS CGGL03
y
P
x
  • Basic approach
  • Sort input with some topographical function F
    (e.g. volume)
  • Compare all points with those with highest F
    using a window.

7
Z-Order LZLL07
  • Z-address Interleave bits of all values. E.g.,
    lt1,6gt ? 010110
  • Index-based algorithm
  • Assign a Z-address to all vectors, store them in
    a B tree (SRC).
  • Maintain another B tree for storing skylines
    (SKY).
  • Compare points from SRC with points from SKY.
    Update SKY.

8
Key Lessons
  • Scan-based algorithms use less I/O.
  • Sorting with topographical function F, and
    comparing input with the vectors with highest F,
    can eliminate large number of vectors.
  • Pair-wise comparisons can be reduced by
    partitioning input into regions, and comparing
    regions first.

9
A Framework
y
  • Partition the points into cells according to
    some strategy.
  • Compare cells mutually to eliminate dominated
    cells
  • Compare points from only dependent cells

x
Key idea is that number of cells is much lower
than number of points.
10
Contributions
  • Worked on five different heuristics
  • Sorts
  • Pivot
  • Lattice
  • Cubes
  • Spider
  • For simplicity, assume that points are
  • distinct on any dimension
  • uniformly distributed on any dimension
  • normalized (i.e., values ranging from 1n)

11
Lattice
  • Choose Best-Low as a pivot.
  • Best-Low is very effective if its close to
  • lt n, n, ..., n gt
  • lt n - n/k , n - n/k , ..., n - n/k gt
  • lt n/k , n/k , ..., n/k gt
  • Requires a dependency table
  • Under UI, Best-Low is estimated to be close
    to
  • ltb, b, ..., b gt, b n(1 1/n1/k)
  • Best-Low is the point whose lowest value in any
    dimension is the highest among the lowest values
    of all the points.
  • With strict assumptions, Lattice is O(n1.58) in
    worst-case.
  • tbdl ? n/k

12
Spider
  • Very similar to DDC
  • Apply a modified form of SFS
  • Split the points into cells
  • Compare cells to eliminate dominated cells.
  • Solve each cell individually.
  • Compare points from comparable cells by
    reapplying Spider over reduced dimensions.
  • Split factor f 2
  • With strict assumptions, Lattice is O(n1.58) in
    worst-case.

13
Results (1)
14
Results (2)
15
Results (3)
16
Results (4)
17
Conclusions
  • Created an algorithm that attempts to
    incorporate best features of other algorithms.
  • An EF window, similar to LESS, is used to solve
    the best-case and average case efficiently.
  • A divide-and-conquer technique, similar to DDC,
    is used to solve the worst case efficiently..
  • Partitioning, similar to Z-order, to reduce
    pair-wise comparisons.
  • A scan-based approach which is conducive for
    externalization.

18
Future Work
  • Determine the exact cause of Spiders failure in
    higher number of dimensions and fix it if
    possible.
  • Conduct an experimental analysis for the Lattice
    algorithm.
  • Come up with more reliable theoretical analysis
    of Lattice and Spider.

19
References(1)
  • BCL90 J.L. Bentley, K.L. Clarkson and D.B.
    Levine. Fast Linear Expected-time Algorithms for
    Computing Maxima and Convex Hulls. In Proceedings
    of the 1st Annual ACM-SIAM Symposium on Discrete
    Algorithms (SODA), pages 179187. 1990.
  • BKS01 Stephan Borzsonyi, Donald Kossmann and
    Konrad Stocker. The Skyline Operator. In
    Proceedings of the 17th International Conference
    on Data Engineering, pages 421430. 2001.
  • BKST78 J.L Bentley, H.T. Kung, M. Schkolnick
    and C.D. Thompson. On the Average Number of
    Maxima in a Set of Vectors and Applications. In
    Journal of the Association for Computing
    Machinery (ACM), 25(4)pages 536543, 1978.
  • Buc89 C. Buchta. On the Average Number of
    Maxima in a Set of Vectors. In Information
    Processing Letters, 33pages 6365, 1989.
  • CGGL03 Jan Chomicki, Parke Godfrey, Jarek Gryz
    and Dongming Liang. Skyline with Presorting. In
    Proceedings of the 19th International Conference
    on Data Engineering (ICDE), pages 717719.
    Bangalore, India, 2003.
  • CGGL05 Jan Chomicki, Parke Godfrey, Jarek Gryz
    and Dongming Liang. Skyline with Presorting
    Theory and Optimization. In Proceedings of the
    Intelligent Information Systems Conference (IIS)
    New Trends in Intelligent Information Processing
    and Web Mining, pages 593602. Gdansk, Poland,
    2005.
  • God04 Parke Godfrey. Skyline Cardinality for
    Relational Processing. In Proceedings of the 3rd
    International Symposium on Foundations of
    Information and Knowledge Systems, pages 7897.
    Springer, Wilhelminenberg Castle, Austria, 2004.

Continued
20
References(2)
  • God04 Parke Godfrey. Skyline Cardinality for
    Relational Processing. In Proceedings of the 3rd
    International Symposium on Foundations of
    Information and Knowledge Systems, pages 7897.
    Springer, Wilhelminenberg Castle, Austria, 2004.
  • KLP75 H. T. Kung, F. Luccio and F. P.
    Preparata. On Finding the Maxima of a Set of
    Vectors. In Journal of the Association for
    Computing Machinery (ACM), 22(4)pages 469476,
    1975.
  • KRR02 D. Kossmann, F. Ramsak and S. Rost.
    Shooting Stars in the Sky an Online Algorithm
    for Skyline Queries. In Very Large Data Bases
    Conference (VLDB), pages 275286. 2002.
  • LZLL07 Ken C. K. Lee, Baihua Zheng, Huajing Li
    and Wang-Chien Lee. Approaching the Skyline in Z
    Order. In Proceedings of the 33rd International
    Conference on Very Large Data Bases (VLDB), pages
    279290. Vienna, Austria, 2007.
  • PTFS05 D. Papadias, Y. Tao, G. Fu and B.
    Seeger. Progressive Skyline Computation in
    Database Systems. In Association for Computing
    Machinery (ACM) TODS, 30(1)pages 4182, 2005.
  • TC02 Riccardo Torlone and Paolo Ciaccia. Which
    Are My Preferred Items? In Workshop on
    Recommendation and Personalization in eCommerce,
    pages 19. Malaga, Spain, 2002.
Write a Comment
User Comments (0)
About PowerShow.com