CSIS 7101: Spatial Data (Part 1) The R*-tree: An Efficient and Robust Access Method for Points and Rectangles - PowerPoint PPT Presentation

About This Presentation
Title:

CSIS 7101: Spatial Data (Part 1) The R*-tree: An Efficient and Robust Access Method for Points and Rectangles

Description:

May search more than one sub-tree (why?) Try to search a rectangle S. Search (S) ... Nodes in tree should be filled as much as possible ... R*-tree Insertion (cont. ... – PowerPoint PPT presentation

Number of Views:361
Avg rating:3.0/5.0
Slides: 25
Provided by: Eri6
Category:

less

Transcript and Presenter's Notes

Title: CSIS 7101: Spatial Data (Part 1) The R*-tree: An Efficient and Robust Access Method for Points and Rectangles


1
CSIS 7101Spatial Data (Part 1)The R-treeAn
Efficient and Robust Access Method for Points and
Rectangles
  • Rollo Chan
  • Chu Chung Man
  • Mak Wai Yip
  • Vivian Lee
  • Eric Lo
  • Sindy Shou
  • Hugh Wang

2
Spatial Access Method (SAM)
  • Handle spatial data efficiently
  • Query
  • Build Index
  • Retrieve data item from a database system quickly
  • Dynamic ? Update
  • Why not use B-tree?
  • 1 dimensional
  • Designed for multi-dimensional points
  • E.g. 2D for Map

3
R-tree and R-tree
  • R-tree Guttman84
  • R-tree Beckmann90
  • Height-balanced tree (Similar to B-tree)
  • Leaf-nodes has format
  • ltI, tuple-identifiergt
  • I is the Minimum Bounding Rectangle of a spatial
    object
  • Tuple-identifier ? id to retreive the spatial
    object in the database (name, address, etc)

4
The Spatial Data
Minimum Bounding Box
5
R-tree and R-tree properties
  • Leaf ltI, tuple-identifiergt
  • Non-leafltI, child-pointergt
  • I covers allrectangles in the children nodes
    entries
  • Parameters
  • M (max no of entries per node)
  • m (min no of entries per node)
  • m lt M/2
  • Root has at least two children
  • All leaves in same level
  • 1 node ? 1 disk page (minimize no. of I/O)

6
Outline
  • Introduction
  • Motivation
  • R-tree and R-tree structure
  • Searching of R-tree
  • Construction of R-tree
  • Conclusions
  • References

7
Searching
  • May search more than one sub-tree (why?)
  • Try to search a rectangle S
  • Search (S)
  • Search from root
  • Find all index records overlap with S
  • If not a leaf, check overlap, if yes ? Search
    (subTree)
  • Else it is a leaf, check all entries in that leaf
    which entries overlap with S

8
Searching examples
9
Spatial Data
  • Introduction
  • Motivation
  • R-tree and R-tree structure
  • Searching of R-tree
  • Construction of R-tree
  • Conclusions
  • References

10
R-tree
  • Optimization Criteria
  • Minimize the area covered by an index rectangle
  • Minimize overlap between bounding rectangles
  • Minimizes the number of paths to be traversed
  • Minimize the margin of a directory rectangle
  • Creates less overlap, using same amount of area
  • Allows for better, more structured clustering
  • Optimize the storage utilization
  • Nodes in tree should be filled as much as
    possible
  • Sometimes it is impossible to optimize all the
    above criteria at the same time!

11
R-tree Insertion
  • To insert a new entry, you need to choose which
    leaf entry to insert
  • ChooseSubTree Select a leaf in which to place a
    new index entry E
  • Start from Root
  • If non-leaf node (children are leaves), choose
    the node using the following criteria
  • 1)Least overlap enlargement
  • 2)Least area enlargement
  • 3)Smaller area
  • If non-leaf node (children are not leaves), use 1
    and 2
  • Invoke ChooseSubTree recursively
  • If leaf, return this node to be inserted

12
Splitting Node
  • How about if a new entry E going to add to a node
    N which is full?
  • Split the full node?
  • Reinserted?
  • How to split?
  • Determine the axis
  • Distribute the entries into 2 groups along that
    axis
  • Distribution may not evenly distributed!

13
1. Determine the axis (M1) entries
  • For each axis (i.e. x and y axis)
  • sort entries by the lower value, then by upper
    value
  • E.g. X axis, sort by lower value, then generate
    M-2m2 3 distributions (M3, m1)
  • kst distribution (m-1)k the rest
  • E.g. 2nd distribution (1-1)2 E1 E2 E3 E4
  • 3rd distribution (1-1)3 E1 E2 E3 E4

14
1. Determining split axis (cont.)
  • Compute S ? sum of all margin-value of all (1,
    2 M-2m2) distributions
  • Margin-value perimeters of rectangles
  • Choose the axis with lower S
  • E.g. the S of 6 x-axis distributions (3 for lower
    value, 3 for higher-value) lt that of y-axis
  • Return x-axis as splitting axis

15
2. Distribute entries along axis
  • How to split?
  • Determine the axis
  • Distribute the entries into 2 groups along that
    axis
  • Distribution may not evenly distributed!
  • Along that axis, choose the distribution (out of
    3) that with minimum overlap-value
  • Overlap-value arearect(group1)
    arearect(group2)

16
Who call split? R-tree Insertion
  • Algorithm Insert Add a new entry into the level
    specified
  • Begin
  • End

Find appropriate node Invoke ChooseSubtree to find node N in which to place the new entry E.
2. Check for space in node to insert entry If N has less entries then M, insert E. Else
3. Split or Reinsert Invoke OverflowTreatment
4. Propagate changes upward If a split was performed, propagate upward. If a split of root node occurred, Create new root.
5. Adjust covering rectangles Adjust all rectangles in the insertion path to be minimum bounding box.
17
R-tree Insertion (cont.)
  • Algorithm OverflowTreatment Determine whether to
    split the current node or try reinsertion.
  • Begin
  • End

Check condition If level is not root level and this is the first call of OverflowTreatment in the given level during the insertion of one data rectangle,
2. Do Reinsert Invoke ReInsert Else
3. Do Split Invoke Split
18
R-tree Insertion (cont.)
  • Algorithm ReInsert.
  • Begin
  • End

Compute Distance For all M1 entries of a node N, compute the distance between the centers of their rectangles and the center of the bounding rectangle of N.
2.Sort entries Sort entries in decreasing order of their distances computed in 2.
3.Remove entries Remove the first p entries from N and adjust bounding rectangle
4.Reinsert entries Invoke Insert starting with maximum or minimum distance as defined in 3.
19
R-tree Split Example
  • R-tree R-tree
  • Quadratic Split m 40
  • m 40

20
R-tree
  • Forced Reinsert
  • When R-tree node p overflows, instead of
    splitting p immediately, try to see if some
    entries in p could possibly fit better in another
    node
  • As splitting only contribute to local
    re-organization of the directory rectangles
  • Reinsert increase slightly the construction time,
    BUT resulting less overlap ? improve query
    response time
  • Remove 30 (p) yield best performance

21
Performance Comparison
  • Using forced reinsert increases storage
    efficiency, decreases overlap, causes fewer
    spits, and makes rectangles more quadratic
    (square).
  • CPU cost is higher when implementing forced
    reinsert, but due to fewer splits, the increase
    in disk access for insertions is only 4 (remains
    the lowest of all R-tree variants)!

22
Outline
  • Introduction
  • Motivation
  • R-tree and R-tree structure
  • Searching of R-tree
  • Construction of R-tree
  • Conclusions
  • References

23
Conclusions
  • R-trees performs significantly better than the
    other R-tree variants.
  • It is the most robust of the trees requires
    less disk access
  • Gain is higher for smaller rectangles because
    strong utilization is more important for larger
    query rectangles
  • 400 gain over Linear, 180 gain over Quadratic
    split in R-tree
  • The best storage utilization
  • Even with forced reinsertion, insertion cost is
    decreased, due to fewer splits
  • Spatial join has the highest gain

24
References
  • Guttman,A., R-Trees A Dynamic Index Structure
    for Spatial Searching, Proceedings, ACM SIGMOD,
    pp47-57, June 1984.
  • Beckmann, N., Kriegel, H.P., Schneider, R.,
    Seeger, B. The R-Tree An Efficient and Robust
    Access Method for Points and Rectangles,Proceedin
    gs, ACM SIGMOD International Conferences on
    Management of Data, May 23-25, 1990.
Write a Comment
User Comments (0)
About PowerShow.com