R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

Description:

Template design: Polly M., Silver Fox Productions, Inc. Formatter: Event Date: Event Location: Speech Length: Audience: Key Topics: – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 35
Provided by: VishalT8
Category:

less

Transcript and Presenter's Notes

Title: R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.


1
R-TREES A Dynamic Index Structure for Spatial
Searchingby A. Guttman, SIGMOD 1984.
  • Shahram Ghandeharizadeh
  • Computer Science Department
  • University of Southern California

2
Motivating Example
  • Type in your street address in Google

3
Example (Cont)
  • Show me all the pizza places close by

4
Terminology
  • Example query is termed a spatial query.
  • R-tree is a spatial index structure.
  • K-D-B trees are useful for point data only.
  • Exact-point lookup!
  • Show me the USC Salvatory Computer Science
    building.
  • R-tree represents data objects in intervals in
    several dimensions.
  • Exact-point and range lookups!
  • Show me all Pizza places in a 2 mile radius of
    USC Salvatory Computer Science building.
  • R-tree is
  • A height-balanced tree similar to B-tree with
    index records in its leaf nodes containing
    pointers to data objects.
  • A node is a disk page.
  • Assumes each tuple has a unique identifier, RID.

5
R-Tree Leaf Nodes
  • Leaf nodes contain index records
  • (I, tuple-identifier)
  • tuple-identifier is RID,
  • I is an n-dimensional rectangle that bounds the
    indexed spatial object
  • I (I0, I1, , In-1) where n is the number of
    dimensions.
  • Ii is a closed bounded interval a,b describing
    the extent of the object along dimension i.
  • Values for a and b might be infinity, indicating
    an unbounded object along dimension i.

6
R-Tree Non-leaf nodes
  • Non-leaf nodes contain entries of the form
  • (I, child-pointer)
  • Child-pointer is the address of a lower node in
    the R-Tree.
  • I covers all rectangles in the lower nodes
    entries.

7
R-Tree A 2-D (n2) Example
8
R-Tree Non-leaf nodes
  • Non-leaf nodes contain entries of the form
  • (I, child-pointer)
  • Child-pointer is the address of a lower node in
    the R-Tree.
  • I covers all rectangles in the lower nodes
    entries.
  • Questions?

9
R-Tree Non-leaf nodes
  • Non-leaf nodes contain entries of the form
  • (I, child-pointer)
  • Child-pointer is the address of a lower node in
    the R-Tree.
  • I covers all rectangles in the lower nodes
    entries.
  • Questions?

What is this?
10
R-Tree Non-leaf nodes
  • Non-leaf nodes contain entries of the form
  • (I, child-pointer)
  • Child-pointer is the address of a lower node in
    the R-Tree.
  • I covers all rectangles in the lower nodes
    entries.
  • Questions?

Disk Page address!
11
R-Tree Non-leaf nodes
  • Non-leaf nodes contain entries of the form
  • (I, child-pointer)
  • Child-pointer is the address of a lower node in
    the R-Tree.
  • I covers all rectangles in the lower nodes
    entries.
  • Questions?

How about this? What is it?
12
R-Tree Non-leaf nodes
  • Non-leaf nodes contain entries of the form
  • (I, child-pointer)
  • Child-pointer is the address of a lower node in
    the R-Tree.
  • I covers all rectangles in the lower nodes
    entries.
  • Questions?

An n dimensional rectangle I (I0, I1, , In-1)
13
R-tree Properties
  • Assume
  • M Maximum number of entries in a node.
  • m lt M/2
  • N Number of records
  • R-tree has the following properties
  • Every leaf node contains between m and M index
    records. Root node is the exception.
  • For each index record (I, tuple-identifier) in a
    leaf node, I is the smallest rectangle that
    spatially contains the n dimensional data object
    represented in the indicated tuple.
  • Every non-leaf node has between m and M children.
    Root node is the exception.
  • For each entry (I, child-pointer) in a non-leaf
    node, I is the smallest rectangle that spatially
    contains the rectangles in the child node.
  • The root node has at least two children unless it
    is a leaf.
  • All leaves appear on the same level.
  • Height of a tree Ceiling(logmN)-1.
  • Worst case utilization for all nodes except the
    root is m/M.

14
Searching
  • Descend from root to leaf in a B-tree manner.
  • If multiple sub-trees contain the point of
    interest then follow all.
  • Assume
  • EI denotes the rectangle part of an index entry
    E,
  • Ep denotes the tuple-identifier or child-pointer.
  • Search (T Root of the R-tree, S Search
    Rectangle)
  • If T is not a leaf, check each entry E to
    determine whether EI overlaps S. For all
    overlapping entries, invoke Search(Ep, S).
  • If T is a leaf, check all entries E to determine
    whether EI overlaps S. If so, E is a qualifying
    record.

15
Insertion
  • Similar to B-trees, new index records are added
    to the leaves, nodes that overflow are split, and
    splits propagate up the tree.
  • Insert (T Root of the R-tree, E new index
    entry)
  • Find position for new record Invoke ChooseLeaf
    to select a leaf node L in which to place E.
  • Add record to leaf node If L has room for E
    then insert E and return. Otherwise, invoke
    SplitNode to obtain L and LL containing E and all
    the old entries of L.
  • Propagate changes upwards Invoke AdjustTree on
    L, also passing LL if a split was performed.
  • Grow tree taller If node split propagation
    caused the root to split, create a new root whose
    children are the two resulting nodes.

16
Insertion ChooseLeaf
  • ChooseLeaf (E new index entry)
  • Initialize Set N to be the root node,
  • Leaf check If N is a leaf, return N.
  • Choose subtree Let F be the entry in N whose
    rectangle FI needs least enlargement to include
    E. Resolve ties by choosing the entry with the
    rectangle of smallest area.
  • Descend until a leaf is reached Set N to be the
    child node pointed to by Fp and repeat from step
    2.

17
SplitNode Node Splitting
  • A full node contains M entries. Divide the
    collection of M1 entries between 2 nodes.
  • Objective Make it as unlikely as possible for
    the resulting two new nodes to be examined on
    subsequent searches.
  • Heuristic The total area of two covering
    rectangles after a split should be minimized.

Total area is larger!
18
SplitNode Node Splitting
  • A full node contains M entries. Divide the
    collection of M1 entries between 2 nodes.
  • Objective Make it as unlikely as possible for
    the resulting two new nodes to be examined on
    subsequent searches.
  • Heuristic The total area of two covering
    rectangles after a split should be minimized.

Total area is larger!
19
Node Splitting How?
  • How to find the minimum area node split?
  • Exhaustive algorithm,
  • Quadratic-cost algorithm,
  • Linear cost algorithm.

20
Exhaustive Algorithm
  • Generate all possible groups and choose the best
    with minimum area.
  • Number of possibilities 2 to power of M-1
  • M 50 ? Number of possibilities 600 Trillion

21
Exhaustive Algorithm
  • Generate all possible groups and choose the best
    with minimum area.
  • Number of possibilities 2 to power of M-1
  • M 50 ? Number of possibilities 600 Trillion
  • US deficit pales!

22
Quadratic-Cost algorithm
  • A heuristic to find a small-area split.
  • Cost is quadratic in M and linear in the number
    of dimensions.
  • Pick two of the M1 entries to be the first
    elements of the two new groups.
  • Choose these in a manner to waste the most area
    if both were put in the same group.
  • Assign remaining entries to groups one at a time.

23
Quadratic-Cost algorithm
  • A heuristic to find a small-area split.
  • Cost is quadratic in M and linear in the number
    of dimensions.
  • Pick two of the M1 entries to be the first
    elements of the two new groups.
  • Choose these in a manner to waste the most area
    if both were put in the same group.
  • Assign remaining entries to groups one at a time.

24
Quadratic-Cost algorithm
  • A heuristic to find a small-area split.
  • Cost is quadratic in M and linear in the number
    of dimensions.
  • Pick two of the M1 entries to be the first
    elements of the two new groups.
  • Choose these in a manner to waste the most area
    if both were put in the same group.
  • Assign remaining entries to groups one at a time.

25
Linear Cost Algorithm
  • Identical to Quadratic with the following
    differences
  • Uses a different version of PickSeeds.
  • PickNext simply chooses any of the remaining
    entries.

Linear Choose two objects that are furthest
apart. Quadratic Choose two objects that create
as much empty space as possible.
26
Comparison
  • Linear node-split is simple, fast, and as good as
    quadratic!
  • Quality of the splits is slightly worse!

27
Insertion
  • Similar to B-trees, new index records are added
    to the leaves, nodes that overflow are split, and
    splits propagate up the tree.
  • Insert (T Root of the R-tree, E new index
    entry)
  • Find position for new record Invoke ChooseLeaf
    to select a leaf node L in which to place E.
  • Add record to leaf node If L has room for E
    then insert E and return. Otherwise, invoke
    SplitNode to obtain L and LL containing E and all
    the old entries of L.
  • Propagate changes upwards Invoke AdjustTree on
    L, also passing LL if a split was performed.
  • Grow tree taller If node split propagation
    caused the root to split, create a new root whose
    children are the two resulting nodes.

28
AdjustTree
  • Ascend from a leaf node L to the root, adjusting
    covering rectangles and propagating node splits.

29
Deletes
  • Straightforward. The only complication is
    under-flows
  • An under-full node can be merged with whichever
    sibling will have its area increased least.
  • Orphaned entries are inserted back into the
    R-Tree.

30
R-Tree
31
R-tree Variations
  • R-tree enhances retrieval performance by
    avoiding visiting multiple paths when searching
    for point queries.
  • No overlap for minimum bounding rectangels at the
    same level.
  • Specific objects entry might be duplicated.
  • Insertions might lead to a series of update
    operations in a chain-reaction.
  • Under certain circumstances, the structure may
    lead to a deadlock, e.g., every rectangle
    encloses a smaller one.

32
R-tree 1990
  • Node split is more sophisticated.
  • Does not obey the limitation of the number of
    pairs per node.
  • When a node overflows, p entries are extracted
    and reinserted in the tree (p might be 25).
  • Considers minimization of
  • the overlapping between minimum bounding
    rectangles at the same level.
  • the perimeter of the produced minimum bounding
    rectangles.
  • Insertion is more expensive while retrievals are
    faster.

33
Static R-trees
  • Assumes the dataset is known in advance.
  • Static R-trees are more efficient than dynamic
    ones
  • Tree structure is more compact,
  • Contains fewer news,
  • Overlap between minimum bounding rectangles is
    reduced.

34
Summary
  • R-tree is a spatial index structure that provides
    competitive average performance.
  • Many different variations in the literature
  • Spatio-temporal access methods, 3-d R-tree.
  • Historical R-trees and Time-Parameterized R-tree
    fo spatiotemporal applications.
  • Have been used to speed-up operations in OLAP
    applications, data warehouses and data mining.
Write a Comment
User Comments (0)
About PowerShow.com