R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

Description:

Template design: Polly M., Silver Fox Productions, Inc. Formatter: Event Date: Event Location: Speech Length: Audience: Key Topics: – PowerPoint PPT presentation

Number of Views:250

Avg rating:3.0/5.0

Slides: 35

Provided by: VishalT8

Category:

more less

Transcript and Presenter's Notes

Title: R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984.

1
R-TREES A Dynamic Index Structure for Spatial
Searchingby A. Guttman, SIGMOD 1984.

Shahram Ghandeharizadeh
Computer Science Department
University of Southern California

2
Motivating Example

Type in your street address in Google

3
Example (Cont)

Show me all the pizza places close by

4
Terminology

Example query is termed a spatial query.
R-tree is a spatial index structure.
K-D-B trees are useful for point data only.
Exact-point lookup!
Show me the USC Salvatory Computer Science
building.
R-tree represents data objects in intervals in
several dimensions.
Exact-point and range lookups!
Show me all Pizza places in a 2 mile radius of
USC Salvatory Computer Science building.
R-tree is
A height-balanced tree similar to B-tree with
index records in its leaf nodes containing
pointers to data objects.
A node is a disk page.
Assumes each tuple has a unique identifier, RID.

5
R-Tree Leaf Nodes

Leaf nodes contain index records
(I, tuple-identifier)
tuple-identifier is RID,
I is an n-dimensional rectangle that bounds the
indexed spatial object
I (I0, I1, , In-1) where n is the number of
dimensions.
Ii is a closed bounded interval a,b describing
the extent of the object along dimension i.
Values for a and b might be infinity, indicating
an unbounded object along dimension i.

6
R-Tree Non-leaf nodes

Non-leaf nodes contain entries of the form
(I, child-pointer)
Child-pointer is the address of a lower node in
the R-Tree.
I covers all rectangles in the lower nodes
entries.

7
R-Tree A 2-D (n2) Example
8
R-Tree Non-leaf nodes

Non-leaf nodes contain entries of the form
(I, child-pointer)
Child-pointer is the address of a lower node in
the R-Tree.
I covers all rectangles in the lower nodes
entries.
Questions?

9
R-Tree Non-leaf nodes

Non-leaf nodes contain entries of the form
(I, child-pointer)
Child-pointer is the address of a lower node in
the R-Tree.
I covers all rectangles in the lower nodes
entries.
Questions?

What is this?
10
R-Tree Non-leaf nodes

Non-leaf nodes contain entries of the form
(I, child-pointer)
Child-pointer is the address of a lower node in
the R-Tree.
I covers all rectangles in the lower nodes
entries.
Questions?

Disk Page address!
11
R-Tree Non-leaf nodes

Non-leaf nodes contain entries of the form
(I, child-pointer)
Child-pointer is the address of a lower node in
the R-Tree.
I covers all rectangles in the lower nodes
entries.
Questions?

How about this? What is it?
12
R-Tree Non-leaf nodes

Non-leaf nodes contain entries of the form
(I, child-pointer)
Child-pointer is the address of a lower node in
the R-Tree.
I covers all rectangles in the lower nodes
entries.
Questions?

An n dimensional rectangle I (I0, I1, , In-1)
13
R-tree Properties

Assume
M Maximum number of entries in a node.
m lt M/2
N Number of records
R-tree has the following properties
Every leaf node contains between m and M index
records. Root node is the exception.
For each index record (I, tuple-identifier) in a
leaf node, I is the smallest rectangle that
spatially contains the n dimensional data object
represented in the indicated tuple.
Every non-leaf node has between m and M children.
Root node is the exception.
For each entry (I, child-pointer) in a non-leaf
node, I is the smallest rectangle that spatially
contains the rectangles in the child node.
The root node has at least two children unless it
is a leaf.
All leaves appear on the same level.
Height of a tree Ceiling(logmN)-1.
Worst case utilization for all nodes except the
root is m/M.

14
Searching

Descend from root to leaf in a B-tree manner.
If multiple sub-trees contain the point of
interest then follow all.
Assume
EI denotes the rectangle part of an index entry
E,
Ep denotes the tuple-identifier or child-pointer.
Search (T Root of the R-tree, S Search
Rectangle)
If T is not a leaf, check each entry E to
determine whether EI overlaps S. For all
overlapping entries, invoke Search(Ep, S).
If T is a leaf, check all entries E to determine
whether EI overlaps S. If so, E is a qualifying
record.

15
Insertion

Similar to B-trees, new index records are added
to the leaves, nodes that overflow are split, and
splits propagate up the tree.
Insert (T Root of the R-tree, E new index
entry)
Find position for new record Invoke ChooseLeaf
to select a leaf node L in which to place E.
Add record to leaf node If L has room for E
then insert E and return. Otherwise, invoke
SplitNode to obtain L and LL containing E and all
the old entries of L.
Propagate changes upwards Invoke AdjustTree on
L, also passing LL if a split was performed.
Grow tree taller If node split propagation
caused the root to split, create a new root whose
children are the two resulting nodes.

16
Insertion ChooseLeaf

ChooseLeaf (E new index entry)
Initialize Set N to be the root node,
Leaf check If N is a leaf, return N.
Choose subtree Let F be the entry in N whose
rectangle FI needs least enlargement to include
E. Resolve ties by choosing the entry with the
rectangle of smallest area.
Descend until a leaf is reached Set N to be the
child node pointed to by Fp and repeat from step
2.

17
SplitNode Node Splitting

A full node contains M entries. Divide the
collection of M1 entries between 2 nodes.
Objective Make it as unlikely as possible for
the resulting two new nodes to be examined on
subsequent searches.
Heuristic The total area of two covering
rectangles after a split should be minimized.

Total area is larger!
18
SplitNode Node Splitting

A full node contains M entries. Divide the
collection of M1 entries between 2 nodes.
Objective Make it as unlikely as possible for
the resulting two new nodes to be examined on
subsequent searches.
Heuristic The total area of two covering
rectangles after a split should be minimized.

Total area is larger!
19
Node Splitting How?

How to find the minimum area node split?
Exhaustive algorithm,
Quadratic-cost algorithm,
Linear cost algorithm.

20
Exhaustive Algorithm

Generate all possible groups and choose the best
with minimum area.
Number of possibilities 2 to power of M-1
M 50 ? Number of possibilities 600 Trillion

21
Exhaustive Algorithm

Generate all possible groups and choose the best
with minimum area.
Number of possibilities 2 to power of M-1
M 50 ? Number of possibilities 600 Trillion
US deficit pales!

22
Quadratic-Cost algorithm

A heuristic to find a small-area split.
Cost is quadratic in M and linear in the number
of dimensions.
Pick two of the M1 entries to be the first
elements of the two new groups.
Choose these in a manner to waste the most area
if both were put in the same group.
Assign remaining entries to groups one at a time.

23
Quadratic-Cost algorithm

A heuristic to find a small-area split.
Cost is quadratic in M and linear in the number
of dimensions.
Pick two of the M1 entries to be the first
elements of the two new groups.
Choose these in a manner to waste the most area
if both were put in the same group.
Assign remaining entries to groups one at a time.

24
Quadratic-Cost algorithm

A heuristic to find a small-area split.
Cost is quadratic in M and linear in the number
of dimensions.
Pick two of the M1 entries to be the first
elements of the two new groups.
Choose these in a manner to waste the most area
if both were put in the same group.
Assign remaining entries to groups one at a time.

25
Linear Cost Algorithm

Identical to Quadratic with the following
differences
Uses a different version of PickSeeds.
PickNext simply chooses any of the remaining
entries.

Linear Choose two objects that are furthest
apart. Quadratic Choose two objects that create
as much empty space as possible.
26
Comparison

Linear node-split is simple, fast, and as good as
quadratic!
Quality of the splits is slightly worse!

27
Insertion

Similar to B-trees, new index records are added
to the leaves, nodes that overflow are split, and
splits propagate up the tree.
Insert (T Root of the R-tree, E new index
entry)
Find position for new record Invoke ChooseLeaf
to select a leaf node L in which to place E.
Add record to leaf node If L has room for E
then insert E and return. Otherwise, invoke
SplitNode to obtain L and LL containing E and all
the old entries of L.
Propagate changes upwards Invoke AdjustTree on
L, also passing LL if a split was performed.
Grow tree taller If node split propagation
caused the root to split, create a new root whose
children are the two resulting nodes.

28
AdjustTree

Ascend from a leaf node L to the root, adjusting
covering rectangles and propagating node splits.

29
Deletes

Straightforward. The only complication is
under-flows
An under-full node can be merged with whichever
sibling will have its area increased least.
Orphaned entries are inserted back into the
R-Tree.

30
R-Tree
31
R-tree Variations

R-tree enhances retrieval performance by
avoiding visiting multiple paths when searching
for point queries.
No overlap for minimum bounding rectangels at the
same level.
Specific objects entry might be duplicated.
Insertions might lead to a series of update
operations in a chain-reaction.
Under certain circumstances, the structure may
lead to a deadlock, e.g., every rectangle
encloses a smaller one.

32
R-tree 1990

Node split is more sophisticated.
Does not obey the limitation of the number of
pairs per node.
When a node overflows, p entries are extracted
and reinserted in the tree (p might be 25).
Considers minimization of
the overlapping between minimum bounding
rectangles at the same level.
the perimeter of the produced minimum bounding
rectangles.
Insertion is more expensive while retrievals are
faster.

33
Static R-trees

Assumes the dataset is known in advance.
Static R-trees are more efficient than dynamic
ones
Tree structure is more compact,
Contains fewer news,
Overlap between minimum bounding rectangles is
reduced.

34
Summary

R-tree is a spatial index structure that provides
competitive average performance.
Many different variations in the literature
Spatio-temporal access methods, 3-d R-tree.
Historical R-trees and Time-Parameterized R-tree
fo spatiotemporal applications.
Have been used to speed-up operations in OLAP
applications, data warehouses and data mining.

Write a Comment

User Comments (0)