Chapter 3: Data Storage and Access Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 3: Data Storage and Access Methods

Description:

Large geographic data. ... Large rectangles limit split options! Non ... Insertions and deletions. Overlapping directory rectangles = multiple search paths ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 14
Provided by: spatia
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3: Data Storage and Access Methods


1
Chapter 3 Data Storage and Access Methods
  • Title The R Tree An Efficient and Robust
    Access Method for Points and Rectangles
  • Authors N. Beckmann, H. Kriegel, R. Schneider
    and B. Seeger
  • Pages 207-216

2
The R Tree An Efficient and Robust Access
Method for Points and Rectangles
  • Problem
  • Problem Statement
  • Why is this problem important?
  • Why is this problem hard?
  • Approaches
  • Approach description, key concepts
  • Contributions (novelty, improved)
  • Assumptions

3
Problem Statement R Tree
  • Given
  • Data containing points and rectangles
  • Spatial queries (point, range query, insert,
    delete)
  • Find - An Access Method (Data Structure)
  • A hierarchical organization of rectangles
  • Example from wikipedia
  • Objectives
  • Efficiency of spatial queries
  • Constraints
  • Balanced tree
  • Each node is a disk page and has gt m (min of
    entries) entries.
  • Root has at least two children unless it is a
    leaf
  • Efficiency metric number of disk-pages accessed

4
Why is this problem important?
  • Multi-dimensional Applications
  • Large geographic data. e.g., Map objects like
    countries occupy regions of non-zero size in two
    dimension.
  • Common real world usage Find all museums within
    2 miles of my current location".
  • CAD
  • Many DBMS servers support spatial indices
  • Orcale, IBM DB2,

5
Why is this problem Hard?
  • B-tree split methods ineffective in 2-dimensions
  • Ex. Sorting
  • Size variation across data Rectangles
  • Large rectangles limit split options!
  • Non-uniform data distribution over space
  • Dynamic Access Method
  • Insertions and deletions
  • Overlapping directory rectangles gt multiple
    search paths

6
Novelty of Contribution
  • Related Work
  • Traditional one-dimensional indexing structures
    (e.g., hash, B-tree) are not appropriate for
    range search
  • B tree
  • Represents sorted data in a way that allows for
    efficient insertion and removal of elements.
  • Dynamic, multilevel index with maximum and
    minimum bounds on the number of keys in each
    node.
  • Leaf nodes are linked together as a linked list
    to make range queries easy.
  • R-tree
  • R-tree is a foundation for spatial access method
  • A complex spatial object is represented by
    minimum bounding rectangles while preserving
    essential geometric properties
  • Over-lapping regions
  • Heuristic minimize the area of each enclosing
    rectangle in the inner nodes.

7
Principles of R-tree
  • Height-balanced tree similar to a B-tree with
    index records in its leaf nodes containing
    pointers to data objects.
  • Heuristic Optimization minimize the area of each
    enclosing rectangle in the inner nodes.

Reference A Guttman R-tree a dynamic index
structure for spatial searching, 1984
8
Performance Parameters beyond R-tree
  • (Q1) The area covered by a directory rectangle
    should be minimized.
  • (Q2) The overlap between directory rectangles
    should be minimized.
  • (Q3) The margin of a directory rectangle should
    be minimized.
  • (Q4) Storage utilization should be optimized.
  • Intuitions
  • Reduce overlap between sibling nodes.
  • Reduce traversal of multiple branches for point
    query
  • Reinsert old data changes entries between
    neighboring nodes and thus decreases overlap.
  • Due to more restructuring, less splits occur

9
Difference between R-tree and R-tree
  • Minimization of area, margin, and overlap is
    crucial to the performance of R-tree / R-tree.
  • The R-tree attempts to reduce the tree, using a
    combination of a revised node split algorithm and
    the concept of forced reinsertion at node
    overflow. This is based on the observation that
    R-tree structures are highly susceptible to the
    order in which their entries are inserted, so an
    insertion-built (rather than bulk-loaded)
    structure is likely to be sub-optimal. Deletion
    and reinsertion of entries allows them to "find"
    a place in the tree that may be more appropriate
    than their original location. ? Improve retrieval
    performance

10
Example

Preferred by R-tree
R1
R2
R5
R4
R3
Preferred by R-tree
11
Validation Methodology
  • Methodology
  • Experiments with simulated workloads
  • Evaluation of design decisions
  • Results
  • R-tree outperforms variants of R-tree and
    2-level grid file.
  • R-tree is robust against non-uniform data
    distributions.

12
Summary
  • Papers focus
  • R-tree implementations and performance
  • Ideas
  • Heuristic Optimizations (pp. 208)
  • Reduction of area, margin, and overlap of the
    directory rectangles
  • Better Storage Utilization (pp 211)
  • Forced Reinsertion (splits can be prevented)
  • Experimental comparison
  • Using many data distributions

13
Assumptions, Rewrite today
  • Assumptions
  • Indexing data in two-dimensional space
  • Bulk load and bulk reorganization not available
  • Concurrency control and recovery costs are
    negligible
  • Reinserts during split!
  • Rewrite today
  • Bulk-load of rectangles
  • Compare with newer methods
  • R tree (disjoint sibling), Hilbert-R-tree
  • Analytical results
  • Formally compare R-tree with alternatives
Write a Comment
User Comments (0)
About PowerShow.com