Title: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees
1CSIS 7101Spatial Data (Part 2)Efficient
Processing of Spatial Joins Using R-trees
- Rollo Chan
- Chu Chung Man
- Mak Wai Yip
- Vivian Lee
- Eric Lo
- Sindy Shou
- Hugh Wang
2Efficient Processing of Spatial Join Using R-trees
- What is Spatial Data?
- Consists of points, lines, rectangles, polygons,
surfaces - Two types of queries in DBS
- Single scan and Multiple scan queries
- How to retrieve spatial objects in GIS
efficiently? - Spatial Access Method (SAM) eg. R-tree
3What is Spatial Access Method?
- Designed to support single scan query
- eg. Window query
- Find all objects which intersect a given window
- Attempts to store objects which are close
together in the data space on a common page - Reduces number of disk accesses
4How is window query processed by SAM?
- 1) Filter step
- Find all objects whose minimum bounding
rectangles intersects the query rectangle - 2) Refinement step
- Check whether the objects fulfill the query
condition
5What is Spatial Join?
- To combine two sets of spatial objects according
to some spatial properties - It is an important type of query for multiple
scanning in spatial DBS
6Example of Spatial Join
- Two relations forests, cities
- (Assume an attributes in each relation
represents the borders of forests and cities) - Example query would be
- Find all forests which are in a city
7Problems when performing Spatial Join
- It is too expensive in terms of CPU time and I/O
time - Traditional index structure is not efficient for
spatial join - How to make it more efficient?
- R-tree
8Why using R-tree for Spatial Join ?
- To optimize CPU-time and I/O time
- Less comparison than a simple nested loop
- Other algorithms cannot be efficiently applied to
spatial join
9R-tree Approach for Spatial Join
- Suppose there are two R-trees
- R, S
- Idea
- To use the property that directory rectangles
form the minimum bounding box of data rectangles
in the corresponding subtrees. - If the rectangles of two directory entries ER
and ES have common intersection then there is a
pair (rectR, rectS)
10Minimum Bounding Box
11Is there anyway to be more efficient?
- There are two areas we need to take into account
in order to be more efficient - CPU Time Tuning
- I/O Time Tuning
12CPU Time Tuning
- Two ways to improve CPU time
- Restricting the search space
- Spatial sorting and plane sweep
13Restricting the search space
- Idea
- Scan through each of two nodes marks all entries
which are required for performing the join, (i.e.
which intersect the intersecting rectangles of
two nodes. ) - Then, each marked entry of one node is tested
against all marked entries of the other node.
14Restricting the search space (contd)
Original 7 of R 7 of S
5
49 joins
1
4
6
2
2
1
1
5
1
3
2
6
2
3
7
Now 3 of R 2 of S
7
3
6 joins
Plus Scanning 7 of R 7 of S
4
14 times
15Spatial sorting and plane sweep
- Idea
- Sort the entries in a node of the R-tree
according to the spatial location of the
corresponding rectangles. - Then move the Sweep-Line perpendicular to one of
the axes from left to right to compute the
intersections.
16Example of Sorted Intersection Test
r1.xu
- t r1 r1 lt--gt s1
- t s1 s1 lt--gt r2
- t r2 r2 lt--gt s2, r2 lt--gt s3
- t s2 -
- t r3 r3 lt--gt s3
s1.xl lt r1.xu
s1.xl
Sweep-Line
17I/O Time Tuning
- To achieve good I/O-performance with a buffer
size as small as possible - R-tree might occupy only small portion of
LRU-buffer - Compute a read schedule of the pages to minimize
the number of disk accesses - Local optimization policy based on spatial
locality - Idea of Read Schedule If a frequently used page
always resides in the buffer, the number of disk
access can be improved by a lot
18Three such techniques
- Local plane sweep
- Local plane sweep with pinning
- Local z-order
19Local Plane-Sweep Order
- Idea
- Based on spatial ordering, the plane-sweep
algorithm creates a sequence of pairs of
intersecting rectangles. - This sequence can be used to determine the read
schedule of the spatial join.
20Local Plane-Sweep Order (contd)
6
r3
r3
4
s2
s2
lt
gt
,
,
,
,
,
3
r1
r1
r4
r4
s1
s1
5
r2
1
r2
2
21Local Plane-Sweep Order w/ Pinning
- Idea
- Determine a pair of (Er,Es) of entries wrt local
plane sweep order. Compute the degree of the
rectangles of both entries - Deg(E.rect) of intersections between E.rect
and the rectangles which belong to entries of the
other tree that are not yet processed - Pin the page in the buffer whose corresponding
rectangle has maximal degree - Perform spatial join on the pinned page with all
other pages
22Local Plane-Sweep Order w/ Pinning (contd)
Er.rect r1 Es.rect s2
r3
Es
1
s2
0
2
Deg(r1)
Deg(s2)
2
Er
r1
r4
s1
r2
23Local Z-Order
- Idea
- Compute the intersections between each rectangle
of the one node and all rectangles of the other
node - Sort the rectangles according to the spatial
location of their centers - Decompose the underlying space into cells of
equal size and provide an ordering on this set of
cells
24Local Z-Order (contd)
r3
III
III
s2
IV
IV
II
II
r1
r4
s1
I
I
r2
Read schedule lts1,r2,r1,s2,r4,r3gt
25Number of Disk Access
gt
5384
5290
Size of LRU Buffer
lt
2392
2373
26Number of Disk Access (contd)
Size of LRU Buffer
27Q A
- Thats it for the Presentation
- Any Questions?
28Reference
- Brinkhoff T., Kriegel H.P., Seeger B. (1993).
Institute of Computer Science, University of
Munich. Efficient Processing of Spatial Joins
Using R-trees. Washington, DC, USA ACM-SIGMOD.