Title: CIS750
1CIS750 Seminar in Advanced Topics in Computer
ScienceAdvanced topics in databases
Multimedia Databases
- V. Megalooikonomou
- Spatial Access Methods (SAMs) I
- (some slides are based on notes by C. Faloutsos)
2General Overview
- Multimedia Indexing
- Spatial Access Methods (SAMs)
- k-d trees
- Point Quadtrees
- MX-Quadtree
- z-ordering
- R-trees
3SAMs - Detailed outline
- spatial access methods
- problem dfn
- k-d trees
- point quadtrees
- MX-quadtrees
- z-ordering
- R-trees
4Spatial Access Methods - problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer spatial queries
(like??)
5Spatial Access Methods - problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
6Spatial Access Methods - problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
7Spatial Access Methods - problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
8Spatial Access Methods - problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
9Spatial Access Methods - problem
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs within e)
10SAMs - motivation
11SAMs - motivation
traditional DB
GIS
age
salary
12SAMs - motivation
traditional DB
GIS
age
salary
13SAMs - motivation
CAD/CAM
find elements too close to each other
14SAMs - motivation
CAD/CAM
15SAMs - motivation
eg,. std
S1
F(S1)
1
365
day
F(Sn)
Sn
eg, avg
1
365
day
16SAMs solutions
- K-d trees
- point quadtrees
- MX-quadtrees
- z-ordering
- R-trees
- (grid files)
- Q how would you organize, e.g., n-dim points, on
disk? (C points per disk page)
17SAMs - Detailed outline
- spatial access methods
- problem dfn
- k-d trees
- point quadtrees
- MX-quadtrees
- z-ordering
- R-trees
18k-d trees
- Used to store k dimensional point data
- It is not used to store region data
- A 2-d tree (i.e., for k2) stores 2-dimensional
point data while a 3-d tree stores 3-dimensional
point data, etc.
192-d trees node structure
- Binary trees
- Info information field
- Xval,Yval coordinates of a point associated with
the node - Llink, Rlink pointers to children
- Properties (N node)
- If level N even -gt
- for all nodes M in the subtree rooted at N.Llink
M.Xval lt N.Xval - for all nodes P in the subtree rooted at N.Rlink
P.Xval gt N.Xval - If level N odd -gt
- Similarly use Yvals
202-d trees Example
212-d trees Insertion/Search
- To insert a node N into the tree pointed by T
- If N and T agree on Xval, Yval then overwrite T
- Else, branch left if N.Xval lt T.xval, right
otherwise (even levels) - Similarly for odd levels (branching on Yvals)
222-d trees Example of Insertion
City (Xval, Yval)
Banja Luka (19, 45)
Derventa (40, 50)
Toslic (38, 38)
Tuzla (54, 35)
Sinj (4, 4)
Splitting of region by Banja Luka
Splitting of region by Derventa
Splitting of region by Toslic
Splitting of region by Sinj
232-d trees Deletion
- Deletion of point (x,y) from T
- If N is a leaf node easy
- Otherwise either Tl (left subtree) or Tr (right
subtree) is non-empty - Find a candidate replacement node R in Tl or
Tr - Replace all of Ns non-link fields by those of R
- Recursively delete R from Ti
- Recursion guaranteed to terminate - Why?
242-d trees Deletion
- Finding candidate replacement nodes for deletion
- Replacement node R must bear same spatial
relation to all nodes in Tl and Tr as node N
252-d trees Range Queries
- Q Given a point (xc, yc) and a distance r find
all points in the 2-d tree that lie within the
circle - A Each node N in a 2-d tree implicitly
represents a region RN If the circle (specified
by the query) has no intersection with RN then
there is no point in searching the subtree rooted
at node N
26SAMs - Detailed outline
- spatial access methods
- problem dfn
- k-d trees
- point quadtrees
- z-ordering
- R-trees
27Point Quadtrees
- Represent point data
- Always split regions into 4 parts
- 2-d tree a node N splits a region into two by
drawing one line through the point (N.xval,
N.yval) - Point quadtree a node N splits a region by
drawing a horizontal and a vertical line through
the point (N.xval, N.yval) - Four parts NW, SW, NE, and SE quadrants
- Q Quadtree nodes have 4 children?
28Point Quadtrees
- Nodes in point quadtrees represent regions
29Point quadtrees - Insertion
City (Xval, Yval)
Banja Luka (19, 45)
Derventa (40, 50)
Toslic (38, 38)
Tuzla (54, 35)
Sinj (4, 4)
Splitting of region by Banja Luka
Splitting of region by Derventa
Splitting of region by Toslic
Splitting of region by Sinj
Splitting of region by Tuzla
30Point Quadtrees - Insertion
31Point quadtrees Deletion
- Deletion of point (x,y) from T
- If N is a leaf node easy
- Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is
non-empty - Find a candidate replacement node R in one of
the subtrees such that - Every other node R1 in N.NW is to the NW of R
- Every other node R2 in N.SW is to the SW of R
- etc
- Replace all of Ns non-link fields by those of R
- Recursively delete R from Ti
- In general, it may not always be possible to find
such as replacement node - Q What happens in the worst case?
32Point quadtrees Deletion
- Deletion of point (x,y) from T
- If N is a leaf node easy
- Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is
non-empty - Find a candidate replacement node R in one of
the subtrees such that - Every other node R1 in N.NW is to the NW of R
- Every other node R2 in N.SW is to the SW of R
- etc
- Replace all of Ns non-link fields by those of R
- Recursively delete R from Ti
- In general, it may not always be possible to find
such as replacement node - Q What happens in the worst case? May require
all nodes to be reinserted
33Point quadtrees Range Searches
- Each node in a point quadtree represents a region
- Do not search regions that do not intersect the
circle defined by the query
34SAMs - Detailed outline
- spatial access methods
- problem dfn
- k-d trees
- point quadtrees
- MX-quadtrees
- z-ordering
- R-trees
35MX-Quadtrees
- Drawbacks of 2-d trees, point quadtrees
- shape of tree depends upon the order in which
objects are inserted into the tree - splits may be uneven depending upon where the
point (N.xval, N.yval) is located inside the
region (represented by N) - MX-quadtrees shape (and height) of tree
independent of number of nodes and order of
insertion
36MX-Quadtrees
- Assumption the map is represented as a grid of
size (2k x 2k) for some k - When a region gets split it splits down the
middle
37MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
38MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
39MX-Quadtrees - Deletion
- Fairly easy why?
- All point are represented at the leaf level
- Total time for deletion O(k)
40MX-Quadtrees Range Queries
- Same as in point quadtrees
- One difference
- Checking to see if a point is in the circle
defined by the range query needs to be performed
at the leaf level (points are stored at the leaf
level)
41SAMs - Detailed outline
- spatial access methods
- problem dfn
- k-d trees
- point quadtrees
- MX-quadtrees
- z-ordering
- R-trees
42z-ordering
- Q how would you organize, e.g., n-dim points, on
disk? (C points per disk page) - Hint reduce the problem to 1-d points(!!)
- Q1 why?
- A
- Q2 how?
43z-ordering
- Q how would you organize, e.g., n-dim points, on
disk? (C points per disk page) - Hint reduce the problem to 1-d points (!!)
- Q1 why?
- A B-trees!
- Q2 how?
44z-ordering
- Q2 how?
- A assume finite granularity z-ordering
bit-shuffling N-trees Morton keys
geo-coding ...
45z-ordering
- Q2 how?
- A assume finite granularity (e.g., 232x232 4x4
here) - Q2.1 how to map n-d cells to 1-d cells?
46z-ordering
- Q2.1 how to map n-d cells to 1-d cells?
47z-ordering
- Q2.1 how to map n-d cells to 1-d cells?
- A row-wise
- Q is it good?
48z-ordering
- Q is it good?
- A great for x axis bad for y axis
49z-ordering
- Q How about the snake curve?
50z-ordering
- Q How about the snake curve?
- A still problems
232
232
51z-ordering
- Q Why are those curves bad?
- A no distance preservation ( clustering)
- Q solution?
232
232
52z-ordering
- Q solution? (w/ good clustering, and easy to
compute, for 2-d and n-d?)
53z-ordering
- Q solution? (w/ good clustering, and easy to
compute, for 2-d and n-d?) - A z-ordering/bit-shuffling/linear-quadtrees
- looks better
- few long jumps
- scoops out the whole quadrant
- before leaving it
- a.k.a. space filling curves
54z-ordering
- z-ordering/bit-shuffling/linear-quadtrees
- Q How to generate this curve (z f(x,y) )?
- A 3 (equivalent) answers!
55z-ordering
- z-ordering/bit-shuffling/linear-quadtrees
- Q How to generate this curve (z f(x,y))?
- A1 z (or N) shapes, RECURSIVELY
order-2
order-1
...
order (n1)
56z-ordering
- Notice
- self similar (well see about fractals, soon)
- method is hard to use z ? f(x,y)
order-2
order-1
57z-ordering
- z-ordering/bit-shuffling/linear-quadtrees
- Q How to generate this curve (z f(x,y) )?
- A 3 (equivalent) answers!
Method 2?
58z-ordering
y
11 10 01 00
00
10
x
01
11
59z-ordering
y
11 10 01 00
How about the reverse (x,y) g(z) ?
00
10
x
01
11
60z-ordering
y
11 10 01 00
How about n-d spaces?
00
10
x
01
11
61z-ordering
- z-ordering/bit-shuffling/linear-quadtrees
- Q How to generate this curve (z f(x,y) )?
- A 3 (equivalent) answers!
Method 3?
62z-ordering
- linear-quadtrees assign N-gt1, S-gt0 e.t.c.
W E
1
N S
0
0
1
63z-ordering
- ... and repeat recursively. Eg. zgray-cell
- WNWN (0101)2 5
W E
11
00
1
N S
0
0
1
64z-ordering
- Drill z-value of grey cell, with the three
methods?
W E
1
N S
0
0
1
65z-ordering
- Drill z-value of grey cell, with the three
methods?
W E
method1 14 method2 shuffle(1110)
(1110)2 14
1
N S
0
0
1
66z-ordering
- Drill z-value of grey cell, with the three
methods?
W E
method1 14 method2 shuffle(1110)
(1110)2 14 method3 ENES ... 14
1
N S
0
0
1
67z-ordering - Detailed outline
- spatial access methods
- z-ordering
- main idea - 3 methods
- use w/ B-trees algorithms (range, knn queries
...) - non-point (eg., region) data
- analysis variations
- R-trees
68z-ordering - usage algos
- Q1 How to store on disk?
- A
- Q2 How to answer range queries etc
69z-ordering - usage algos
- Q1 How to store on disk?
- A treat z-value as primary key feed to B-tree
PGH
SF
70z-ordering - usage algos
- MAJOR ADVANTAGES w/ B-tree
- already inside commercial systems (no coding
/debugging!) - concurrency recovery is ready
71z-ordering - usage algos
- Q2 queries? (eg. find city at (0,3) )?
PGH
SF
72z-ordering - usage algos
- Q2 queries? (eg. find city at (0,3) )?
- A find z-value search B-tree
PGH
SF
73z-ordering - usage algos
PGH
SF
74z-ordering - usage algos
- Q2 range queries?
- A compute ranges of z-values use B-tree
PGH
9,11-15
SF
75z-ordering - usage algos
- Q2 range queries - how to reduce of
qualifying ranges?
PGH
9,11-15
SF
76z-ordering - usage algos
- Q2 range queries - how to reduce of
qualifying ranges? - A Augment the query!
PGH
9,11-15 -gt 8-15
SF
77z-ordering - usage algos
- Q2 range queries - how to break a query into
ranges?
9,11-15
78z-ordering - usage algos
- Q2 range queries - how to break a query into
ranges? - A recursively, quadtree-style decompose only
non-full quadrants
12-15
9,11-15
79z-ordering - usage algos
- Q2 range queries - how to break a query into
ranges? - A recursively, quadtree-style decompose only
non-full quadrants
12-15
9,11-15
9, 11
80z-ordering - Detailed outline
- spatial access methods
- z-ordering
- main idea - 3 methods
- use w/ B-trees algorithms (range, knn queries
...) - non-point (eg., region) data
- analysis variations
- R-trees
81z-ordering - usage algos
- Q3 k-nn queries? (say, 1-nn)?
PGH
SF
82z-ordering - usage algos
- Q3 k-nn queries? (say, 1-nn)?
- A traverse B-tree find nn wrt z-values and ...
PGH
SF
83z-ordering - usage algos
PGH
SF
nn wrt z-value
12
5
3
84z-ordering - usage algos
PGH
SF
nn wrt z-value
12
5
3
85z-ordering - usage algos
- Q4 all-pairs queries? ( all pairs of cities
within 10 miles from each other? )
PGH
SF
(well see spatial joins later find all PA
counties that intersect a lake)
86z-ordering - Detailed outline
- spatial access methods
- z-ordering
- main idea - 3 methods
- use w/ B-trees algorithms (range, knn queries
...) - non-point (eg., region) data
- analysis variations
- R-trees
- ...
87z-ordering - regions
zB ?? zC ??
B
A
C
88z-ordering - regions
- Q z-value for a region?
- A 1 or more z-values by quadtree decomposition
zB ?? zC ??
89z-ordering - regions
dont care
zB 11 zC ??
W E
11
00
1
N S
0
0
1
90z-ordering - regions
dont care
zB 11 zC 0010 1000
W E
11
00
1
N S
0
0
1
91z-ordering - regions
- Q How to store in B-tree?
- Q How to search (range etc queries)
92z-ordering - regions
- Q How to store in B-tree?
- A sort (lt0lt1)
- Q How to search (range etc queries)
93z-ordering - regions
- Q How to search (range etc queries)
- eg red range query
94z-ordering - regions
- Q How to search (range etc queries)
- eg red range query
- A break query in z-values check B-tree
95z-ordering - regions
- Almost identical to range queries for point data,
except for the dont cares - i.e.,
1100 ?? 11
96z-ordering - regions
- Almost identical to range queries for point data,
except for the dont cares - i.e., - z1 1100 ?? 11 z2
- Specifically does z1 contain/avoid/intersect z2?
- Q what is the criterion to decide?
-
97z-ordering - regions
- z1 1100 ?? 11 z2
- Specifically does z1 contain/avoid/intersect z2?
- Q what is the criterion to decide?
- A Prefix property let r1, r2 be the
corresponding regions, and let r1 be the smallest
(gt z1 has fewest s). Then
98z-ordering - regions
- r2 will either contain completely, or avoid
completely r1. - it will contain r1, if z2 is the prefix of z1
-
1100 ?? 11
region of z1 completely contained in region of z2
99z-ordering - regions
- Drill (True/False). Given
- z1 011001
- z2 01
- z3 0100
- T/F r2 contains r1
- T/F r3 contains r1
- T/F r3 contains r2
-
100z-ordering - regions
- Drill (True/False). Given
- z1 011001
- z2 01
- z3 0100
- T/F r2 contains r1 - TRUE (prefix property)
- T/F r3 contains r1 - FALSE (disjoint)
- T/F r3 contains r2 - FALSE (r2 contains r3)
-
101z-ordering - regions
- Drill (True/False). Given
- z1 011001
- z2 01
- z3 0100
-
z2
102z-ordering - regions
- Drill (True/False). Given
- z1 011001
- z2 01
- z3 0100
-
z2
z3
T/F r2 contains r1 - TRUE (prefix property) T/F
r3 contains r1 - FALSE (disjoint) T/F r3 contains
r2 - FALSE (r2 contains r3)
103z-ordering - regions
- Spatial joins find (quickly) all
- counties intersecting lakes
-
104z-ordering - regions
- Spatial joins find (quickly) all
- counties intersecting lakes
- Naive algorithm O( N M)
- Something faster?
-
105z-ordering - regions
- Spatial joins find (quickly) all
- counties intersecting lakes
-
106z-ordering - regions
- Spatial joins find (quickly) all
- counties intersecting lakes
- Solution merge the lists of (sorted) z-values,
looking for the prefix property - footnote1 needs careful treatment
- footnote2 need dup. elimination
-
107z-ordering - Detailed outline
- spatial access methods
- z-ordering
- main idea - 3 methods
- use w/ B-trees algorithms (range, knn queries
...) - non-point (eg., region) data
- analysis variations
- R-trees
108z-ordering - variations
- Q is z-ordering the best we can do?
109z-ordering - variations
- Q is z-ordering the best we can do?
- A probably not - occasional long jumps
- Q then?
110z-ordering - variations
- Q is z-ordering the best we can do?
- A probably not - occasional long jumps
- Q then? A1 Gray codes
111z-ordering - variations
- A2 Hilbert curve! (a.k.a. Hilbert-Peano curve)
112z-ordering - variations
- Looks better (never long jumps). How to derive
it?
113z-ordering - variations
- Looks better (never long jumps). How to derive
it?
order-1
order-2
order (n1)
...
114z-ordering - variations
- Q function for the Hilbert curve ( h f(x,y) )?
- A bit-shuffling, followed by post-processing,
- to account for rotations. Linear on bits.
- See textbook, for pointers to
code/algorithms (eg., Jagadish, 90)
115z-ordering - variations
- Q how about Hilbert curve in 3-d? n-d?
- A Exists (and is not unique!). Eg., 3-d, order-1
Hilbert curves (Hamiltonian paths on cube)
1
2
116z-ordering - Detailed outline
- spatial access methods
- z-ordering
- main idea - 3 methods
- use w/ B-trees algorithms (range, knn queries
...) - non-point (eg., region) data
- analysis variations
- R-trees
- ...
117z-ordering - analysis
- Q How many pieces (quad-tree blocks) per
region? - A proportional to perimeter (surface etc)
118z-ordering - analysis
- (How long is the coastline, say, of England?
- Paradox The answer changes with the yard-stick
-gt fractals ...)
119z-ordering - analysis
- Q Should we decompose a region to full detail
(and store in B-tree)?
120z-ordering - analysis
- Q Should we decompose a region to full detail
(and store in B-tree)? - A NO! approximation with 1-3 pieces/z-values is
best Orenstein90
121z-ordering - analysis
- Q how to measure the goodness of a curve?
122z-ordering - analysis
- Q how to measure the goodness of a curve?
- A e.g., avg. of runs, for range queries
4 runs
3 runs
(runs disk accesses on B-tree)
123z-ordering - analysis
- Q So, is Hilbert really better?
- A 27 fewer runs, for 2-d (similar for 3-d)
- Q are there formulas for runs, of quadtree
blocks etc? - A Yes (Jagadish Moon etc see textbook)
124z-ordering - fun observations
- Hilbert and z-ordering curves space filling
curves eventually, they visit every point - in n-d space - therefore
125z-ordering - fun observations
- ... they show that the plane has as many points
as a line (-gt headaches for 1900s
mathematics/topology). (fractals, again!)
126z-ordering - fun observations
- Observation 2 Hilbert (like) curve for video
encoding Y. Matias, CRYPTO 87 - Given a frame, visit its pixels in randomized
- hilbert order compress and transmit
127z-ordering - fun observations
- In general, Hilbert curve is great for preserving
distances, clustering, vector quantization etc
128Conclusions
- z-ordering is a great idea (n-d points -gt 1-d
points feed to B-trees) - used by TIGER system and (most probably) by other
GIS products - works great with low-dim points