CIS750 - PowerPoint PPT Presentation

About This Presentation
Title:

CIS750

Description:

Spatial Access Methods (SAMs) I (some s are based on notes by C. Faloutsos) ... SAMs - Detailed outline. spatial access methods. problem dfn. k-d trees ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 129
Provided by: Vas111
Learn more at: https://cis.temple.edu
Category:
Tags: cis750 | sams

less

Transcript and Presenter's Notes

Title: CIS750


1
CIS750 Seminar in Advanced Topics in Computer
ScienceAdvanced topics in databases
Multimedia Databases
  • V. Megalooikonomou
  • Spatial Access Methods (SAMs) I
  • (some slides are based on notes by C. Faloutsos)

2
General Overview
  • Multimedia Indexing
  • Spatial Access Methods (SAMs)
  • k-d trees
  • Point Quadtrees
  • MX-Quadtree
  • z-ordering
  • R-trees

3
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

4
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer spatial queries
    (like??)

5
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

6
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

7
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

8
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs queries)

9
Spatial Access Methods - problem
  • Given a collection of geometric objects (points,
    lines, polygons, ...)
  • organize them on disk, to answer
  • point queries
  • range queries
  • k-nn queries
  • spatial joins (all pairs within e)

10
SAMs - motivation
  • Q applications?

11
SAMs - motivation
traditional DB
GIS
age
salary
12
SAMs - motivation
traditional DB
GIS
age
salary
13
SAMs - motivation
CAD/CAM
find elements too close to each other
14
SAMs - motivation
CAD/CAM
15
SAMs - motivation
eg,. std
S1
F(S1)
1
365
day
F(Sn)
Sn
eg, avg
1
365
day
16
SAMs solutions
  • K-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees
  • (grid files)
  • Q how would you organize, e.g., n-dim points, on
    disk? (C points per disk page)

17
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

18
k-d trees
  • Used to store k dimensional point data
  • It is not used to store region data
  • A 2-d tree (i.e., for k2) stores 2-dimensional
    point data while a 3-d tree stores 3-dimensional
    point data, etc.

19
2-d trees node structure
  • Binary trees
  • Info information field
  • Xval,Yval coordinates of a point associated with
    the node
  • Llink, Rlink pointers to children
  • Properties (N node)
  • If level N even -gt
  • for all nodes M in the subtree rooted at N.Llink
    M.Xval lt N.Xval
  • for all nodes P in the subtree rooted at N.Rlink
    P.Xval gt N.Xval
  • If level N odd -gt
  • Similarly use Yvals

20
2-d trees Example
21
2-d trees Insertion/Search
  • To insert a node N into the tree pointed by T
  • If N and T agree on Xval, Yval then overwrite T
  • Else, branch left if N.Xval lt T.xval, right
    otherwise (even levels)
  • Similarly for odd levels (branching on Yvals)

22
2-d trees Example of Insertion
City (Xval, Yval)
Banja Luka (19, 45)
Derventa (40, 50)
Toslic (38, 38)
Tuzla (54, 35)
Sinj (4, 4)
Splitting of region by Banja Luka
Splitting of region by Derventa
Splitting of region by Toslic
Splitting of region by Sinj
23
2-d trees Deletion
  • Deletion of point (x,y) from T
  • If N is a leaf node easy
  • Otherwise either Tl (left subtree) or Tr (right
    subtree) is non-empty
  • Find a candidate replacement node R in Tl or
    Tr
  • Replace all of Ns non-link fields by those of R
  • Recursively delete R from Ti
  • Recursion guaranteed to terminate - Why?

24
2-d trees Deletion
  • Finding candidate replacement nodes for deletion
  • Replacement node R must bear same spatial
    relation to all nodes in Tl and Tr as node N

25
2-d trees Range Queries
  • Q Given a point (xc, yc) and a distance r find
    all points in the 2-d tree that lie within the
    circle
  • A Each node N in a 2-d tree implicitly
    represents a region RN If the circle (specified
    by the query) has no intersection with RN then
    there is no point in searching the subtree rooted
    at node N

26
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • z-ordering
  • R-trees

27
Point Quadtrees
  • Represent point data
  • Always split regions into 4 parts
  • 2-d tree a node N splits a region into two by
    drawing one line through the point (N.xval,
    N.yval)
  • Point quadtree a node N splits a region by
    drawing a horizontal and a vertical line through
    the point (N.xval, N.yval)
  • Four parts NW, SW, NE, and SE quadrants
  • Q Quadtree nodes have 4 children?

28
Point Quadtrees
  • Nodes in point quadtrees represent regions

29
Point quadtrees - Insertion
City (Xval, Yval)
Banja Luka (19, 45)
Derventa (40, 50)
Toslic (38, 38)
Tuzla (54, 35)
Sinj (4, 4)
Splitting of region by Banja Luka
Splitting of region by Derventa
Splitting of region by Toslic
Splitting of region by Sinj
Splitting of region by Tuzla
30
Point Quadtrees - Insertion
31
Point quadtrees Deletion
  • Deletion of point (x,y) from T
  • If N is a leaf node easy
  • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is
    non-empty
  • Find a candidate replacement node R in one of
    the subtrees such that
  • Every other node R1 in N.NW is to the NW of R
  • Every other node R2 in N.SW is to the SW of R
  • etc
  • Replace all of Ns non-link fields by those of R
  • Recursively delete R from Ti
  • In general, it may not always be possible to find
    such as replacement node
  • Q What happens in the worst case?

32
Point quadtrees Deletion
  • Deletion of point (x,y) from T
  • If N is a leaf node easy
  • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is
    non-empty
  • Find a candidate replacement node R in one of
    the subtrees such that
  • Every other node R1 in N.NW is to the NW of R
  • Every other node R2 in N.SW is to the SW of R
  • etc
  • Replace all of Ns non-link fields by those of R
  • Recursively delete R from Ti
  • In general, it may not always be possible to find
    such as replacement node
  • Q What happens in the worst case? May require
    all nodes to be reinserted

33
Point quadtrees Range Searches
  • Each node in a point quadtree represents a region
  • Do not search regions that do not intersect the
    circle defined by the query

34
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

35
MX-Quadtrees
  • Drawbacks of 2-d trees, point quadtrees
  • shape of tree depends upon the order in which
    objects are inserted into the tree
  • splits may be uneven depending upon where the
    point (N.xval, N.yval) is located inside the
    region (represented by N)
  • MX-quadtrees shape (and height) of tree
    independent of number of nodes and order of
    insertion

36
MX-Quadtrees
  • Assumption the map is represented as a grid of
    size (2k x 2k) for some k
  • When a region gets split it splits down the
    middle

37
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
38
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
39
MX-Quadtrees - Deletion
  • Fairly easy why?
  • All point are represented at the leaf level
  • Total time for deletion O(k)

40
MX-Quadtrees Range Queries
  • Same as in point quadtrees
  • One difference
  • Checking to see if a point is in the circle
    defined by the range query needs to be performed
    at the leaf level (points are stored at the leaf
    level)

41
SAMs - Detailed outline
  • spatial access methods
  • problem dfn
  • k-d trees
  • point quadtrees
  • MX-quadtrees
  • z-ordering
  • R-trees

42
z-ordering
  • Q how would you organize, e.g., n-dim points, on
    disk? (C points per disk page)
  • Hint reduce the problem to 1-d points(!!)
  • Q1 why?
  • A
  • Q2 how?

43
z-ordering
  • Q how would you organize, e.g., n-dim points, on
    disk? (C points per disk page)
  • Hint reduce the problem to 1-d points (!!)
  • Q1 why?
  • A B-trees!
  • Q2 how?

44
z-ordering
  • Q2 how?
  • A assume finite granularity z-ordering
    bit-shuffling N-trees Morton keys
    geo-coding ...

45
z-ordering
  • Q2 how?
  • A assume finite granularity (e.g., 232x232 4x4
    here)
  • Q2.1 how to map n-d cells to 1-d cells?

46
z-ordering
  • Q2.1 how to map n-d cells to 1-d cells?

47
z-ordering
  • Q2.1 how to map n-d cells to 1-d cells?
  • A row-wise
  • Q is it good?

48
z-ordering
  • Q is it good?
  • A great for x axis bad for y axis

49
z-ordering
  • Q How about the snake curve?

50
z-ordering
  • Q How about the snake curve?
  • A still problems

232
232
51
z-ordering
  • Q Why are those curves bad?
  • A no distance preservation ( clustering)
  • Q solution?

232
232
52
z-ordering
  • Q solution? (w/ good clustering, and easy to
    compute, for 2-d and n-d?)

53
z-ordering
  • Q solution? (w/ good clustering, and easy to
    compute, for 2-d and n-d?)
  • A z-ordering/bit-shuffling/linear-quadtrees
  • looks better
  • few long jumps
  • scoops out the whole quadrant
  • before leaving it
  • a.k.a. space filling curves

54
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y) )?
  • A 3 (equivalent) answers!

55
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y))?
  • A1 z (or N) shapes, RECURSIVELY

order-2
order-1
...
order (n1)
56
z-ordering
  • Notice
  • self similar (well see about fractals, soon)
  • method is hard to use z ? f(x,y)

order-2
order-1
57
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y) )?
  • A 3 (equivalent) answers!

Method 2?
58
z-ordering
  • bit-shuffling

y
11 10 01 00
00
10
x
01
11
59
z-ordering
  • bit-shuffling

y
11 10 01 00
How about the reverse (x,y) g(z) ?
00
10
x
01
11
60
z-ordering
  • bit-shuffling

y
11 10 01 00
How about n-d spaces?
00
10
x
01
11
61
z-ordering
  • z-ordering/bit-shuffling/linear-quadtrees
  • Q How to generate this curve (z f(x,y) )?
  • A 3 (equivalent) answers!

Method 3?
62
z-ordering
  • linear-quadtrees assign N-gt1, S-gt0 e.t.c.

W E
1
N S
0
0
1
63
z-ordering
  • ... and repeat recursively. Eg. zgray-cell
  • WNWN (0101)2 5

W E
11
00
1
N S
0
0
1
64
z-ordering
  • Drill z-value of grey cell, with the three
    methods?

W E
1
N S
0
0
1
65
z-ordering
  • Drill z-value of grey cell, with the three
    methods?

W E
method1 14 method2 shuffle(1110)
(1110)2 14
1
N S
0
0
1
66
z-ordering
  • Drill z-value of grey cell, with the three
    methods?

W E
method1 14 method2 shuffle(1110)
(1110)2 14 method3 ENES ... 14
1
N S
0
0
1
67
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees

68
z-ordering - usage algos
  • Q1 How to store on disk?
  • A
  • Q2 How to answer range queries etc

69
z-ordering - usage algos
  • Q1 How to store on disk?
  • A treat z-value as primary key feed to B-tree

PGH
SF
70
z-ordering - usage algos
  • MAJOR ADVANTAGES w/ B-tree
  • already inside commercial systems (no coding
    /debugging!)
  • concurrency recovery is ready

71
z-ordering - usage algos
  • Q2 queries? (eg. find city at (0,3) )?

PGH
SF
72
z-ordering - usage algos
  • Q2 queries? (eg. find city at (0,3) )?
  • A find z-value search B-tree

PGH
SF
73
z-ordering - usage algos
  • Q2 range queries?

PGH
SF
74
z-ordering - usage algos
  • Q2 range queries?
  • A compute ranges of z-values use B-tree

PGH
9,11-15
SF
75
z-ordering - usage algos
  • Q2 range queries - how to reduce of
    qualifying ranges?

PGH
9,11-15
SF
76
z-ordering - usage algos
  • Q2 range queries - how to reduce of
    qualifying ranges?
  • A Augment the query!

PGH
9,11-15 -gt 8-15
SF
77
z-ordering - usage algos
  • Q2 range queries - how to break a query into
    ranges?

9,11-15
78
z-ordering - usage algos
  • Q2 range queries - how to break a query into
    ranges?
  • A recursively, quadtree-style decompose only
    non-full quadrants

12-15
9,11-15
79
z-ordering - usage algos
  • Q2 range queries - how to break a query into
    ranges?
  • A recursively, quadtree-style decompose only
    non-full quadrants

12-15
9,11-15
9, 11
80
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees

81
z-ordering - usage algos
  • Q3 k-nn queries? (say, 1-nn)?

PGH
SF
82
z-ordering - usage algos
  • Q3 k-nn queries? (say, 1-nn)?
  • A traverse B-tree find nn wrt z-values and ...

PGH
SF
83
z-ordering - usage algos
  • ... ask a range query.

PGH
SF
nn wrt z-value
12
5
3
84
z-ordering - usage algos
  • ... ask a range query.

PGH
SF
nn wrt z-value
12
5
3
85
z-ordering - usage algos
  • Q4 all-pairs queries? ( all pairs of cities
    within 10 miles from each other? )

PGH
SF
(well see spatial joins later find all PA
counties that intersect a lake)
86
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees
  • ...

87
z-ordering - regions
  • Q z-value for a region?

zB ?? zC ??
B
A
C
88
z-ordering - regions
  • Q z-value for a region?
  • A 1 or more z-values by quadtree decomposition

zB ?? zC ??
89
z-ordering - regions
dont care
  • Q z-value for a region?

zB 11 zC ??
W E
11
00
1
N S
0
0
1
90
z-ordering - regions
dont care
  • Q z-value for a region?

zB 11 zC 0010 1000
W E
11
00
1
N S
0
0
1
91
z-ordering - regions
  • Q How to store in B-tree?
  • Q How to search (range etc queries)

92
z-ordering - regions
  • Q How to store in B-tree?
  • A sort (lt0lt1)
  • Q How to search (range etc queries)

93
z-ordering - regions
  • Q How to search (range etc queries)
  • eg red range query

94
z-ordering - regions
  • Q How to search (range etc queries)
  • eg red range query
  • A break query in z-values check B-tree

95
z-ordering - regions
  • Almost identical to range queries for point data,
    except for the dont cares - i.e.,

1100 ?? 11
96
z-ordering - regions
  • Almost identical to range queries for point data,
    except for the dont cares - i.e.,
  • z1 1100 ?? 11 z2
  • Specifically does z1 contain/avoid/intersect z2?
  • Q what is the criterion to decide?

97
z-ordering - regions
  • z1 1100 ?? 11 z2
  • Specifically does z1 contain/avoid/intersect z2?
  • Q what is the criterion to decide?
  • A Prefix property let r1, r2 be the
    corresponding regions, and let r1 be the smallest
    (gt z1 has fewest s). Then

98
z-ordering - regions
  • r2 will either contain completely, or avoid
    completely r1.
  • it will contain r1, if z2 is the prefix of z1

1100 ?? 11
region of z1 completely contained in region of z2
99
z-ordering - regions
  • Drill (True/False). Given
  • z1 011001
  • z2 01
  • z3 0100
  • T/F r2 contains r1
  • T/F r3 contains r1
  • T/F r3 contains r2

100
z-ordering - regions
  • Drill (True/False). Given
  • z1 011001
  • z2 01
  • z3 0100
  • T/F r2 contains r1 - TRUE (prefix property)
  • T/F r3 contains r1 - FALSE (disjoint)
  • T/F r3 contains r2 - FALSE (r2 contains r3)

101
z-ordering - regions
  • Drill (True/False). Given
  • z1 011001
  • z2 01
  • z3 0100

z2
102
z-ordering - regions
  • Drill (True/False). Given
  • z1 011001
  • z2 01
  • z3 0100

z2
z3
T/F r2 contains r1 - TRUE (prefix property) T/F
r3 contains r1 - FALSE (disjoint) T/F r3 contains
r2 - FALSE (r2 contains r3)
103
z-ordering - regions
  • Spatial joins find (quickly) all
  • counties intersecting lakes

104
z-ordering - regions
  • Spatial joins find (quickly) all
  • counties intersecting lakes
  • Naive algorithm O( N M)
  • Something faster?

105
z-ordering - regions
  • Spatial joins find (quickly) all
  • counties intersecting lakes

106
z-ordering - regions
  • Spatial joins find (quickly) all
  • counties intersecting lakes
  • Solution merge the lists of (sorted) z-values,
    looking for the prefix property
  • footnote1 needs careful treatment
  • footnote2 need dup. elimination

107
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees

108
z-ordering - variations
  • Q is z-ordering the best we can do?

109
z-ordering - variations
  • Q is z-ordering the best we can do?
  • A probably not - occasional long jumps
  • Q then?

110
z-ordering - variations
  • Q is z-ordering the best we can do?
  • A probably not - occasional long jumps
  • Q then? A1 Gray codes

111
z-ordering - variations
  • A2 Hilbert curve! (a.k.a. Hilbert-Peano curve)

112
z-ordering - variations
  • Looks better (never long jumps). How to derive
    it?

113
z-ordering - variations
  • Looks better (never long jumps). How to derive
    it?

order-1
order-2
order (n1)
...
114
z-ordering - variations
  • Q function for the Hilbert curve ( h f(x,y) )?
  • A bit-shuffling, followed by post-processing,
  • to account for rotations. Linear on bits.
  • See textbook, for pointers to
    code/algorithms (eg., Jagadish, 90)

115
z-ordering - variations
  • Q how about Hilbert curve in 3-d? n-d?
  • A Exists (and is not unique!). Eg., 3-d, order-1
    Hilbert curves (Hamiltonian paths on cube)

1
2
116
z-ordering - Detailed outline
  • spatial access methods
  • z-ordering
  • main idea - 3 methods
  • use w/ B-trees algorithms (range, knn queries
    ...)
  • non-point (eg., region) data
  • analysis variations
  • R-trees
  • ...

117
z-ordering - analysis
  • Q How many pieces (quad-tree blocks) per
    region?
  • A proportional to perimeter (surface etc)

118
z-ordering - analysis
  • (How long is the coastline, say, of England?
  • Paradox The answer changes with the yard-stick
    -gt fractals ...)

119
z-ordering - analysis
  • Q Should we decompose a region to full detail
    (and store in B-tree)?

120
z-ordering - analysis
  • Q Should we decompose a region to full detail
    (and store in B-tree)?
  • A NO! approximation with 1-3 pieces/z-values is
    best Orenstein90

121
z-ordering - analysis
  • Q how to measure the goodness of a curve?

122
z-ordering - analysis
  • Q how to measure the goodness of a curve?
  • A e.g., avg. of runs, for range queries

4 runs
3 runs
(runs disk accesses on B-tree)
123
z-ordering - analysis
  • Q So, is Hilbert really better?
  • A 27 fewer runs, for 2-d (similar for 3-d)
  • Q are there formulas for runs, of quadtree
    blocks etc?
  • A Yes (Jagadish Moon etc see textbook)

124
z-ordering - fun observations
  • Hilbert and z-ordering curves space filling
    curves eventually, they visit every point
  • in n-d space - therefore

125
z-ordering - fun observations
  • ... they show that the plane has as many points
    as a line (-gt headaches for 1900s
    mathematics/topology). (fractals, again!)

126
z-ordering - fun observations
  • Observation 2 Hilbert (like) curve for video
    encoding Y. Matias, CRYPTO 87
  • Given a frame, visit its pixels in randomized
  • hilbert order compress and transmit

127
z-ordering - fun observations
  • In general, Hilbert curve is great for preserving
    distances, clustering, vector quantization etc

128
Conclusions
  • z-ordering is a great idea (n-d points -gt 1-d
    points feed to B-trees)
  • used by TIGER system and (most probably) by other
    GIS products
  • works great with low-dim points
Write a Comment
User Comments (0)
About PowerShow.com