Title: I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis
1I/O-Efficient Batched Union-Find and Its
Applications to Terrain Analysis
- Pankaj K. Agarwal, Lars Arge, Ke Yi
- Duke University
- University of Aarhus
2The Union-Find Problem
- A universe of N elements x1, x2, , xN
- Initially N singleton sets x1, x2 , , xN
- Each set has a representative
- Maintain the partition under
- Union(xi, xj) Joins the sets containing xi and
xj - Find(xi) Returns the representative of the set
containing xi
3The Solution
representatives
d
h
i
p
b
j
a
f
l
s
r
c
z
k
e
g
m
n
Union(d, h)
Find(n)
h
h
d
f
l
d
f
l
n
m
b
j
a
b
j
a
m
link-by-rank
path compression
e
g
e
g
n
4Complexity
- O(N a(N)) for a sequence of N union and find
operations Tarjan 75 - a() Inverse Ackermann function (very slow!)
- Optimal in the worst case Tarjan79, Fredman and
Saks 89 - Batched (Off-line) version
- Entire sequence known in advance
- Can be improved to linear on RAM Gabow and
Tarjan 85 - Not possible on a pointer machine Tarjan79
5Simple and Good, as long as
- The entire data structure fits in memory
6The I/O Model
Main memory of size M
One I/O transfers B items between memory and disk
Disk of infinite size
7Sources of Non-Locality
- Two operands in a union
- Nodes on a leaf-to-root path
- Operands in consecutive operations
- Cannot remove for the on-line case
- Need to eliminate all of them in order to get
less than one I/O per operation!
8Our Results
- An I/O-efficient algorithm for the batched
union-find problem using O(sort(N)) O(N/B
logM/B(N/B)) I/Os - Same as sorting
- optimal in the worst case
- A practical algorithm using O(sort(N) log(N/M))
I/Os - Implemented
- Applications to terrain analysis
- Topological persistence O(sort(N)) I/Os
- Implemented
- Contour trees O(sort(N)) I/Os
9I/O-Efficient Batched Union-Find
- Assumption No redundant unions
- Each union must join two different sets
- Will remove later
- Two-stage algorithm
- Convert to interval union-find
- Compute an order on the elements s.t. each union
joins two adjacent sets - Solve batched interval union-find
10Union Tree
1 Union(d, g) 2 Union(a, c) 3 Union(r, b) 4
Union(a, e) 5 Union(e, i) 6 Union(r, a) 7
Union(a, d) g 8 Union(d, h)
r 9 Union(b, f)
r
r
9
3
6
6
3
f
a
b
a
b
4
4
2
9
2
7
7
c
d
e
f
c
d
e
1
5
8
1
5
g
h
i
g
i
8
h
Equivalent union trees
11Transforming the Union Tree
r
r
r
7
3
3
3
6
6
6
8
8
a
b
a
b
h
a
b
d
h
4
2
9
2
9
9
4
4
7
7
1
2
c
d
e
f
c
d
e
f
c
e
f
g
1
5
8
1
5
5
i
g
h
i
g
i
r
7
9
6
3
8
a
b
d
f
h
Weights along root-to-leaf path decrease
1
2
4
5
c
e
g
i
12Formulating as a Batched Problem
r
3
6
a
b
r
7
4
9
2
9
6
3
7
8
a
b
d
f
h
c
d
e
f
1
2
1
5
8
4
5
c
e
g
i
g
h
i
For each edge, find the lowest ancestor edgewith
a higher weight
13Cast in a Geometry Setting
r
3
9
6
8
a
b
7
4
2
9
7
6
c
d
e
f
5
1
5
8
4
3
g
h
i
2
1
Euler Tour
x weight y positions in the tour
In O(sort(N)) I/Os Chiang et al. 95
14Cast in a Geometry Setting
r
3
9
6
8
a
b
7
4
2
9
7
6
c
d
e
f
5
1
5
8
4
3
g
h
i
2
1
For each edge, find the lowest ancestor edgewith
a higher weight
For each segment, find the shortest segment above
and containing it
15Distribution Sweeping
M/B vertical slabs
checkedrecursively
Total cost O(sort(N))
checked here
16In-Order Traversal
r
3
9
6
7
Weights along root-to-leaf path decrease
8
a
b
d
f
h
1
2
4
5
c
e
g
i
- At u, with child u1,, uk (in increasing order
of weight) - Recursively visit subtree at u1
- Return u
- For i2 ,, kRecursively visit subtree at ui
b
r
a
c
e
i
g
d
h
f
Claim this traversalproduces the right order
17Solving Interval Union-Find
Union x two operands y time stamp Find x
operand y time stamp
Four instances of batched ray shooting O(sort(N))
18Handling Redundant Unions
- Union tree becomes a graph
- Compute the minimum spanning tree
- O(sort(N)) I/Os (randomized) Chiang et al. 95
O(sort(N) loglog B) I/Os (deterministic) Arge et
al. 04 - Deterministic O(sort(N)) I/Os if graph is planar
- Only MST edges are non-redundant
19A Practical Algorithm
- Previous algorithm too complicated
- 2 Euler tours
- 4 instances of batched ray shooting
- MST
- A simple and practical algorithm
- Divide-and-conquer
- O(sort(N) log(N/M)) I/Os
- Implemented
20Applications
- Topological Persistence
- Contour Trees
21Topological Persistence
22Formulated as Batched Union-Find
- Represented as a triangulated mesh
- Consider minimum-saddle pairs
- When reach
- A minimum or maximum do nothing
- A regular poin u Issue union(u,v) for a lower
neighbor v - A saddle u let v and w be nodes from us two
connected pieces in its lower link Issue
find(v), find(w), union(u,v), union(u,w)
lower link
23Contour Trees
24Previous Results
- Directly maintain contours
- O(N log N) time van Kreveld et al. 97
- Needs union-split-find for circular lists
- Do not extend to higher dimensions
- Two sweeps by maintaining components, then merge
- O(N log N) time Carr et al. 03
- Extend to arbitrary dimensions
25Join Tree and Split Tree
Qualified nodes
9
9
9
9
8
8
8
8
7
7
7
7
6
6
6
6
5
5
5
5
4
4
4
4
3
3
3
3
2
2
1
1
1
1
Join tree
Split tree
Join tree
Split tree
26Final Contour Tree
Hard to BATCH!
9
9
9
8
8
8
7
7
7
6
6
6
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
Join tree
Split tree
Contour tree
27Another Characterization
Let w be the highest node that is a descendant of
v in join tree and ancestor of u in split tree,
(u, w) is a contour tree edge
9
9
9
Now can BATCH!
8
8
8
u
7
7
u
7
u
6
6
6
v
v
u
5
5
5
w
w
w
4
4
4
3
3
3
2
2
2
1
1
1
Join tree
Split tree
Contour tree
28Experiment 1Random Union-Find
29Experiment 2 Topological Persistence on Terrain
Data
Neuse River Basin of NC
30Experiment 2 Topological Persistence on Terrain
Data
31Summary
- An I/O-efficient algorithm for the batched
union-find problem using O(sort(N)) O(N/B
logM/B(N/B)) I/Os - optimal in the worst case
- A practical algorithm using O(sort(N) log(N/M))
I/Os - Applications to terrain analysis
- Topological persistence O(sort(N)) I/Os
- Contour trees O(sort(N)) I/Os
- Open Question On-line case
- Can we get below O(N a(N)) I/Os?
32Thank you!