Title: Fast N-Body Algorithms for Massive Datasets
1Fast N-Body Algorithmsfor Massive Datasets
- Alexander Gray
- Georgia Institute of Technology
2Is science in 2007different from science in 1907?
Instruments
Science, Szalay J. Gray, 2001
3Is science in 2007different from science in 1907?
Instruments
Data CMB Maps
Science, Szalay J. Gray, 2001
1990 COBE 1,000 2000 Boomerang
10,000 2002 CBI 50,000 2003 WMAP
1 Million 2008 Planck 10 Million
Data Local Redshift Surveys
Data Angular Surveys
1986 CfA 3,500 1996 LCRS 23,000 2003
2dF 250,000 2005 SDSS 800,000
1970 Lick 1M 1990 APM 2M 2005 SDSS
200M 2008 LSST 2B
4Sloan Digital Sky Survey (SDSS)
51 billion objects 144 dimensions (250M galaxies
in 5 colors, 1M 2000-D spectra)
- Size matters! Now possible
- low noise subtle patterns
- global properties and patterns
- rare objects and patterns
- more info 3d, deeper/earlier, bands
- in parallel more accurate simulations
- 2008 LSST time-varying phenomena
6Happening everywhere!
microarray chips
Molecular biology
nuclear mag. resonance
Drug discovery
satellite topography
Earth sciences
microprocessor
Physical simulation
functional MRI
fiber optics
Neuroscience
Internet
7- How did galaxies evolve?
- What was the early universe like?
- Does dark energy exist?
- Is our model (GRinflation) right?
Astrophysicist
R. Nichol, Inst. Cosmol. Gravitation A. Connolly,
U. Pitt Physics C. Miller, NOAO R. Brunner,
NCSA G. Kulkarni, Inst. Cosmol. Gravitation D.
Wake, Inst. Cosmol. Gravitation R. Scranton, U.
Pitt Physics M. Balogh, U. Waterloo Physics I.
Szapudi, U. Hawaii Inst. Astronomy G. Richards,
Princeton Physics A. Szalay, Johns Hopkins
Physics
Machine learning/ statistics guy
8- How did galaxies evolve?
- What was the early universe like?
- Does dark energy exist?
- Is our model (GRinflation) right?
Astrophysicist
- Kernel density estimator
- n-point spatial statistics
- Nonparametric Bayes classifier
- Support vector machine
- Nearest-neighbor statistics
- Gaussian process regression
- Bayesian inference
O(N2)
O(Nn)
R. Nichol, Inst. Cosmol. Grav. A. Connolly, U.
Pitt Physics C. Miller, NOAO R. Brunner, NCSA G.
Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst.
Cosmol. Grav. R. Scranton, U. Pitt Physics M.
Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii
Inst. Astro. G. Richards, Princeton Physics A.
Szalay, Johns Hopkins Physics
O(N2)
O(N2)
O(N2)
O(N3)
O(cDT(N))
Machine learning/ statistics guy
9- How did galaxies evolve?
- What was the early universe like?
- Does dark energy exist?
- Is our model (GRinflation) right?
Astrophysicist
- Kernel density estimator
- n-point spatial statistics
- Nonparametric Bayes classifier
- Support vector machine
- Nearest-neighbor statistics
- Gaussian process regression
- Bayesian inference
O(N2)
O(Nn)
R. Nichol, Inst. Cosmol. Grav. A. Connolly, U.
Pitt Physics C. Miller, NOAO R. Brunner, NCSA G.
Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst.
Cosmol. Grav. R. Scranton, U. Pitt Physics M.
Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii
Inst. Astro. G. Richards, Princeton Physics A.
Szalay, Johns Hopkins Physics
O(N2)
O(N2)
O(N2)
O(N3)
O(cDT(N))
Machine learning/ statistics guy
10- How did galaxies evolve?
- What was the early universe like?
- Does dark energy exist?
- Is our model (GRinflation) right?
Astrophysicist
- Kernel density estimator
- n-point spatial statistics
- Nonparametric Bayes classifier
- Support vector machine
- Nearest-neighbor statistics
- Gaussian process regression
- Bayesian inference
O(N2)
O(Nn)
R. Nichol, Inst. Cosmol. Grav. A. Connolly, U.
Pitt Physics C. Miller, NOAO R. Brunner, NCSA G.
Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst.
Cosmol. Grav. R. Scranton, U. Pitt Physics M.
Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii
Inst. Astro. G. Richards, Princeton Physics A.
Szalay, Johns Hopkins Physics
O(N2)
O(N2)
O(N2)
O(N3)
O(cDT(N))
But I have 1 million points
Machine learning/ statistics guy
11Data The Stack
Apps (User, Science)
Perception Computer Vision, NLP, Machine Translation, Bibleome , Autonomous vehicles
ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy
Data Abstractions DBMS , MapReduce , VOTables ,
Clustering / Threading Programming with 1000s of powerful compute nodes
O/S
Network
Motherboards / Datacenter
ICs
12Data The Stack
Apps (User, Science)
Perception Computer Vision, NLP, Machine Translation, Bibleome , Autonomous vehicles
ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy
Data Abstractions DBMS , MapReduce , VOTables , Data structures
Clustering / Threading Programming with 1000s of powerful compute nodes
O/S
Network
Motherboards / Datacenter
ICs
13Making fast algorithms
- There are many large datasets. There are many
questions we want to ask them. - Why we must not get obsessed with one specific
dataset. - Why we must not get obsessed with one specific
question. - The activity Ill describe is about accerating
computations which occur commonly across many ML
methods.
14Scope
- Nearest neighbor
- K-means
- Hierarchical clustering
- N-point correlation functions
- Kernel density estimation
- Locally-weighted regression
- Mean shift tracking
- Mixtures of Gaussians
- Gaussian process regression
- Manifold learning
- Support vector machines
- Affinity propagation
- PCA
- .
15Scope
- ML methods with distances underneath
- Distances only
- Continuous kernel functions
- ML methods with counting underneath
16Scope
- Computational ideas in this tutorial
- Data structures
- Monte Carlo
- Series expansions
- Problem/solution abstractions
- Challenges
- Dont introduce error, if possible
- Dont introduce tweak parameters, if possible
17Two canonical problems
- Nearest-neighbor search
-
-
- Kernel density estimation
18Ideas
- Data structures and how to use them
- Monte Carlo
- Series expansions
- Problem/solution abstractions
19Nearest Neighbor - Naïve Approach
- Given a query point X.
- Scan through each point Y
- Calculate the distance d(X,Y)
- If d(X,Y) lt best_seen then Y is the new nearest
neighbor. - Takes O(N) time for each query!
20Speeding Up Nearest Neighbor
- We can speed up the search for the nearest
neighbor - Examine nearby points first.
- Ignore any points that are further then the
nearest point found so far. - Do this using a KD-tree
- Tree based data structure
- Recursively partitions points into axis aligned
boxes.
21KD-Tree Construction
Pt X Y
1 0.00 0.00
2 1.00 4.31
3 0.13 2.85
We start with a list of n-dimensional points.
22KD-Tree Construction
Xgt.5
YES
NO
Pt X Y
1 0.00 0.00
3 0.13 2.85
Pt X Y
2 1.00 4.31
We can split the points into 2 groups by choosing
a dimension X and value V and separating the
points into X gt V and X lt V.
23KD-Tree Construction
Xgt.5
YES
NO
Pt X Y
1 0.00 0.00
3 0.13 2.85
Pt X Y
2 1.00 4.31
We can then consider each group separately and
possibly split again (along same/different
dimension).
24KD-Tree Construction
Xgt.5
YES
NO
Ygt.1
Pt X Y
2 1.00 4.31
NO
YES
Pt X Y
3 0.13 2.85
Pt X Y
1 0.00 0.00
We can then consider each group separately and
possibly split again (along same/different
dimension).
25KD-Tree Construction
We can keep splitting the points in each set to
create a tree structure. Each node with no
children (leaf node) contains a list of points.
26KD-Tree Construction
We will keep around one additional piece of
information at each node. The (tight) bounds of
the points at or below this node.
27KD-Tree Construction
- Use heuristics to make splitting decisions
- Which dimension do we split along? Widest
- Which value do we split at? Median of value of
that split dimension for the points. - When do we stop? When there are fewer then m
points left OR the box has hit some minimum width.
28Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
29Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
30Nearest Neighbor with KD Trees
We traverse the tree looking for the nearest
neighbor of the query point.
31Nearest Neighbor with KD Trees
Examine nearby points first Explore the branch
of the tree that is closest to the query point
first.
32Nearest Neighbor with KD Trees
Examine nearby points first Explore the branch
of the tree that is closest to the query point
first.
33Nearest Neighbor with KD Trees
When we reach a leaf node compute the distance
to each point in the node.
34Nearest Neighbor with KD Trees
When we reach a leaf node compute the distance
to each point in the node.
35Nearest Neighbor with KD Trees
Then we can backtrack and try the other branch at
each node visited.
36Nearest Neighbor with KD Trees
Each time a new closest node is found, we can
update the distance bounds.
37Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
38Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
39Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
40Simple recursive algorithm (k1 case)
NN(xq,R,dlo,xsofar,dsofar) if dlo gt dsofar,
return. if leaf(R), xsofar,dsofarNNBase(xq,
R,dsofar). else, R1,d1,R2,d2orderByDis
t(xq,R.l,R.r). NN(xq,R1,d1,xsofar,dsofar).
NN(xq,R2,d2,xsofar,dsofar).
41Nearest Neighbor with KD Trees
- Instead, some animations showing real data
- kd-tree with cached sufficient statistics
- nearest-neighbor with kd-trees
- range-count with kd-trees
For animations, see http//www.cs.cmu.edu/awm/an
imations/kdtree
42Range-count example
43Range-count example
44Range-count example
45Range-count example
46Range-count example
Pruned! (inclusion)
47Range-count example
48Range-count example
49Range-count example
50Range-count example
51Range-count example
52Range-count example
53Range-count example
54Range-count example
Pruned! (exclusion)
55Range-count example
56Range-count example
57Range-count example
58Some questions
- Asymptotic runtime analysis?
- In a rough sense, O(logN)
- But only under some regularity conditions
- How high in dimension can we go?
- Roughly exponential in intrinsic dimension
- In practice, in less than 100 dimensions, still
big speedups
59Another kind of tree
- Ball-trees, metric trees
- Use balls instead of hyperrectangles
- Can often be more efficient in high dimension
(though not always) - Can work with non-Euclidean metric (you only need
to respect the triangle inequality) - Many non-metric similarity measures can be
bounded by metric quantities.
60A Set of Points in a metric space
61Ball Tree root node
62A Ball Tree
63A Ball Tree
64A Ball Tree
65A Ball Tree
66A Ball Tree
- J. Uhlmann, 1991
- S. Omohundro, NIPS 1991
67Ball-trees properties
- Let Q be any query point and let x be a point
inside ball B -
- x-Q ? Q - B.center - B.radius
- x-Q ? Q - B.center B.radius
B.center
Q
x
68How to build a metric tree, exactly?
- Must balance quality vs. build-time
- Anchors hierarchy (farthest-points heuristic,
2-approx used in OR) - Omohundro Five ways to build a ball-tree
- Which is the best? A research topic
69Some other trees
- Cover-tree
- Provable worst-case O(logN) under an assumption
(bounded expansion constant) - Like a non-binary ball-tree
- Learning trees
- In this conference
70All-type problems
- Nearest-neighbor search
-
- All-nearest neighbor
- (bichromatic)
- Kernel density estimation
- All version
- (bichromatic)
71Almost always all-type problems
- Kernel density estimation
- Nadaraya-Watson locally-wgtd regression
- Gaussian process prediction
- Radial basis function networks
Always all-type problems
- Monochromatic all-nearest neighbor (e.g. LLE)
- n-point correlation (n-tuples)
72Dual-tree idea
- If all the queries are available simultaneously,
then it is faster to - 1. Build a tree on the queries as well
- 2. Effectively process the queries in chunks
rather than individually ? work is shared between
similar query points
73Single-tree
74Single-tree
Dual-tree (symmetric)
75Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
76Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
77Exclusion and inclusion, using kd-tree node-node
bounds. O(D) bounds on distance minima/maxima
(Analogous to point-node bounds.)
Also needed
Nodewise bounds.
78Exclusion and inclusion, using kd-tree node-node
bounds. O(D) bounds on distance minima/maxima
(Analogous to point-node bounds.)
Also needed
Nodewise bounds.
79Single-tree simple recursive algorithm (k1 case)
NN(xq,R,dlo,xsofar,dsofar) if dlo gt dsofar,
return. if leaf(R), xsofar,dsofarNNBase(xq,
R,dsofar). else, R1,d1,R2,d2orderByDis
t(xq,R.l,R.r). NN(xq,R1,d1,xsofar,dsofar).
NN(xq,R2,d2,xsofar,dsofar).
80Single-tree ? Dual-tree
- xq ? Q
- dlo(xq,R) ? dlo(Q,R)
- xsofar ? xsofar, dsofar ? dsofar
- store Q.dsofarmaxQdsofar
- 2-way recursion ? 4-way recursion
- N x O(logN) ? O(N)
81Dual-tree simple recursive algorithm (k1)
AllNN(Q,R,dlo,xsofar,dsofar) if dlo gt
Q.dsofar, return. if leaf(Q) leaf(R),
xsofar,dsofarAllNNBase(Q,R,dsofar).
Q.dsofarmaxQdsofar. else if !leaf(Q)
leaf(R), else if leaf(Q) !leaf(R),
else if !leaf(Q) !leaf(R),
R1,d1,R2,d2orderByDist(Q.l,R.l,R.r).
AllNN(Q.l,R1,d1,xsofar,dsofar).
AllNN(Q.l,R2,d2,xsofar,dsofar).
R1,d1,R2,d2orderByDist(Q.r,R.l,R.r).
AllNN(Q.r,R1,d1,xsofar,dsofar).
AllNN(Q.r,R2,d2,xsofar,dsofar). Q.dsofar
max(Q.l.dsofar,Q.r.dsofar).
82Dual-tree traversal
(depth-first)
Reference points
Query points
83Dual-tree traversal
Reference points
Query points
84Dual-tree traversal
Reference points
Query points
85Dual-tree traversal
Reference points
Query points
86Dual-tree traversal
Reference points
Query points
87Dual-tree traversal
Reference points
Query points
88Dual-tree traversal
Reference points
Query points
89Dual-tree traversal
Reference points
Query points
90Dual-tree traversal
Reference points
Query points
91Dual-tree traversal
Reference points
Query points
92Dual-tree traversal
Reference points
Query points
93Dual-tree traversal
Reference points
Query points
94Dual-tree traversal
Reference points
Query points
95Dual-tree traversal
Reference points
Query points
96Dual-tree traversal
Reference points
Query points
97Dual-tree traversal
Reference points
Query points
98Dual-tree traversal
Reference points
Query points
99Dual-tree traversal
Reference points
Query points
100Dual-tree traversal
Reference points
Query points
101Dual-tree traversal
Reference points
Query points
102Dual-tree traversal
Reference points
Query points
103Meta-idea Higher-order Divide-and-conquer
Generalizes divide-and-conquer of a single set
to divide-and-conquer of multiple sets.
Break each set into pieces. Solving the
sub-parts of the problem and combining these
sub-solutions appropriately might be easier than
doing this over only one set.
104Ideas
- Data structures and how to use them
- Monte Carlo
- Series expansions
- Problem/solution abstractions
105Characterization of an entire distribution?
2-point correlation
How many pairs have distance lt r ?
2-point correlation function
r
106The n-point correlation functions
- Spatial inferences filaments, clusters, voids,
homogeneity, isotropy, 2-sample testing, - Foundation for theory of point processes
Daley,Vere-Jones 1972, unifies spatial
statistics Ripley 1976 - Used heavily in biostatistics, cosmology,
particle physics, statistical physics
2pcf definition
3pcf definition
1073-point correlation
Standard model ngt0 terms should be zero!
r1
r2
r3
How many triples have pairwise distances lt r ?
108How can we count n-tuples efficiently?
How many triples have pairwise distances lt r ?
109Use n trees!
Gray Moore, NIPS 2000
110How many valid triangles a-b-c(where
) could there be?
r
countA,B,C ?
A
B
C
111How many valid triangles a-b-c(where
) could there be?
r
countA,B,C countA,B,C.left countA,B,C.ri
ght
A
B
C
112How many valid triangles a-b-c(where
) could there be?
r
countA,B,C countA,B,C.left countA,B,C.ri
ght
A
B
C
113How many valid triangles a-b-c(where
) could there be?
r
A
B
countA,B,C ?
C
114How many valid triangles a-b-c(where
) could there be?
r
A
B
countA,B,C 0!
C
Exclusion
115How many valid triangles a-b-c(where
) could there be?
r
A
B
countA,B,C ?
C
116How many valid triangles a-b-c(where
) could there be?
Inclusion
r
A
B
countA,B,C A x B x C
Inclusion
C
Inclusion
117Key idea(combinatorial proximity problems)
- for n-tuples
- n-tree recursion
1183-point runtime
(biggest previous 20K) VIRGO simulation
data, N 75,000,000 naïve 5x109 sec.
(150 years) multi-tree 55 sec.
(exact)
n2 O(N) n3 O(Nlog3) n4 O(N2)
119But
Depends on rD-1.Slow for large radii.
VIRGO simulation data, N 75,000,000 naïve
150 years multi-tree large h 24 hrs
Lets develop a method for large radii.
120c p T
hard.
known.
EASIER?
no dependence on N! but it does depend on p
121c p T
no dependence on N! but it does depend on p
122c p T
no dependence on N! but it does depend on p
123c p T
no dependence on N! but it does depend on p
124c p T
no dependence on N! but it does depend on p
125c p T
This is junk dont bother
126c p T
This is promising
127- Basic idea
- 1. Remove some junk
- (Run exact algorithm for a while)
- ? make p larger
- 2. Sample from the rest
128Get disjoint sets from the recursion tree
all possible n-tuples
prune
129Now do stratified sampling
130Speedup Results
VIRGO simulation data N 75,000,000 naïve
150 years multi-tree large h 24 hrs
multi-tree monte carlo 99 confidence 96
sec
131Ideas
- Data structures and how to use them
- Monte Carlo
- Multipole methods
- Problem/solution abstractions
132Kernel density estimation
133How to use a tree1. How to approximate?2. When
to approximate?
- Barnes and Hut, Science, 1987
R
s
r
if
134How to use a tree3. How to know potential error?
- Lets maintain bounds on the true kernel sum
R
At the beginning
135How to use a tree4. How to do all problem?
Single-tree
Dual-tree (symmetric)
Gray Moore 2000
136How to use a tree4. How to do all problem?
R
Q
s
rR
rQ
if
- Generalizes Barnes-Hut to dual-tree
137BUT
We have a tweak parameter
Case 1 alg. gives no error bounds Case 2 alg.
gives error bounds, but must be rerun Case 3
alg. automatically achieves error tolerance
So far we have case 2 lets try for case 3
Lets try to make an automatic stopping rule
138Finite-difference function approximation.
Taylor expansion
Gregory-Newton finite form
139Finite-difference function approximation.
assumes monotonic decreasing kernel
approximate Q,R if
140Speedup Results (KDE)
dual- N naïve tree
12.5K 7 .12
25K 31 .31
50K 123 .46
100K 494 1.0
200K 1976 2
400K 7904 5
800K 31616 10
1.6M 35 hrs 23
One order-of-magnitude speedup over single-tree
at 2M points
5500x
141Ideas
- Data structures and how to use them
- Monte Carlo
- Multipole methods
- Problem/solution abstractions
142These are all examples of Generalized N-body
problems
All-NN 2-point 3-point KDE SPH
General theory and toolkit for designing
algorithms for such problems
143For more
- In this conference
- Learning trees
- Monte Carlo for statistical summations
- Large-scale learning workshop
- EMST
- GNPs and MapReduce-like parallelization
- Monte Carlo SVD
- agray_at_cc.gatech.edu