Fast N-Body Algorithms for Massive Datasets - PowerPoint PPT Presentation

About This Presentation
Title:

Fast N-Body Algorithms for Massive Datasets

Description:

... data below each node, we can prune parts of the tree that could NOT ... Pruned! (inclusion) Range-count example. Range-count example. Range-count example ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 144
Provided by: andre521
Learn more at: http://www.siam.org
Category:

less

Transcript and Presenter's Notes

Title: Fast N-Body Algorithms for Massive Datasets


1
Fast N-Body Algorithmsfor Massive Datasets
  • Alexander Gray
  • Georgia Institute of Technology

2
Is science in 2007different from science in 1907?
Instruments
Science, Szalay J. Gray, 2001
3
Is science in 2007different from science in 1907?
Instruments
Data CMB Maps
Science, Szalay J. Gray, 2001
1990 COBE 1,000 2000 Boomerang
10,000 2002 CBI 50,000 2003 WMAP
1 Million 2008 Planck 10 Million
Data Local Redshift Surveys
Data Angular Surveys
1986 CfA 3,500 1996 LCRS 23,000 2003
2dF 250,000 2005 SDSS 800,000
1970 Lick 1M 1990 APM 2M 2005 SDSS
200M 2008 LSST 2B
4
Sloan Digital Sky Survey (SDSS)
5
1 billion objects 144 dimensions (250M galaxies
in 5 colors, 1M 2000-D spectra)
  • Size matters! Now possible
  • low noise subtle patterns
  • global properties and patterns
  • rare objects and patterns
  • more info 3d, deeper/earlier, bands
  • in parallel more accurate simulations
  • 2008 LSST time-varying phenomena

6
Happening everywhere!
microarray chips
Molecular biology
nuclear mag. resonance
Drug discovery
satellite topography
Earth sciences
microprocessor
Physical simulation
functional MRI
fiber optics
Neuroscience
Internet
7
  1. How did galaxies evolve?
  2. What was the early universe like?
  3. Does dark energy exist?
  4. Is our model (GRinflation) right?

Astrophysicist
R. Nichol, Inst. Cosmol. Gravitation A. Connolly,
U. Pitt Physics C. Miller, NOAO R. Brunner,
NCSA G. Kulkarni, Inst. Cosmol. Gravitation D.
Wake, Inst. Cosmol. Gravitation R. Scranton, U.
Pitt Physics M. Balogh, U. Waterloo Physics I.
Szapudi, U. Hawaii Inst. Astronomy G. Richards,
Princeton Physics A. Szalay, Johns Hopkins
Physics
Machine learning/ statistics guy
8
  1. How did galaxies evolve?
  2. What was the early universe like?
  3. Does dark energy exist?
  4. Is our model (GRinflation) right?

Astrophysicist
  • Kernel density estimator
  • n-point spatial statistics
  • Nonparametric Bayes classifier
  • Support vector machine
  • Nearest-neighbor statistics
  • Gaussian process regression
  • Bayesian inference

O(N2)
O(Nn)
R. Nichol, Inst. Cosmol. Grav. A. Connolly, U.
Pitt Physics C. Miller, NOAO R. Brunner, NCSA G.
Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst.
Cosmol. Grav. R. Scranton, U. Pitt Physics M.
Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii
Inst. Astro. G. Richards, Princeton Physics A.
Szalay, Johns Hopkins Physics
O(N2)
O(N2)
O(N2)
O(N3)
O(cDT(N))
Machine learning/ statistics guy
9
  1. How did galaxies evolve?
  2. What was the early universe like?
  3. Does dark energy exist?
  4. Is our model (GRinflation) right?

Astrophysicist
  • Kernel density estimator
  • n-point spatial statistics
  • Nonparametric Bayes classifier
  • Support vector machine
  • Nearest-neighbor statistics
  • Gaussian process regression
  • Bayesian inference

O(N2)
O(Nn)
R. Nichol, Inst. Cosmol. Grav. A. Connolly, U.
Pitt Physics C. Miller, NOAO R. Brunner, NCSA G.
Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst.
Cosmol. Grav. R. Scranton, U. Pitt Physics M.
Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii
Inst. Astro. G. Richards, Princeton Physics A.
Szalay, Johns Hopkins Physics
O(N2)
O(N2)
O(N2)
O(N3)
O(cDT(N))
Machine learning/ statistics guy
10
  1. How did galaxies evolve?
  2. What was the early universe like?
  3. Does dark energy exist?
  4. Is our model (GRinflation) right?

Astrophysicist
  • Kernel density estimator
  • n-point spatial statistics
  • Nonparametric Bayes classifier
  • Support vector machine
  • Nearest-neighbor statistics
  • Gaussian process regression
  • Bayesian inference

O(N2)
O(Nn)
R. Nichol, Inst. Cosmol. Grav. A. Connolly, U.
Pitt Physics C. Miller, NOAO R. Brunner, NCSA G.
Kulkarni, Inst. Cosmol. Grav. D. Wake, Inst.
Cosmol. Grav. R. Scranton, U. Pitt Physics M.
Balogh, U. Waterloo Physics I. Szapudi, U. Hawaii
Inst. Astro. G. Richards, Princeton Physics A.
Szalay, Johns Hopkins Physics
O(N2)
O(N2)
O(N2)
O(N3)
O(cDT(N))
But I have 1 million points
Machine learning/ statistics guy
11
Data The Stack
Apps (User, Science)
Perception Computer Vision, NLP, Machine Translation, Bibleome , Autonomous vehicles
ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy
Data Abstractions DBMS , MapReduce , VOTables ,
Clustering / Threading Programming with 1000s of powerful compute nodes
O/S
Network
Motherboards / Datacenter
ICs
12
Data The Stack
Apps (User, Science)
Perception Computer Vision, NLP, Machine Translation, Bibleome , Autonomous vehicles
ML / Opt Machine Learning / Optimization / Linear Algebra / Privacy
Data Abstractions DBMS , MapReduce , VOTables , Data structures
Clustering / Threading Programming with 1000s of powerful compute nodes
O/S
Network
Motherboards / Datacenter
ICs
13
Making fast algorithms
  • There are many large datasets. There are many
    questions we want to ask them.
  • Why we must not get obsessed with one specific
    dataset.
  • Why we must not get obsessed with one specific
    question.
  • The activity Ill describe is about accerating
    computations which occur commonly across many ML
    methods.

14
Scope
  • Nearest neighbor
  • K-means
  • Hierarchical clustering
  • N-point correlation functions
  • Kernel density estimation
  • Locally-weighted regression
  • Mean shift tracking
  • Mixtures of Gaussians
  • Gaussian process regression
  • Manifold learning
  • Support vector machines
  • Affinity propagation
  • PCA
  • .

15
Scope
  • ML methods with distances underneath
  • Distances only
  • Continuous kernel functions
  • ML methods with counting underneath

16
Scope
  • Computational ideas in this tutorial
  • Data structures
  • Monte Carlo
  • Series expansions
  • Problem/solution abstractions
  • Challenges
  • Dont introduce error, if possible
  • Dont introduce tweak parameters, if possible

17
Two canonical problems
  • Nearest-neighbor search
  • Kernel density estimation

18
Ideas
  1. Data structures and how to use them
  2. Monte Carlo
  3. Series expansions
  4. Problem/solution abstractions

19
Nearest Neighbor - Naïve Approach
  • Given a query point X.
  • Scan through each point Y
  • Calculate the distance d(X,Y)
  • If d(X,Y) lt best_seen then Y is the new nearest
    neighbor.
  • Takes O(N) time for each query!

20
Speeding Up Nearest Neighbor
  • We can speed up the search for the nearest
    neighbor
  • Examine nearby points first.
  • Ignore any points that are further then the
    nearest point found so far.
  • Do this using a KD-tree
  • Tree based data structure
  • Recursively partitions points into axis aligned
    boxes.

21
KD-Tree Construction
Pt X Y
1 0.00 0.00
2 1.00 4.31
3 0.13 2.85

We start with a list of n-dimensional points.
22
KD-Tree Construction
Xgt.5
YES
NO
Pt X Y
1 0.00 0.00
3 0.13 2.85

Pt X Y
2 1.00 4.31

We can split the points into 2 groups by choosing
a dimension X and value V and separating the
points into X gt V and X lt V.
23
KD-Tree Construction
Xgt.5
YES
NO
Pt X Y
1 0.00 0.00
3 0.13 2.85

Pt X Y
2 1.00 4.31

We can then consider each group separately and
possibly split again (along same/different
dimension).
24
KD-Tree Construction
Xgt.5
YES
NO
Ygt.1
Pt X Y
2 1.00 4.31

NO
YES
Pt X Y
3 0.13 2.85

Pt X Y
1 0.00 0.00

We can then consider each group separately and
possibly split again (along same/different
dimension).
25
KD-Tree Construction
We can keep splitting the points in each set to
create a tree structure. Each node with no
children (leaf node) contains a list of points.
26
KD-Tree Construction
We will keep around one additional piece of
information at each node. The (tight) bounds of
the points at or below this node.
27
KD-Tree Construction
  • Use heuristics to make splitting decisions
  • Which dimension do we split along? Widest
  • Which value do we split at? Median of value of
    that split dimension for the points.
  • When do we stop? When there are fewer then m
    points left OR the box has hit some minimum width.

28
Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
29
Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
30
Nearest Neighbor with KD Trees
We traverse the tree looking for the nearest
neighbor of the query point.
31
Nearest Neighbor with KD Trees
Examine nearby points first Explore the branch
of the tree that is closest to the query point
first.
32
Nearest Neighbor with KD Trees
Examine nearby points first Explore the branch
of the tree that is closest to the query point
first.
33
Nearest Neighbor with KD Trees
When we reach a leaf node compute the distance
to each point in the node.
34
Nearest Neighbor with KD Trees
When we reach a leaf node compute the distance
to each point in the node.
35
Nearest Neighbor with KD Trees
Then we can backtrack and try the other branch at
each node visited.
36
Nearest Neighbor with KD Trees
Each time a new closest node is found, we can
update the distance bounds.
37
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
38
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
39
Nearest Neighbor with KD Trees
Using the distance bounds and the bounds of the
data below each node, we can prune parts of the
tree that could NOT include the nearest neighbor.
40
Simple recursive algorithm (k1 case)
NN(xq,R,dlo,xsofar,dsofar) if dlo gt dsofar,
return. if leaf(R), xsofar,dsofarNNBase(xq,
R,dsofar). else, R1,d1,R2,d2orderByDis
t(xq,R.l,R.r). NN(xq,R1,d1,xsofar,dsofar).
NN(xq,R2,d2,xsofar,dsofar).
41
Nearest Neighbor with KD Trees
  • Instead, some animations showing real data
  • kd-tree with cached sufficient statistics
  • nearest-neighbor with kd-trees
  • range-count with kd-trees

For animations, see http//www.cs.cmu.edu/awm/an
imations/kdtree
42
Range-count example
43
Range-count example
44
Range-count example
45
Range-count example
46
Range-count example
Pruned! (inclusion)
47
Range-count example
48
Range-count example
49
Range-count example
50
Range-count example
51
Range-count example
52
Range-count example
53
Range-count example
54
Range-count example
Pruned! (exclusion)
55
Range-count example
56
Range-count example
57
Range-count example
58
Some questions
  • Asymptotic runtime analysis?
  • In a rough sense, O(logN)
  • But only under some regularity conditions
  • How high in dimension can we go?
  • Roughly exponential in intrinsic dimension
  • In practice, in less than 100 dimensions, still
    big speedups

59
Another kind of tree
  • Ball-trees, metric trees
  • Use balls instead of hyperrectangles
  • Can often be more efficient in high dimension
    (though not always)
  • Can work with non-Euclidean metric (you only need
    to respect the triangle inequality)
  • Many non-metric similarity measures can be
    bounded by metric quantities.

60
A Set of Points in a metric space
61
Ball Tree root node
62
A Ball Tree
63
A Ball Tree
64
A Ball Tree
65
A Ball Tree
66
A Ball Tree
  • J. Uhlmann, 1991
  • S. Omohundro, NIPS 1991

67
Ball-trees properties
  • Let Q be any query point and let x be a point
    inside ball B
  • x-Q ? Q - B.center - B.radius
  • x-Q ? Q - B.center B.radius

B.center
Q
x
68
How to build a metric tree, exactly?
  • Must balance quality vs. build-time
  • Anchors hierarchy (farthest-points heuristic,
    2-approx used in OR)
  • Omohundro Five ways to build a ball-tree
  • Which is the best? A research topic

69
Some other trees
  • Cover-tree
  • Provable worst-case O(logN) under an assumption
    (bounded expansion constant)
  • Like a non-binary ball-tree
  • Learning trees
  • In this conference

70
All-type problems
  • Nearest-neighbor search
  • All-nearest neighbor
  • (bichromatic)
  • Kernel density estimation
  • All version
  • (bichromatic)

71
Almost always all-type problems
  • Kernel density estimation
  • Nadaraya-Watson locally-wgtd regression
  • Gaussian process prediction
  • Radial basis function networks

Always all-type problems
  • Monochromatic all-nearest neighbor (e.g. LLE)
  • n-point correlation (n-tuples)

72
Dual-tree idea
  • If all the queries are available simultaneously,
    then it is faster to
  • 1. Build a tree on the queries as well
  • 2. Effectively process the queries in chunks
    rather than individually ? work is shared between
    similar query points

73
Single-tree
74
Single-tree
Dual-tree (symmetric)
75
Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
76
Exclusion and inclusion, using point-node
kd-tree bounds. O(D) bounds on distance
minima/maxima
77
Exclusion and inclusion, using kd-tree node-node
bounds. O(D) bounds on distance minima/maxima
(Analogous to point-node bounds.)
Also needed
Nodewise bounds.
78
Exclusion and inclusion, using kd-tree node-node
bounds. O(D) bounds on distance minima/maxima
(Analogous to point-node bounds.)
Also needed
Nodewise bounds.
79
Single-tree simple recursive algorithm (k1 case)
NN(xq,R,dlo,xsofar,dsofar) if dlo gt dsofar,
return. if leaf(R), xsofar,dsofarNNBase(xq,
R,dsofar). else, R1,d1,R2,d2orderByDis
t(xq,R.l,R.r). NN(xq,R1,d1,xsofar,dsofar).
NN(xq,R2,d2,xsofar,dsofar).
80
Single-tree ? Dual-tree
  • xq ? Q
  • dlo(xq,R) ? dlo(Q,R)
  • xsofar ? xsofar, dsofar ? dsofar
  • store Q.dsofarmaxQdsofar
  • 2-way recursion ? 4-way recursion
  • N x O(logN) ? O(N)

81
Dual-tree simple recursive algorithm (k1)
AllNN(Q,R,dlo,xsofar,dsofar) if dlo gt
Q.dsofar, return. if leaf(Q) leaf(R),
xsofar,dsofarAllNNBase(Q,R,dsofar).
Q.dsofarmaxQdsofar. else if !leaf(Q)
leaf(R), else if leaf(Q) !leaf(R),
else if !leaf(Q) !leaf(R),
R1,d1,R2,d2orderByDist(Q.l,R.l,R.r).
AllNN(Q.l,R1,d1,xsofar,dsofar).
AllNN(Q.l,R2,d2,xsofar,dsofar).
R1,d1,R2,d2orderByDist(Q.r,R.l,R.r).
AllNN(Q.r,R1,d1,xsofar,dsofar).
AllNN(Q.r,R2,d2,xsofar,dsofar). Q.dsofar
max(Q.l.dsofar,Q.r.dsofar).
82
Dual-tree traversal
(depth-first)
Reference points
Query points
83
Dual-tree traversal
Reference points
Query points
84
Dual-tree traversal
Reference points
Query points
85
Dual-tree traversal
Reference points
Query points
86
Dual-tree traversal
Reference points
Query points
87
Dual-tree traversal
Reference points
Query points
88
Dual-tree traversal
Reference points
Query points
89
Dual-tree traversal
Reference points
Query points
90
Dual-tree traversal
Reference points
Query points
91
Dual-tree traversal
Reference points
Query points
92
Dual-tree traversal
Reference points
Query points
93
Dual-tree traversal
Reference points
Query points
94
Dual-tree traversal
Reference points
Query points
95
Dual-tree traversal
Reference points
Query points
96
Dual-tree traversal
Reference points
Query points
97
Dual-tree traversal
Reference points
Query points
98
Dual-tree traversal
Reference points
Query points
99
Dual-tree traversal
Reference points
Query points
100
Dual-tree traversal
Reference points
Query points
101
Dual-tree traversal
Reference points
Query points
102
Dual-tree traversal
Reference points
Query points
103
Meta-idea Higher-order Divide-and-conquer
Generalizes divide-and-conquer of a single set
to divide-and-conquer of multiple sets.
Break each set into pieces. Solving the
sub-parts of the problem and combining these
sub-solutions appropriately might be easier than
doing this over only one set.
104
Ideas
  1. Data structures and how to use them
  2. Monte Carlo
  3. Series expansions
  4. Problem/solution abstractions

105
Characterization of an entire distribution?
2-point correlation
How many pairs have distance lt r ?
2-point correlation function
r
106
The n-point correlation functions
  • Spatial inferences filaments, clusters, voids,
    homogeneity, isotropy, 2-sample testing,
  • Foundation for theory of point processes
    Daley,Vere-Jones 1972, unifies spatial
    statistics Ripley 1976
  • Used heavily in biostatistics, cosmology,
    particle physics, statistical physics

2pcf definition
3pcf definition
107
3-point correlation
Standard model ngt0 terms should be zero!
r1
r2
r3
How many triples have pairwise distances lt r ?
108
How can we count n-tuples efficiently?
How many triples have pairwise distances lt r ?
109
Use n trees!
Gray Moore, NIPS 2000
110
How many valid triangles a-b-c(where
) could there be?
r
countA,B,C ?
A
B
C
111
How many valid triangles a-b-c(where
) could there be?
r
countA,B,C countA,B,C.left countA,B,C.ri
ght
A
B
C
112
How many valid triangles a-b-c(where
) could there be?
r
countA,B,C countA,B,C.left countA,B,C.ri
ght
A
B
C
113
How many valid triangles a-b-c(where
) could there be?
r
A
B
countA,B,C ?
C
114
How many valid triangles a-b-c(where
) could there be?
r
A
B
countA,B,C 0!
C
Exclusion
115
How many valid triangles a-b-c(where
) could there be?
r
A
B
countA,B,C ?
C
116
How many valid triangles a-b-c(where
) could there be?
Inclusion
r
A
B
countA,B,C A x B x C
Inclusion
C
Inclusion
117
Key idea(combinatorial proximity problems)
  • for n-tuples
  • n-tree recursion

118
3-point runtime
(biggest previous 20K) VIRGO simulation
data, N 75,000,000 naïve 5x109 sec.
(150 years) multi-tree 55 sec.
(exact)
n2 O(N) n3 O(Nlog3) n4 O(N2)
119
But
Depends on rD-1.Slow for large radii.
VIRGO simulation data, N 75,000,000 naïve
150 years multi-tree large h 24 hrs
Lets develop a method for large radii.
120
c p T
hard.
known.
EASIER?
no dependence on N! but it does depend on p
121
c p T
no dependence on N! but it does depend on p
122
c p T
no dependence on N! but it does depend on p
123
c p T
no dependence on N! but it does depend on p
124
c p T
no dependence on N! but it does depend on p
125
c p T
This is junk dont bother
126
c p T
This is promising
127
  • Basic idea
  • 1. Remove some junk
  • (Run exact algorithm for a while)
  • ? make p larger
  • 2. Sample from the rest

128
Get disjoint sets from the recursion tree
all possible n-tuples

prune
129
Now do stratified sampling



130
Speedup Results
VIRGO simulation data N 75,000,000 naïve
150 years multi-tree large h 24 hrs
multi-tree monte carlo 99 confidence 96
sec
131
Ideas
  1. Data structures and how to use them
  2. Monte Carlo
  3. Multipole methods
  4. Problem/solution abstractions

132
Kernel density estimation
133
How to use a tree1. How to approximate?2. When
to approximate?
  • Barnes and Hut, Science, 1987

R
s
r
if
134
How to use a tree3. How to know potential error?
  • Lets maintain bounds on the true kernel sum

R
At the beginning
135
How to use a tree4. How to do all problem?
Single-tree
Dual-tree (symmetric)
Gray Moore 2000
136
How to use a tree4. How to do all problem?
R
Q
s
rR
rQ
if
  • Generalizes Barnes-Hut to dual-tree

137
BUT
We have a tweak parameter
Case 1 alg. gives no error bounds Case 2 alg.
gives error bounds, but must be rerun Case 3
alg. automatically achieves error tolerance
So far we have case 2 lets try for case 3
Lets try to make an automatic stopping rule
138
Finite-difference function approximation.
Taylor expansion
Gregory-Newton finite form
139
Finite-difference function approximation.
assumes monotonic decreasing kernel
approximate Q,R if

140
Speedup Results (KDE)
dual- N naïve tree
12.5K 7 .12
25K 31 .31
50K 123 .46
100K 494 1.0
200K 1976 2
400K 7904 5
800K 31616 10
1.6M 35 hrs 23
One order-of-magnitude speedup over single-tree
at 2M points
5500x
141
Ideas
  1. Data structures and how to use them
  2. Monte Carlo
  3. Multipole methods
  4. Problem/solution abstractions

142
These are all examples of Generalized N-body
problems
All-NN 2-point 3-point KDE SPH
General theory and toolkit for designing
algorithms for such problems
143
For more
  • In this conference
  • Learning trees
  • Monte Carlo for statistical summations
  • Large-scale learning workshop
  • EMST
  • GNPs and MapReduce-like parallelization
  • Monte Carlo SVD
  • agray_at_cc.gatech.edu
Write a Comment
User Comments (0)
About PowerShow.com