Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations

About This Presentation

Title:

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations

Description:

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 66

Provided by: gridsUcs

Learn more at: http://grids.ucs.indiana.edu

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations

1
Parallel Programming for Particle Dynamics and
Systems of Ordinary Differential Equations

Spring Semester 2005
Geoffrey Fox
Community Grids Laboratory
Indiana University
505 N Morton
Suite 224
Bloomington IN
gcf_at_indiana.edu

2
Abstract of Parallel Programming for Particle
Dynamics

This uses the simple O(N2) Particle Dynamics
Problem as a motivator to discuss solution of
ordinary differential equations
We discuss Euler, Runge Kutta and predictor
corrector methods
Various Message parallel O(N2) algorithms are
described with performance comments
The initial discussion covers the type of
problems that use these algorithms
Advanced topics include fast multipole methods
and their application to earthquake simulations

3
Classes of Physical Simulations

Mathematical (Numerical) formulations of
simulations fall into a few key classes which
have their own distinctive algorithmic and
parallelism issues
Most common formalism is that of a field theory
where quantities of interest are represented by
densities defined over a 1,2,3 or 4 dimensional
space.
Such a description could be fundamental as in
electromagnetism or relativity for gravitational
field or approximate as in CFD where a fluid
density averages over a particle description.
Our Laplace example is of this form where field ?
could either be fundamental (as in
electrostatics) or approximate if comes from
Euler equations for CFD

4
Applications reducing to Coupled set of Ordinary
Differential Equations

Another set of models of physical systems
represent them as coupled set of discrete
entities evolving over time
Instead of ?(x,t) one gets ?i(t) labeled by an
index i
Discretizing x in continuous case leads to
discrete case but in many cases, discrete
formulation is fundamental
Within coupled discrete system class, one has two
important approaches
Classic time stepped simulations -- loop over all
i at fixed t updating to
Discrete event simulations -- loop over all
events representing changes of state of ?i(t)

5
Particle Dynamics or Equivalent Problems

Particles are sets of entities -- sometimes fixed
(atoms in a crystal) or sometimes moving
(galaxies in a universe)
They are characterized by force Fij on particle i
due to particle j
Forces are characterized by their range r
Fij(xi,xj) is zero if distance xi-xj greater
than r
Examples
The universe
A globular star cluster
The atoms in a crystal vibrating under
interatomic forces
Molecules in a protein rotating and flexing under
interatomic force
Laws of Motion are typically ordinary
differential equations
Ordinary means differentiate wrt one variable --
typically time

6
Classes of Particle Problems

If the range r is small (as in a crystal), the
one gets numerical formulations and parallel
computing considerations similar to those in
Laplace example with local communication
We showed in Laplace module that efficiency
increases as range of force increases
If r is infinite ( no cut-off for force) as in
gravitational problem, one finds rather different
issues which we will discuss in this module
There are several non-particle problems
discussed later that reduce to long range force
problem characterized by every entity interacting
with every other entity
Characterized by a calculation where updating
entity i involves all other entities j

7
Circuit Simulations I

An electrical or electronic network has the same
structure as a particle problem where particles
are components (transistor, resistance,
inductance etc.) and force between components i
and j is nonzero if and only if i and j are
linked in the circuit
For simulations of electrical transmission
networks (the electrical grid), one would
naturally use classic time stepped simulations
updating each component i from state at time t to
state at time t?t.
If one is simulating something like a chip, then
time stepped approach is very wasteful as 99.99
of the components are doing nothing (i.e. remain
in same state) at any given time step!
Here is where discrete event simulations (DES)
are useful as one only computes where the action
is
Biological simulations often are formulated as
networks where each component (say a neuron or a
cell) is described by an ODE and the network
couples components

8
Circuit Simulations II

Discrete Event Simulations are clearly preferable
on sequential machines but parallel algorithms
are hard due to need for dynamic load balancing
(events are dynamic and not uniform throughout
system) and synchronization (which events can be
executed in parallel?)
There are several important approaches to DES of
which best known is Time Warp method originally
proposed by David Jefferson -- here one
optimistically executes events in parallel and
rolls back to an earlier state if this is found
to be inconsistent
Conservative methods (only execute those events
you are certain cannot be impacted by earlier
events) have little paralellism
e.g. there is only one event with lowest global
time
DES do not exhibit the classic loosely
synchronous compute-communicate structure as
there is no uniform global time
typically even with time warp, no scalable
parallelism

9
Discrete Event Simulations

Suppose we try to execute in parallel events E1
and E2 at times t1 and t2 with t1lt t2.
We show the timelines of several(4) objects in
the system and our two events E1 and E2
If E1 generates no interfering events or one E12
at a time greater than t2 then our parallel
execution of E2 is consistent
However if E1 generates E12 before t2 then
execution of E2 has to be rolled back and E12
should be executed first

E1
E21
E11
Objects in System
E22
E12
E12
Time
E2
10
Matrices and Graphs I

Especially in cases where the force is linear
in the ?i(t) , it is convenient to think of force
being specified by a matrix M whose elements mij
are nonzero if and only if the force between i
and j is nonzero. A typical force law is Fi ?
mij ?j(t)
In Laplace Equation example, the matrix M is
sparse ( most elements are zero) and this is a
specially common case where one can and needs to
develop efficient algorithms
We discuss in another talk the matrix formulation
in the case of partial differential solvers

11
Matrices and Graphs II

Another way of looking at these problems is as
graphs G where the nodes of the graphs are
labeled by the particles i, and one has edges
linking i to j if and only if the force Fij is
non zero
In these languages, long range force problems
correspond to dense matrix M (all elements
nonzero) and fully connected graphs G

10
1
3
4
7
11
9
5
2
6
8
12
12
Other N-Body Like Problems - I

The characteristic structure of N-body problem is
an observable that depends on all pairs of
entities from a set of N entities.
This structure is seen in diverse applications
1) Look at a database of items and calculate some
form of correlation between all pairs of database
entries
2) This was first used in studies of measurements
of a "chaotic dynamical system" with points xi
which are vectors of length m Put rij distance
between xi and xj in m dimensional space Then
probability p(rij r) is proportional to r(d-1)
where d (not equal to m) is dynamical dimension
of system
calculate by forming all the rij (for i and j
running over observable points from our system --
usually a time series) and accumulating in a
histogram of bins in r
Parallel algorithm in a nutshell Store
histograms replicated in all processors,
distribute vectors equally in each processor and
just pipeline xj through processors and as they
pass through accumulate rij add histograms
together at end.

13
Other N-Body Like Problems - II

3) Green's Function Approach to simple Partial
Differential equations gives solutions as
integrals of known Green's functions times
"source" or "boundary" terms.
For the simulation of earthquakes in GEM project
the source terms are strains in faults and the
stresses in any fault segment are the integral
over strains in all other segments
Compared to particle dynamics, Force law replaced
by Green's function but in each case total
stress/Force is sum over contributions associated
with other entities in formulation
4) In the so called vortex method in CFD
(Computational Fluid Dynamics) one models the
Navier Stokes Equation as the long range
interactions between entities which are the
vortices
5) Chemistry uses molecular dynamics and so
particles are molecules but force is not Newton's
laws usually but rather Van der Waals forces
which are long range but fall off faster than 1/r2

14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Eulers Method for ODEs

Eulers method involves a simple linear
extrapolation to proceed from point to point with
the ODE providing the slope
Grid ti t0 hi and X0 Initial Value
Xi1 Xi h f(ti,Xi)
In practice not used as not accurate enough

19
Estimate of Local Error in Eulers Method

Let X(ti) be numerical solution and Y(ti) the
exact solution. We have Taylor Expansion
And by assumption X(ti) Y(ti) and in Eulers
method we use exact derivative at ti. ThusAnd
so X(tih) Y(tih) O(h2) and so local error
is O(h2)
Accumulating this error over 1/h steps gives a
global error of order h

20
Relationship of Error to Computational Approach

Whenever f satisfies certain smoothness
conditions, there is always a sufficiently small
step size h such that the difference between the
real function value Yi1 at ti1 and the
approximation Xi1 is less than some required
error magnitude ?. e.g. Burden and Faires
Euler's method very quick as one computation of
the derivative function f at each step.
Other methods require more computation per step
size h but can produce the specified error ? with
fewer steps as can use a larger value of h. Thus
Eulers method is not best.

21
Simple Example of Eulers Equation

The CSEP Book http//csep.hpcc.nectec.or.th/CSEP/
has lots of useful material. Consider simple
example from there
With the trivial and uninteresting (as step size
too large) choice h0.25, we get

And so
22
(No Transcript)
23
Whats wrong with Eulers Method?

To extrapolate from ti to ti1 one should
obviously best use (if just a straight line) the
average slope (of tangent) and NOT the value at
start
This immediately gives an error smaller by factor
h but isnt trivial because you dont seem to
know f(ti0.5h,X(ti0.5h))

24
Runge-Kutta Methods Modified Euler I

In the Runge Kutta methods, one uses intermediate
values to calculate such midpoint derivatives
Key idea is that use an approximation for
X(ti0.5h) as this is an argument of f which is
multiplied by h. Thus error is (at least) one
factor of h better than approximation
So if one wishes just to do one factor of h
better than Euler, one can use Euler to estimate
value of X(ti0.5h)

25
Runge-Kutta Methods Modified Euler II

Midpoint Method is ?1 f(ti,X(ti)) ?2
f(ti0.5h,X(ti0.5h)) Xi1 Xi h ?2

Global Error is O(h2))Second Order Method
?2 Approximate Tangent at Midpoint
?1 Tangentat ti
Exact Yi1
Midpoint Approximation Xi1
Xi Yi
Approximate Tangent at Midpoint through ti
ti
ti1
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Time Discretization

In ordinary differential equations, one focuses
on different ways of represent d?i(t)/dt in some
difference form such as Eulers form
d?i(t)/dt (?i(t ?t) - ?i(t)) / ?t
Or more accurate Runge-Kutta and other formulae
The same time discretization methods are
applicable to (time derivatives in) partial
differential equations involving time and are
used to approximate ??i(x,t) / ?t

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
The Mathematical Problem

We have 3 long vectors of size N
X X1, X2, X3, XN
V V1, V2, V3, VN
M M1, M2, M3, MN is constant
Critical Equation isdVi / dt Sum over all
other particles j of the force due to j th
particle which is (Xj Xi)/(distance between i
and j )3 which we term Grav(X,M)
We call the message parallel (MPI) implementation
of the calculation of Grav the function MPGrav

35
The Problem

D(X,V)/dt Function of X,V, M
Where each of X, V and M are N dimensional arrays
And there some scalars such as Gravitational
Constant

36
Essential Ideas

The array size N is from 100,000 (Chemistry) to
100,000,000 (cosmology)
In simplest case of Eulers method
X(tdt) X(t) h V(t)
V(tdt) V(t) h Grav(Vectors X and M)
Runge-Kutta is more complicated but this doesnt
effect essential issues for parallelism
The X equation needs to be calculated for each
component i it takes time of order N
The V equation also needs to be calculated for
each component i but now each component is
complicted and needs a computation itself of O(N)
total time is of order N2 and dominates

37
Form of parallel Computation

Computation of numerical method is inherently
iterative at each time step, the solution
depends on the immediately preceding one.
At each time step, MPGrav is called (several
times as using Runge Kutta)
For each particle i, one must sum over the forces
due to all other particles j.
This requires communication as information about
other particles are stored in other processors
We will use 4th order Runge Kutta to integrate in
time and the program is designed around an
overall routine looping over time with
parallelism hidden in MPGrav routines and so
identical between parallel and sequential versions

38
(No Transcript)
39
Status of Parallelism in Various N Body Cases

Data Parallel approach is really only useful for
the simple O(N2) case and even here it is quite
tricky to express algorithm so that it is
both Space Efficient and
Captures factor of 2 savings from Newton's law of
action and reaction Fij - Fji
We have discussed these issues in a different
foilset
The shared memory approach is effective for a
modest number of processors in both algorithms.
It is only likely to scale properly for O(N2)
case as the compiler will find it hard to capture
clever ideas needed to make fast multipole
efficient
Message Parallel approach gives you very
efficient algorithms in both O(N2) and O(NlogN)
O(N2) case has very low communication
O(NlogN) has quite low communication

40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
Relation to General Speed Up and Efficiency
Analysis

We discussed in case of Jacobi example, that in
many problems there is an elegant formulafcomm
constant . tcomm/(n1/d tfloat)
d is system information dimension which is equal
to geometric dimension in problems like Jacobi
Iteration where communication is a surface and
calculation a volume effect
This geometric type formula comes in any case
where we are solving partial differential
equation by local algorithm (locality corresponds
to geometry)

52
(No Transcript)
53
(No Transcript)
54
Factor of 2 in N-body Algorithm

Symmetry of force on particles Fij -Fji
(Newton's Law of Action and Reaction!)
Only half need to be computed
Sequentially this can easily be taken into
account for one just does loops with sum over
particles i, and then sums for force calculation
over particles j lt i and then calculate
algebraic form of Fij and then accumulates --
Force on i is incremented by Fij and Force on j
is decremented by Fij
Parallel version has two issues
Firstly one cannot use "owner-computes" rule
directly as typically i and j are stored in
different processors and one must calculate Fij
in wrong processor for either i or j (in
algorithm above the force on the particle with
the larger index is calculated in wrong
processor)
Secondly one must worry about load balancing as
processors responsible for particles with low
numbered particles will have more work

55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
Best Message Parallel N Body Algorithm - III

So owners compute rule is preserved as the home
for particle i accumulates the force on I and
calculates its acceleration and change in
velocity and position
Ignoring blocking, we send a message for each
particle around in a pipeline that visits each
other processor
Message contains the mass, position and the
off-processor contribution to force that is
incremented as message travels through the
processors
Message returns to the home processor for each
particle and the travelling force is added into
the part of the force that had been accumulated
in home processor.

60
Best Message Parallel N Body Algorithm - IV

Key characteristic of algorithm are rotating
messages containing data about and updating force
for each particle
Essential idea for efficiency is that when a
message arrives in a new processor, it is used to
update all particles that it should in that
processor so message visits each processor once
and once only
On average message only updates 50 of particles
in each new processor
Load balancing requires some sort of scattered
decomposition so each processor has as many high
and lo numbered particles
Insensitivity to message latency requires that we
block messages and send several at a time
This algorithm is too hard to be found by a
compiler unless you build it in.