Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations

Description:

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 66
Provided by: gridsUcs
Category:

less

Transcript and Presenter's Notes

Title: Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations


1
Parallel Programming for Particle Dynamics and
Systems of Ordinary Differential Equations
  • Spring Semester 2005
  • Geoffrey Fox
  • Community Grids Laboratory
  • Indiana University
  • 505 N Morton
  • Suite 224
  • Bloomington IN
  • gcf_at_indiana.edu

2
Abstract of Parallel Programming for Particle
Dynamics
  • This uses the simple O(N2) Particle Dynamics
    Problem as a motivator to discuss solution of
    ordinary differential equations
  • We discuss Euler, Runge Kutta and predictor
    corrector methods
  • Various Message parallel O(N2) algorithms are
    described with performance comments
  • The initial discussion covers the type of
    problems that use these algorithms
  • Advanced topics include fast multipole methods
    and their application to earthquake simulations

3
Classes of Physical Simulations
  • Mathematical (Numerical) formulations of
    simulations fall into a few key classes which
    have their own distinctive algorithmic and
    parallelism issues
  • Most common formalism is that of a field theory
    where quantities of interest are represented by
    densities defined over a 1,2,3 or 4 dimensional
    space.
  • Such a description could be fundamental as in
    electromagnetism or relativity for gravitational
    field or approximate as in CFD where a fluid
    density averages over a particle description.
  • Our Laplace example is of this form where field ?
    could either be fundamental (as in
    electrostatics) or approximate if comes from
    Euler equations for CFD

4
Applications reducing to Coupled set of Ordinary
Differential Equations
  • Another set of models of physical systems
    represent them as coupled set of discrete
    entities evolving over time
  • Instead of ?(x,t) one gets ?i(t) labeled by an
    index i
  • Discretizing x in continuous case leads to
    discrete case but in many cases, discrete
    formulation is fundamental
  • Within coupled discrete system class, one has two
    important approaches
  • Classic time stepped simulations -- loop over all
    i at fixed t updating to
  • Discrete event simulations -- loop over all
    events representing changes of state of ?i(t)

5
Particle Dynamics or Equivalent Problems
  • Particles are sets of entities -- sometimes fixed
    (atoms in a crystal) or sometimes moving
    (galaxies in a universe)
  • They are characterized by force Fij on particle i
    due to particle j
  • Forces are characterized by their range r
    Fij(xi,xj) is zero if distance xi-xj greater
    than r
  • Examples
  • The universe
  • A globular star cluster
  • The atoms in a crystal vibrating under
    interatomic forces
  • Molecules in a protein rotating and flexing under
    interatomic force
  • Laws of Motion are typically ordinary
    differential equations
  • Ordinary means differentiate wrt one variable --
    typically time

6
Classes of Particle Problems
  • If the range r is small (as in a crystal), the
    one gets numerical formulations and parallel
    computing considerations similar to those in
    Laplace example with local communication
  • We showed in Laplace module that efficiency
    increases as range of force increases
  • If r is infinite ( no cut-off for force) as in
    gravitational problem, one finds rather different
    issues which we will discuss in this module
  • There are several non-particle problems
    discussed later that reduce to long range force
    problem characterized by every entity interacting
    with every other entity
  • Characterized by a calculation where updating
    entity i involves all other entities j

7
Circuit Simulations I
  • An electrical or electronic network has the same
    structure as a particle problem where particles
    are components (transistor, resistance,
    inductance etc.) and force between components i
    and j is nonzero if and only if i and j are
    linked in the circuit
  • For simulations of electrical transmission
    networks (the electrical grid), one would
    naturally use classic time stepped simulations
    updating each component i from state at time t to
    state at time t?t.
  • If one is simulating something like a chip, then
    time stepped approach is very wasteful as 99.99
    of the components are doing nothing (i.e. remain
    in same state) at any given time step!
  • Here is where discrete event simulations (DES)
    are useful as one only computes where the action
    is
  • Biological simulations often are formulated as
    networks where each component (say a neuron or a
    cell) is described by an ODE and the network
    couples components

8
Circuit Simulations II
  • Discrete Event Simulations are clearly preferable
    on sequential machines but parallel algorithms
    are hard due to need for dynamic load balancing
    (events are dynamic and not uniform throughout
    system) and synchronization (which events can be
    executed in parallel?)
  • There are several important approaches to DES of
    which best known is Time Warp method originally
    proposed by David Jefferson -- here one
    optimistically executes events in parallel and
    rolls back to an earlier state if this is found
    to be inconsistent
  • Conservative methods (only execute those events
    you are certain cannot be impacted by earlier
    events) have little paralellism
  • e.g. there is only one event with lowest global
    time
  • DES do not exhibit the classic loosely
    synchronous compute-communicate structure as
    there is no uniform global time
  • typically even with time warp, no scalable
    parallelism

9
Discrete Event Simulations
  • Suppose we try to execute in parallel events E1
    and E2 at times t1 and t2 with t1lt t2.
  • We show the timelines of several(4) objects in
    the system and our two events E1 and E2
  • If E1 generates no interfering events or one E12
    at a time greater than t2 then our parallel
    execution of E2 is consistent
  • However if E1 generates E12 before t2 then
    execution of E2 has to be rolled back and E12
    should be executed first

E1
E21
E11
Objects in System
E22
E12
E12
Time
E2
10
Matrices and Graphs I
  • Especially in cases where the force is linear
    in the ?i(t) , it is convenient to think of force
    being specified by a matrix M whose elements mij
    are nonzero if and only if the force between i
    and j is nonzero. A typical force law is Fi ?
    mij ?j(t)
  • In Laplace Equation example, the matrix M is
    sparse ( most elements are zero) and this is a
    specially common case where one can and needs to
    develop efficient algorithms
  • We discuss in another talk the matrix formulation
    in the case of partial differential solvers

11
Matrices and Graphs II
  • Another way of looking at these problems is as
    graphs G where the nodes of the graphs are
    labeled by the particles i, and one has edges
    linking i to j if and only if the force Fij is
    non zero
  • In these languages, long range force problems
    correspond to dense matrix M (all elements
    nonzero) and fully connected graphs G

10
1
3
4
7
11
9
5
2
6
8
12
12
Other N-Body Like Problems - I
  • The characteristic structure of N-body problem is
    an observable that depends on all pairs of
    entities from a set of N entities.
  • This structure is seen in diverse applications
  • 1) Look at a database of items and calculate some
    form of correlation between all pairs of database
    entries
  • 2) This was first used in studies of measurements
    of a "chaotic dynamical system" with points xi
    which are vectors of length m Put rij distance
    between xi and xj in m dimensional space Then
    probability p(rij r) is proportional to r(d-1)
  • where d (not equal to m) is dynamical dimension
    of system
  • calculate by forming all the rij (for i and j
    running over observable points from our system --
    usually a time series) and accumulating in a
    histogram of bins in r
  • Parallel algorithm in a nutshell Store
    histograms replicated in all processors,
    distribute vectors equally in each processor and
    just pipeline xj through processors and as they
    pass through accumulate rij add histograms
    together at end.

13
Other N-Body Like Problems - II
  • 3) Green's Function Approach to simple Partial
    Differential equations gives solutions as
    integrals of known Green's functions times
    "source" or "boundary" terms.
  • For the simulation of earthquakes in GEM project
    the source terms are strains in faults and the
    stresses in any fault segment are the integral
    over strains in all other segments
  • Compared to particle dynamics, Force law replaced
    by Green's function but in each case total
    stress/Force is sum over contributions associated
    with other entities in formulation
  • 4) In the so called vortex method in CFD
    (Computational Fluid Dynamics) one models the
    Navier Stokes Equation as the long range
    interactions between entities which are the
    vortices
  • 5) Chemistry uses molecular dynamics and so
    particles are molecules but force is not Newton's
    laws usually but rather Van der Waals forces
    which are long range but fall off faster than 1/r2

14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Eulers Method for ODEs
  • Eulers method involves a simple linear
    extrapolation to proceed from point to point with
    the ODE providing the slope
  • Grid ti t0 hi and X0 Initial Value
  • Xi1 Xi h f(ti,Xi)
  • In practice not used as not accurate enough

19
Estimate of Local Error in Eulers Method
  • Let X(ti) be numerical solution and Y(ti) the
    exact solution. We have Taylor Expansion
  • And by assumption X(ti) Y(ti) and in Eulers
    method we use exact derivative at ti. ThusAnd
    so X(tih) Y(tih) O(h2) and so local error
    is O(h2)
  • Accumulating this error over 1/h steps gives a
    global error of order h

20
Relationship of Error to Computational Approach
  • Whenever f satisfies certain smoothness
    conditions, there is always a sufficiently small
    step size h such that the difference between the
    real function value Yi1 at ti1 and the
    approximation Xi1 is less than some required
    error magnitude ?. e.g. Burden and Faires
  • Euler's method very quick as one computation of
    the derivative function f at each step.
  • Other methods require more computation per step
    size h but can produce the specified error ? with
    fewer steps as can use a larger value of h. Thus
    Eulers method is not best.

21
Simple Example of Eulers Equation
  • The CSEP Book http//csep.hpcc.nectec.or.th/CSEP/
    has lots of useful material. Consider simple
    example from there
  • With the trivial and uninteresting (as step size
    too large) choice h0.25, we get

And so
22
(No Transcript)
23
Whats wrong with Eulers Method?
  • To extrapolate from ti to ti1 one should
    obviously best use (if just a straight line) the
    average slope (of tangent) and NOT the value at
    start
  • This immediately gives an error smaller by factor
    h but isnt trivial because you dont seem to
    know f(ti0.5h,X(ti0.5h))

24
Runge-Kutta Methods Modified Euler I
  • In the Runge Kutta methods, one uses intermediate
    values to calculate such midpoint derivatives
  • Key idea is that use an approximation for
    X(ti0.5h) as this is an argument of f which is
    multiplied by h. Thus error is (at least) one
    factor of h better than approximation
  • So if one wishes just to do one factor of h
    better than Euler, one can use Euler to estimate
    value of X(ti0.5h)

25
Runge-Kutta Methods Modified Euler II
  • Midpoint Method is ?1 f(ti,X(ti)) ?2
    f(ti0.5h,X(ti0.5h)) Xi1 Xi h ?2

Global Error is O(h2))Second Order Method
?2 Approximate Tangent at Midpoint
?1 Tangentat ti
Exact Yi1
Midpoint Approximation Xi1
Xi Yi
Approximate Tangent at Midpoint through ti
ti
ti1
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Time Discretization
  • In ordinary differential equations, one focuses
    on different ways of represent d?i(t)/dt in some
    difference form such as Eulers form
    d?i(t)/dt (?i(t ?t) - ?i(t)) / ?t
  • Or more accurate Runge-Kutta and other formulae
  • The same time discretization methods are
    applicable to (time derivatives in) partial
    differential equations involving time and are
    used to approximate ??i(x,t) / ?t

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
The Mathematical Problem
  • We have 3 long vectors of size N
  • X X1, X2, X3, XN
  • V V1, V2, V3, VN
  • M M1, M2, M3, MN is constant
  • Critical Equation isdVi / dt Sum over all
    other particles j of the force due to j th
    particle which is (Xj Xi)/(distance between i
    and j )3 which we term Grav(X,M)
  • We call the message parallel (MPI) implementation
    of the calculation of Grav the function MPGrav

35
The Problem
  • D(X,V)/dt Function of X,V, M
  • Where each of X, V and M are N dimensional arrays
  • And there some scalars such as Gravitational
    Constant

36
Essential Ideas
  • The array size N is from 100,000 (Chemistry) to
    100,000,000 (cosmology)
  • In simplest case of Eulers method
  • X(tdt) X(t) h V(t)
  • V(tdt) V(t) h Grav(Vectors X and M)
  • Runge-Kutta is more complicated but this doesnt
    effect essential issues for parallelism
  • The X equation needs to be calculated for each
    component i it takes time of order N
  • The V equation also needs to be calculated for
    each component i but now each component is
    complicted and needs a computation itself of O(N)
    total time is of order N2 and dominates

37
Form of parallel Computation
  • Computation of numerical method is inherently
    iterative at each time step, the solution
    depends on the immediately preceding one.
  • At each time step, MPGrav is called (several
    times as using Runge Kutta)
  • For each particle i, one must sum over the forces
    due to all other particles j.
  • This requires communication as information about
    other particles are stored in other processors
  • We will use 4th order Runge Kutta to integrate in
    time and the program is designed around an
    overall routine looping over time with
    parallelism hidden in MPGrav routines and so
    identical between parallel and sequential versions

38
(No Transcript)
39
Status of Parallelism in Various N Body Cases
  • Data Parallel approach is really only useful for
    the simple O(N2) case and even here it is quite
    tricky to express algorithm so that it is
  • both Space Efficient and
  • Captures factor of 2 savings from Newton's law of
    action and reaction Fij - Fji
  • We have discussed these issues in a different
    foilset
  • The shared memory approach is effective for a
    modest number of processors in both algorithms.
  • It is only likely to scale properly for O(N2)
    case as the compiler will find it hard to capture
    clever ideas needed to make fast multipole
    efficient
  • Message Parallel approach gives you very
    efficient algorithms in both O(N2) and O(NlogN)
  • O(N2) case has very low communication
  • O(NlogN) has quite low communication

40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
Relation to General Speed Up and Efficiency
Analysis
  • We discussed in case of Jacobi example, that in
    many problems there is an elegant formulafcomm
    constant . tcomm/(n1/d tfloat)
  • d is system information dimension which is equal
    to geometric dimension in problems like Jacobi
    Iteration where communication is a surface and
    calculation a volume effect
  • This geometric type formula comes in any case
    where we are solving partial differential
    equation by local algorithm (locality corresponds
    to geometry)

52
(No Transcript)
53
(No Transcript)
54
Factor of 2 in N-body Algorithm
  • Symmetry of force on particles Fij -Fji
    (Newton's Law of Action and Reaction!)
  • Only half need to be computed
  • Sequentially this can easily be taken into
    account for one just does loops with sum over
    particles i, and then sums for force calculation
    over particles j lt i and then calculate
    algebraic form of Fij and then accumulates --
    Force on i is incremented by Fij and Force on j
    is decremented by Fij
  • Parallel version has two issues
  • Firstly one cannot use "owner-computes" rule
    directly as typically i and j are stored in
    different processors and one must calculate Fij
    in wrong processor for either i or j (in
    algorithm above the force on the particle with
    the larger index is calculated in wrong
    processor)
  • Secondly one must worry about load balancing as
    processors responsible for particles with low
    numbered particles will have more work

55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
Best Message Parallel N Body Algorithm - III
  • So owners compute rule is preserved as the home
    for particle i accumulates the force on I and
    calculates its acceleration and change in
    velocity and position
  • Ignoring blocking, we send a message for each
    particle around in a pipeline that visits each
    other processor
  • Message contains the mass, position and the
    off-processor contribution to force that is
    incremented as message travels through the
    processors
  • Message returns to the home processor for each
    particle and the travelling force is added into
    the part of the force that had been accumulated
    in home processor.

60
Best Message Parallel N Body Algorithm - IV
  • Key characteristic of algorithm are rotating
    messages containing data about and updating force
    for each particle
  • Essential idea for efficiency is that when a
    message arrives in a new processor, it is used to
    update all particles that it should in that
    processor so message visits each processor once
    and once only
  • On average message only updates 50 of particles
    in each new processor
  • Load balancing requires some sort of scattered
    decomposition so each processor has as many high
    and lo numbered particles
  • Insensitivity to message latency requires that we
    block messages and send several at a time
  • This algorithm is too hard to be found by a
    compiler unless you build it in.

61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com