Title: Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms
1Efficient Parallel Implementation of Molecular
Dynamics with Embedded Atom Method on Multi-core
Platforms
- Reporter Jilin Zhang
- AuthorsChangjun Hu, Yali Liu, and Jianjiang Li
- Information Engineering School, University of
Science and Technology Beijing, Beijing, P.R.China
2Outline
- 1 Motivation
- 2 Related Works
- 3 Spatial Decomposition Coloring (SDC) Approach
- 4 Short-Range Forces Calculations of EAM using
SDC method - 5 Experiments and Discussion
- 6 Conclusion and Future Directions
31 Motivation
- The process of molecular dynamics simulations
set init_state
Fig. 1 the process of molecular dynamics
simulations.
41 Motivation
- the intensive computation appears in short-range
force calculations procedure of MD simulations - Neighbor-list method decreases the intensive
computation largely. It make each atom only
interacts with atoms in its neighbor region. - Newtons third law can have the force
computations. And it brings the reduction
operations on irregular arrays
Fig. 2 codes of force caluclations.
52 Related Works --- parallel reduction
operations on irregular arrays
- Some types of solutions
- enclosing reduction operation in a critical
section - privating the reduction array
- using redundant computations strategy
62 Related Works --- parallel reduction
operations on irregular arrays
- enclosing reduction operation in a critical
section - create a critical section in inner loop
- straight and easy to implement parallelization.
- high synchronization cost arose by critical
region, atomic or lock involved in inner loop
72 Related Works --- parallel reduction
operations on irregular arrays
- private the reduction array
- each thread have to update share array in
critical region according the value of its
private array - it reduce times of entering into critical region
and reduce synchronization cost. - high memory overhead of private array
- limit number of particles allowed in simulations
- compete for cache space and decrease program
speed
82 Related Works --- parallel reduction
operations on irregular arrays
- redundant computations strategy
- does not use Newtons third law. So each pair
interaction has to be calculated twice. - the high parallelizability since data dependence
has been removed between the loop iterations - there are double computations and that neighbor
list requires more memory space.
93 Spatial Decomposition Coloring (SDC) Approach
- Spatial Decomposition (SD) method
- distributed memory multi-processors involving
several hundreds of processors - change all array declarations and all loop
bounds, and explicitly codes the periodic
transfer of the boundary data between processors.
- It is simple to implement SD in OpenMP.
103 Spatial Decomposition Coloring (SDC) Approach
- SD method places a restriction on parallelism in
OpenMP. - synchronization will be required to ensure that
multiple threads do not attempt to update the
same atom simultaneously.
Fig. 3 SD method.
113 Spatial Decomposition Coloring (SDC) Approach
- SDC method
- SDC method consists of the following steps
- Step 1) Split domain
- Step 2) Coloring subdomains
- Step 3) Parallel Computing
123 Spatial Decomposition Coloring (SDC) Approach
- SDC method
- SDC method consists of the following steps
- Step 1) Split domain
- Split the spatial domain into subdomains.
- Length of a subdomain must be longer than
diameter. - Number of subdomains in dimension decomposed
should be even.
133 Spatial Decomposition Coloring (SDC) Approach
- SDC method
- SDC method consists of the following steps
- Step 2) Coloring subdomains
- The number of subdomains with each color must be
equal - each subdomain is surrounded only by those
subdomains with different colors.
143 Spatial Decomposition Coloring (SDC) Approach
- SDC method
- SDC method consists of the following steps
- Step 3) Parallel Computing
- Calculations of forces on subdomains with one
color can be run in parallel. - a barrier should be given for waiting all threads
to complete computation on this color. - Calculations on subdomains with different colors
must run in a serial fashion.
153 Spatial Decomposition Coloring (SDC) Approach
- SDC method
- advantage
- neighbor list usually doesnt be updated in every
time-step ?Cost of SDC method is very lowest. - higher-dimensional decomposition method creates
more subdomains. ?scalable and suitable on
multi-core and many-core architectures. - disadvantage
- Spatial Decomposition method ?Overload imbalance
- ? under condition of simulation system has
uniformity of density
164 Short-Range Forces Calculations of EAM using
SDC method
- EAM method
- short-range forces
- the intensive computation
- three computational phases
- the most time consuming parts are 1 and 3
Fig. 4 short-range forces in EAM method.
174 Short-Range Forces Calculations of EAM using
SDC method
- The parallel procedure of short-range forces
calculations using SDC method - 1) Run electron density computations using SDC
method - 2) Calculate embedding function value and their
derivative in parallel - 3) Run force calculations using SDC method
184 Short-Range Forces Calculations of EAM using
SDC method
- force calculations based on SDC method
- L1 computations on subdomains with different
color - L2 computations on subdomains with same color
- L3 deals with all atoms that constitute a
subdomain - L4 deals with neighbors of a atom
Fig. 5 forces calculations using SDC.
195 Experiments and Discussion
- Experimental environment
- Four Intel Xeon(R) Quad-core E7320 (L2 Cache 4MB)
processors, 16 GB memory - OS is Fedora release 9 with kernel 2.6.25. The
compiler is gcc 4.3.0. - Experimental cases
- observe micro-deformation behaviors of pure Fe
metals material - ---came from XMD program
- under periodic boundary conditions
- initial state -- body-centered cubic (bcc)
lattice arrangement - test cases
- Small-scale case (1) 54,000 atoms
- Medium-scale case (2) 265,302 atoms
- Large-scale case (3) 1,062,882 atoms
- Large-scale case (4) 3,456,000 atoms
20Table 1. The Speedups of Spatial Decomposition
Coloring (SDC) Methods
Speedup Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores
Speedup 2 3 4 8 12 16 2 3 4 8 12 16
SDC (one-dim) 1.71 2.46 3.07 4.17 1.84 2.64 3.37 6.24 6.33
SDC (two-dim) 1.70 2.46 3.07 4.74 5.90 6.43 1.84 2.65 3.39 6.20 8.89 10.90
SDC (three-dim) 1.66 2.40 2.99 4.61 5.74 6.30 1.82 2.65 3.36 6.16 8.76 10.78
Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores
2 3 4 8 12 16 2 3 4 8 12 16
SDC (one-dim) 1.86 2.76 3.67 6.82 9.76 9.59 1.88 2.79 3.66 6.30 9.97 9.82
SDC (two-dim) 1.87 2.78 3.64 6.74 9.73 12.31 1.87 2.80 3.65 6.77 9.84 12.42
SDC (three-dim) 1.86 2.75 3.64 6.64 9.65 12.29 1.87 2.80 3.67 6.74 9.82 12.34
215 Experiments and Discussion
- the scalability of our SDC method. performance of
multi-dimensional SDC method has been improved
with the increase in the number of cores and the
increase in the number of atoms. - performance of SDC methods. We can see that
two-dimensional SDC method achieves highest
efficiency. - two-dimensional decomposition algorithm strives
to make subdomains with small surface area and
large volume, which results in better cache
locality compared to the one-dimensional
decomposition strategy. - three-dimensional SDC method slightly degrades
the performance due to the more overhead of
fork-join threads and scheduling.
22Fig. 6 The speedup of two-dimensional Spatial
Decomposition Coloring (SDC) method, Critical
Section (CS) method, Share Array Privatization
(SAP) method and Redundant Computations (RC)
method.
235 Experiments and Discussion
- SDC method achieves a nearly linear speedup and
highest speedup than other methods - The reason of nearly linear speedup is that the
low synchronization cost of implicit barriers in
our method can be amortized over a large amount
of computation. - CSmethod
- achieves lowest efficiency. CS method encloses
reduction operations on irregular array in
critical section. - SAPmethod
- performance degrade with the increase of the
number of executing cores. memory
overheadsynchronization overhead - RC VS SDC
- there is nearly two-fold computation work for the
short-range force calculations in RC method than
in SDC method, the efficiency of RC method is low
than that of SDC method.
24Conclusion and Future Directions
- A scalable spatial decomposition coloring (SDC)
method - To solve a class of short-range force
calculations problems on shared memory multi-core
platforms - It is scalable not only to large simulation
system but also to many-core architectures - Future directions
- To study SDC method on NUMA memory architecture
- To implement SDC method using MPIOpenMP in
multi-core cluster
25