Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms

Description:

enclosing reduction operation in a critical section. privating the reduction array ... enclosing reduction operation in a critical section. create a critical ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 26
Provided by: lyl
Learn more at: https://www.mcs.anl.gov
Category:

less

Transcript and Presenter's Notes

Title: Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms


1
Efficient Parallel Implementation of Molecular
Dynamics with Embedded Atom Method on Multi-core
Platforms
  • Reporter Jilin Zhang
  • AuthorsChangjun Hu, Yali Liu, and Jianjiang Li
  • Information Engineering School, University of
    Science and Technology Beijing, Beijing, P.R.China

2
Outline
  • 1 Motivation
  • 2 Related Works
  • 3 Spatial Decomposition Coloring (SDC) Approach
  • 4 Short-Range Forces Calculations of EAM using
    SDC method
  • 5 Experiments and Discussion
  • 6 Conclusion and Future Directions

3
1 Motivation
  • The process of molecular dynamics simulations

set init_state
Fig. 1 the process of molecular dynamics
simulations.
4
1 Motivation
  • the intensive computation appears in short-range
    force calculations procedure of MD simulations
  • Neighbor-list method decreases the intensive
    computation largely. It make each atom only
    interacts with atoms in its neighbor region.
  • Newtons third law can have the force
    computations. And it brings the reduction
    operations on irregular arrays

Fig. 2 codes of force caluclations.
5
2 Related Works --- parallel reduction
operations on irregular arrays
  • Some types of solutions
  • enclosing reduction operation in a critical
    section
  • privating the reduction array
  • using redundant computations strategy

6
2 Related Works --- parallel reduction
operations on irregular arrays
  • enclosing reduction operation in a critical
    section
  • create a critical section in inner loop
  • straight and easy to implement parallelization.
  • high synchronization cost arose by critical
    region, atomic or lock involved in inner loop

7
2 Related Works --- parallel reduction
operations on irregular arrays
  • private the reduction array
  • each thread have to update share array in
    critical region according the value of its
    private array
  • it reduce times of entering into critical region
    and reduce synchronization cost.
  • high memory overhead of private array
  • limit number of particles allowed in simulations
  • compete for cache space and decrease program
    speed

8
2 Related Works --- parallel reduction
operations on irregular arrays
  • redundant computations strategy
  • does not use Newtons third law. So each pair
    interaction has to be calculated twice.
  • the high parallelizability since data dependence
    has been removed between the loop iterations
  • there are double computations and that neighbor
    list requires more memory space.

9
3 Spatial Decomposition Coloring (SDC) Approach
  • Spatial Decomposition (SD) method
  • distributed memory multi-processors involving
    several hundreds of processors
  • change all array declarations and all loop
    bounds, and explicitly codes the periodic
    transfer of the boundary data between processors.
  • It is simple to implement SD in OpenMP.

10
3 Spatial Decomposition Coloring (SDC) Approach
  • SD method places a restriction on parallelism in
    OpenMP.
  • synchronization will be required to ensure that
    multiple threads do not attempt to update the
    same atom simultaneously.

Fig. 3 SD method.
11
3 Spatial Decomposition Coloring (SDC) Approach
  • SDC method
  • SDC method consists of the following steps
  • Step 1) Split domain
  • Step 2) Coloring subdomains
  • Step 3) Parallel Computing

12
3 Spatial Decomposition Coloring (SDC) Approach
  • SDC method
  • SDC method consists of the following steps
  • Step 1) Split domain
  • Split the spatial domain into subdomains.
  • Length of a subdomain must be longer than
    diameter.
  • Number of subdomains in dimension decomposed
    should be even.

13
3 Spatial Decomposition Coloring (SDC) Approach
  • SDC method
  • SDC method consists of the following steps
  • Step 2) Coloring subdomains
  • The number of subdomains with each color must be
    equal
  • each subdomain is surrounded only by those
    subdomains with different colors.

14
3 Spatial Decomposition Coloring (SDC) Approach
  • SDC method
  • SDC method consists of the following steps
  • Step 3) Parallel Computing
  • Calculations of forces on subdomains with one
    color can be run in parallel.
  • a barrier should be given for waiting all threads
    to complete computation on this color.
  • Calculations on subdomains with different colors
    must run in a serial fashion.

15
3 Spatial Decomposition Coloring (SDC) Approach
  • SDC method
  • advantage
  • neighbor list usually doesnt be updated in every
    time-step ?Cost of SDC method is very lowest.
  • higher-dimensional decomposition method creates
    more subdomains. ?scalable and suitable on
    multi-core and many-core architectures.
  • disadvantage
  • Spatial Decomposition method ?Overload imbalance
  • ? under condition of simulation system has
    uniformity of density

16
4 Short-Range Forces Calculations of EAM using
SDC method
  • EAM method
  • short-range forces
  • the intensive computation
  • three computational phases
  • the most time consuming parts are 1 and 3

Fig. 4 short-range forces in EAM method.
17
4 Short-Range Forces Calculations of EAM using
SDC method
  • The parallel procedure of short-range forces
    calculations using SDC method
  • 1) Run electron density computations using SDC
    method
  • 2) Calculate embedding function value and their
    derivative in parallel
  • 3) Run force calculations using SDC method

18
4 Short-Range Forces Calculations of EAM using
SDC method
  • force calculations based on SDC method
  • L1 computations on subdomains with different
    color
  • L2 computations on subdomains with same color
  • L3 deals with all atoms that constitute a
    subdomain
  • L4 deals with neighbors of a atom

Fig. 5 forces calculations using SDC.
19
5 Experiments and Discussion
  • Experimental environment
  • Four Intel Xeon(R) Quad-core E7320 (L2 Cache 4MB)
    processors, 16 GB memory
  • OS is Fedora release 9 with kernel 2.6.25. The
    compiler is gcc 4.3.0.
  • Experimental cases
  • observe micro-deformation behaviors of pure Fe
    metals material
  • ---came from XMD program
  • under periodic boundary conditions
  • initial state -- body-centered cubic (bcc)
    lattice arrangement
  • test cases
  • Small-scale case (1) 54,000 atoms
  • Medium-scale case (2) 265,302 atoms
  • Large-scale case (3) 1,062,882 atoms
  • Large-scale case (4) 3,456,000 atoms

20
Table 1. The Speedups of Spatial Decomposition
Coloring (SDC) Methods
Speedup Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Small case (1) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores Medium case (2) on 216 cores
Speedup 2 3 4 8 12 16 2 3 4 8 12 16
SDC (one-dim) 1.71 2.46 3.07 4.17 1.84 2.64 3.37 6.24 6.33
SDC (two-dim) 1.70 2.46 3.07 4.74 5.90 6.43 1.84 2.65 3.39 6.20 8.89 10.90
SDC (three-dim) 1.66 2.40 2.99 4.61 5.74 6.30 1.82 2.65 3.36 6.16 8.76 10.78
Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (3) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores Large case (4) on 216 cores
2 3 4 8 12 16 2 3 4 8 12 16
SDC (one-dim) 1.86 2.76 3.67 6.82 9.76 9.59 1.88 2.79 3.66 6.30 9.97 9.82
SDC (two-dim) 1.87 2.78 3.64 6.74 9.73 12.31 1.87 2.80 3.65 6.77 9.84 12.42
SDC (three-dim) 1.86 2.75 3.64 6.64 9.65 12.29 1.87 2.80 3.67 6.74 9.82 12.34
21
5 Experiments and Discussion
  • the scalability of our SDC method. performance of
    multi-dimensional SDC method has been improved
    with the increase in the number of cores and the
    increase in the number of atoms.
  • performance of SDC methods. We can see that
    two-dimensional SDC method achieves highest
    efficiency.
  • two-dimensional decomposition algorithm strives
    to make subdomains with small surface area and
    large volume, which results in better cache
    locality compared to the one-dimensional
    decomposition strategy.
  • three-dimensional SDC method slightly degrades
    the performance due to the more overhead of
    fork-join threads and scheduling.

22
Fig. 6 The speedup of two-dimensional Spatial
Decomposition Coloring (SDC) method, Critical
Section (CS) method, Share Array Privatization
(SAP) method and Redundant Computations (RC)
method.
23
5 Experiments and Discussion
  • SDC method achieves a nearly linear speedup and
    highest speedup than other methods
  • The reason of nearly linear speedup is that the
    low synchronization cost of implicit barriers in
    our method can be amortized over a large amount
    of computation.
  • CSmethod
  • achieves lowest efficiency. CS method encloses
    reduction operations on irregular array in
    critical section.
  • SAPmethod
  • performance degrade with the increase of the
    number of executing cores. memory
    overheadsynchronization overhead
  • RC VS SDC
  • there is nearly two-fold computation work for the
    short-range force calculations in RC method than
    in SDC method, the efficiency of RC method is low
    than that of SDC method.

24
Conclusion and Future Directions
  • A scalable spatial decomposition coloring (SDC)
    method
  • To solve a class of short-range force
    calculations problems on shared memory multi-core
    platforms
  • It is scalable not only to large simulation
    system but also to many-core architectures
  • Future directions
  • To study SDC method on NUMA memory architecture
  • To implement SDC method using MPIOpenMP in
    multi-core cluster

25
  • Thank You !
Write a Comment
User Comments (0)
About PowerShow.com