NAMD2: Greater Scalability for Parallel Molecular Dynamics - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

NAMD2: Greater Scalability for Parallel Molecular Dynamics

Description:

Computed by proxies on min(x, y, z) patch. Minimizes communication: 7 down, 7x up. Can use same proxies for non-bonded. Uses lists of atoms in 'tuples' Bonded ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 23
Provided by: scottca8
Category:

less

Transcript and Presenter's Notes

Title: NAMD2: Greater Scalability for Parallel Molecular Dynamics


1
NAMD2 Greater Scalability for Parallel
Molecular Dynamics
  • Scott Callaghan
  • CSCI 653
  • Spring 2006

2
Overview
  • Why?
  • Applications
  • Current models
  • Algorithm
  • Pieces
  • Computations
  • Implementation
  • Coding
  • Load balancing
  • Performance

3
Why yet another approach?
  • Biomolecular systems
  • 104 105 atoms
  • 106 timesteps
  • Bonded and non-bonded computation
  • Modular

4
Current Methods
  • Force Decomposition
  • Not scalable
  • Spatial Decomposition
  • Scalable
  • High overhead for low N/P
  • Combine both!
  • Scalable
  • Keep N/P high while having high P

5
Algorithm Components
  • Patches
  • Compute Objects
  • Proxies
  • Sequencers

6
Patches
  • Spatial decomposition components
  • Responsible for
  • Checking and migrating atoms (4-8 timesteps)
  • Forces on those atoms

7
Compute Objects
  • Force decomposition components
  • Responsible for
  • Bonding and non-bonding forces
  • Relaying forces to the respective patch
  • Modular

8
Algorithm so far
This seems good
but what about this?
P0
P0
P1
Patch (0, 0, 0)
Patch (0, 0, 1)
Patch (0, 0, 0)
Patch (0, 0, 1)
?
Compute non-bonding
Compute non-bonding
9
Proxies
  • Act as go-betweens
  • Responsible for
  • Getting atoms from home patch
  • Giving atoms to compute node
  • Getting forces from compute node
  • Giving forces to home patch
  • Better than direct communication

10
Sequencers
  • Manages the home patch
  • Responsible for
  • Sending compute requests for a subset
  • Updating velocities
  • Notifying home patch of position updates
  • Calculates energies
  • Changes to logic go here
  • Basic pseudocode on pg. 293

11
Lets pause for a summary
P0
P1
  • Sequencer updates
  • Notifies home patch
  • Migrate atoms
  • Populate proxies
  • Notify compute objects
  • Calculate forces
  • Update forces
  • Update velocities
  • Calculate energies
  • Repeat!

Sequencer
Patch (0, 0, 0)
Patch (0, 0, 1)
Proxy
Compute non-bonding
12
Non-bonded interactions
  • 3 types
  • Self
  • Pair
  • Exclusion
  • Ignored for 1-2 and 1-3 atoms
  • Modified for 1-4
  • Pair-list of atoms within cutoff distance
  • Compute all and exclude later

13
Self and Pair algorithm
  • For all pairs in this and neighboring patches
  • If the atoms are inside the cutoff
  • If the atoms are inside the exclusion cutoff
  • If the atoms are 1-4 excluded
  • calculate modified interaction
  • else if the atoms are not excluded
  • calculate normal interaction
  • else
  • calculate normal interaction

14
Exclusion
  • For all pairs of excluded atoms
  • If the atoms are beyond the exclusion cutoff
  • calculate and subtract normal interaction
  • If the atoms are 1-4 excluded
  • calculate modified interaction

15
Why this complex approach?
  • Greater force decomposition
  • Expensive to check exclusion
  • Dont add/subtract L-J potential for
  • Add cutoffs to provide some speedup
  • Exclusion
  • Hydrogen group
  • Perform cutoffs together

16
Bonded interactions
  • 5 compute objects (bond interactions) per node
  • Computed by proxies on min(x, y, z) patch
  • Minimizes communication 7 down, 7x up
  • Can use same proxies for non-bonded
  • Uses lists of atoms in tuples

17
Bonded algorithm
  • For all home or upstream (proxy) patches
  • For all atoms in this patch
  • For all tuples with this atom as the first atom
  • look up the patches with atoms in this tuple
  • calculate the common downstream patch
  • If the downstream patch is you
  • add this tuple to the list to calculate

18
Implementation
  • Objects which send messages
  • Uses Converse
  • Charm (execute and communicate)
  • Other strange packages
  • Type hierarchy for easy modification
  • Encapsulation

19
Load Balancing
  • First on startup
  • Distribute patches
  • 1 bond force compute object per processor
  • Non-bonding
  • Self on the same processor as the patch
  • Patch-pair on upstream

20
Load Balancing, cont.
  • After n steps
  • Calculate t for non-migratable objects
  • Determine average load
  • Iterate through migratable objects, long to short
  • Put on proc so new load
  • Examine 2-proxy, then 1, then 0 (self objects)
  • Assign and update loads
  • Do second pass with smaller overload
  • Repeat second pass after m more steps

21
Performance
  • 80 efficiency on P128
  • 10 communication overhead
  • 10 idle time
  • Independent comparisons can be found at
    http//www.ks.uiuc.edu/Research/namd/performance.h
    tml
  • Best for very large processor counts

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com