NAMD2: Greater Scalability for Parallel Molecular Dynamics

About This Presentation

Title:

NAMD2: Greater Scalability for Parallel Molecular Dynamics

Description:

Computed by proxies on min(x, y, z) patch. Minimizes communication: 7 down, 7x up. Can use same proxies for non-bonded. Uses lists of atoms in 'tuples' Bonded ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 23

Provided by: scottca8

Category:

more less

Transcript and Presenter's Notes

Title: NAMD2: Greater Scalability for Parallel Molecular Dynamics

1
NAMD2 Greater Scalability for Parallel
Molecular Dynamics

Scott Callaghan
CSCI 653
Spring 2006

2
Overview

Why?
Applications
Current models
Algorithm
Pieces
Computations
Implementation
Coding
Load balancing
Performance

3
Why yet another approach?

Biomolecular systems
104 105 atoms
106 timesteps
Bonded and non-bonded computation
Modular

4
Current Methods

Force Decomposition
Not scalable
Spatial Decomposition
Scalable
High overhead for low N/P
Combine both!
Scalable
Keep N/P high while having high P

5
Algorithm Components

Patches
Compute Objects
Proxies
Sequencers

6
Patches

Spatial decomposition components
Responsible for
Checking and migrating atoms (4-8 timesteps)
Forces on those atoms

7
Compute Objects

Force decomposition components
Responsible for
Bonding and non-bonding forces
Relaying forces to the respective patch
Modular

8
Algorithm so far
This seems good
but what about this?
P0
P0
P1
Patch (0, 0, 0)
Patch (0, 0, 1)
Patch (0, 0, 0)
Patch (0, 0, 1)
?
Compute non-bonding
Compute non-bonding
9
Proxies

Act as go-betweens
Responsible for
Getting atoms from home patch
Giving atoms to compute node
Getting forces from compute node
Giving forces to home patch
Better than direct communication

10
Sequencers

Manages the home patch
Responsible for
Sending compute requests for a subset
Updating velocities
Notifying home patch of position updates
Calculates energies
Changes to logic go here
Basic pseudocode on pg. 293

11
Lets pause for a summary
P0
P1

Sequencer updates
Notifies home patch
Migrate atoms
Populate proxies
Notify compute objects
Calculate forces
Update forces
Update velocities
Calculate energies
Repeat!

Sequencer
Patch (0, 0, 0)
Patch (0, 0, 1)
Proxy
Compute non-bonding
12
Non-bonded interactions

3 types
Self
Pair
Exclusion
Ignored for 1-2 and 1-3 atoms
Modified for 1-4
Pair-list of atoms within cutoff distance
Compute all and exclude later

13
Self and Pair algorithm

For all pairs in this and neighboring patches
If the atoms are inside the cutoff
If the atoms are inside the exclusion cutoff
If the atoms are 1-4 excluded
calculate modified interaction
else if the atoms are not excluded
calculate normal interaction
else
calculate normal interaction

14
Exclusion

For all pairs of excluded atoms
If the atoms are beyond the exclusion cutoff
calculate and subtract normal interaction
If the atoms are 1-4 excluded
calculate modified interaction

15
Why this complex approach?

Greater force decomposition
Expensive to check exclusion
Dont add/subtract L-J potential for
Add cutoffs to provide some speedup
Exclusion
Hydrogen group
Perform cutoffs together

16
Bonded interactions

5 compute objects (bond interactions) per node
Computed by proxies on min(x, y, z) patch
Minimizes communication 7 down, 7x up
Can use same proxies for non-bonded
Uses lists of atoms in tuples

17
Bonded algorithm

For all home or upstream (proxy) patches
For all atoms in this patch
For all tuples with this atom as the first atom
look up the patches with atoms in this tuple
calculate the common downstream patch
If the downstream patch is you
add this tuple to the list to calculate

18
Implementation

Objects which send messages
Uses Converse
Charm (execute and communicate)
Other strange packages
Type hierarchy for easy modification
Encapsulation

19
Load Balancing

First on startup
Distribute patches
1 bond force compute object per processor
Non-bonding
Self on the same processor as the patch
Patch-pair on upstream

20
Load Balancing, cont.

After n steps
Calculate t for non-migratable objects
Determine average load
Iterate through migratable objects, long to short
Put on proc so new load
Examine 2-proxy, then 1, then 0 (self objects)
Assign and update loads
Do second pass with smaller overload
Repeat second pass after m more steps

21
Performance

80 efficiency on P128
10 communication overhead
10 idle time
Independent comparisons can be found at
http//www.ks.uiuc.edu/Research/namd/performance.h
tml
Best for very large processor counts

22
Questions?

Write a Comment

User Comments (0)