Memory Architectures for Protein Folding: MD on million PIM processors - PowerPoint PPT Presentation

About This Presentation

Title:

Memory Architectures for Protein Folding: MD on million PIM processors

Description:

University of Illinois at Urbana-Champaign. Memory Architectures for ... simulator that allows one to run full-fledged programs on simulated architecture ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 22

Provided by: prvu

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Memory Architectures for Protein Folding: MD on million PIM processors

1
Memory Architectures for Protein Folding MD on
million PIM processors

Fort Lauderdale, May 03,

2
Overview

EIA-0081307 ITR Intelligent Memory
Architectures and Algorithms to Crack the Protein
Folding Problem
PIs
Josep Torrellas and Laxmikant Kale (University of
Illinois)
Mark Tuckerman (New York University)
Michael Klein (University of Pennsylvania)
Also associated Glenn Martyna (IBM)
Period 8/00 - 7/03

3
Project Description

Multidisciplinary project in computer
architecture and software, and computational
biology
Goals
Design improved algorithms to help solve the
protein folding problem
Design the architecture and software of
general-purpose parallel machines that speed-up
the solution of the problem

4
Some Recent Progress Ideas

Developed REPSWA
(Reference Potential Spatial Warping Algorithm)
Novel algorithm for accelerating conformational
sampling in molecular dynamics, a key element in
protein folding
Based on spatial warping'' variable
transformation.
This transformation is designed to shrink barrier
regions on the energy landscape and grow
attractive basins without altering the
equilibrium properties of the system
Result large gains in sampling efficiency
Using novel variable transformations to enhance
conformational sampling in molecular dynamics Z.
Zhu, M. E. Tuckerman, S. O. Samuelson and G. J.
Martyna, Phys. Rev. Lett. 88, 100201 (2002).

5
Some Recent Progress Tools

Developed LeanMD, a molecular dynamics parallel
program that targets at very large scale parallel
machines
Research-quality program based on the Charm
parallel object oriented language
Descendant from NAMD (another parallel molecular
dynamics application) that achieved unprecedented
speedup on thousands of processors
LeanMD to be able to run on next generation
parallel machines with ten thousands or even
millions of processors such as Blue Gene/L or
Blue Gene/C
Requires a new parallelization strategy that can
break up the simulation problem in a more fine
grained manner to generate parallelism enough to
effectively distribute work across a million
processors.

6
Some Recent Progress Tools

Developed a high-performance communication
library
For collective communication operations
AlltoAll personalized communication, AlltoAll
multicast, and AllReduce
These operations can be complex and time
consuming in large parallel machines
Especially costly for applications that involve
all-to-all patterns
such as 3-D FFT and sorting
Library optimizes collective communication
operations
by performing message combining via imposing a
virtual topology
The overhead of AlltoAll communication for
76-byte message exchanges between 2058 processors
is in the low tens of milliseconds

7
Some Recent Progress People

The following graduate student researchers have
been supported
Sameer Kumar (University of Illinois)
Gengbin Zheng (University of Illinois)
Jun Nakano (University of Illinois)
Zhongwei Zhu (New York University)

8
Overview

Rest of the talk
Objective Develop a Molecular Dynamics program
that will run effectively on a million processors
Each with low memory to processor ratio
Method
Use parallel objects methodology
Develop an emulator/simulator that allows one to
run full-fledged programs on simulated
architecture
Presenting Today
Simulator details
LeanMD Simulation on BG/L and BG/C

9
Performance Prediction on Large Machines

Problem
How to predict performance of applications on
future machines?
How to do performance tuning without continuous
access to a large machine?

Solution
Leverage virtualization
Develop a machine emulator
Simulator accurate time modeling
Run a program on 100,000 processors using only
hundreds of processors

10
Blue Gene Emulator functional view
Affinity message queues
Affinity message queues
Converse scheduler
Converse Q
11
Emulator to Simulator

Emulator
Study programming model and application
development
Simulator
performance prediction capability
models communication latency based on network
model
Doesnt model memory access on chip, or network
contention

Parallel performance is hard to model
Communication subsystem
Out of order messages
Communication/computation overlap
Event dependencies
Parallel Discrete Event Simulation
Emulation program executes in parallel with event
time stamp correction.
Exploit inherent determinacy of application

12
How to simulate?

Time stamping events
Per thread timer (sharing one physical timer)
Time stamp messages
Calculate communication latency based on network
model
Parallel event simulation
When a message is sent out, calculate the
predicted arrival time for the destination
bluegene-processor
When a message is received, update current time
as
currTime max(currTime,recvTime)
Time stamp correction

13
Parallel correction algorithm

Sort message execution by receive time
Adjust time stamps when needed
Use correction message to inform the change in
event startTime.
Send out correction messages following the path
message was sent
The events already in the timeline may have to
move.

14
Timestamps Correction
15
Timestamps Correction
16
Timestamps Correction
17
Timestamps Correction
18
Predicted time vs latency factor
Validation
19
LeanMD

LeanMD is a molecular dynamics simulation
application written in Charm
Next generation of NAMD,
The Gordon Bell Award winner in SC2002.
Requires a new parallelization strategy
break up the problem in a more fine-grained
manner to effectively distribute work across the
extreme large number of processors.

20
LeanMD Performance Analysis
Need readable graphs 1 to a page is fine, but
with larger fonts, thicker lines
21
(No Transcript)

Write a Comment

User Comments (0)