Title: Comparative Study of Techniques for Parallelization of the Grid Simulation Problem
1Comparative Study of Techniques for
Parallelization of the Grid Simulation Problem
- Prepared by
- Arthi Ramachandran
- Kansas State University
2The Problem
- Iterative schemes eg.
- Jacobi method
- Kinetic Monte Carlo
- Gauss Seidel method
- Input data has a grid/matrix topology.
- Computation of value in a cell at time-step t
requires values of the neighboring cells at
time-step t-1. - Parallelization of this problem
3The Problem (contd)
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Jacobi Simulation of Heat flow
4Outline
- Goal
- Available Solutions
- Parallel Adaptive Distributed Simulations
Architecture - Performance Comparison Results
- Conclusions
- Future Work
5The Goal
- A framework such that user/programmer should have
to code only the problem specific logic. - The framework manages the parallelization of the
users application - Load Balancing required for some problems to
achieve good performance - the model that we build is geared towards
achieving a good performance via moving fixed
size jobs across machines we will see a slide
that shows that we do get very good improvement
in performance with load balancing.
6Solution 1 OptimalGrid
- Developed by IBM Almaden Research Lab
- Specifically built to parallelize connected
problems - Built using Java
- User only needs to supply
- Java code for the problem to be parallelized
- Changes to configuration files to fine-tune the
behaviour of OptimalGrid (if required)
7OptimalGrid Architecture
Manages the Compute Agents
Monitors the Agents tracks their status
performs optimizations if necessary
Invokes Problem Builder if necessary which
automatically Partitions Problem
Distributes the problem among agents
8Data Units
Original Problem Cell (OPC )
9Implementation of Jacobi method of Heat Flow
OptimalGrid (1)
- EntityJacobi extends
- EntityAbstract
- Data Members
- double temperature
- Class Name
- Methods
- double getTemperature()
- void setTemperature (double)
- void initFromXML(Element entity)
- Element getXML()
10Implementation of Jacobi method of Heat Flow
OptimalGrid (2)
localInteraction(ArrayList occupants) double
temperature 0.0 if(this.loc.x 0) //
Cell is on first row of grid temperature
5.0 else if(this.loc.y 0
this.loc.y Grid_Dimension-1) temperature
0 else // Inner cells compute
for(each neighbor entry in occupants list)
temperature occupantsEntry.getTemperature()
temperature / 4 occupants(0).setTemp
erature(temperature) Remove all neighbor
entries from occupantsList
- OPCJacobi extends OPCAbstract
- Data Members
- Version Identifier
- Class Name
- Methods
- propagate()
- localInteraction()
propagate() ArrayList newOccupants ArrayList
neibList neibList this.getAllOpcNeighbors()
for(each entry in neibList)
newOccupants.add(neibListEntry.
getOccupants() ) return newOccupants
11MPI Solution
- Message Passing Interface library a
specification - Various Implementations of this specification
available eg. - LAM MPI developed by Ohio Supercomputer Centre.
- MPICH developed by Argonne National Labs and
Mississippi State University. (used in this work)
12Features of MPI
- Flexible Send and Receive APIs
- void CommSend(void buf, int count, Datatype
datatype, int destination, int tag) - void CommRecv(void buf, int count, Datatype
datatype, int source, int tag) - Collective Communications support
- Broadcast
- Scatter and Gather operations between a set of
processes - Collective computation operations such as
minimum, maximum, sum etc.
13Features of MPI (contd.)
- Virtual Topologies
- Communication Modes
- Non-blocking versions of Send/Receive APIs
- Synchronous Mode
- Buffered Mode
- Debugging and Profiling hooks
14MPI Solution overview
Process 0
Process 1
Process 2
Process 3
15MPI Implementation - Jacobi Iteration method
Number of Processes (N) Partition Size Number of
Iterations
All processes (0N-1)
Create_cart()
Cartesian Matrix of processes
CommShift()
Each process has the id of the left, top, right
and bottom neighbor processes
16MPI Implementation - Jacobi Iteration method
(contd.)
Processes 1 N-1 wait for the partition
co-ordinates from Process 0
Process 0 computes the grid co-ordinates of the
partition to be assigned to each process
Send
Allocate Boundary Buffers for the partition
17MPI Implementation - Jacobi Iteration method
(contd.)
Issue calls to Isend and Irecv non blocking
methods to send/recv data
No
Iterations finished
Yes
Compute Inner Cells
Send the result data to the Process 0
Wait for the Isend and Irecv calls to complete
Compute Outer Cells
18Parallel Adaptive Distributed Simulations A
new model
- But Why ?
- Experiment with bringing together the concepts of
partititions, double buffers, thread pool, jobs,
synchronization schemes in thread pools and load
balancing by moving fixed size jobs across
machines. - OptimalGrid does some of this, however it is
proprietary software hence no access to its
source. - It is fairly easy to code the application using
MPI, however for problems such as atomistic
motion simulation, load balancing is a required
feature. - Can we do better ?
19Thread Pool
20Jobs and partitions
public bool execute(phase) switch(phase)
case 0 // Sequential code for phase 0 phase
1 if(!synchronizationMethod(this)) return
false else return true case 1
- Data Members
- int phase
- Methods
- bool execute
About to enter Synchronization part Advance phase
by 1
Synchronization returns false Job has to wait on
some Condition Job thread should relinquish
this job
Synchronization returns true Job can continue
21PADS - Architecture
- Establish communication channels with controller
as well as with other controller agents - Initialize Thread Pool on host
- Deploy jobs received from controller in Thread
Pool - Handle communication requirements of each job
- Respond to controller messages (load balancing)
- Send results back to controller after iterations
are complete
- Parsing Input
- Opening, monitoring communication channels with
controller agents - Partitioning input grid
- Assigning the jobs to the hosts/nodes
- Co-ordinate load balancing
- Collect results from all hosts after iterations
are completed - Emit output in the specified format
22Communication between Controller Agents
23Synchronization Jobs and Controller agent
- All communication between jobs is through
controller agent(s) - Hence, synchronization required only between the
controller agent and the job - Shared Job data
- Time step of job
- Boundary Buffers
- Waiting flag
- Frozen flag
24Load Balancer
Node NS Job JMC
Messages
Job movement
Controller
Node NS Job JM
Messages
25Overview of Load Balancer Module
26Overview of Load Balancer Module (2)
27Overview of Load Balancer Module (3)
28Overview of Load Balancer Module (4)
29Performance ComparisonExperiment 1
30Performance ComparisonExperiment 2
31Performance ComparisonExperiment 3
32PADS Performance comparison for varying number
of threads per node (50 x 50 partition size)
33PADS Performance comparison for varying number
of threads per node (25 x 25 partition size)
34PADS Performance comparison for varying number
of threads per node (10 x 10 partition size)
35Preliminary Results for Load Balancer
36Conclusions
- OptimalGrid seems to perform better than PADS and
MPI solution for a larger grain size. ( 10 µs)
(System.nanoseconds() accuracy ? ) - PADS and MPI perform better than the OptimalGrid
by an order of magnitude for small grain size (4
10ns) - OptimalGrid provides features that can be used
easily by the user. - MPI provides hooks for logging and debugging
which can be used by the programmer. - OptimalGrid and PADS allow for load balancing to
be done automatically. With PADS, from the
results of the simulation, we see that a good
performance improvement is obtained with load
balancing. MPI does not provide dynamic load
balancing.
37Future Work
- Formulation and implementation of policies for
dynamic load balancing in PADS. - Experiment with flexibility - partitions are
allowed to have variable dimensions however,
synchronization and communication will become
more complex and might give rise to more
overhead. - Heterogeneity in duration of a time-step and
computation among jobs needs to be allowed for
implementation of certain problems. - Develop a GUI for PADS.
38Acknowledgements
- Dr. Virgil E. Wallentine
- Dr. Daniel A. Andresen
- Dr. Gurdip Singh
- Dr. Masaaki Mizuno
39Questions ?