Comparative Study of Techniques for Parallelization of the Grid Simulation Problem - PowerPoint PPT Presentation

About This Presentation

Title:

Comparative Study of Techniques for Parallelization of the Grid Simulation Problem

Description:

Comparative Study of Techniques for Parallelization of the Grid Simulation Problem Prepared by Arthi Ramachandran Kansas State University – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 40

Provided by: Arthi5

Learn more at: https://people.cs.ksu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Comparative Study of Techniques for Parallelization of the Grid Simulation Problem

1
Comparative Study of Techniques for
Parallelization of the Grid Simulation Problem

Prepared by
Arthi Ramachandran
Kansas State University

2
The Problem

Iterative schemes eg.
Jacobi method
Kinetic Monte Carlo
Gauss Seidel method
Input data has a grid/matrix topology.
Computation of value in a cell at time-step t
requires values of the neighboring cells at
time-step t-1.
Parallelization of this problem

3
The Problem (contd)
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Jacobi Simulation of Heat flow
4
Outline

Goal
Available Solutions
Parallel Adaptive Distributed Simulations
Architecture
Performance Comparison Results
Conclusions
Future Work

5
The Goal

A framework such that user/programmer should have
to code only the problem specific logic.
The framework manages the parallelization of the
users application
Load Balancing required for some problems to
achieve good performance
the model that we build is geared towards
achieving a good performance via moving fixed
size jobs across machines we will see a slide
that shows that we do get very good improvement
in performance with load balancing.

6
Solution 1 OptimalGrid

Developed by IBM Almaden Research Lab
Specifically built to parallelize connected
problems
Built using Java
User only needs to supply
Java code for the problem to be parallelized
Changes to configuration files to fine-tune the
behaviour of OptimalGrid (if required)

7
OptimalGrid Architecture
Manages the Compute Agents
Monitors the Agents tracks their status
performs optimizations if necessary
Invokes Problem Builder if necessary which
automatically Partitions Problem
Distributes the problem among agents
8
Data Units
Original Problem Cell (OPC )
9
Implementation of Jacobi method of Heat Flow
OptimalGrid (1)

EntityJacobi extends
EntityAbstract
Data Members
double temperature
Class Name
Methods
double getTemperature()
void setTemperature (double)
void initFromXML(Element entity)
Element getXML()

10
Implementation of Jacobi method of Heat Flow
OptimalGrid (2)
localInteraction(ArrayList occupants) double
temperature 0.0 if(this.loc.x 0) //
Cell is on first row of grid temperature
5.0 else if(this.loc.y 0
this.loc.y Grid_Dimension-1) temperature
0 else // Inner cells compute
for(each neighbor entry in occupants list)
temperature occupantsEntry.getTemperature()
temperature / 4 occupants(0).setTemp
erature(temperature) Remove all neighbor
entries from occupantsList

OPCJacobi extends OPCAbstract
Data Members
Version Identifier
Class Name
Methods
propagate()
localInteraction()

propagate() ArrayList newOccupants ArrayList
neibList neibList this.getAllOpcNeighbors()
for(each entry in neibList)
newOccupants.add(neibListEntry.
getOccupants() ) return newOccupants
11
MPI Solution

Message Passing Interface library a
specification
Various Implementations of this specification
available eg.
LAM MPI developed by Ohio Supercomputer Centre.
MPICH developed by Argonne National Labs and
Mississippi State University. (used in this work)

12
Features of MPI

Flexible Send and Receive APIs
void CommSend(void buf, int count, Datatype
datatype, int destination, int tag)
void CommRecv(void buf, int count, Datatype
datatype, int source, int tag)
Collective Communications support
Broadcast
Scatter and Gather operations between a set of
processes
Collective computation operations such as
minimum, maximum, sum etc.

13
Features of MPI (contd.)

Virtual Topologies
Communication Modes
Non-blocking versions of Send/Receive APIs
Synchronous Mode
Buffered Mode
Debugging and Profiling hooks

14
MPI Solution overview
Process 0
Process 1
Process 2
Process 3
15
MPI Implementation - Jacobi Iteration method
Number of Processes (N) Partition Size Number of
Iterations
All processes (0N-1)
Create_cart()
Cartesian Matrix of processes
CommShift()
Each process has the id of the left, top, right
and bottom neighbor processes
16
MPI Implementation - Jacobi Iteration method
(contd.)
Processes 1 N-1 wait for the partition
co-ordinates from Process 0
Process 0 computes the grid co-ordinates of the
partition to be assigned to each process
Send
Allocate Boundary Buffers for the partition
17
MPI Implementation - Jacobi Iteration method
(contd.)
Issue calls to Isend and Irecv non blocking
methods to send/recv data
No
Iterations finished
Yes
Compute Inner Cells
Send the result data to the Process 0
Wait for the Isend and Irecv calls to complete
Compute Outer Cells
18
Parallel Adaptive Distributed Simulations A
new model

But Why ?
Experiment with bringing together the concepts of
partititions, double buffers, thread pool, jobs,
synchronization schemes in thread pools and load
balancing by moving fixed size jobs across
machines.
OptimalGrid does some of this, however it is
proprietary software hence no access to its
source.
It is fairly easy to code the application using
MPI, however for problems such as atomistic
motion simulation, load balancing is a required
feature.
Can we do better ?

19
Thread Pool
20
Jobs and partitions
public bool execute(phase) switch(phase)
case 0 // Sequential code for phase 0 phase
1 if(!synchronizationMethod(this)) return
false else return true case 1

Data Members
int phase
Methods
bool execute

About to enter Synchronization part Advance phase
by 1
Synchronization returns false Job has to wait on
some Condition Job thread should relinquish
this job
Synchronization returns true Job can continue

21
PADS - Architecture

Establish communication channels with controller
as well as with other controller agents
Initialize Thread Pool on host
Deploy jobs received from controller in Thread
Pool
Handle communication requirements of each job
Respond to controller messages (load balancing)
Send results back to controller after iterations
are complete

Parsing Input
Opening, monitoring communication channels with
controller agents
Partitioning input grid
Assigning the jobs to the hosts/nodes
Co-ordinate load balancing
Collect results from all hosts after iterations
are completed
Emit output in the specified format

22
Communication between Controller Agents
23
Synchronization Jobs and Controller agent

All communication between jobs is through
controller agent(s)
Hence, synchronization required only between the
controller agent and the job
Shared Job data
Time step of job
Boundary Buffers
Waiting flag
Frozen flag

24
Load Balancer
Node NS Job JMC
Messages
Job movement
Controller
Node NS Job JM
Messages
25
Overview of Load Balancer Module
26
Overview of Load Balancer Module (2)
27
Overview of Load Balancer Module (3)
28
Overview of Load Balancer Module (4)
29
Performance ComparisonExperiment 1
30
Performance ComparisonExperiment 2
31
Performance ComparisonExperiment 3
32
PADS Performance comparison for varying number
of threads per node (50 x 50 partition size)
33
PADS Performance comparison for varying number
of threads per node (25 x 25 partition size)
34
PADS Performance comparison for varying number
of threads per node (10 x 10 partition size)
35
Preliminary Results for Load Balancer
36
Conclusions

OptimalGrid seems to perform better than PADS and
MPI solution for a larger grain size. ( 10 µs)
(System.nanoseconds() accuracy ? )
PADS and MPI perform better than the OptimalGrid
by an order of magnitude for small grain size (4
10ns)
OptimalGrid provides features that can be used
easily by the user.
MPI provides hooks for logging and debugging
which can be used by the programmer.
OptimalGrid and PADS allow for load balancing to
be done automatically. With PADS, from the
results of the simulation, we see that a good
performance improvement is obtained with load
balancing. MPI does not provide dynamic load
balancing.

37
Future Work

Formulation and implementation of policies for
dynamic load balancing in PADS.
Experiment with flexibility - partitions are
allowed to have variable dimensions however,
synchronization and communication will become
more complex and might give rise to more
overhead.
Heterogeneity in duration of a time-step and
computation among jobs needs to be allowed for
implementation of certain problems.
Develop a GUI for PADS.

38
Acknowledgements