Comparative Study of Techniques for Parallelization of the Grid Simulation Problem - PowerPoint PPT Presentation

About This Presentation
Title:

Comparative Study of Techniques for Parallelization of the Grid Simulation Problem

Description:

Comparative Study of Techniques for Parallelization of the Grid Simulation Problem Prepared by Arthi Ramachandran Kansas State University – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 40
Provided by: Arthi5
Category:

less

Transcript and Presenter's Notes

Title: Comparative Study of Techniques for Parallelization of the Grid Simulation Problem


1
Comparative Study of Techniques for
Parallelization of the Grid Simulation Problem
  • Prepared by
  • Arthi Ramachandran
  • Kansas State University

2
The Problem
  • Iterative schemes eg.
  • Jacobi method
  • Kinetic Monte Carlo
  • Gauss Seidel method
  • Input data has a grid/matrix topology.
  • Computation of value in a cell at time-step t
    requires values of the neighboring cells at
    time-step t-1.
  • Parallelization of this problem

3
The Problem (contd)
5.0
5.0
5.0
5.0
5.0
5.0
5.0
5.0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Jacobi Simulation of Heat flow
4
Outline
  • Goal
  • Available Solutions
  • Parallel Adaptive Distributed Simulations
    Architecture
  • Performance Comparison Results
  • Conclusions
  • Future Work

5
The Goal
  • A framework such that user/programmer should have
    to code only the problem specific logic.
  • The framework manages the parallelization of the
    users application
  • Load Balancing required for some problems to
    achieve good performance
  • the model that we build is geared towards
    achieving a good performance via moving fixed
    size jobs across machines we will see a slide
    that shows that we do get very good improvement
    in performance with load balancing.

6
Solution 1 OptimalGrid
  • Developed by IBM Almaden Research Lab
  • Specifically built to parallelize connected
    problems
  • Built using Java
  • User only needs to supply
  • Java code for the problem to be parallelized
  • Changes to configuration files to fine-tune the
    behaviour of OptimalGrid (if required)

7
OptimalGrid Architecture
Manages the Compute Agents
Monitors the Agents tracks their status
performs optimizations if necessary
Invokes Problem Builder if necessary which
automatically Partitions Problem
Distributes the problem among agents
8
Data Units
Original Problem Cell (OPC )
9
Implementation of Jacobi method of Heat Flow
OptimalGrid (1)
  • EntityJacobi extends
  • EntityAbstract
  • Data Members
  • double temperature
  • Class Name
  • Methods
  • double getTemperature()
  • void setTemperature (double)
  • void initFromXML(Element entity)
  • Element getXML()

10
Implementation of Jacobi method of Heat Flow
OptimalGrid (2)
localInteraction(ArrayList occupants) double
temperature 0.0 if(this.loc.x 0) //
Cell is on first row of grid temperature
5.0 else if(this.loc.y 0
this.loc.y Grid_Dimension-1) temperature
0 else // Inner cells compute
for(each neighbor entry in occupants list)
temperature occupantsEntry.getTemperature()
temperature / 4 occupants(0).setTemp
erature(temperature) Remove all neighbor
entries from occupantsList
  • OPCJacobi extends OPCAbstract
  • Data Members
  • Version Identifier
  • Class Name
  • Methods
  • propagate()
  • localInteraction()

propagate() ArrayList newOccupants ArrayList
neibList neibList this.getAllOpcNeighbors()
for(each entry in neibList)
newOccupants.add(neibListEntry.
getOccupants() ) return newOccupants
11
MPI Solution
  • Message Passing Interface library a
    specification
  • Various Implementations of this specification
    available eg.
  • LAM MPI developed by Ohio Supercomputer Centre.
  • MPICH developed by Argonne National Labs and
    Mississippi State University. (used in this work)

12
Features of MPI
  • Flexible Send and Receive APIs
  • void CommSend(void buf, int count, Datatype
    datatype, int destination, int tag)
  • void CommRecv(void buf, int count, Datatype
    datatype, int source, int tag)
  • Collective Communications support
  • Broadcast
  • Scatter and Gather operations between a set of
    processes
  • Collective computation operations such as
    minimum, maximum, sum etc.

13
Features of MPI (contd.)
  • Virtual Topologies
  • Communication Modes
  • Non-blocking versions of Send/Receive APIs
  • Synchronous Mode
  • Buffered Mode
  • Debugging and Profiling hooks

14
MPI Solution overview
Process 0
Process 1
Process 2
Process 3
15
MPI Implementation - Jacobi Iteration method
Number of Processes (N) Partition Size Number of
Iterations
All processes (0N-1)
Create_cart()
Cartesian Matrix of processes
CommShift()
Each process has the id of the left, top, right
and bottom neighbor processes
16
MPI Implementation - Jacobi Iteration method
(contd.)
Processes 1 N-1 wait for the partition
co-ordinates from Process 0
Process 0 computes the grid co-ordinates of the
partition to be assigned to each process
Send
Allocate Boundary Buffers for the partition
17
MPI Implementation - Jacobi Iteration method
(contd.)
Issue calls to Isend and Irecv non blocking
methods to send/recv data
No
Iterations finished
Yes
Compute Inner Cells
Send the result data to the Process 0
Wait for the Isend and Irecv calls to complete
Compute Outer Cells
18
Parallel Adaptive Distributed Simulations A
new model
  • But Why ?
  • Experiment with bringing together the concepts of
    partititions, double buffers, thread pool, jobs,
    synchronization schemes in thread pools and load
    balancing by moving fixed size jobs across
    machines.
  • OptimalGrid does some of this, however it is
    proprietary software hence no access to its
    source.
  • It is fairly easy to code the application using
    MPI, however for problems such as atomistic
    motion simulation, load balancing is a required
    feature.
  • Can we do better ?

19
Thread Pool
20
Jobs and partitions
public bool execute(phase) switch(phase)
case 0 // Sequential code for phase 0 phase
1 if(!synchronizationMethod(this)) return
false else return true case 1
  • Data Members
  • int phase
  • Methods
  • bool execute

About to enter Synchronization part Advance phase
by 1
Synchronization returns false Job has to wait on
some Condition Job thread should relinquish
this job
Synchronization returns true Job can continue

21
PADS - Architecture
  • Establish communication channels with controller
    as well as with other controller agents
  • Initialize Thread Pool on host
  • Deploy jobs received from controller in Thread
    Pool
  • Handle communication requirements of each job
  • Respond to controller messages (load balancing)
  • Send results back to controller after iterations
    are complete
  • Parsing Input
  • Opening, monitoring communication channels with
    controller agents
  • Partitioning input grid
  • Assigning the jobs to the hosts/nodes
  • Co-ordinate load balancing
  • Collect results from all hosts after iterations
    are completed
  • Emit output in the specified format

22
Communication between Controller Agents
23
Synchronization Jobs and Controller agent
  • All communication between jobs is through
    controller agent(s)
  • Hence, synchronization required only between the
    controller agent and the job
  • Shared Job data
  • Time step of job
  • Boundary Buffers
  • Waiting flag
  • Frozen flag

24
Load Balancer
Node NS Job JMC
Messages
Job movement
Controller
Node NS Job JM
Messages
25
Overview of Load Balancer Module
26
Overview of Load Balancer Module (2)
27
Overview of Load Balancer Module (3)
28
Overview of Load Balancer Module (4)
29
Performance ComparisonExperiment 1
30
Performance ComparisonExperiment 2
31
Performance ComparisonExperiment 3
32
PADS Performance comparison for varying number
of threads per node (50 x 50 partition size)
33
PADS Performance comparison for varying number
of threads per node (25 x 25 partition size)
34
PADS Performance comparison for varying number
of threads per node (10 x 10 partition size)
35
Preliminary Results for Load Balancer
36
Conclusions
  • OptimalGrid seems to perform better than PADS and
    MPI solution for a larger grain size. ( 10 µs)
    (System.nanoseconds() accuracy ? )
  • PADS and MPI perform better than the OptimalGrid
    by an order of magnitude for small grain size (4
    10ns)
  • OptimalGrid provides features that can be used
    easily by the user.
  • MPI provides hooks for logging and debugging
    which can be used by the programmer.
  • OptimalGrid and PADS allow for load balancing to
    be done automatically. With PADS, from the
    results of the simulation, we see that a good
    performance improvement is obtained with load
    balancing. MPI does not provide dynamic load
    balancing.

37
Future Work
  • Formulation and implementation of policies for
    dynamic load balancing in PADS.
  • Experiment with flexibility - partitions are
    allowed to have variable dimensions however,
    synchronization and communication will become
    more complex and might give rise to more
    overhead.
  • Heterogeneity in duration of a time-step and
    computation among jobs needs to be allowed for
    implementation of certain problems.
  • Develop a GUI for PADS.

38
Acknowledgements
  • Dr. Virgil E. Wallentine
  • Dr. Daniel A. Andresen
  • Dr. Gurdip Singh
  • Dr. Masaaki Mizuno

39
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com