Parallel Programming in Computational Economics: RANGM as a Benchmark for MPI

About This Presentation

Title:

Parallel Programming in Computational Economics: RANGM as a Benchmark for MPI

Description:

Windows: can use dual Quad-core PC (8 CPU _at_ 4GHz) Software Environments ... Variables have same value in all nodes (e.g. dual core processor) Distributed Memory: ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 24

Provided by: assetsWha

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Programming in Computational Economics: RANGM as a Benchmark for MPI

1
Parallel Programming in Computational
EconomicsRA-NGM as a Benchmark for MPI

Michael Michaux
(still representing Wharton)

2
Outline

Motivation
Software and Hardware
Software Environments
Type of memory
MPI Library in detail
Philosophy of parallel programming
Example RA-NGM
Performance

3
Motivation

Large models require a lot of CPU time ?
Examples
Large state space (e.g. population, OLG, )
High number of agents (e.g. GE with
Cons/Prod/Gov)
Structural estimations (e.g. DSGE models)
Large scale simulations (e.g. Forecasting in
Aiyagari)

4
Motivation

Remedies
Use fast language (e.g. Fortran, C)
Use a fast computer (e.g. Pentium Xeon 4GHz)
Then what? ?
Use a cluster collection of CPU ( nodes)
Linux Connect PCs and use the Beowulf library
Windows can use dual Quad-core PC (8 CPU _at_ 4GHz)

5
Software Environments

There are three major software environments which
are used for parallel programming
MPI (Message Passing Interface)
OpenMP
PVM (Parallel Virtual Machine)
Used as an additional library to a computing
language
Typically used with Fortran or C
Also used with Java, MATLAB, R (less popular)

6
Type of Memory

Shared Memory
Nodes share the same memory space
Variables have same value in all nodes (e.g. dual
core processor)
Distributed Memory
Nodes have independent memory space
Variables have different values in different
nodes (e.g. computers connected by ethernet)

7
MPI Library

MPI Library
Designed for distributed-memory environment
Routines enable different nodes to send and
receive data each other
Have to write exactly what each node does
Scalability
Code doesnt depend on the number of nodes
Speed usually increases as the number of nodes
increases
Portability
Very standardized
MPI code can be used on any cluster (with MPI
installed)

8
Philosophy of // programming

You only write one code, and this code runs
simultaneously in all the nodes (i.e. processors)
In the code
Each node is assigned an id 0, 1, , Nproc1
Explicitly tell which node does which job
Assign jobs to different nodes by referring to id
MPI is a distributed-memory environment
Remember what data each node owns
Tell the nodes to transfer data among them

9
Philosophy of // programming

Gain of using parallel code is large
Large part of a code parallelized
Transmission of data across nodes is minimal
Avoid less idle time for all the nodes

10
Example RA-NGM

The Bellman Equation (BE) is
with parameters

11
RA-NGM

The idea is to split the capital grid into Nproc
pieces
Do your optimization separately
Send results to all the other nodes
Wait until data from all other nodes is received

12
Step 1 Do optimization separately
13
Step 2 Send results to all the other nodes
14
Step 3 Wait until data from other nodes is
received
15
RA-NGM

Solved using all the tricks in the book
Monotonicity
Concavity
FOC for n
The grid for capital is

16
Main features in Fortran Code

Initialization of MPI
MPI_INIT Initialize the MPI environment
MPI_COMM_RANK Get id
MPI_COMM_SIZE Get nproc
Create the local/global variables
Allocate the memory Compute variables
size(id,nproc)
Define the local and global variables

17
Main features in Fortran Code

Debugging
Use test if (id0) then
Communication routines
MPI_SEND Send data using tag(id,
idnow1Nproc-1)
MPI_RECEIVE Receive data using same tag(.,.)
BE CAREFUL with tags! Easy to mess up!
Print results using id0 only
Finish the code with MPI_FINALIZE

18
Performance

I used a PC with 4 CPU _at_ 3GHZ and 1Gb RAM
(smallest Dell Precision 690)
I used the (free) library MPICH2 provided by
Argonne National Laboratory

19
CPU History

1 node
RAM 395MB
time100

20
CPU History

8 nodes
RAM 637MB
time49

21
(No Transcript)
22
Conclusion

This problem (my code MPI Windows) does not
have a very good scalability
The best result is 49 with 4 CPUs!
I blame the data communication with windows
In general the performance is lowered due to data
communication with high number of nodes (Nproc).
Note that you need more RAM as Nproc is increased.

Parallel Programming in Computational Economics: RANGM as a Benchmark for MPI - PowerPoint PPT Presentation

Parallel Programming in Computational Economics: RANGM as a Benchmark for MPI

Windows: can use dual Quad-core PC (8 CPU _at_ 4GHz) Software Environments ... Variables have same value in all nodes (e.g. dual core processor) Distributed Memory: ... – PowerPoint PPT presentation