MPI - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

MPI

Description:

Title: Linux Cluster Workshop Subject: MPI Author: Kadin Tseng Last modified by: Kadin Tseng Created Date: 2/18/1999 9:06:02 PM Document presentation format – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 25
Provided by: Kadin2
Learn more at: http://www.bu.edu
Category:
Tags: mpi

less

Transcript and Presenter's Notes

Title: MPI


1
High Performance Computing with MATLAB Kadin
Tseng Scientific Computing and Visualization
Group Boston University
2
Outline
  • Performance Issues
  • Memory Access
  • Vectorization
  • Compiler
  • Other Considerations
  • Parallel MATLAB

3
Memory Access
  • Memory access patterns often affect computational
    performances. Here are some effective ways to
    enhance performance
  • Allocate array memory before using it
  • For-loops Ordering
  • Compute and save array in-place wherever possible

4
Allocate Array
  • Allocate array memory before using it.
  • MATLAB is designed primarily as an
    interactive, user-friendly environment. No
    pre-allotment of memory is required. Often,
    however, array sizes are known a priori. By
    pre-allocating it ensures that all array elements
    are allocated in one single, contiguous block
    right from the start.

n5000 x(1) 1 for i2n x(i)
2x(i-1) end Wallclock time 0.0153 seconds
n5000 x ones(n,1) x(1) 1 for i2n
x(i) 2x(i-1) end Wallclock time 0.0002
seconds
The timing data are recorded on Katana. The
actual times can vary significantly depending on
the processor.
5
For-loop Ordering
  • Best if inner-most for loop is for left-most
    index of array, etc.
  • For a multi-dimensional array, x(i,j), the 1D
    representation of the same array, x(k),
    inherently possesses the contiguous property

n5000 x zeros(n) for i1n rows
for j1n columns x(i,j) i(j-1)n
end end Wallclock time 0.88 seconds
n5000 x zeros(n) for j1n columns
for i1n rows x(i,j) i(j-1)n
end end Wallclock time 0.48 seconds
6
Compute In-place
  • Compute and save array in-place improves
    performance

x randn(10000) tic y x.2 toc Wallclock
time 1.23 seconds
x randn(10000) tic x x.2 toc Wallclock
time 0.49 seconds
7
Other Considerations
  • Use function m-file instead of script m-file
    whenever reasonable
  • Script m-file is loaded into memory and evaluate
    one line at a time. Subsequent uses require
    reloading.
  • Function m-file is compiled into a pseudo-code
    and is loaded once. Subsequent use of the
    function will be faster without reloading.
  • Avoid using virtual memory. Physical memory is
    much faster.
  • Avoid passing large matrices to a function and
    modifying only a handful of elements.
  • Use MATLAB profiler (profile) to identify hot
    spots for performance enhancement.

8
Vectorization
  • The use of for loop in MATLAB, in general, can be
    expensive, especially if the loop count is large
    or nested for-loops.
  • Without array allocation, for-loops are very
    costly.
  • From a performance standpoint, in general, a
    compact vector representation should be used in
    place of for-loops. Here is an example.

i 0 for t 0.0110 i i 1 y(i)
sin(t) end Wallclock time 0.0045 seconds
t 0.0110 y sin(t) Wallclock time
0.0005 seconds
9
Compiler
  • A MATLAB compiler, mcc, is available.
  • It compiles m-files into C codes, object
    libraries, or stand-alone executables.
  • A stand-alone executable generated with mcc can
    run on compatible platforms without an installed
    MATLAB or a MATLAB license.
  • Many MATLAB general and toolbox licenses are
    available at BU. On special occasions, MATLAB
    access may be denied if all licenses are checked
    out. Running a stand-alone requires NO licenses
    and no waiting.
  • Some compiled codes may run more efficiently than
    m-files because they are not run in interpretive
    mode.
  • A stand-alone enables you to share it without
    revealing the source.
  • http//scv.bu.edu/documentation/tutorials/MATLAB/c
    ompiler/

10
Is Parallel MATLAB the way to go ?
  • Even in the best case, cant compete with
    C/Fortran with MPI/OpenMP
  • It is an acceptable compromise if
  • Converting your MATLAB code to C/Fortran requires
    too big of an effort and you dont have the time
    or inclination to do that.
  • A big job typically takes hours, rather than
    days, to run on a single processor.
  • You strongly prefer the relative ease and
    efficiency in programming a research code in
    MATLAB.
  • The appropriate multiprocessing MATLAB paradigm
    is at your disposal.

11
Multiprocessing MATLAB
1 MatlabMPI 2 pMatlab 3 SCVs parallel MATLAB 4
Distributed Computing Toolbox 5 Star-P
12
1 MatlabMPI
  • MatlabMPI is a parallel MATLAB package developed
    at Lincoln Lab in Lexington, MA.
  • It does not require or make use of high speed
    interconnect for communication among cluster
    nodes. Instead, it relies on the network file
    system being visible, or shared, by all
    processors. With this, message passing is
    achieved through I/O to the file system.
  • It has a small basic set of utility routines that
    mimic those of the Message Passing Interface
    (MPI) in functionalities. While the MPI routines
    for sending and receiving messages are performed
    via high speed interconnect, the routines in this
    package accomplish the same tasks via I/O.
  • It is good for embarrassingly parallel codes
    that require only infrequent communications.

13
2 pMatlab
  • pMatlab is a parallel MATLAB package also
    developed at Lincoln Lab in Lexington, MA. It is
    built on top of MatlabMPI.
  • As such, it inherits all the properties of
    MatlabMPI. It can be thought of as providing
    higher-level wrapper functions to insulate the
    programmers from having to deal with lower-level
    function calls to perform parallel tasks.
  • It is good for embarrassingly parallel algorithms
    with very modest amount of communications.

14
3 SCVs parallel MATLAB
  • SCV has a very simple parallel MATLAB package
    that is also based on the shared network file
    system concept as with MatlabMPI.
  • It is limited to most of the same restrictions as
    MatlabMPI. However, there are two departures
  • 1. There is only one batch script and two
    function m-files to be inserted to your code.
  • 2. These include a barrier function to
    synchronize work performed on multiprocessing
    nodes. This is typically required for codes that
    contain serial and parallel sections.
  • It is good for embarrassingly parallel algorithms
    with very modest amount of communications.
  • Email or call Kadin if you want to use any of the
    above three packages. An example is given next.

15
SCV parallel MATLAB Example 1
  • This example demonstrates the use of
    multiprocessors to compute C A B (matrix
    size is N2)
  • Decomposition along columns can also be
    decomposed along rows, or both.
  • C(, range(rank)) A(, range(rank)) B(,
    range(rank))
  • In the above, range(rank) is the range of
    columns as a function of the processor rank
  • range(rank) rankn1ranknn
    (0ltrankltnproc-1 nN/nproc)
  • For simplicity, N is assumed to be divisible by
    nproc
  • N 8 size of global matrix A
  • I (1N) generate column vector
  • A I(, ones(1,N))10 I(, ones(1,N))
    generate A on current (and all) process
  • pbegin, pend, rank, nproc parallel_info(N)
    query for parallel info
  • rank (0ltrankltnproc-1) is the current MATLAB
    process
  • n N/nproc distributed column
    size of matrix B
  • b I(, ones(1,n))10 generate N x n
    matrix b (local B)
  • c A(, pbeginpend) b compute local c
    from A and local b
  • save matrix_c each current dir has own
    individual copy of c

16
SCV parallel MATLAB Example 1 (contd)
  • Run barrier to synchronize all processors
  • ierr barrier(rank, nproc)
  • Finally, perform (serial) gather on c of all
    ranks into C on 0
  • if (rank 0)
  • C zeros(N) allocate C
  • C(,1n) c starts with c from rank
    0 which is already in memory
  • for k1nproc-1
  • i nk1 beginning location to which
    c will be inserted
  • j nkn end location
  • fk ../' num2str(k) /matrix_c'
    file name of c on process k
  • load(fk, 'c')
  • C(,ij) c
  • end
  • save(../matrixC, C) save C to
    parent dir
  • end

17
parallel MATLAB Example 1 batch script
  • !/bin/csh
  • Example SGE script for running parallel MATLAB
    jobs on Katana
  • Submit job with the command qsub batch_sge.scv
  • " qsub_option" is interpreted by qsub as if
    "qsub_option" was passed to qsub on commandline.
  • Set hard runtime (wallclock) limit, default is
    2 hours. Format -l h_rtHHMMSS
  • -l h_rt20000
  • Merge stderr into the stdout file to reduce
    clutter.
  • -j y
  • Invoke Parallel Environment for N processors.
    No default value, it must be specified.
  • For MATLAB apps, DO NOT select omp
  • -pe 1_per_node 4
  • end of qsub options
  • By default, the script is executed in the
    directory from which it was submitted
  • with qsub. You might want to change directories
    before invoking mpirun ...
  • cd PWD
  • running the following script generates multiple
    concurrent copies of MATLAB
  • Use addpath in startup.m to add path to all
    necessary matlab m-files
  • batch_sge and sge_matlab should live in either
    HOME/bin or PWD
  • sge_matlab PWD scv_matlab_example.m

18
SCV parallel MATLAB Example 2
The airplane is represented with patches of
quadrilateral elements and the integral
formulation is discretized to yield
? is the known Neumann boundary condition. f is
the unknown to be solved for.
19
parallel MATLAB Example 2 Geometry
20
parallel MATLAB Example 2 timings
21
How slow is MATLAB compared with C ?
22
4 Distributed Computing Toolbox
  • The Mathworks has a DCT which is a parallel
    MATLAB package that utilizes the clusters high
    speed interconnect for inter-processor
    communications.
  • At present, DCT is not available on SCV machines.

23
5 StarP
  • StarP is a parallel MATLAB product of Interactive
    Supercomputing, Inc. It bears some resemblance to
    the pMatlab package in that it enables parallel
    MATLAB while shielding the programmers from most
    of the lower level parallel programming.
  • Like Mathworks DCT, StarP is a parallel MATLAB
    package that utilizes high speed interconnect for
    inter-processor communications.
  • At present, this package is not available on SCV
    machines.

24
Useful SCV Info
  • SCV home page (http//scv.bu.edu/)
  • Resource Applications (https//acct.bu.edu/SCF)
  • Help
  • Web-based tutorials (http//scv.bu.edu/)
  • (MPI, OpenMP, MATLAB, IDL, Graphics tools)
  • HPC consultations by appointment
  • Kadin Tseng (kadin_at_bu.edu)
  • Doug Sondak (sondak_at_bu.edu)
  • help_at_twister.bu.edu, help_at_cootie.bu.edu
Write a Comment
User Comments (0)
About PowerShow.com