Introduction to Parallel Computing, MPI, and OpenMP - PowerPoint PPT Presentation

Loading...

PPT – Introduction to Parallel Computing, MPI, and OpenMP PowerPoint presentation | free to download - id: bbe1e-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Introduction to Parallel Computing, MPI, and OpenMP

Description:

High Performance Computing Photos -- http://cs.calvin.edu/CS/parallel/resources/photos ... set the include paths and links to appropriate libraries ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 84
Provided by: LisaAult9
Learn more at: http://www.mgnet.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Introduction to Parallel Computing, MPI, and OpenMP


1
Introduction to Parallel Computing, MPI, and
OpenMP
Chunfang Chen, Danny Thorne, Adam Zornes
2
Outline
  • Introduction to Parallel Computing, by Danny
    Thorne
  • Basic Parallel Computing Concepts
  • Hardware Characteristics
  • Introduction to MPI, by Chunfang Chen
  • Introduction to OpenMP, by Adam Zornes

3
Parallel Computing Concepts
  • Definition
  • Types of Parallelism
  • Performance Measures
  • Parallelism Issues

4
Definition
Parallel Computing
  • Computing multiple things simultaneously.
  • Usually means computing different parts of the
    same problem simultaneously.
  • In scientific computing, it often means
    decomposing a domain into more than one
    sub-domain and computing a solution on each
    sub-domain separately and simultaneously (or
    almost separately and simultaneously).

5
Types of Parallelism
  • Perfect (a.k.a Embarrassing, Trivial)
    Parallelism
  • Monte-Carlo Methods
  • Cellular Automata
  • Data Parallelism
  • Domain Decomposition
  • Dense Matrix Multiplication
  • Task Parallelism
  • Pipelining
  • Monte-Carlo?
  • Cellular Automata?

6
Performance Measures I
  • Peak Performance Theoretical upper bound on
    performance.
  • Sustained Performance Highest consistently
    achievable speed.
  • MHz Million cycles per second.
  • MIPS Million instructions per second.
  • Mflops Million floating point operations per
    second.
  • Speedup Sequential run time divided by parallel
    run time.

7
Performance Measures II
  • Number of Procs p.
  • Sequential Run-time Tseq.
  • Parallel Run-time Tpar.
  • Speedup S Tseq / Tpar. // Want Sp.
  • Efficiency E S / p. // Want E1.
  • Cost C p Tpar. // Want CTseq.

8
Parallelism Issues
  • Load Balancing
  • Problem Size
  • Communication
  • Portability
  • Scalability
  • Amdahls law For constant problem size, speedup
    goes to one (efficiency goes to zero) as the
    number of processors goes to infinity.

9
Hardware Characteristics
  • Kinds of Processors
  • Types of Memory Organization
  • Flow of Control
  • Interconnection Networks

10
Kinds of Processors
  • A few very powerful processors.
  • Cray SV1
  • 8-32 procs, 1.2Gflops per proc.
  • A whole lot of less powerful processors.
  • Thinking Machines CM-2
  • 65,536 procs, 7Mflops per proc.
  • ASCI White, IBM SP Power3
  • 8192 procs, 375 MHz per proc.
  • A medium quantity of medium power procs.
  • Beowulf
  • e.g. Bunyip, 192 x Intel Pentium III/550

11
Types of Memory Organization
  • Distributed Memory
  • Shared Memory
  • Distributed Shared Memory

12
Distributed Memory
13
Shared Memory
(HP Super Dome)
14
Distributed Shared Memory
15
Flow of Control
16
Dynamic Interconnection Networks
  • a.k.a. Indirect networks.
  • Dynamic (Indirect) links between processors and
    memory.
  • Usually used for shared memory computers.

17
Static Interconnection Networks
  • a.k.a. Direct networks.
  • Point-to-point links between processors.
  • Usually for message passing (distributed memory)
    computers.

18
Summary
  • Basic Parallel Computing Concepts
  • Parallel Computing is
  • Perfect Parallelism, Data Parallelism, Task
    Parallelism
  • Peak vs. Sustained Performance, Speedup,
    Efficiency, Cost
  • Load Bal., Communication, Prob. Size,
    Scalability, Amdahl
  • Hardware Characteristics
  • Few Powerful Procs, Many Weaker Procs, Medium
  • Distributed, Shared, and Distributed-Shared
    Memory
  • Flynns Taxonomy SISD, SIMD, MISD, MIMD
  • Bus Network, Crossbar Switched Network,
    Multistage
  • Star, Mesh, Hypercube, Tree Networks

19
Links
Alliance Web Based Training for HPC --
http//webct.ncsa.uiuc.edu8900/webct/public/home.
pl Kumar, Grama, Gupta, Karypis, Introduction
to Parallel computing -- ftp//ftp.cs.umn.edu/dept
/users/kumar/book Selected Web Resources for
Parallel Computing -- http//www.eecs.umich.edu/q
stout/parlinks.html Deep Blue --
http//www.research.ibm.com/deepblue/meet/html/d.3
.html Current Trends in Supercomputers and
Scientific Computing -- http//www.jics.utk.edu/CO
LLABOR_INST/MMC/ Writing A Task or Pipeline
Parallel Program -- http//www.epcc.ed.ac.uk/direc
t/VISWS/CINECA/tsld041.htm HP Technical
Documentation -- http//docs.hp.com Linux
Parallel Processing HOWTO -- http//aggregate.org/
PPLINUX/19980105/pphowto.html Introduction to
Parallel Processing -- http//www.jics.utk.edu/I2P
P/I2PPhtml/ Message Passing Interface MPI for
users -- http//www.npac.syr.edu/users/gcf/cps615m
pi95/index.html Intro to Parallel Computing I
-- http//archive.ncsa.uiuc.edu/Alliances/Exemplar
/Training/NCSAMaterials/IntroParallel_I/index.htm
Thinking Machines CM-2. -- http//www.svisions.co
m/sv/cm-dv.html The Beowulf Project --
http//www.beowulf.org Bunyip (Beowulf) Project
-- http//tux.anu.edu.au/Projects/Beowulf/ Robust
Monte Carlo Methods for Light Transport
Simulation -- http//graphics.stanford.edu/papers/
veach_thesis/ An Introduction to Parallel
Computing -- http//www.pcc.qub.ac.uk/tec/courses/
intro/ohp/intro-ohp.html Supercomputing,
Parallel Processors and High Performance
Computing -- http//www.compinfo-center.com/tpsupr
-t.htm Internet Parallel Computing Archive --
http//wotug.ukc.ac.uk/parallel/ IEEE Computer
Society's ParaScope, A Listing of Parallel
Computing Sites -- http//computer.org/parascope/
High Performance Computing (HPC) Wire --
http//www.tgc.com/HPCwire.html KAOS
Laboratory, University of Kentucky --
http//aggregate.org/KAOS/ Notes on Parallel
Computer Architecture -- http//www.npac.syr.edu/n
se/hpccsurvey/architecture/index.html Nan's
Parallel Computing Page -- http//www.cs.rit.edu/
ncs/parallel.html High Performance Computing
Photos -- http//cs.calvin.edu/CS/parallel/resourc
es/photos/ Parallel Networking Topologies --
http//www.cs.rit.edu/icss571/parallelwrl/cgframe
.html What is mixed parallelism? --
http//www.ens-lyon.fr/fsuter/pages/mixedpar.html

20
Introduction to MPI
21
Outline
  • Introduction to Parallel Computing, by Danny
    Thorne
  • Introduction to MPI, by Chunfang Chen
  • Writing MPI
  • Compiling and linking MPI programs
  • Running MPI programs
  • Introduction to OpenMP, by Adam Zornes

22
Writing MPI Programs
  • All MPI programs must include a header file. In
    C mpi.h, in fortran mpif.h
  • All MPI programs must call MPI_INIT as the first
    MPI call. This establishes the MPI environment.
  • All MPI programs must call MPI_FINALIZE as the
    last call, this exits MPI.

23
Program Welcome to MPI
Program Welcome include mpif.h integer
ierr Call MPI_INIT(ierr) print, Welcome to
MPI Call MPI_FINALIZE(ierr) end
24
Commentary
  • Only one invocation of MPI_INIT can occur in each
    program
  • Its only argument is an error code (integer)
  • MPI_FINALIZE terminates the MPI environment ( no
    calls to MPI can be made after MPI_FINALIZE is
    called)
  • All non MPI routine are local i.e. Print,
    Welcome to MPI runs on each processor

25
Compiling MPI programs
  • In many MPI implementations, the program can be
    compiled as
  • mpif90 -o executable program.f
  • mpicc -o executable program.c
  • mpif90 and mpicc transparently set the include
    paths and links to appropriate libraries

26
Compiling MPI Programs
  • mpif90 and mpicc can be used to compile small
    programs
  • For larger programs, it is ideal to make use of a
    makefile

27
Running MPI Programs
  • mpirun -np 2 executable
  • - mpirun indicate that you are using the
  • MPI environment.
  • - np is the number of processors you
  • like to use ( two for the present case)

28
Sample Output
  • Sample output when run over 2 processors will be
  • Welcome to MPI
  • Welcome to MPI
  • Since Print, Welcome to MPI is local
    statement, every processor execute it.

29
Finding More about Parallel Environment
  • Primary questions asked in parallel program are
  • - How many processors are there?
  • - Who am I?
  • How many is answered by MPI_COMM_SIZE
  • Who am I is answered by MPI_COMM_RANK

30
How Many?
  • Call MPI_COMM_SIZE(mpi_comm_world, size, ierr)
  • - mpi_comm_world is the communicator
  • - Communicator contains a group of processors
  • - size returns the total number of processors
  • - integer size

31
Who am I?
  • The processors are ordered in the group
    consecutively from 0 to size-1, which is known as
    rank
  • Call MPI_COMM_RANK(mpi_comm_world,rank,ierr)
  • - mpi_comm_world is the communicator
  • - integer rank
  • - for size4, ranks are 0,1,2,3

32
Communicator
  • MPI_COMM_WORLD

1
2
0
3
33
Program - Welcome to MPI
  • Program Welcome
  • include mpif.h
  • integer size, rank, ierr
  • Call MPI_INIT(ierr)
  • Call MPI_COMM_SIZE(mpi_comm_world, size, ierr)
  • Call MPI_COMM_RANK((mpi_comm_world, rank, ierr)
  • print, my rank is, rank, Welcome to MPI
  • call MPI_FINALIZE(ierr)
  • end

34
Sample Output
  • Sdx1 28 mpif90 welcome.f90
  • /usr/ccs/bin/ld(warning) At least one PA2.0
    object file (welcome.o) was detected. The linked
    output may not run on a PA 1.x system.
  • Sdx1 29 mpirun -np 4 a.out
  • my rank is 2 Welcome to MPI
  • my rank is 0 Welcome to MPI
  • my rank is 1 Welcome to MPI
  • my rank is 3 Welcome to MPI

35
Sending and Receiving Messages
  • Communication between processors involves
  • - identify sender and receiver
  • - the type and amount of data that is being
    sent
  • - how is the receiver identified?

36
Communication
  • Point to point communication
  • - affects exactly two processors
  • Collective communication
  • - affects a group of processors in the
    communicator

37
Point to point Communication
  • MPI_COMM_WORLD

1
0
2
3
38
Point to Point Communication
  • Communication between two processors
  • source processor sends message to destination
    processor
  • destination processor receives the message
  • communication takes place within a communicator
  • destination processor is identified by its rank
    in the communicator

39
Communication mode
  • Synchronous send(MPI_SSEND)
  • buffered send
  • (MPI_BSEND)
  • standard send
  • (MPI_SEND)
  • receive(MPI_RECV)
  • Only completes when the receive has completed
  • Always completes (unless an error occurs),
    irrespective of receiver
  • Message send(receive state unknown)
  • Completes when a message had arrived

40
Standard Send
  • Call MPI_SEND(buf,count,datatype,dest,tag,comm,ier
    r)
  • - buf is the name of the array/variable to be
    broadcasted
  • - count is the number of elements to be sent
  • - datatype is the type of the data
  • - dest is the rank of the destination processor
  • - tag is an arbitrary number which can be used
    to
  • distinguish among messages
  • - comm is the communicator( mpi_comm_world)

41
MPI Receive
  • Call MPI_RECV (buf,count,datatype,source,tag,comm,
    status,ierr)
  • - source is the rank of the processor from
    which data will
  • be accepted (this can be the rank of a
    specific
  • processor or a wild card- MPI_ANY_SOURCE)
  • - tag is an arbitrary number which can be used
    to
  • distinguish among messages (this can be a
    wild card-
  • MPI_ANY_TAG)

42
Basic data type (Fortran)
  • MPI_INTEGER
  • MPI_REAL
  • MPI_DOUBLE_PRECISION
  • MPI_COMPLEX
  • MPI_LOGICAL
  • MPI_CHARACTER
  • Integer
  • Real
  • Double Precision
  • Complex
  • Logical
  • Character

43
Sample Code with Send/Receive
  • include mpif.h
  • ! Run on 2 processors
  • integer size, rank, ierr,tag,status
  • character(14) message
  • Call MPI_INIT(ierr)
  • Call MPI_COMM_SIZE(mpi_comm_world, size, ierr)
  • Call MPI_COMM_RANK((mpi_comm_world, rank, ierr)
  • tag7
  • if(rank.eq.0)then

44
Sample Code with Send/Receive (cont.)
  • message Welcome to MPI
  • call MPI_SEND
  • (message,14,MPI_CHARACTER,1,tag,mpi_comm_
    world,ierr)
  • else
  • call MPI_RECV (message,14,MPI_CHARACTER,MPI_ANY_
    SOURCE,tag,mpi_comm_world,status,ierr)
  • print, my rank is , rank, message is ,
    message
  • endif
  • call MPI_FINALIZE(ierr)
  • end

45
Sample Output
  • Sdx1 30 mpif90 sendrecv.f90
  • /usr/ccs/bin/ld(warning) At least one PA2.0
    object file (sendrecv.o) was detected. The linked
    output may not run on a PA 1.x system.
  • Sdx1 31 mpirun -np 2 a.out
  • my rank is 1 Message is Welcome to MPI

46
Collective Communication
  • MPI_COMM_WORLD

1
0
2
3
47
Collective Communication
  • Will not interfere with point-to-point
    communication and vice-versa
  • All processors must call the collective routine
  • Synchronization not guaranteed (except for
    barrier)
  • no tags
  • receive buffer must be exactly the right size

48
Collective Routines
  • MPI_BCAST
  • MPI_REDUCE

49
Collective RoutineMPI_ BCAST
  • call MPI_BCAST
  • (buffer,count,datatype,source,comm,ierr)
  • - buffer is the name of the array/variable to be
    broadcasted
  • - count is the number of elements to be sent
  • - datatype is the type of the data
  • - source is the rank of the processor from which
    data will be sent
  • - comm is the communicator( mpi_comm_world)

50
Sample code using MPI_BCAST
  • include mpif.h
  • integer size, rank, ierr
  • real para
  • Call MPI_INIT(ierr)
  • Call MPI_COMM_SIZE(mpi_comm_world, size, ierr)
  • Call MPI_COMM_RANK((mpi_comm_world, rank, ierr)
  • if (rank.eq.3) para23.0
  • Call MPI_BCAST(para,1,MPI_REAL,3,MPI_COMM_WORLD,i
    err)

51
Sample code (cont.)
  • Print,my rank is , rank, after broadcast
    para is , para
  • call MPI_FINALIZE(ierr)
  • end

52
Sample Output
  • Sdx1 32 mpif90 bcast.f90
  • /usr/ccs/bin/ld(warning) At least one PA2.0
    object file (bcast.o) was detected. The linked
    output may not run on a PA 1.x system.
  • Sdx1 33 mpirun -np 4 a.out
  • my rank is 3 after broadcast para is 23.0
  • my rank is 2 after broadcast para is 23.0
  • my rank is 0 after broadcast para is 23.0
  • my rank is 1 after broadcast para is 23.0

53
Collective RoutineMPI_ REDUCE
  • call MPI_REDUCE
  • (sendbuffer,recvbuffer,count,datatype,op,root,comm
    ,ierr)
  • - sendbuffer is the buffer/array to be sent
  • - recvbuffer is the receiving buffer/array
  • - datatype is the type of the data
  • - op is the collective operation
  • - root is the rank of the destination
  • - comm is the communicator

54
Collective Operation
  • MPI_MAX
  • MPI_MIN
  • MPI_SUM
  • MPI_PROD
  • MPI_MAXLOC
  • MPI_MINLOC
  • MPI_LOR
  • MPI_LXOR
  • maximum
  • minimum
  • sum
  • product
  • maximum and location
  • minimum and location
  • logical OR
  • logical exclusive OR

55
Sample code using MPI_REDUCE
  • include mpif.h
  • integer size, rank, ierr
  • integer in(2),out(2)
  • Call MPI_INIT(ierr)
  • Call MPI_COMM_SIZE(mpi_comm_world, size, ierr)
  • Call MPI_COMM_RANK((mpi_comm_world, rank, ierr)
  • in(1)rank1
  • in(2)rank

56
Sample code (cont.)
  • Call MPI_REDUCE(in,out,1,MPI_2INTEGER,MPI_MAXLOC,
    7,MPI_COMM_WORLD,ierr)
  • if (rank.eq.7) print,my rank is , rank,
    max, out(1), at rank,out(2)
  • Call MPI_REDUCE(in,out,1,MPI_2INTEGER,MPI_MINLOC,
    2,MPI_COMM_WORLD,ierr)
  • if (rank.eq.2) print,my rank is , rank,
    min, out(1), at rank,out(2)
  • call MPI_FINALIZE(ierr)
  • end

57
Sample Output
  • Sdx1 36 mpif90 bcast.f90
  • /usr/ccs/bin/ld(warning) At least one PA2.0
    object file (bcast.o) was detected. The linked
    output may not run on a PA 1.x system.
  • Sdx1 37 mpirun -np 8 a.out
  • my rank is 7 max8 at rank 7
  • my rank is 2 min1 at rank 0

58
Basic Routines in MPI
  • Using the following MPI routines, many parallel
    programs can be written
  • - MPI_INIT
  • - MPI_COMM_SIZE
  • - MPI_COMM_RANK
  • - MPI_COMM_SEND
  • - MPI_COMM_RECV
  • - MPI_COMM_BCAST
  • - MPI_COMM_REDUCE
  • - MPI_COMM_FINALIZE

59
Resources
  • Online resources
  • http//www-unix.mcs.anl.gov/mpi
  • http//www.erc.msstate.edu/mpi
  • http//www.epm.ornl.gov/walker/mpi
  • http//www.epcc.ed.ac.uk/mpi
  • http//www.mcs.anl.gov/mpi/mpi-report-1.1/mpi-repo
    rt.html
  • ftp//www.mcs.anl.gov/pub/mpi/mpi-report.html

60
OpenMP and You
61
Outline
  • Introduction to Parallel Computing, by Danny
    Thorne
  • Introduction to MPI, by Chunfang Chen
  • Introduction to OpenMP, by Adam Zornes
  • What is OpenMP
  • A Brief History
  • Nuts and Bolts
  • Example(s?)

62
What is OpenMP
  • OpenMP is a portable, multiprocessing API for
    shared memory computers
  • OpenMP is not a language
  • Instead, OpenMP specifies a set of subroutines
    in an existing language (FORTRAN, C) for parallel
    programming on a shared memory machine

63
Why is OpenMP Popular?
  • No message passing
  • OpenMP directives or library calls may be
    incorporated incrementally.
  • The code is in effect a serial code.
  • Code size increase is generally smaller.
  • OpenMP-enabled codes tend to be more readable
  • Vendor involvement

64
History of OpenMP
  • Emergence of shared memory computers with
    proprietary directive driven programming
    enviroments in the mid-80s
  • In 1996 a group formed to create an industry
    standard
  • They called themselves...

65
History of OpenMP
  • The ARB (OpenMP Architecture Review Board)
  • Group of corporations, research groups, and
    universities
  • Original members were ASCI, DEC, HP, IBM,
    Intel, KAI, SGI
  • Has permanent and auxiliary members
  • Meet by phone and email to interpret the
    standards, answer questions, develop new
    specifications, and create publicity

66
What Did They Create?
  • OpenMp consists of three main parts
  • Compiler directives used by the programmer to
    communicate with the compiler
  • Runtime library which enables the setting and
    querying of parallel parameters
  • Environmental variables that can be used to
    define a limited number of runtime system
    parallel parameters

67
The Basic Idea
  • The code starts with one master thread
  • When a parallel tasks needs to be performed,
    additional threads are spawned
  • When the parallel tasks are finished, the
    additional threads are released

68
The Basic Idea
69
The Illustrious OpenMP Directives
Control Structures what is parallel and what
is serial Work Sharing who does what
Synchronization bring everything back together
Data Scope Attributes (clauses) who can use
what and when and where Orphaning alone but
not necessarily lost
70
Regions or Loops, Which is Right for You?
  • Two ways to parallelize parallel loops
    (fine-grained) and parallel regions
    (coarse-grained)
  • Loops can be do, while, for, etc.
  • Parallel regions cut down on overhead, but
    require more complex programming (i.e. What
    happens to a thread not in use)

71
Work Sharing Constructs
A work sharing construct divides the execution
of enclosed code region among participating
processes The DO directive The SECTIONS
directive The SINGLE directive
!omp parallel do do i 1, n a(i)
b(i) c(i) enddo
!omp parallel !omp sections !omp section
call init_field(field) !omp section
call check_grid(grid) !omp end
sections !omp single call do_some_work(a(1))
!omp end single !omp end parallel
72
Syncronization Getting it Together
Synchronization directives provide for process
synchronization and mutual exclusion
  • The MASTER directive
  • The BARRIER directive
  • The CRITICAL directive
  • The ORDERED directive
  • The ATOMIC directive

73
Data Scoping Directives
Clauses qualify and scope the variables in a
block of code
  • PRIVATE
  • SHARED
  • DEFAULT (PRIVATE SHARED NONE)
  • FIRSTPRIVATE
  • LASTPRIVATE
  • COPYIN
  • REDUCTION

74
Orphaning
  • Directives that do not appear in the lexical
    extent of the parallel construct, but lie in the
    dynamic extent are called orphaned directives.
  • Directives in routines called from within
    parallel constructs.

75
Runtime Library Routines
  • OMP_SET_NUM_THREADS (int)
  • OMP_GET_NUM_THREADS( )
  • OMP_GET_MAX_THREADS( )
  • OMP_GET_THREAD( )
  • OMP_GET_NUM_PROCS( )
  • OMP_IN_PARALLEL( )
  • OMP_SET_DYNAMIC(bool)

76
The DREADed LOCKS
  • OMP_INIT_LOCK(var)
  • OMP_DESTROY_LOCK(var)
  • OMP_SET_LOCK(var)
  • OMP_UNSET_LOCK(var)
  • OMP_TEST_LOCK(var)

77
Enviromental Variables
  • OMP_SCHEDULE
  • OMP_NUM_THREADS
  • OMP_DYNAMIC
  • OMP_NESTED

78
The Example(s?)
79
The Example(s?) cont.
80
The Example(s?) cont.
81
The Example(s?) cont.
82
The Example(s?) cont.
83
The Requisite Links Page
  • http//www.cs.gsu.edu/cscyip/csc4310/
  • http//www.openmp.org/
  • http//webct.ncsa.uiuc.edu8900/webct/public/show_
    courses.pl
  • http//oscinfo.osc.edu/training/openmp/big/fsld.00
    1.html
  • http//www.ccs.uky.edu/douglas

And the audience wakes upthen stumbles out of
the room...
About PowerShow.com