Introduction to OpenMP - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to OpenMP

Description:

Threads all start at same time then synchronize at a barrier at the end to ... firstprivate(var) and lastprivate(var) clauses. x[0] ... variable set and then ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 30
Provided by: quinno
Category:

less

Transcript and Presenter's Notes

Title: Introduction to OpenMP


1
Introduction to OpenMP
  • For a more detailed tutorial see
  • http//www.openmp.org
  • Look at the presentations

2
Concepts
  • Directive based programming
  • declare properties of language structures
    (sections, loops)
  • scope variables
  • A few service routines
  • get information
  • Compiler options
  • Environment variables

3
OpenMP Programming Model
  • fork-join parallelism
  • Master thread spawns a team of threads as needed.

4
Typical OpenMP Use
  • Generally used to parallelize loops
  • Find most time consuming loops
  • Split iterations up between threads

void main() double Res1000 pragma omp
parallel for for(int i0ilt1000i)
do_huge_comp(Resi)
void main() double Res1000 for(int
i0ilt1000i) do_huge_comp(Resi)

5
Thread Interaction
  • OpenMP operates using shared memory
  • Threads communicate via shared variables
  • Unintended sharing can lead to race conditions
  • output changes due to thread scheduling
  • Control race conditions using synchronization
  • synchronization is expensive
  • change the way data is stored to minimize the
    need for synchronization

6
Syntax format
  • Compiler directives
  • C/C
  • pragma omp construct clause clause
  • Fortran
  • COMP construct clause clause
  • !OMP construct clause clause
  • OMP construct clause clause
  • Since we use directives, no changes need to be
    made to a program for a compiler that doesnt
    support OpenMP

7
Using OpenMP
  • Compilers can automatically place directives with
    option
  • -qsmpauto
  • xlf_r and xlc do a good job
  • some loops may speed up, some may slow down
  • Compiler option required when you write in
    directives
  • -qsmpomp (ibm)
  • -mp (sgi)
  • Can mix directives with automatic parallelization
  • -qsmpautoomp
  • Scoping variables is the hard part!
  • shared variables, thread private variables

8
OpenMP Directives
  • 5 categories
  • Parallel Regions
  • Worksharing
  • Data Environment
  • Synchronization
  • Runtime functions / environment variables
  • Basically the same between C/C and Fortran

9
Parallel Regions
  • Create threads with omp parallel
  • Threads share A (default behavior)
  • Threads all start at same time then synchronize
    at a barrier at the end to continue with code.

double A1000 omp_set_num_threads(4) pragma
omp parallel int ID omp_get_thread_num() do
something(ID, A)
10
Sections construct
  • The sections construct gives a different
    structured block to each thread
  • By default there is a barrier at the end. Use the
    nowait clause to turn off.

pragma omp parallel pragma omp
sections X_calculation() pragma omp
section y_calculation() pragma omp
section z_calculation()
11
Work-sharing constructs
  • the for construct splits up loop iterations
  • By default, there is a barrier at the end of the
    omp for. Use the nowait clause to turn off
    the barrier.

pragma omp parallel pragma omp for for
(I0IltNI) NEAT_STUFF(I)
12
Short-hand notation
  • Can combine parallel and work sharing constructs
  • There is also a parallel sections construct

pragma omp parallel for for (I0IltNI) NEA
T_STUFF(I)
13
A Rule
  • In order to be made parallel, a loop must have
    canonical shape

index index index-- --index index
inc index - inc index index inc index
inc index index index inc
lt lt gt gt
for (indexstart index end
)
14
An example
pragma omp parallel for private(j) for (i 0 i
lt BLOCK_SIZE(id,p,n) i) for (j 0 j lt n
j) aij MIN(aij, aik tmpj)
By definition, private variable values are
undefined at loop entry and exit To change this
behavior, you can use the firstprivate(var) and
lastprivate(var) clauses
x0 complex_function() pragma omp parallel
for private(j) firstprivate(x) for (i 0 i lt n
i) for (j 0 j lt m j) xj g(i,
xj-1) answeri xj xi
15
Scheduling Iterations
  • The schedule clause effects how loop iterations
    are mapped onto threads
  • schedule(static ,chunk)
  • Deal-out blocks of iterations of size chunk to
    each thread.
  • schedule(dynamic,chunk)
  • Each thread grabs chunk iterations off a queue
    until all iterations have been handled.
  • schedule(guided,chunk)
  • Threads dynamically grab blocks of iterations.
    The size of he block starts large and shrinks
    down to size chunk as the calculation proceeds.
  • schedule(runtime)
  • Schedule and chunk size taken from the
    OMP_SCHEDULE environment variable.

16
An example
pragma omp parallel for private(j)
schedule(static, 2) for (i 0 i lt n i) for
(j 0 j lt m j) xjj g(i, xj-1)
You can play with the chunk size to meet load
balancing issues, etc.
17
Scheduling considerations
  • Dynamic is most general and provides load
    balancing
  • If choice of scheduling has (big) impact on
    performance, something is wrong
  • overhead too big gt work in loop too small
  • n can be specification expression, not just
    constant

18
Reductions
  • Sometimes you want each thread to calculate part
    of a value then collapse all that into a single
    value
  • Done with reduction clause

area 0.0 pragma omp parallel for private(x)
reduction (area) for (i 0 i lt n i) x
(i 0.5)/n area 4.0/(1.0 xx) pi
area / n
19
Fortran Parallel Directives
  • PARALLEL / END PARALLEL
  • PARALLEL SECTIONS / SECTION / SECTION / END
    PARALLEL SECTIONS
  • DO / END DO
  • work sharing directive for DO loop immediately
    following
  • PARALLEL DO / END PARALLEL DO
  • combined section and work sharing

20
Serial Directives
  • MASTER / END MASTER
  • executed by master thread only
  • DO SERIAL / END DO SERIAL
  • loop immediately following should not be
    parallelized
  • useful with -qsmpompauto

21
Synchronization Directives
  • BARRIER
  • inside PARALLEL, all threads synchronize
  • CRITICAL (lock) / END CRITICAL (lock)
  • section that can be executed by one thread only
  • lock is optional name to distinguish several
    critical constructs from each other

22
An example
double area, pi, x int i, n area
0.0 pragma omp parallel for private(x) for (i
0 i lt n i) x (i 0.5)/n pragma omp
critical area 4.0/(1.0 xx) pi area /
n
23
Scope Rules
  • Shared memory programming model
  • most variables are shared by default
  • Global variables are shared
  • But not everything is shared
  • stack variables in functions are private
  • variable set and then used in DO is PRIVATE
  • array whose subscript is constant w.r.t. PARALLEL
    DO and is set and then used within the DO is
    PRIVATE

24
Scope Clauses
  • DO and for directive has extra clauses, the most
    important
  • PRIVATE (variable list)
  • REDUCTION (op variable list)
  • op is sum, min, max
  • variable is scalar, XLF allows array

25
Scope Clauses (2)
  • PARALLEL and PARALELL DO and PARALLEL SECTIONS
    have also
  • DEFAULT (variable list)
  • scope determined by rules
  • SHARED (variable list)
  • IF (scalar logical expression)
  • directives are like programming language
    extension, not compiler option

26
integer i,j,n real8 a(n,n), b(n)
read (1) b !OMP PARALLEL DO !OMP PRIVATE (i,j)
SHARED (a,b,n) do j1,n do i1,n
a(i,j) sqrt(1.d0 b(j)i) end
do end do !OMP END PARALLEL DO
27
Matrix Multiply
!OMP PARALLEL DO PRIVATE(i,j,k) do j1,n do
i1,n do k1,n c(i,j) c(i,j)
a(i,k) b(k,j) end do end do end do
28
Analysis
  • Outer loop is parallel columns of c
  • Not optimal for cache use
  • Can put more directives for each loop
  • Then granularity might be too fine

29
OMP Functions
  • int omp_get_num_procs()
  • int omp_get_num_threads()
  • int omp_get_thread_num()
  • void omp_set_num_threads(int)
Write a Comment
User Comments (0)
About PowerShow.com