Introduction to OpenMP Part I - PowerPoint PPT Presentation

Loading...

PPT – Introduction to OpenMP Part I PowerPoint presentation | free to download - id: 845ee0-ZTZiM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Introduction to OpenMP Part I

Description:

Introduction to OpenMP Part I White Rose Grid Computing Training Series Deniz Savas, Alan Real, Mike Griffiths RTP Module 2007- 2010 – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 67
Provided by: savas4
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Introduction to OpenMP Part I


1
Introduction to OpenMP Part I
  • White Rose Grid Computing Training Series
  • Deniz Savas, Alan Real, Mike Griffiths
  • RTP Module
  • 2007- 2010

2
Historical Perspective
  • Single Processor CPUs
  • Pipe-lined processors (Specialised pipes)
  • Vector Processors (SIMD)
  • Multi-Processors with Distributed Memory (MIMD)
  • Multi-Processors, Shared Memory (SMP
  • meaning Symmetric Multi Processing)

3
Single Instruction Single Data Model
Instructions
Data
Program Counter
Program Counter will execute instructions in
sequence, unless a jump Instruction changes its
position. Each instruction will fetch at most two
items from the data segment
Code Segment
Data Segment
4
Pipelining and Specialised Units
CPU UNIT
Floating Point Unit
Integer/ Logical Unit
Instruction Handling Unit
Memory Management Unit
CPU functions are distributed to specialised
units, all of which can execute data in parallel
with each other. Finest Grained optimisation is
concerned with this level and is normally
provided by the compiler. Memory Management Unit
is related to Cache Control issues. Almost all
CPUs have this now.
5
Vector Processors as Single Instruction Multiple
Data Engines
Scalar Unit(s)
Single Instruction
Vector of Data values
Vector of Data values
Mask Vector of 0 and 1s
Vector of Results

Vector Processor Units
Scalar Unit passes the single vector instruction
to the vector processor along with the address
range of the vector of values to operate on and
the address of the results to be stored. The
vector unit performs the operations independent
of the scalar unit and signals The scalar unit
when it finishes. Scalar unit can perform other
scalar Instructions in parallel with the vector
unit. Example Cray, Cyber
6
Multi Processors with Distributed Memory
Memory2
Memory1
CPU2
Disk2
CPU1
CPU3
Disk3
Disk1
Memory3
MPI Message Passing between CPUs is the most
suitable form of Communications for these
architectures. Data is local to the
CPU. Examples Processor Farms, Intel Hypercube,
Transputers
7
Shared Memory Multi-Processors Symmetric Multi
Processing Machines
CPU2
CPU1
CPU3
Shared Memory
CPU.n
CPU4
CPU5
This is the most Suitable Hardware Configuration
for using OpenMP Examples Sun , Sgi.
8
Why use OpenMP
  • OpenMP is suitable for shared memory,
    multi-processor architectures. The key feature is
    that each processor can access the memory ( I.e.
    single address space )
  • Increase speed by utilising all the available
    CPUs (reduce wall clock time)
  • Improve ease of portability of parallel programs
    to other (hardware or software) platforms. Before
    OMP came along each vendor (i.e. Sun SG, Cray)
    had their own versions of non-standard compiler
    steering in-line directives that rendered such
    programs non-portable to other platforms.
  • Take advantage of the shared memory hardware of
    multi-processor machines (Sun, SGi, Cray, new
    Intel and AMD platforms )

9
OpenMP Philosophy
  • A program can be made up of multiple threads. A
    thread is a process which may have its own data.
  • Each tread runs under its own program control.
  • Data can be either private to a thread or
    shared between the threads.
  • Communications between the threads are achieved
    via the shared data ( as they all have access to
    it ).
  • Master thread is responsible for coordinating the
    team of threads as shown later by the Fork and
    Join Model Diagram.
  • Serial code is simply a process with a single
    thread.

10
Shared and Private Data
THREAD n
THREAD 1
THREAD 2
. . . .
Private Data
Private Data
Private Data
SHARED DATA
Each thread has access to its own private data
all the shared data, as shown by the diagram
above. A private variable can only be seen i.e.
read/written by its own thread, where as all
threads can read/write any shared data.
11
Parallel Regions
  • An OMP Program is a conventional program that
    contains Parallel regions, defined via OMP
    PARALLEL constructs.
  • Program begins as a single thread ( Master
    thread). When a parallel region is reached the
    master thread creates multiple threads, each
    thread executes the statements contained within
    the parallel region on its own until the end of
    that parallel region is reached. At the end of
    the parallel region, when all the threads have
    completed their tasks, the master thread
    continues executing the rest of the program until
    the end or until another parallel region is
    reached. This is known as the fork-join model as
    shown by the following diagram.
  • Branching in and out of a parallel region is not
    allowed (i.e no GOTO statements or overlapping DO
    loops etc )

12
Fork and Join Model
A Serial Program
A Program using OpenMP instructions
Master thread
13
FORTRAN program listing C/C PROGRAM xyz
main( ) !OMP PARALLEL
pragma omp parallel

!OMP END PARALLEL


!OMP PARALLEL pragma omp parallel

!OMP END
PARALLEL END

threads
Master thread
F O R K
J O I N
Master thread
F O R K
J O I N
14
Three components of OpenMP Programming
  • OMP Directives
  • These form the major elements of OpenMP
    programming, they-
  • Create threads
  • Share the work amongst threads
  • Synchronise treads
  • Library Routines
  • These routines can be used to control and query
    the parallel execution environment such as the
    number of processors that are available for use.
  • Environment Variables
  • The execution environment such as the number of
    threads to be made available to an OMP program
    can also be set/queried at the operating system
    level before the program execution is started (
    an alternative to calling library routines ).

15
Directives and sentinels
  • A directive is a special line of source code with
    meaning only to certain compilers.
  • A directive is distinguished by a sentinel
    (special character string) at the start of the
    line.
  • OpenMP sentinels are
  • For Fortran !OMP (or COMP or OMP)
  • For C/C pragma omp
  • On serial compilers these directive sentinels (
    i.e. !OMP and pragma omp will be interpreted as
    comments according to the rules of the language
    and therefore simply ignored by compilers that do
    not have OpenMP features.

16
Conditional Compilation
  • As the previous slide indicates the compilers
    that do not support OpenMP will always ignore the
    OpenMP directives because they will simply appear
    as comments.
  • In addition, there are two further methods exist
    to ensure that code that is written to support
    OpenMP can also be compiled by serial compilers
    with minimal changes to the source.
  • Method 1 Applies only to Fortran.
  • Any line that starts with ! will be visible (
    by replacing ! by two spaces) to any compiler
    supporting OpenMP. Compilers that do not support
    OpenMP will simply see these lines as comments.
  • Method 2 Applies to C and cpp preprocessors and
    Fortran preprocessors that supports macro
    definitions.
  • While using a compiler that supports OpenMP the
    symbol _OPENMP will have been predefined.
  • Therefore conditional compilation statements in
    the form of ifdef _OPENMP can be used to
    bracket the pieces of code that will be compiled
    only if OPENMP features are available.
  • Example
  • NUM_WORKERS 1
  • ifdef _OPENMP
  • NUM_WORKERS omp_get_num_threads( )
  • endif

17
OMP PARALLEL directive
  • Defines a block of region which will be executed
    in parallel by multiple threads.
  • Syntax
  • FORTRAN !OMP PARALLEL optional extra
    clauses block
    !OMP END PARALLEL
  • C/C pragma omp parallel optional
    extra clauses
  • block
  • A number of extra clauses control the
    relationship between the threads such as sharing
    of variables etc.
  • Branching in and out of these blocks are not
    allowed.
  • The actual number of threads to be used is
    controlled by the
  • OMP_NUM_THREADS environment variable (see
    environment vars),
  • NUM_THREADS clause or
  • The library calls ( see later)

18
Program execution schematic
PROGRAM myprog CALL
york() !OMP PARALLEL CALL leeds() !OMP
END PARALLEL CALL shef() !OMP PARALLEL
CALL hull() !OMP END PARALLEL CALL brad()
END
Execution time
Hull
Brad
19
Useful functions
  • To find out how many threads are being used
  • Fortran
  • INCLUDE omp_lib.h
  • INTEGER FUNCTION OMP_GET_NUM_THREADS()
  • C/C
  • include ltomp.hgt int omp_get_num_threads(void)
  • Returns value 1 if outside the parallel
    region else returns the number of threads
    available.
  • To identify individual threads by number
  • Fortran
  • INCLUDE omp_lib.h
  • INTEGER FUNCTION OMP_GET_THREAD_NUM()
  • C/C
  • include ltomp.hgt int omp_get_thread_num(void)
  • Returns value between 0 to
    OMP_GET_NUM_THREADS() - 1

20
OpenMP on the Sheffield grid node iceberg
  • OpenMP support is built into the PGI Fortran90
    and C compilers on iceberg and its workers.
  • Users are encouraged to do all their openmp
    related work by starting up an interactive or
    batch session by using the qsh and qsub commands
    respectively.
  • Programs that use openmp work as a team of
    threads.
  • For maximum efficiency we should have as many
    threads as available processors on the system and
    assign each thread to a different processor.
  • Some of the icebergs workers have 6-processors
    each and by using OpenMP we can take advantage of
    these nodes.
  • As iceberg can only provide a maximum of six
    processors, we shall resort to simulating having
    larger numbers of processors by declaring and
    using more threads than the available processors.
  • If the number of threads needed by an openmp job
    are greater than the number of processors
    available the threads start sharing the
    processors, hence diminishing efficiency.

21
Interactive shell for OpenMP on iceberg
  • OpenMP code development can be done interactively
    by starting an interactive, openMP friendly shell
    as follows
  • While logged onto iceberg type
  • qsh pe openmp 6
  • and work in the new shell. This will make a
    shell available on a multi-processor worker node
    to facilitate the openMP code development.
  • We also need to set an environment variable named
    OMP_NUM_THREADS to declare the number of threads
    that will be required. The value of this variable
    defines the number of threads an OMP job will
    create by default when it starts executing.
  • export OMP_NUM_THREADS6
  • If the number of threads requested this way are
    greater than the number of available processors
    then the threads will share the processors
    amongst themselves.

22
Interactive shell for OpenMP on iceberg
  • Setting the number of threads.
  • If you are using the bash shell ( this is the
    default shell on iceberg) then you can define the
    number of threads that will be made available to
    a running openmp job by setting the
    OMP_NUM_THREADS environment variable as follows
  • export OMP_NUM_THREADSnn
  • where nn is the number of threads you want to
    use ( Maximum 16 )
  • e.g. export OMP_NUM_THREADS12
  • If you are using the c-shell csh, tcsh then the
    same can be done as follows setenv
    OMP_NUM_THREADS nn

23
Compiling on the Sheffield Grid Node
  • OpenMP support is built into the Fortran 90 and C
    compilers on iceberg cluster
  • To compile a program using OpenMP , specify the
    mp flag for both Fortran and C programs.
  • EXAMPLE
  • gtpgf90 mp prog.f90
  • Or
  • gtpgcc mp proc.c
  • Compiler optimisation will be raised to O3
    automatically.
  • Can specify any additional flags on command line
    e.g. fast
  • If compiling and linking in separate stages, be
    sure to use identical compiler options for both!

24
Running the program
  • Run as you would a serial program
  • gt./a.out ( or ./progname
  • as specified by the o progname compiler
    flag)
  • e.g.
  • gtf90 program.f90 o progname
  • gt./progname
  • You can redefine the number of threads to use any
    time by resetting the environment variable
    OMP_NUM_THREADS. There is no need to re-compile
    the program after changing this environment
    variable.
  • In sh and bash export OMP_NUM_THREADSnn
  • In csh, tcsh setenv OMP_NUM_THREADS nn
  • (where nn is the number of threads to use )

25
Batch execution of openmp programs
  • Batch queues allow exclusive access to all CPUs
    of a worker node to run openmp jobs.
  • To submit a batch job use the qsub command
  • qsub pe openmp ltnpgt scriptfile
  • Where
  • ltnpgt is the number of processors to reserve.
  • scriptfile contains the commands that will be
    executed by the batch job
  • The environment variable OMP_NUM_THREADS should
    also be set inside the script.
  • On iceberg, do not request for more than 8
    processors as 8 is the maximum number of
    processors per machine that can be made available.

26
Example batch submission script
  • pe openmp 6 -l h_rt10000
  • -cwd
  • export OMP_NUM_THREADSNSLOTS
  • ./prog
  • The NSLOTS environment variables value is set to
    the number of processors that a job is allocated
    to. Therefore it can be used to set the
    OMP_NUM_THREADS variable for maximum efficiency.
  • Options can be included on the command line or
    within the script with line prefix.
  • Above script will request 1 hour of runtime.
  • Avoid running on more processors than NSLOTS
    indicates.

27
Exercises
  • The omp examples are contained in the directory
  • /usr/local/courses/openmp/exercises. Copy them
    into your own directory.
  • The aim is to comp ile and run a few trivial
    OpenMP programs. Read the readme.txt file in the
    exercises directory to follow the instructions to
    compile and run the first two programs either in
    c or in Fortran.
  • Vary the number of threads using OMP_NUM_THREADS
    environment variable and run the code again.
  • Run the code several times. Is the output
    consistent?

28
OMP PARALLEL Directive
  • Defines a block of region which will be executed
    in parallel by multiple threads.
  • Syntax
  • FORTRAN !OMP PARALLEL (optional clauses)
    block !OMP END
    PARALLEL
  • C/C pragma omp parallel (optional
    clauses)
  • block

29
Optional OMP PARALLEL clauses
  • The list of clauses can be comma or space
    separated in Fortran. Where as in C/C they are
    only allowed to be space separated. The clauses
    can be one or more of the following
  • PRIVATE(var_list) , FIRSTPRIVATE(var_list)
  • SHARED(var_list)
  • DEFAULT(PRIVATESHAREDNONE)
  • REDUCTION(operatorintrinsic_func_namevars_list
    )
  • COPYIN( var_list)
  • IF(logical_expression)
  • NUM_THREADS ( number )
  • All of these clauses will be explained in later
    slides.
  • Sometimes these clauses render the OMP PARALLEL
    directive just too long to fit onto a single
    line, in which case the directive can be
    continued onto the following line(s) as shown by
    the next slide.

30
Continuation lines for OMP directives
  • OMP directives can take a number of optional
    clauses which may render the directive too long
    to fit onto a single line and therefore may need
    to be split into multiple lines as described
    below.
  • Fortran free source form The continued line
    must end with an . The after OMP on the
    continuation line(s) are optional.
  • !OMP PARALLEL DEFAULT(NONE), !OMP
    PRIVATE(i,myid), SHARED(a,n)
  • C/C The continued line must end with \
  • pragma omp parallel default(none) \
    private(i,myid) shared(a,n)

31
OMP PARALLEL - If clause
  • Parallel region can be made conditional by using
    the IF clause on the PARALLEL directive. This may
    be useful if, for example, there is not enough
    work to make parallel execution worthwhile as in
    the first example, or that there are not enough
    threads available.
  • Syntax
  • Fortran OMP PARALLEL IF (
    scalar_logical_expression)
  • C omp parallel if ( scalar_expression)
  • Example
  • Here the enclosed region will be executed
    serially if we have less than 100 tasks to do
  • pragma omp parallel if( ntasks gt 100 )

32
OMP PARALLEL - Num_threads clause
  • Syntax OMP PARALLEL ... NUM_THREADS( scalar )
  • Normally when a parallel region is entered the
    number of threads to be started is determined by
    the last call to the OMP_SET_NUM_THREADS routine
    if there was such a call.
  • If there were no such calls, as is usually the
    case, then the value of the environment variable
    OMP_NUM_THREADS determines the number of threads
    to be started.
  • However, this clause will allow the user to
    dictate the exact number of threads to be started
    for the parallel region, thus over-riding the
    previous two methods. Any number specified that
    is not sensible ( such as a negative number )
    will be ignored. Also any number exceeding the
    number of threads allowed by the operating system
    will be pegged to that limit. It is therefore
    sensible to test the actual number of threads
    currently being used by the parallel region via a
    call to the GET_NUM_THREADS and not rely of the
    requested number of threads being made available.
    See Example-11

33
Shared and private clauses
  • Within a parallel region, variables can be
  • shared every thread sees the same copy or
  • private each thread has its own copy.
  • The following optional clauses control the
    privacy attributes of the variables
  • Syntax
  • OMP PARALLEL SHARED(var_list)
  • OMP PARALLEL PRIVATE(var_list)
  • OMP PARALLEL DEFAULT(SHAREDPRIVATENONE)
  • Default is for all variables to be shared.
  • DEFAULT(NONE) all variables must be explicitly
    defined as private or shared.

34
Shared and private examples
  • In this example each thread initialises its own
    column of a shared array named a(). !OMP
    PARALLEL DEFAULT(NONE), !OMP
    PRIVATE(i,myid),SHARED(a,n)
    myidomp_get_thread_num()1 do i1,n
    a(i,myid)1.0 end do !OMP END
    PARALLEL
  • i is local loop index and should be private
  • myid is local thread number private
  • a is main array shared
  • n is only read no need to store extra copies -
    shared (saves memory)

35
Which variables to share?
  • Most variables are shared
  • Loop indices are private
  • Loop temporaries are private
  • Read-only variables are shared
  • Main arrays are shared with caution!
  • Write-before-read scalars usually private
  • Sometimes either is OK, however there may be
    performance implications in making the choice.
  • Note Can have private arrays as well as scalars.

36
Initialising the private variables
  • There are no ambiguities with regards to the
    values of the shared variables at the start of a
    parallel region, as they will continue to have
    the value they have had before the start of the
    parallel region.
  • However, as new copies of private variables (one
    per thread) comes into existence on entry into a
    parallel region, no such assumptions can be made
    about the values of these variables.
  • By default values of the private variables are
    undefined on entry into a parallel region unless
    they are declared in the FIRSTPRIVATE clause.
  • FIRSTPRIVATE clause
  • Variables declared as firstprivate are private
    variables that are given the values that existed
    immediately prior to the parallel region
  • Syntax OMP PARALLEL . FIRSTPRIVATE(list)

37
Firstprivate clause - example
  • In the example below, variable b is private to
    each thread. At the start of each thread it is
    initialised with its value in the master thread (
    i.e. 5.0 ). For safe programming, the value of b
    should be treated as undefined upon exit from the
    parallel region, although on most platforms b on
    exit from the parallel region will contain the
    value for thread 0 version.
  • b5.0 pragma omp parallel firstprivate(b) \
    private(i,myid) shared(c) myidomp_get_thread_nu
    m() for(i0iltni) bcmyidi
    cmyidnb
  • bnew b / this means the value of b for
    thread 0 /

38
THREADPRIVATE
  • This directive makes the declared variables and
    commonblocks private to each thread but global
    within each thread.
  • OMP THREADPRIVATE ( list of variables and or
    common-block names)
  • Note that this declaration must be repeated in
    each subroutine/function that declare the same
    common block or variables.

39
WORK SHARING Constructs
  • Upon entering a parallel section, a team of
    threads are created and each thread executes the
    parallel section on its own. In this mode of
    operation, unless there are some controls (if
    statements etc.) based on the thread_number of
    the process, the work is simply repeated
    number_of_threads times.
  • In many circumstances, when we would like a given
    set of tasks to be performed once by a team of
    threads rather than repeated on each thread, we
    use one of the following three directives, known
    as the Work Sharing Directives.
  • FORTRAN !OMP DO clauses !OMP
    END DO
  • C/C pragma omp for clauses
    for loop
  • FORTRAN !OMP SECTIONS .. !OMP END
    SECTION
  • C/C pragma omp SECTIONS
  • FORTRAN !OMP WORKSHARE .. !OMP END
    WORKSHARE

40
Parallel DO/for loops
This is optional as the compiler will interpret
END DO as a sentinel
  • Syntax
  • Fortran !OMP DO clauses do loop !OMP
    END DO
  • C/C pragma omp for clauses for loop

Note that there are no curly braces here as the
for-loop block is taken to be the block for this
pragma.
41
OMP DO/for directive
  • This directive allows the enclosed do-loop or for
    block to be work-shared amongst threads.
  • Note also that OMP DO/omp for directive itself
    must be enclosed within a parallel region for the
    parallel processing to be initiated and can take
    on a number of optional clauses that will be
    listed later.
  • Example
  • !OMP PARALLEL
  • !OMP DO
  • DO i 1, 800
  • A(i) B(i) C(i)
  • END DO
  • !OMP END DO
  • !OMP END PARALLEL
  • distributes the do-loop over the different
    threads each thread computing part of the
    iterations.
  • For example, if 4 threads are in use, then in
    general each thread computes 100 iterations of
    the do-loop thread 0 computes from i1 to 200,
    thread 1 from 201 to 400 and so on. This is shown
    graphically in the next slide.

42
OMP DO example
Thread 0
Serial Region
OMP PARALLEL OMP DO
Parallel Region
Thread 0 Do i 1,200
Thread 1 Do i 201,400
Thread 2 Do i 401,600
Thread 3 Do i 601,800
OMP END DO OMP END PARALLEL
Serial Region
Thread 0
43
OMP DO/for directive clauses
  • The OMP DO/for directive can take on any one or
    more of the following clauses to control the
    scheduling of the task and the behaviour of the
    private and shared variables.
  • These being PRIVATE, FIRST PRIVATE,
    LASTPRIVATE, REDUCTION, SCHEDULE and ORDERED.
  • Of all these only the ORDERED and SCHEDULE
    clauses are specific to the this directive. Other
    clauses are general and can also be used for the
    OMP PARALLEL directives.
  • We shall therefore only study these two clauses
    here and leave the discussion of the others to
    later sections.

44
OMP for directives C/C restrictions
  • As the for loop in C is a general while loop,
    there are restrictions on the form it can take
  • It has to have a determinable trip count
  • i.e it must be of the form for (vara var
    logical-op b incr-exp) where
  • logical-op is one of lt, lt, gt, gt
  • incr-exp is var var/- incr (or var/var--)
  • Also we cannot modify var within the loop body.

45
Parallel loop identification
  • How can you tell if a loop is parallel or not?
  • Every iteration should be independent of other
    iterations.
  • i.e. it can be run in any order
  • Reverse order testAlmost certainly parallel if
    it can be run backwards to produce the same
    results.
  • Jumps out of the loop are not allowed
  • For example the below loop can not be
    parallelized because it violates (1),(2) and (3)
  • DO i2,n a(i) 2 a(i-1)
    END DO

46
Parallel loop identification examples
ix is calculated during the previous iteration
  • Example 2 ixbase do i1,n a(ix)
    a(ix)b(i) ix ix stride end do
  • Example 3 do i1,n b(i) (a(i)-a(i-1))0.5 en
    d do

a() and b() are independent arrays. Independent,
so no influence of previous iterations on the
current one
47
Parallel loop example
  • !OMP PARALLEL DO
  • do i1,n b(i) (a(i)-a(i-1))0.5
    end do
  • !OMP END PARALLEL DO
  • a, b and n are shared by default
  • i is private.

48
Parallelising, despite loop dependencies !
  • We have seen that loops that exhibit dependencies
    between iterations can not normally be
    parallelised.
  • That is to say, if results of one iteration of a
    loop uses values that were updated in any of the
    previous iterations of that loop, that loop can
    not be parallelised.
  • For example loops carrying either of the
    following expressions will not parallelised
  • x(i) x(i-1) a
  • xi xi xi1
  • However, there is an important exception to
    this restriction for certain commonly occurring
    class of operations, namely reduction-operations.

49
Reductions
  • Reduction operations are those operations that
    produce a single value from an associative
    operation (e.g. addition, multiplication,
    subtraction, AND (), OR( ) and a number of
    intrinsic functions such as MIN, MAX .
  • Example
  • sum0.0
  • DO i1, N
  • sum sum b(i)
  • END DO
  • When parallelising this loop, the variable sum
    will need to be declared in a REDUCTION clause
    associated with operator . This will allow sum
    to be calculated correctly by making sure that
    each thread does the accumulation in its own
    private copy of sum and at the end of the loop
    these partial sums are added together to give a
    final result that is stored in the shared
    variable sum.

50
Reductions
  • Variables can be given REDUCTION attribute
  • Fortran REDUCTION(oplist)
  • C/C reduction(oplist)
  • op is of the reduction operator from the
    following table
  • list is the variables list
  • OpenMP allows only Fortran array reduction.

Fortran - .AND. .OR. .EQV. .NEQV. MAX MIN IAND IOR IEOR
c/c -
51
Reduction example
  • b0 !OMP PARALLEL REDUCTION(b),
    !OMP PRIVATE(i,myid) myidomp_get_thread_
    num()1 do i1,n bbc(i,myid)
    end do !OMP END PARALLEL

52
LASTPRIVATE clause
  • Sometimes we need the value a private variable
    would have had upon exit from the do or for loop.
  • This is normally undefined.
  • Variable can be declared as LASTPRIVATE to give
    this behaviour.
  • Applies to both DO/for and SECTIONS (takes
    variable from last section)
  • Variables can be declared as both FIRSTPRIVATE
    and LASTPRIVATE
  • Syntax
  • Fortran LASTPRIVATE(list)
  • C/C lastprivate(list)

53
LASTPRIVATE example
  • !OMP PARALLEL
  • !OMP DO LASTPRIVATE(i) do i1,func(l,m,n)
    d(i)d(i)ef(i) end do ixi-1
  • !OMP END PARALLEL

54
Combining PARALLEL and DO/for directives
  • OMP DO/for loops enclosed immediately within an
    OMP PARALLEL directive are so common that there
    is a shorthand method of declaring them in a
    merged format as shown below
  • SYNTAX
  • Fortran !OMP PARALLEL DO clauses
    do loop !OMP END PARALLEL DO
  • C/C pragma omp parallel for clauses
    for loop

55
OMP SECTIONS DIRECTIVE
  • This command is used to explicitly divide the
    work between threads. It provides a very
    straight-forward and explicit mechanism of
    sharing the work between the team of threads.
  • Each section is executed once ( and only once)
  • Each section will run on a separate thread.
  • There can be as many sections as needed and when
    there are more sections than processors the
    threads will do a round robin picking up and
    executing the other sections that have not yet
    been run.
  • Once all the sections have been allocated, the
    remaining threads wait at the END SECTIONS
    directive until all the threads complete their
    tasks. Only then the execution will continue past
    the ENDSECTIONS directive, unless a NOWAIT clause
    is specified. See later.

56
OMP SECTIONS
  • Syntax
  • Fortran OMP SECTIONS optional clauses
  • OMP SECTION
  • OMP END SECTIONS
  • C/C pragma omp sections optional clauses
  • pragma omp section
  • structured-block
  • pragma omp section
  • structured-block

57
Sections schematic
Serial
  • Example !OMP PARALLEL !OMP SECTIONS !OMP
    SECTION call init(x) !OMP SECTION
    call init(y) !OMP SECTION call
    init(z) !OMP END SECTIONS !OMP END PARALLEL

idle
idle
idle
idle
idle
idle
init(x)
init(y)
init(z)
Serial
58
OMP Sections directive optional clauses
  • OMP sections directive can take the following
    optional clauses
  • Private
  • Firstprivate
  • Lastprivate
  • Reduction

59
SINGLE directive
  • Within parallel region, SINGLE directive
    indicates that the block of code is executed on a
    single thread.
  • The first thread to reach the SINGLE directive
    will execute the block
  • Other threads will wait at the end of the block
    before continuing ( unless there is a NOWAIT
    clause).
  • Syntax
  • Fortran !OMP SINGLE clauses
    block !OMP END SINGLE
  • C/C pragma omp single clauses
    structured block

60
SINGLE execution schematic
setup(x)
setup(x)
setup(x)
setup(x)
setup(x)
setup(x)
  • Example
  • pragma omp parallel setup(x) pragma omp
    single input(y) work(x,y)

input(y)
idle
idle
idle
idle
idle
work(x,y)
work(x,y)
work(x,y)
work(x,y)
work(x,y)
work(x,y)
61
MASTER directive
  • Code block is only executed on master thread
    (thread 0).
  • Different from SINGLE
  • Other threads skip the block and continue
    executing! ( i.e. there is no implied barrier at
    the end of the block)
  • Most often used for I/O.
  • Fortran !OMP MASTER block
    !OMP END MASTER
  • C/C pragma omp master
    structured block

62
Ordered execution
  • Can specify code within a loop that must be done
    in the order it would be done if executed
    sequentially.
  • Fortran !OMP ORDERED
    structured-block !OMP END ORDERED
  • C/C pragma omp ordered
    structured-block
  • Can only be used inside a DO/for directive that
    has the ORDERED clause specified.
  • ORDERED is useful for debugging but mainly
    included for completeness.

63
ORDERED example
  • !OMP PARALLEL DO ORDERED do j1,n
  • !OMP ORDERED write(,) j,count(j)
  • !OMP END ORDERED end do
  • !OMP END PARALLEL DO

64
!OMP WORKSHARE (Fortran90 only)
  • !OMP WORKSHARE and !OMP END WORKSHARE define a
    block of region within which the work is divided
    into separate units and allocated to the
    available threads. These units of tasks are then
    executed in parallel while making sure that each
    unit is executed once and only once and the
    integrity of the program is not violated by
    ensuring that the execution of a statement
    appears to occur after the completion of the
    previous statement.
  • This is all done by the compiler without the user
    having to worry about what is a unit of work,
    however worksharing will only be possible if
  • The workshare block must only contain array or
    scalar assignment
  • The block must not contain calls to user routines
  • The block may contain FORALL and WHERE Fortran95
    statements.
  • Obviously no branch in or out of the block is
    allowed.
  • EXAMPLE Given the conformable arrays-
    a,b,c,d,e,f,g
  • a bc def ga/d
  • The first two multiplications can be carried out
    in parallel.
  • But the / operation must ensure that the updated
    values of each element of a and d are used.
    These considerations are taken care of by the
    compiler when worksharing.

65
WORKSHARE example
  • !OMP PARALLEL WORKSHARE
  • A B C
  • WHERE (D .ne. 0) E1/D
  • FORALL (i1n,j1m) X(i,j)1
  • !OMP END PARALLEL WORKSHARE

66
End of Part 1
About PowerShow.com