OpenMP - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

OpenMP

Description:

Each thread grabs 'chunk' iterations off a queue until all iterations have been handled. ... Threads dynamically grab blocks of iterations. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 67
Provided by: reaha
Category:
Tags: openmp | grabs

less

Transcript and Presenter's Notes

Title: OpenMP


1
OpenMP
2
OpenMP
  • OpenMP Overview
  • Goals
  • OpenMP constructs
  • Parallel Regions
  • Work Sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment Variables
  • Major Errors
  • Future of OpenMP 3.o

3
Overview
  • Stands for Open specifications for Multi
    Processing
  • A set of API for writing multi threaded
    applications
  • OpenMPs contructs are Made up of compiler
    directives.
  • Used with C/C and Fortran Languages.

4
Overview
  • Thread based parallism multiple threads created
    and run on the same shared memory
  • Uses Fork-Join Model A master process and slaves
    being forked and then joined again.

5
OpenMP Release History
  • 1997 OpenMP Fortran 1.0
  • 1998 OpenMP C/C 1.0
  • 1999 OpenMP Fortran 1.1
  • 2000 OpenMP Fortran 2.0
  • 2002 OpenMP C/C 2.0
  • ?? OpenMP 3.0 ??

6
OpenMP
  • OpenMP Overview
  • Goals
  • OpenMP constructs
  • Parallel Regions
  • Work Sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment Variables
  • Major Errors
  • Future of OpenMP 3.o

7
Goals
  • Standardization
  • Provide a standard among a variety of shared
    memory architectures/platforms
  • Lean and Mean
  • Establish a simple and limited set of directives
    for programming shared memory machines.
    Significant parallelism can be implemented by
    using just 3 or 4 directives.

8
Goals
  • Ease of Use
  • Provide capability to incrementally parallelize a
    serial program
  • Provide the capability to implement both
    coarse-grain and fine-grain parallelism
  • Portability
  • Supports Fortran (77, 90, and 95), C, and C

9
OpenMP
  • OpenMP Overview
  • Goals
  • OpenMP constructs
  • Parallel Regions
  • Work Sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment Variables
  • Major Errors
  • Future of OpenMP 3.o

10
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

11
OpenMP Constructs
  • Format of any OpenMP construct in C/C
  • pragma omp directivename clauses..
  • parallel code()

12
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

13
Parallel Regions Directive
  • Indicates a block of code that will be executed
    by multiple threads.
  • This is the fundamental OpenMP parallel
    construct.

14
Parallel Regions Directive
  • includeltomp.hgt
  • Void main()
  • int x
  • Setof sequential code()
  • pragma omp parallel
  • setof parallel code()
  • another set of sequential code()

15
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

16
Work sharing Directive
  • Includes
  • For-construct
  • Section construct
  • Single construct
  • For Construct
  • Used for splitting work among all available
    threads.
  • Splitting of jobs depends on the SCHEDULE
    clause used.

17
For Directive
  • pragma omp parallel
  • pragma omp for
  • for(int I0IltnI)
  • aIbIaI
  • The same program could be written without the
    for construct but with more programming steps.

18
For Directive
  • pragma omp parallel
  • int id, i, Nthrds, istart, iend
  • id omp_get_thread_num()
  • Nthrds omp_get_num_threads()
  • istart id N / Nthrds
  • iend (id1) N / Nthrds
  • for(iistartIltiendi) ai ai bi

19
For Directive
  • pragma omp parallell
  • pragma omp for schedule(static)
  • for(i0IltNi) ai ai bi

20
Schedule Clause
  • The schedule clause effects how loop iterations
    are mapped onto threads
  • schedule(static ,chunk)
  • Deal-out blocks of iterations of size chunk
    to each thread.
  • schedule(dynamic,chunk)
  • Each thread grabs chunk iterations off a
    queue until all iterations have been handled.

21
Schedule Clause
  • schedule(guided,chunk)
  • Threads dynamically grab blocks of iterations.
    The size of the block starts large and shrinks
    down to size chunk as the calculation proceeds.
  • schedule(runtime)
  • Schedule and chunk size taken from the
  • OMP_SCHEDULE environment variable.

22
Section Directive
  • The SECTIONS directive is a non-iterative
    work-sharing construct.
  • It specifies that the enclosed section(s) of code
    are to be divided among the threads in the team.
  • Independent SECTION directives are nested within
    a SECTIONS directive.

23
Section Directive
  • What if number of threads greater than the number
    of sections?
  • What if the number of sections greater than the
    threads?

24
Section Directive
  • pragma omp parallel
  • pragma omp sections
  • pragma omp section
  • stepsexecutedbyone()
  • pragma omp section
  • stepsexecutedbyanotherone()

25
Single Directive
  • One thread only will execute the single section,
    while the others will do nothing.
  • pragma omp single

26
Parallel Regions and work sharing Directives
  • A parallel region directive could be combined
  • with a work-sharing construct.
  • pragma omp parallel for ScheduleClause
  • pragma omp parallel sections

27
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

28
Data Scope Attribute Clauses
  • An important consideration for OpenMP programming
    is the understanding and use of data scoping
  • Because OpenMP is based upon the shared memory
    programming model, most variables are shared by
    default

29
Data Scope Attribute Clauses
  • The OpenMP Data Scope Attribute Clauses are used
    to explicitly define how variables should be
    scoped. They include
  • PRIVATE
  • FIRSTPRIVATE
  • LASTPRIVATE
  • THREADPRIVATE
  • COPYIN
  • SHARED
  • DEFAULT
  • REDUCTION

30
Data Scope Attribute Clauses
  • The SHARED clause declares variables in its list
    to be shared among all threads in the team.
  • The PRIVATE clause declares variables in its list
    to be private to each thread.
  • The FIRSTPRIVATE clause initializes the value of
    the variables in its list. It is initialized to
    its value prior entering the parallel region

31
Data Scope Attribute Clauses
  • The LASTPRIVATE copies the value of the last loop
    iteration or section to the original variable
    object.
  • Without the LASTPRIVATE clause, the value of the
    original private object at the end of the
    execution will be UNDEFINED.
  • The THREADPRIVATE directive is used to make
    global file scope variables local to a thread
    through the execution of multiple parallel
    regions.

32
Data Scope Attribute Clauses
  • The COPYIN clause used for initialization of
    THREADPRIVATE variable.
  • THREADPRIVATE differs from PRIVATE
  • The DEFAULT clause allows the user to specify a
    default PRIVATE, SHARED, or NONE scope.

33
Data Scope Attribute Clauses
  • The REDUCTION clause performs a reduction on the
    variables that appear in its list.
  • A private copy for each list variable is created
    for each thread.
  • At the end of the reduction, the reduction
    variable is applied to all private copies of the
    shared variable, and the final result is written
    to the global shared variable.
  • They must also be declared SHARED in the
    enclosing context.

34
  • include ltomp.hgt
  • main ()
  • int i, n, chunk
  • float a100, b100, result
  • n 100 chunk 10 result 0.0
  • for (i0 i lt n i) ai i 1.0 bi
    i 2.0
  • pragma omp parallel for\
  • default(shared) private(i) \
  • schedule(static,chunk) \
  • reduction(result)
  • for (i0 i lt n i)
  • result result (ai bi)
  • printf("Final result f\n",result)

35
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

36
Synchronization Directives
  • To avoid inconsistency between shared variables,
    variables must be synchronized between the
    threads to insure that the correct result is
    always produced.
  • OpenMP provides a variety of Synchronization
    Constructs that control how the execution of each
    thread proceeds relative to other team threads.

37
Synchronization Constructs
  • Barrier
  • NoWait
  • Critical
  • Atomic
  • Ordered
  • Master
  • Flush

38
Synchronization Directives
  • When a BARRIER directive is reached, a thread
    will wait at that point until all other threads
    have reached that barrier.
  • Implicit barriers are applied at
  • End parallel regions
  • End of work sharing constructs(for,sections,single
    )
  • End of critical sections

39
Synchronization Directives
  • NoWait is a construct that overcomes the implicit
    barriers.
  • It is used with
  • Parallel Regions Directives
  • Work sharing Directives

40
Synchronization Directives
  • The CRITICAL directive specifies a region of code
    that must be executed by only one thread at a
    time
  • It blocks all other threads until the current
    thread exits that CRITICAL region.
  • pragma omp critical name
  • The optional name enables multiple different
    CRITICAL regions to exist
  • Different CRITICAL regions with the same name are
    treated as the same region.

41
Synchronization Directives
  • The ATOMIC directive specifies that a specific
    memory location must be updated atomically,
    rather than letting multiple threads attempt to
    write to it.
  • Provides a mini-CRITICAL section.
  • The MASTER directive specifies a region that is
    to be executed only by the master thread of the
    team. All other threads on the team skip this
    section of code

42
Synchronization Directives
  • The FLUSH directive identifies a synchronization
    point at which the implementation must provide a
    consistent view of memory. Thread-visible
    variables are written back to memory at this
    point.
  • FLUSH is implied implicitly with these
    directives
  • critical - upon entry and exit
  • Barrier
  • ordered - upon entry and exit
  • parallel - upon exit
  • for - upon exit
  • sections - upon exit
  • single - upon exit

43
Synchronization Directives
  • The ORDERED directive specifies that iterations
    of the enclosed loop will be executed in the same
    order as if they were executed on a serial
    processor.
  • A loop which contains an ORDERED directive, must
    be a loop with an ORDERED clause.

44
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

45
Runtime Library Routines
  • The OpenMP standard defines an API for library
    calls that perform a variety of functions
  • Query the number of threads/processors, set
    number of threads to use
  • General purpose locking routines (semaphores)
  • Set execution environment functions nested
    parallelism, dynamic adjustment of threads.

46
Runtime Library Routines
  • sets the number of threads that will be used in
    the next parallel region.
  • void omp_set_num_threads(int num_threads)
  • returns the number of threads that are currently
    in the team executing the parallel region from
    which it is called.
  • int omp_get_num_threads(void)
  • returns the maximum value that can be returned by
    a call to the OMP_GET_NUM_THREADS function.

47
  • int omp_get_max_threads(void)
  • returns the thread number of the thread, within
    the team, making this call. This number will be
    between 0 and OMP_GET_NUM_THREADS-1. The master
    thread of the team is thread 0
  • int omp_get_thread_num(void)
  • returns the number of processors that are
    available to the program.
  • int omp_get_num_procs(void)
  • Used to determine if the section of code which is
    executing is parallel or not.
  • int omp_in_parallel(void)

48
Runtime Library Routines
  • By default, a program with multiple parallel
    regions will use the same number of threads to
    execute each region.
  • This behavior can be changed to allow the
    run-time system to dynamically adjust the number
    of threads that are created for a given parallel
    section.
  • To enables or disables dynamic adjustment (by the
    run time system) of the number of threads
    available for execution of parallel regions.
  • void omp_set_dynamic(int dynamic_threads)

49
Runtime Library Routines
  • used to determine if dynamic thread adjustment is
    enabled or not.
  • int omp_get_dynamic(void)
  • A parallel region nested within another parallel
    region results in the creation of a new team,
    consisting of one thread, by default.
  • used to enable or disable nested parallelism.
  • void omp_set_nested(int nested)

50
Runtime Library Routines
  • used to determine if nested parallelism is
    enabled or not.
  • void omp_get_nested ()
  • to determine if nested parallelism is enabled or
    not.
  • int omp_get_nested(void)

51
Runtime Library Routines
  • For the Lock routines/functions
  • The lock variable must be accessed only through
    the locking routines
  • The lock variable must have type omp_lock_t or
    type omp_nest_lock_t, depending on the function
    being used.
  • initializes a lock associated with the lock
    variable.
  • void omp_init_lock(omp_lock_t lock)
  • void omp_nest_init_lock(omp_nest_lock_t lock)

52
Runtime Library Routines
  • disassociates the given lock variable from any
    locks.
  • void omp_destroy_lock(omp_lock_t lock)
  • voidomp_destroy_nest__lock(omp_nest_lock_t lock)
  • forces the executing thread to wait until the
    specified lock is available.
  • void omp_set_lock(omp_lock_t lock)
  • Void omp_set_nest__lock(omp_nest_lock_t lock)

53
OpenMP Constructs
  • The OpenMP constructs fall into five main
    categories
  • Parallel Region Directive
  • Work-sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment variables

54
Environment Variables
  • OMP_SCHEDULE
  • Applies only to for, parallel for directives
    which have their schedule clause set to RUNTIME.
  • The value of this variable determines how
    iterations of the loop are scheduled on
    processors.
  • For example
  • setenv OMP_SCHEDULE "guided, 4" setenv
    OMP_SCHEDULE "dynamic"

55
  • OMP_NUM_THREADS
  • Sets the maximum number of threads to use
    during execution.
  • For example
  • setenv OMP_NUM_THREADS 8
  • OMP_DYNAMIC
  • Enables or disables dynamic adjustment of the
    number of threads available for execution of
    parallel regions.
  • Valid values are TRUE or FALSE.
  • For example
  • setenv OMP_DYNAMIC TRUE

56
  • OMP_NESTED
  • Enables or disables nested parallelism.
  • Valid values are TRUE or FALSE.
  • For example
  • setenv OMP_NESTED TRUE

57
OpenMP
  • OpenMP Overview
  • Goals
  • OpenMP constructs
  • Parallel Regions
  • Work Sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment Variables
  • Major Errors
  • Future of OpenMP 3.o

58
Major Errors
  • Race Conditions
  • The outcome of a program depends on detailed
    timing of the thread in the team.
  • Deadlock
  • Threads lock up waiting on a locked resource that
    will never become free.

59
Race Conditions
  • pragma omp parallel sections
  • pragma omp section
  • BAC
  • pragma omp section
  • CBA

60
DeadLock
  • omp_init_lock(lcka)
  • omp_init_lock(lckb)
  • pragma omp parallel sections
  • b)
  • pragma omp section
  • omp_init_lock(lcka)
  • omp_init_lock(lckb)
  • use a and b()
  • omp_destroy_lock(lckb)
  • omp_destroy_lock(lcka)
  • pragma omp section
  • omp_init_lock(lckb)
  • omp_init_lock(lcka)
  • use b and a()
  • omp_destroy_lock(lcka)
  • omp_destroy_lock(lckb)

61
OpenMP
  • OpenMP Overview
  • Goals
  • OpenMP constructs
  • Parallel Regions
  • Work Sharing Directives
  • Data Scope Attribute Clauses
  • Synchronization Directives
  • Runtime Library Routines
  • Environment Variables
  • Major Errors
  • Future of OpenMP 3.o

62
OpenMP 3.0
  • Collapsing of Multiple Parallel loops
  • Automatic Data Scoping

63
Collapsing
  • Reduces overhead relative to nested
    parallelization
  • Produces more mistakes
  • Nested Parallelization
  • pragma omp parallel for
  • for(int i0iltni)
  • pragma omp forc
  • for(int j0jltmj)
  • functions()

64
Collapsing
  • Collapsing
  • pragma omp parallel for collapse(2)
  • for(int i0iltni)
  • for(int j0jltmj)
  • functions()

65
Automatic Data Scoping
  • Create a standard way to ask the compiler to
    figure out data scoping automatically.
  • pragma omp parallel for default(autoscope)
  • for(j0jltCOUNTj)
  • calculation()

66
END
Write a Comment
User Comments (0)
About PowerShow.com