Parallel Programming With OpenMP - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Parallel Programming With OpenMP

Description:

General Code Structure & Sample Examples. Pros & Cons Of OpenMP ... Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM ... – PowerPoint PPT presentation

Number of Views:237
Avg rating:3.0/5.0
Slides: 22
Provided by: lantanaT
Category:

less

Transcript and Presenter's Notes

Title: Parallel Programming With OpenMP


1
Parallel Programming With OpenMP
2
Contents
  • Overview of Parallel Programming OpenMP
  • Difference between OpenMP MPI
  • OpenMP Programming Model
  • OpenMP Environment Variable
  • OpenMP Clauses
  • OpenMP Runtime Routines
  • General Code Structure Sample Examples
  • Pros Cons Of OpenMP
  • Performance of one program (Serial vs Parallel)?

3
Parallel Programming
  • Decomposes an algorithm or data into parts, which
    are processed by multiple processors
    simultaneously.
  • Co-ordinates work and communication between
    processors.
  • Threaded applications are ideal for multi-core.

OpenMP
  • Open specifications for Multi Processing, based
    on a thread paradigm.
  • 3 primary component (Compiler Directives, Runtime
    Library Routines, Environment Variables).
  • Extensions for Fortran, C, C

4
OpenMP vs MPI
  • OpenMP
  • Shared Memory Model
  • Directive Based
  • Easier to program debug
  • Supported by gcc4.2 higher
  • MPI
  • Distributed Memory Model
  • Message Passing Style
  • More flexible scalable
  • Supported by MPICH2 library

5
Why OpenMP
6
OpenMP Programming Model
  • Shared Memory, Thread Based Parallelism.
  • Explicit Parallelism.
  • For-Join Model
  • Execution starts with one thread master thread.
  • Parallel regions fork off new threads on entry
    team thread.
  • Thread join back together at the end of the
    region only master thread continues.

7
OpenMP Environment Variables
  • OMP_SCHEDULE
  • OMP_NUM_THREADS
  • OMP_DYNAMIC
  • OMP_NESTED
  • OMP_THREAD_LIMIT
  • OMP_STACKSIZE

8
OpenMP Clauses
  • Data Scoping Clauses (shared, private, default)?
  • InitializationClauses (firstprivate, lastprivate,
    threadprivate)?
  • Data Copying Clauses (copyin, copyprivate)?
  • Worksharing Clauses (do/for directive, sections
    directive, single directive, parallel do/for,
    parallel sections)?
  • Scheduling Clauses (static, dynamic, guided)?
  • Synchronization Clauses (master, critical,
    atomic, ordered, barrier, nowait, flush)?
  • Reduction Clause (operator list)?

9
OpenMP Runtime Routines
  • To set get number of threads
  • OMP_SET_NUM_THREADS
  • OMP_GET_NUM_THREADS
  • To get the thread number of a thread, in a team
  • OMP_GET_THREAD_NUM
  • To get the number of processors available to the
    program
  • OMP_GET_NUM_PROCS
  • OMP_IN_PARALLEL
  • To enable or disable dynamic adjustment of the
    number of threads
  • OMP_SET_DYNAMIC

10
OpenMP Runtime Routines Cont.
  • To determine if dynamic thread adjustment is
    enabled or not.
  • OMP_GET_DYNAMIC
  • To initialise and disassociates a lock associated
    with the lock variable.
  • OMP_INIT_LOCK
  • OMP_DESTROY_LOCK
  • To own and release a lock
  • OMP_SET_LOCK
  • OMP_UNSET_LOCK
  • To use clock timing routine
  • OMP_GET_WTICK

11
General Code Structure
  • include ltomp.hgt
  • main ()
  • int var1, var2, var3
  • // Serial code
  • // Beginning of parallel section.
  • // Specify variable scoping
  • pragma omp parallel private(var1, var2)
    shared(var3)
  • // Parallel section executed by all threads
  • // All threads join master thread and disband
  • Resume serial code
  • omp keyword distinguishes the pragma as a OpenMP
    pragma and is processed by OpenMP compilers.

12
Parallel Region Example
  • include ltomp.hgt
  • main ()
  • int nthreads, tid
  • / Fork a team of threads
  • pragma omp parallel private(tid)
  • tid omp_get_thread_num() / Obtain thread id
    /
  • printf("Hello World from thread d\n", tid)
  • if (tid 0) / Only master thread does
    this /
  • nthreads omp_get_num_threads()
  • printf("Number of threads d\n", nthreads)
  • / All threads join master thread and
    terminate /

13
for Directive Example
  • include ltomp.hgt
  • define CHUNKSIZE 10
  • define N 100
  • main ()
  • int i, chunk
  • float aN, bN, cN
  • for (i0 i lt N i)?
  • ai bi i 1.0
  • chunk CHUNKSIZE
  • pragma omp parallel shared(a,b,c,chunk)
    private(i)
  • pragma omp for schedule(dynamic,chunk) nowait
  • for (i0 i lt N i)?
  • ci ai bi
  • / end of parallel section /

14
sections directive example
  • include ltomp.hgt
  • define N 1000
  • main ()
  • int i
  • float aN, bN, cN, dN
  • for (i0 i lt N i)
  • ai i 1.5 bi i 22.35
  • pragma omp parallel shared(a,b,c,d) private(i)
  • pragma omp sections nowait
  • pragma omp section
  • for (i0 i lt N i)?
  • ci ai bi
  • pragma omp section
  • for (i0 i lt N i)?
  • di ai bi
  • / end of sections /
  • / end of parallel section /

15
critical Directive Example
  • include ltomp.hgt
  • main()
  • int x
  • x 0
  • pragma omp parallel shared(x)
  • pragma omp critical
  • x x 1
  • / end of parallel section /

16
threadprivate Directive Example
  • include ltomp.hgt
  • int a, b, i, tid float x
  • pragma omp threadprivate(a, x)
  • main ()
  • / Explicitly turn off dynamic threads /
  • omp_set_dynamic(0)
  • printf("1st Parallel Region\n")
  • pragma omp parallel private(b,tid)
  • tid omp_get_thread_num()
  • a tid b tid x 1.1 tid 1.0
  • printf("Thread d a,b,x d d
    f\n",tid,a,b,x)
  • / end of parallel section /
  • printf("Master thread doing serial work
    here\n")
  • printf("2nd Parallel Region\n")
  • pragma omp parallel private(tid
  • tid omp_get_thread_num()
  • printf("Thread d a,b,x d d
    f\n",tid,a,b,x)
  • / end of parallel section /

17
reduction Clause Example
  • include ltomp.hgt
  • main ()
  • int i, n, chunk
  • float a100, b100, result
  • n 100 chunk 10 result 0.0
  • for (i0 i lt n i)
  • ai i 1.0 bi i 2.0
  • pragma omp parallel for default(shared)
    private(i) schedule(static,chunk)
    reduction(result)
  • for (i0 i lt n i)?
  • result result (ai bi)
  • printf("Final result f\n",result)

18
OpenMP - Pros and Cons
  • Pros
  • Simple
  • Incremental Parallelism.
  • Decomposition is handled automatically.
  • Unified code for both serial and parallel
    applications.
  • Cons
  • Runs only on shared-memory multiprocessor.
  • Scalability is limited by memory architecture.
  • Reliable error handling is missing.

19
Performance of arrayUpdate.c
  • Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667
    MHz DDR2 SDRAM
  • Array Size Serial (sec) Parallel (sec)?
  • 1000 0.000221 0.000389
  • 5000 0.001060 0.000999
  • 10000 0.002201 0.001323
  • 50000 0.011266 0.005892
  • 100000 0.22638 0.011715
  • 500000 0.114033 0.068110
  • 1000000 0.227713 0.123106
  • 5000000 1.134773 0.579176
  • 10000000 2.307644 1.151099
  • 50000000 12.536466 5.772921
  • 100000000 194.245929 58.532328

20
arrayUpdate.c Cont.
21
References
  • http//www.openmp.org/
  • Parallel Programming in OpenMP, Morgan Kaufman
    Publishers.

22
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com