Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* - PowerPoint PPT Presentation

Loading...

PPT – Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* PowerPoint presentation | free to view - id: 14bf8f-NGJkY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP*

Description:

Objective: To further study the shared memory model of parallel programming. ... 4 x 512 Mb SDRAM DIMMs. 2 x 9.1 Gb Ultra SCSI HDD. Jan. 10, 2001. Parallel Processing ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 19
Provided by: JeremyR91
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP*


1
Parallel Processing (CS 730) Lecture 5
Shared Memory Parallel Programming with OpenMP
  • Jeremy R. Johnson
  • Wed. Jan. 31, 2001
  • Parts of this lecture was derived from chapters
    1-2 in Chandra et al.

2
Introduction
  • Objective To further study the shared memory
    model of parallel programming. Introduction to
    the OpenMP standard for shared memory parallel
    programming
  • Topics
  • Introduction to OpenMP
  • hello.c
  • hello.f
  • Loop level parallelism
  • Shared vs. private variables
  • Synchronization (implicit and explicit)
  • Parallel regions

3
OpenMP
  • Extension to FORTRAN, C/C
  • Uses directives (comments in FORTRAN, pragma in
    C/C)
  • ignored without compiler support
  • Some library support required
  • Shared memory model
  • parallel regions
  • loop level parallelism
  • implicit thread model
  • communication via shared address space
  • private vs. shared variables (declaration)
  • explicit synchronization via directives (e.g.
    critical)
  • library routines for returning thread information
    (e.g. omp_get_num_threads(), omp_get_thread_num()
    )
  • Environment variables used to provide system info
    (e.g. OMP_NUM_THREADS)

4
Benefits
  • Provides incremental parallelism
  • Small increase in code size
  • Simpler model than message passing
  • Easier to use than thread library
  • With hardware and compiler support smaller
    granularity than message passing.

5
Further Information
  • Adopted as a standard in 1997
  • Initiated by SGI
  • www.openmp.org
  • Chandra, Dagum, Kohr, Maydan, McDonald, Menon,
    Parallel Programming in OpenMP, Morgan Kaufman
    Publishers, 2001.

6
Shared vs. Distributed Memory
P0
P1
Pn
...
M0
M1
Mn
Interconnection Network
Shared memory
Distributed memory
7
Shared Memory Programming Model
  • Shared memory programming does not require
    physically shared memory so long as there is
    support for logically shared memory (in either
    hardware or software)
  • If logical shared memory then there may be
    different costs for accessing memory depending on
    the physical location.
  • UMA - uniform memory access
  • SMP - symmetric multi-processor
  • typically memory connected to processors via a
    bus
  • NUMA - non-uniform memory access
  • typically physically distributed memory connected
    via an interconnection network

8
IBM S80
  • An SMP with upto 24 processors (RS64 III
    processors)
  • http//www.rs6000.ibm.com/hardware/enterprise/s80.
    html
  • Name Goopi.coe.drexel.edu
  • Machine type S80 12-Way with 8Gb RAM
  • Specifications
  • 2 x 6 way 450 MHz RS64 III Processor
    Card, 8Mb L2 Cache
  • 2 x 4096 Mb Memory
  • 9 x 18.2 Gb Ultra SCSI Hot Swappable Hard
    Disk Drives.
  • Name bagha.coe.drexel.edu
  • Machine Type 44P Model 270 4 way with 2 Gb RAM
  • Specifications
  • 2 x 2 way 375 MHz POWER3-II Processor, 4
    Mb L2 Cache
  • 4 x 512 Mb SDRAM DIMMs
  • 2 x 9.1 Gb Ultra SCSI HDD

9
hello.c
  • include ltstdio.hgt
  • int main(int argc, char argv)
  • int n
  • n atoi(argv1)
  • omp_set_num_threads(n)
  • printf("Number of threads d\n",omp_get_num_thre
    ads())
  • pragma omp parallel
  • int id omp_get_thread_num()
  • if (id 0)
  • printf("Number of threads d\n",omp_get_num_th
    reads())
  • printf("Hello World from d\n",id)
  • exit(0)

10
hello.f
  • program hello
  • integer omp_get_thread_num,
    omp_get_num_threads
  • print , "Hello parallel world from
    threads"
  • print , "Num threads ",
    omp_get_num_threads()
  • !omp parallel
  • print , "Num threads ",
    omp_get_num_threads()
  • print , omp_get_thread_num()
  • !omp end parallel
  • print , "Back to the sequential world"
  • end

11
Compiling and Executing OpenMP Programs on the
IBM S80
  • To compile a C program with OpenMP directives
  • cc_r -qsmpomp hello.c -o hello
  • To compile a Fortran program with OpenMP
    directives
  • xlf_r -qsmpomp hello.f -o hello
  • The environment variable OMP_NUM_THREADS controls
    the number of threads used in OpenMP parallel
    regions. It can be set from the C shell
  • setenv OMP_NUM_THREAD ltcountgt
  • where ltcountgt is a positive integer

12
Parallel Loop
  • subroutine saxpy(z, a, x, y, n)
  • integer I, n
  • read z(n), a, x(n), y
  • do i 1, n
  • z(i) a x(i) y
  • end do
  • return
  • end

subroutine saxpy(z, a, x, y, n)
integer I, n read z(n), a, x(n), y !omp
parallel do do i 1, n z(i)
a x(i) y end do return
end
13
Execution Model
Master thread
Implicit thread creation
Parallel Region
Master and slave threads
Implicit barrier synchronization
Master thread
14
More Complicated Exampe
  • Real8 x, y
  • integer i, j, m, n, maxiter
  • integer depth(,)
  • integer mandel_val
  • maxiter 200
  • do i 1, m
  • do j 1, m
  • x i/real(m)
  • y j/real(n)
  • depth(j,i) mandel_val(x, y, maxiter)
  • end do
  • end do

15
Parallel Loop
  • !omp parallel do private(j,x, y)
  • maxiter 200
  • do i 1, m
  • do j 1, m
  • x i/real(m)
  • y j/real(n)
  • depth(j,i) mandel_val(x, y,
    maxiter)
  • end do
  • end do
  • !omp end parallel do

16
Parallel Loop
  • maxiter 200
  • !omp parallel do private(j,x, y)
  • do i 1, m
  • do j 1, m
  • x i/real(m)
  • y j/real(n)
  • depth(j,i) mandel_val(x, y,
    maxiter)
  • end do
  • end do
  • !omp end parallel do

17
Explicit Synchronization
  • maxiter 200
  • total_iters 0
  • !omp parallel do private(j,x, y)
  • do i 1, m
  • do j 1, m
  • x i/real(m)
  • y j/real(n)
  • depth(j,i) mandel_val(x, y,
    maxiter)
  • !omp critical
  • total_iters total_iters
    depth(j,I)
  • !omp end critical
  • end do
  • end do
  • !omp end parallel do

18
Reduction Variables
  • maxiter 200
  • total_iters 0
  • !omp parallel do private(j,x, y)
  • !omp reduction(total_iters)
  • do i 1, m
  • do j 1, m
  • x i/real(m)
  • y j/real(n)
  • depth(j,i) mandel_val(x, y,
    maxiter)
  • total_iters total_iters
    depth(j,I)
  • end do
  • end do
  • !omp end parallel do
About PowerShow.com