OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel

About This Presentation

Title:

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel

Description:

OpenMP for Networks of SMPs. Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ... To enable the programmer to reply on a single, standard, shared-memory ... – PowerPoint PPT presentation

Number of Views:224

Avg rating:3.0/5.0

Slides: 26

Provided by: vicky48

Category:

more less

Transcript and Presenter's Notes

Title: OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel

1
OpenMP for Networks of SMPsY. Charlie Hu,
Honghui Lu, Alan L. Cox, Willy Zwaenepoel

ECE1747 Parallel Programming
Vicky Tsang

2
Background

Published in the Journal of Parallel and
Distributed Computing, vol. 60 (12), pp.
1512-1530, December 2000
Work to further improve TreadMarks
Presents an alternative solution to MPI

3
Roadmap

Motivation
Solution
OpenMP API
TreadMarks
OpenMP Translator
Performance Measurement
Results
Conclusion

4
Motivation

To enable the programmer to reply on a single,
standard, shared-memory API for parallelization
within and between multiprocessors.
To provide another standard other than MPI?

5
Solution

Presents the first system that implements OpenMP
on a network of shared-memory multiprocessors
Implemented via a translator converting OpenMP
directives to calls in modified TreadMarks
Modified TreadMarks uses POSIX threads for
parallelism within an SMP node

6
Solution

Original version of TreadMarks
A Unix process was executed on each processor of
the multiprocessor node and communication between
processes was achieved through message passing
Fails to take advantage of hardware shared memory

7
Solution

Modified version of TreadMarks
POSIX threads used to implement parallelism
OpenMP threads within a multiprocessor share a
single address space
Positive
Reduces the number of changes to TreadMarks to
support multithreading on a multiprocessor
OS maintains the coherence of page mappings
automatically
Negative
More difficult to provide uniform sharing of
memory between threads on the same node and
threads on different nodes

8
OpenMP API

Three kinds of directives
Parallelism/work sharing
Data environment
Synchronization
Based on a fork-join model
Sequential code sections executed by master
thread
Parallel code sections are executed by all
threads, including the master thread

9
OpenMP API

Parallel directive all threads perform the same
computation
Work sharing directive computation is divided
among the threads
Data environment directive control the sharing
of program variables
Synchronization directive control the
synchronization between threads

10
TreadMarks

User-level SDSM system
Provides a global shared address space on top of
physically distributed memories
Key functions performed are memory coherence and
synchronization

11
TreadMarks Memory Coherence

Minimize the amount of communication performed to
maintain memory consistency by
a lazy implementation of release consistency
reducing the impact of false sharing by allowing
multiple concurrent writers to modify a page
Propagation of consistency information is
postponed until the time of an acquire

12
TreadMarks - Synchronization

Barrier implemented as acquire and release
messages
Governed by a centralized manager

13
TreadMarks Modifications for OpenMP

Inclusion of two primitives
Tmk_fork
Tmk_join
All threads created at the start of a programs
execution to minimize overhead.
Slave threads are blocked during sequential
execution until the next Tmk_fork is issued by
the master thread.

14
TreadMarks Modifications for Networks of
Multiprocessors

POSIX thread enabled sharing of data between
processors. Addition of some data structures,
such as message buffers, in thread-private memory
for data that is to remain private within a
thread.
A per-page mutex was added to allow greater
concurrency in the page fault handler.
Synchronization functions in TreadMarks were
modified to use POSIX thread-based
synchronization between processors within a node
and existing TreadMarks synchronization functions
between nodes.
A second mapping was added for the memory that is
shared between nodes so shared-memory pages can
be updated while the first mapping remains
invalid until the update is complete. This
reduces the number of page protection operations
performed by TreadMarks.

15
OpenMP Translator

Synchronization directives translate directly to
TreadMarks synchronization operations.
The complier translates the code sections marks
with parallel directives to fork-join code.
Data environment directives implemented to work
with both TreadMarks and POSIX threads, hiding
the interface issues from the programmer.

16
Performance Measurement

Platform
IBM SP2 consisting of four SMP nodes
Per node
Four IBM PowerPC 604 processors
1 GB memory
Running AIX 4.2

17
Performance Measurement

Applications
SPLASH-2 Barnes-Hut
NAS 3D-FFT
SPLASH-2 CLU
SPLASH-2 Water
Red-Black SOR
TSP
Modified Gramm-Schmidt (MGS)

18
Results
19
Results
20
Results
21
Results
22
Conclusion

Enables the programmer to rely on a single,
standard, shared-memory API for parallelization
within and between multiprocessors.
Using shared hardware memory reduced data and
messages transmitted.
The speedups of multithreaded TreadMarks codes on
four four-way SMP SP2 nodes are within 7-30 of
the MPI versions.

23
Critique

Solution allows easier implementation of program
parallelization across multiprocessors if speedup
is not crucial
OpenMP is easier on the programmer but speedup
still not as good as MPI

24
Critique

Issues
AIX has inefficient implementation of page
protection
Paper claims that every other brand of Unix,
including Linux, uses data structures that handle
mprotect operations more efficiently
Why wasnt the solution implemented on another
platform?
Paper failed to present a big motivation for
using this solution over MPI.

25
Thank You

Write a Comment

User Comments (0)