Atomic Sections - PowerPoint PPT Presentation

About This Presentation
Title:

Atomic Sections

Description:

Source: developed based on MAMBO/HPCS Inner-Product program. The Gram Schmidt Orthonormalization ... Source: IBM MAMBO benchmarks. Radix Sort. Random access ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 12
Provided by: capsl
Category:
Tags: atomic | mambo | sections

less

Transcript and Presenter's Notes

Title: Atomic Sections


1
Atomic Sections A Design and Evaluation Study
under OpenMP-XN
Presentors Joseph Bryant Manzano
Franco Yuan Zhang Guang R. Gao Computer
Architecture and Parallel System
Laboratory University of Delaware In
Collaboration with Kemal Ebcioglu, Vivek
Sarkar X10 Team, IBM
2
Context IBM PERCS/X10 project
  • DARPA HPCS program Phase 2 focuses on evaluating
    new technologies for productivity and performance.

PERCS Programming Tools performance-guided
parallelization and transformation, static
dynamic checking, separation of concerns --- all
integrated into a single development environment
(Eclipse)
Atomic Section Milestone 4 Under OpenMP-XN
  • Focusing only on extensions from familiar SPMD
    model is essential to PERCS programming model
  • Facilitate comparative studies on a large body
    of OpenMP code
  • Permit an early study to begin before X10
    project is underway.
  • Permit risky ideas to be studied under a more
    focused context.
  • The algorithms developed can become a basis for
    future implementation and integration under the
    X10 framework.

PERCS Programming Model
OpenMP
MPI
Static and Dynamic Compilers for base language w/
programming model extensions Mature languages
C/C, Fortran, Java Emerging languages UPC,
StreamIt Experimental language X10
Language Runtime Dynamic Compilation
Continuous Optimization
PERCS System Software (K42, vHype)
PERCS System Hardware
3
Major Goals
  • We show how the OpenMP can be extended with the
    concept of atomic section.
  • We develop a methodology of implementation of
    analyzable atomic sections that include three
    steps (1) identify the consistency-list for an
    atomic section (2) assignment of locks to
    concurrent atomic sections to expose maximum
    parallelism with minimum cost of locks (3)
    placement of fine-grained synchronization
  • We develop an OpenMP-XN prototype implementation
    framework. We report the results and analysis of
    our experiments on some selected set of
    benchmarks and their analysis.
  • We have conducted a productivity study that show
    how OpenMP-XN with analyzable atomic sections can
    improve the programming productivity via examples
    measured by time to the first correct
    implementation.

4
Random Access
The one-dimensional array is passed by reference
Initialized the table
Start a parallel region, and specify the shared
and private data
Initialize ran for each thread with a random
number seeded by the thread id and scheduling
information
Each thread begins to execute some iterations of
for loop
Atomic Section synchronizes accesses to shared
table by making its operations atomic and
mutually exclusive.
Atomic section (AS) a section of code that is
intended to be executed atomically, and mutually
exclusive with other conflicting atomic
operations.
5
Atomic Section
  • A section of code that is intended to be
    executed atomically, and mutually exclusive with
    other conflicting atomic operations.

6
OpenMP-XN Runtime Model
7
Atomic Section Implementation
Note The five-step process is produced
automatically by the OpenMP-XN compiler. High
productivity programmers need not know about the
lock assignment and data replacement.
An atomic section is implemented as a five-step
process (1) acquire lock (2) refresh (3)
computation (4) write-back (5) release lock.
8
Atomic Section Implementation contd.
  • Assumptions
  • No nested atomic sections
  • No nested parallel regions
  • A Three-Step Approach
  • Consistency List Analysis (CLA)
  • Given an OpenMP-XN program, analyze each atomic
    section and identify shared data which might be
    read or written within that atomic section.
  • Lock Refinement and Assignment (LRA)
  • Given an OpenMP-XN program, assign one or more
    locks to guard the entrance of each atomic
    section, so that any pair of concurrent atomic
    sections that might access the same shared data
    will be guarded by the same lock.
  • Generation of Consistency Actions (GCA)
  • Generate refresh and write_back operations in
    atomic sections so that the runtime number of
    these operations is minimized.

9
OpenMP-XN Experimental Testbed Structure (based
on Omni)
10
Experiments
  • A preliminary implementation of AS in OpenMP-XN
    has been completed and tested
  • Performance analysis based on OpenMP-XN on
    DARPA HPCS benchmarks and other benchmarks is in
    progress
  • A productivity study on AS has been conducted

Benchmark
Micro-Benchmarks A set of small benchmarks to test the performance of atomic sections Source Delaware internal benchmarks
Delaware Banker A simple simulator of bank transactions. Implemented in parallel Source Delaware internal benchmarks
TAMMP Toy Another Molecular Mechanics Program. Kernel of the SPEC OMP molecular dynamics benchmark, ammp.
Random access Random access benchmark modified to run under OpenMP XN Source HPC Challenge (modified version)
Radix Sort Implementation of the parallel integer radix sort algorithm Source IBM MAMBO benchmarks
The Gram Schmidt Orthonormalization Compose of dot product derived from IBM benchmarks Source developed based on MAMBO/HPCS Inner-Product program
11
Preliminary Experimental Results
  • Current OpenMP-XN platform permits users to
    collect/derive
  • Execution Time
  • Speedup Curves
  • Performance Statistics
  • Cache consistency traffic
  • Cache misses
  • Number of memory operations
  • CPU cycles of each computation unit in program
  • Case Study
  • Test bed Sun UltraSPARC III, 4 CPU, 400 MHz
  • Benchmark Random access
  • Problem Size 214
  • Compare Atomic Section with Critical Section
  • With right architectural support, atomic section
    will not introduce performance overhead.
  • Delaware tool chain can help for more in-depth
    performance study e.g. the following is one of
    interesting observation
  • Preliminary Performance Observations
  • On conventional hardware platforms, the memory
    wall (especially cache consistency traffic) is a
    bottleneck for performance improvement.
  • Atomic section performance potential requires
    architectural innovations.

Number of snoops averaged over 240 runs
Critical Section 2027.6
Atomic Section 1185.6
Observation It appears that the OpenMP-XN
runtime model based on atomic sections reduce the
number of coherence transactions considerably for
this example, compared to standard OpenMP
critical sections. However, more study is needed
for further explanation and exploration.
Future results will be obtained from PowerPC
systems (work already in progress)
Write a Comment
User Comments (0)
About PowerShow.com