A Multi-platform Co-Array Fortran Compiler for High-Performance Computing - PowerPoint PPT Presentation

1 / 1

About This Presentation

Title:

A Multi-platform Co-Array Fortran Compiler for High-Performance Computing

Description:

A Multi-platform Co-Array Fortran Compiler for High-Performance Computing ... Less restrictive memory fences at call site. Collective operations ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 2

Provided by: cohesi

Category:

more less

Transcript and Presenter's Notes

Title: A Multi-platform Co-Array Fortran Compiler for High-Performance Computing

1
A Multi-platform Co-Array Fortran Compiler for
High-Performance Computing Cristian Coarfa, Yuri
Dotsenko, John Mellor-Crummey dotsenko, ccristi,
johnmc_at_cs.rice.edu
Programming Ultra-scale Parallel Systems
Co-Array Fortran Language
Rice CAF Compiler
CAF Model Refinements

Parallel extension of Fortran 90
SPMD programming model
fixed number of images during execution
images operate asynchronously
Both private and shared data
real a(20,20) private a 20x20 array in
each image
real a(20,20) shared a 20x20 array in
each image
Simple one-sided communication (PUT GET)
x(,jj2) a(r,) pp2 copy rows from
pp2 into local columns
Flexible explicit synchronization
sync_team(team ,wait)
team a vector of process ids to synchronize
with
wait a vector of processes to wait for
Pointers and dynamic allocation

Source-to-source code generation
Open source compiler
Build on Open64/SL infrastructure
Support for core language features
Code generation
library-based communication portable ARMCI and
GASNet communication libraries and array
descriptor CHASM library
load/store communication on shared-memory
platforms
Operating systems
Linux IA64/IA32
Alpha Tru64
SGI IRIX64
Interconnects Platforms
Quadrics QSNet (Elan 3), QSNet II (Elan 4)
Myrinet 2000
Ethernet
SGI Altix 3000, SGI Origin 2000

Point-to-point synchronization
sync_notify(p)
sync_wait(p)
Less restrictive memory fences at call site
Collective operations

CHALLENGES
High-performance and good scalability
Programmer productivity
CAF a promising near-term alternative
As expressive as MPI
Simpler to program than MPI
More amenable to compiler optimizations
User has control over performance-critical
factors
MPI a library-based parallel programming model
Portable and widely used
The developer has explicit control over data and
communication placement
Difficult and error prone to program
Most of the burden for communication
optimization falls on application developers
compiler support is underutilized

Current Optimizations

Procedure Splitting
Hints for non-blocking communication
Library-based and load/store communication
Packing of strided communication

Planned Optimizations

Communication vectorization and aggregation
Synchronization strength-reduction
Automatic split-phase communication
Platform-driven communication optimizations
transform communication from one-sided into
two- sided and collective, if useful
multi-model code for hierarchical architectures
convert GETs into PUTs
Multi-buffer co-arrays for asynchrony tolerance
Employ virtualization for latency tolerance
Interoperability with other programming models

CAF Applications and Benchmarks

Sweep3D wave-front parallelism
Spark98 sparse matrix vector multiply
NAS Parallel Benchmarks 2.3 MG, CG, SP, BT, LU
Random Access, STREAM

Neutron transport problem Sweep3D
San Fernando Valley Earthquake Simulation
Spark98
Computational Fluid Dynamics Cluster Platforms
NAS BT C on Itanium2Myrinet 2000
NAS MG C on Itanium2Myrinet 2000
Spark98 on SGI Altix 3000
Sweep3D 1503on Itanium2Quadrics
Mesh
Computational Fluid Dynamics SGI Altix 3000

Sparse matrix vector multiply (sf2 traces)
Performance of all CAF versions is comparable to
that of MPI and better on large number of CPUs
CAF GETs is simple and more natural to code,
but up to 13 slower
Without considering locality, applications do
not scale on NUMA architectures (Hybrid)
ARMCI library is more efficient than MPI

Sweep3D 1503on Itanium2Myrinet
Sweep3D 1503on SGI Altix 3000
NAS BT B on SGI Altix 3000
NAS MG C on SGI Altix 3000
Partitioned Mesh

Write a Comment

User Comments (0)