Combining Shared and Distributed Memory Models Approach and Evolution of the Global Arrays Toolkit - PowerPoint PPT Presentation

About This Presentation
Title:

Combining Shared and Distributed Memory Models Approach and Evolution of the Global Arrays Toolkit

Description:

Combining Shared and Distributed Memory Models. Approach and Evolution of the ... Framework holds the components and compose them into 'applications' ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 23
Provided by: jarekni
Learn more at: https://www.ece.lsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Combining Shared and Distributed Memory Models Approach and Evolution of the Global Arrays Toolkit


1
Combining Shared and Distributed Memory Models
Approach and Evolution of the Global Arrays
Toolkit
  • Jarek Nieplocha
  • Robert Harrison, Manoj Kumar Krishnan
  • Bruce Palmer, Vinod Tipparaju, Harold Trease
  • Pacific Northwest National Laboratory

2
Overview
  • Background
  • Programming Model
  • Core Capabilities
  • Recent Work
  • Conclusions

3
Global address space One-sided communication
collection of address spaces of processes in a
parallel job (address, pid)
4
Global Arrays Data Model
Physically distributed data
  • shared memory model in context of distributed
    dense arrays
  • complete environment for parallel code
    development
  • compatible with MPI
  • data locality control similar to distributed
    memory/message passing model
  • extensible

single, shared data structure/ global indexing
e.g., A(4,3) rather than buf(7) on task 2
5
Global Array Model of Computations
6
Example Matrix Multiply
global arrays representing matrices


ga_acc
ga_get


dgemm
local buffers on the processor
7
Comparison to other models
8
Structure of GA
application interfaces Fortran 77, C, C, Python
distributed arrays layer memory management, index
translation
Message Passing process creation, run-time
environment
ARMCI portable 1-sided communication put,get,
locks, etc
system specific interfaces LAPI, GM/Myrinet,
threads, VIA,..
9
Core Capabilities
  • Distributed array library
  • dense arrays 1-7 dimensions
  • four data types integer, real, double precision,
    double complex
  • global rather than per-task view of data
    structures
  • user control over data distribution regular and
    irregular
  • Collective and shared-memory style operations
  • Interfaces to third party parallel numerical
    libraries
  • PeIGS, Scalapack, SUMMA, Tao (under development)
  • example to solve a linear system using LU
    factorization
  • call ga_lu_solve(g_a, g_b)
  • instead of
  • call pdgetrf(n,m, locA, p, q, dA, ind, info)
  • call pdgetrs(trans, n, mb, locA, p, q, dA,dB,info)

10
Performance
  • Performance model for "shared memory" data access
  • array index translation e.g., 1.2 ?S on
    Linux/PIII
  • overhead in one of more ARMCI put/get/... calls
  • direct mapping to native RMA calls (e.g., 3?S on
    Cray T3E) or
  • simple shared memory access (e.g., 0.3 ?S on
    Linux/PIII) or
  • more complex due to the Active Message style
    implementations e.g. 12 (put) 37 (get) ?S on
    Linux/PIII with Myrinet

11
Applications Areas
electronic structure
biology
glass flow simulation
Visualization and image analysis
thermal flow simulation
material sciences
molecular dynamics
Others financial security forecasting,
astrophysics, geosciences
12
Major Milestones
  • 1994 - 1st public release of GA
  • 1995 - Metacomputing (grid) extensions of GA
  • 1996 - DRA, parallel I/O for GA programs
    developed
  • 1997 - development of ARMCI started
  • 1998 - GA rewritten to use ARMCI
  • 1999 - GA 3.0 released, n-dimensional arrays
  • 2000 - periodic one-sided operations
  • 2001 - support for sparse data management
  • 2002 - ghost cell operations, n-dim DRA

13
Ghost Cells
normal global array
global array with ghost cells
  • Operations
  • NGA_Create_ghosts - creates array with ghosts
    cells
  • GA_Update_ghosts - updates with data from
    adjacent processors
  • NGA_Access_ghosts - provides access to local
    ghost cell elements
  • Embedded Synchronization - controlled by the
    user
  • Multi-protocol implementation to match platform
    characteristics
  • e.g., MPIshared memory on the IBM SP, SHMEM on
    the Cray T3E

14
Update Algorithms
  • Standard algorithm 3D-1 messages
  • Shift algorithm 2D messages

1st phase
2nd phase
15
Disk Resident Arrays
  • Extend GA model to disk
  • system similar to Panda (UIUC) but higher level
    APIs
  • Provide easy transfer of data between N-dim
    arrays stored on disk and stored in memory
  • Use when
  • Arrays too big to store in core
  • checkpoint/restart
  • out-of-core solvers

disk resident array
global array
image processing application
16
Scalable Performance of DRA
SMP node
file systems
I/O buffers
processor
17
Common Component Architecture
  • A component model specifically designed for HPC
  • Three parts Components, Ports and Frameworks
  • Components
  • peers
  • interact through well-defined interfaces (ports)
  • In OO Language a port is a class
  • In Fortran, a port is a bunch of subroutines
  • A component may provide a port - implement the
    class/subroutines
  • Another component may use that port call
    methods in the port
  • Framework holds the components and compose them
    into applications
  • Advantages Reusable functionality, well-defined
    interfaces, etc.

18
Global Array CCA Component
GA Component
Application Component

addProvidesPort(ga) addProvidesPort(dadf)
registerUsesPort(ga) registerUsesPort(dadf)
CCA Services
CCA Services
Port Instance ga Port Class
GlobalArrayPort
Port Instance ga Port Class
GlobalArrayPort
getPort(ga)
Port Instance dadf Port Class DADFPort
Port Instance dadf Port Class DADFPort
getPort(dadf)
19
CCA Elements
Application Component
GA Component
GlobalArrayPort DADFPort
GoPort
Well-known ports CCA Services Well-known ports
CCAFFEINE (CCA Framework) CCAFFEINE (CCA Framework) CCAFFEINE (CCA Framework)
20
GA
  • GA is a C class library for Global Arrays
  • GA classes GAservices, GlobalArray

GAservices
Initialization, Termination, Inter-process
Synchronization, etc
GAservices
Initialization, Termination, Inter-process
Synchronization, etc
GAservices gs gs.initialize() Global Array
gags-gtcreateGA() do work
ga-gtdestroy() gs.terminate()
GAservices
GlobalArray
One-sided(get/put), collective array, Utility
operations
21
Sparse data managment
  • Sparse arrays can be implemented with
  • 1-dimensional global arrays
  • Nonzero elements, row and/or index arrays
  • Set of new operations that follow Thinking
    Machines CMSSL
  • Enumerate
  • Pack/unpack
  • Binning (NxM mapping)
  • 2-key binning/sorting functions
  • Scatter_with_OP, where OP,min,max
  • Segmented_scan_with_OP, where OP,min,max,copy
  • Adopted in NWPhys/NWGrid AMR package
  • Next step - explicit sparse format
  • need more application experience - too many
    degrees of freedom

22
Summary and Future
  • The basic idea proven successful
  • efficient on a wide range of architectures
  • core operations tuned for high performance
  • library substantially extended but all original
    (1994) APIs preserved
  • increasing number of application areas
  • Ongoing and future work
  • Latency hiding on the low-end cluster networks by
    relaxed memory consistency and replication
  • Advanced data structures
  • sparse arrays and hash tables
  • Increased support for the HPC community standards
  • ESI, CCA
Write a Comment
User Comments (0)
About PowerShow.com