Data Parallel SPMD Programming Environments: Fortran to Java - PowerPoint PPT Presentation


PPT – Data Parallel SPMD Programming Environments: Fortran to Java PowerPoint presentation | free to download - id: 2d7c0-MzRhO


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Data Parallel SPMD Programming Environments: Fortran to Java


... hyphen, - or a single asterisk, *, the term bras is a string of zero or more bracket pairs, ... T bras. A distributed array type is not treated as a class type ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 37
Provided by: hank5


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Parallel SPMD Programming Environments: Fortran to Java

Data Parallel SPMD Programming Environments
Fortran to Java
  • Han-Ku Lee
  • Department of Computer Science
  • Florida State University

  • Background - historical review of data-parallel
    languages, message-passing frameworks, and
    high-level libraries for distributed arrays
  • HPspmd programming language model HPJava
  • The compilation strategies for HPJava
  • Related systems
  • Conclusions

  • This work was supported in part by the National
    Science Foundation (NSF ) Division of Advanced
    Computational Infrastructure and Research
  • Contract number 9872125

Research Objectives
  • Data-parallel programming and languages have
    played a major role in high-performance computing
  • HPF difficult (compilation)
  • Library-based lower-level SPMD programming
  • HPspmd programming language model a flexible
    hybrid of HPF-like data-parallel language and the
    popular, library-oriented, SPMD style
  • Base-language for HPspmd model should be clean
    and simple object semantics, cross-platform
    portability, security, and popular Java
  • To power up Java in data-parallel SPMD environment

Data Parallel Languages
  • Large data-structures, typically arrays, are
    split across nodes
  • Each node performs similar computations on a
    different part of the data structure
  • SIMD Illiac IV and ICL DAP introduced a new
    concept, distributed arrays
  • MIMD asynchronous, flexible, hard to program
  • SPMD loosely synchronous model (SIMDMIMD)
  • Each node has its own local copy of program

HPF (High Performance Fortran)
  • By early 90s, value of portable, standardized
    languages universally acknowledged.
  • Goal of HPF Forum a single language for High
    Performance programming. Effective across
    architecturesvector, SIMD, MIMD, though SPMD a
  • HPF - an extension of Fortran 90 to support the
    data parallel programming model on distributed
    memory parallel computers
  • Supported by Cray, DEC, Fujitsu, HP, IBM, Intel,
    Maspar, Meiko, nCube, Sun, and Thinking Machines

Ideal data distribution
  • Multi-processing and data distribution
    communication and load-balance
  • Introduced processor arrangement and Templates
  • Data Alignment

Message-passing for HPC
  • Processes explicitly communicate through messages
    on some classes of parallel machines with
    distributed memory
  • Early Message-Passing Frameworks p4, PARMACS,
    PVM, and Express
  • Message Passing Interface Forum established a
    standard API for message-passing library routines
  • Portability and scalability

High Level Libraries for Distributed Arrays
  • Distributed Array a collective object shared by
    a number of processes
  • PARTI, The Global Array (GA) Toolkit
  • Adlib
  • high-level runtime library, designed to support
    translation of data-parallel languages
  • Implemented 1994 in the shpf project at
    Southampton University and much improved during
    the Parallel Compiler Runtime Consortium (PCRC)
    project at Syracuse University
  • Initially invented for HPF
  • Currently used in the HPJava project at the
    Florida State University and Indiana University

  • Built-in model of distributed arrays and
  • Equivalent to HPF 1.0 model, plus ghost
    extensions and general block distribution from
    HPF 2.0
  • Collective communication library.
  • Direct support for array section assignments,
    ghost region updates, F90 array intrinsics,
    general gather/scatter.
  • Implemented on top of MPI.
  • Adlib kernel implemented in C.
  • Object-based distributed array descriptor (DAD)
  • Interfaces shpf Fortran interface, PCRC Fortran
    interface, ad interface, and HPJava interface

Features of HPJava
  • A language for parallel programming, especially
    suitable for massively parallel, distributed
    memory computers.
  • Takes various ideas from High Performance
  • HPJava has a distributed array model very similar
    to the HPF model.
  • Almost identical set of distribution and
    alignment options.
  • In other respects of HPJava is a lower level
    parallel programming language than HPF.
  • Programming model is explicit SPMD, needing
    explicit calls to communication libraries such as
  • The HPJava system is built on Java technology.
  • The HPJava programming language is an extension
    of the Java programming language.

Benefits of HPspmd Model
  • Translators are much easier to implement than HPF
    compilers. No compiler magic needed
  • Attractive framework for library development,
    avoiding inconsistent parameterizations of
    distributed array arguments
  • Better prospects for handling irregular problems
    easier to fall back on specialized libraries as
  • Can directly call MPI functions from within an
    HPspmd program

HPspmd Architecture
Multidimensional Arrays
  • Java is an attractive language, but needs to be
    improved for large computational tasks
  • Java provides an array of arrays gt disadvantage
  • Time consumption for out-of bounds checking
  • The ability to alias rows of an array
  • The cost of accessing an element
  • HPJava introduces true multidimensional arrays
    and regular section
  • For example
  • int , a new int 5, 5
  • for (int i0 ilt4 i) a i, i1
  • foo ( a , 0 )

  • Proces2 p new Procs(2, 3)
  • on (p)
  • Range x new BlockRange(N, p.dim(0))
  • Range y new BlockRange(N, p.dim(1))
  • float -,- a new float x, y
  • float -,- b new float x, y
  • float -,- c new float x, y
  • initialize a, b
  • overall (ix for )
  • overall (jy for )
  • c i, j a i, j b i, j
  • An HPJava program is concurrently started on all
    members of some process collection process
  • on construct limits control to the active process
    group (APG), p
  • The class BlockRange is a subclass of Range,
    representing an index range block-distributed
    over the process dimension passed to its

Distributed arrays
  • The most important feature of HPJava
  • A collective object shared by a number of
  • Elements of a distributed array are distributed
  • True multidimensional array
  • Forms a regular section of an array
  • When N 8 in the previous example code, the
    distributed array, a is distributed like

Overall constructoverall (i x for l u s)
  • A distributed parallel loop
  • i distributed index whose value is Location,
    which is a particular element of a particular
    distributed range
  • Index triplet represents a lower bound, an upper
    bound, and a step all of which are integer
  • The step is optional the default step is 1
  • The lower bound may be omitted the default is 0
  • The upper bound may be omitted the default is
  • An HPJava range object gt a collection of
  • With a few exception, the subscript of a
    distributed array must be a distributed index,
    and the location should be an element of the
    range associated with the array dimension
  • This restriction is an important feature,
    ensuring that referenced array elements are
    locally held

At constructat (i x 4)
  • HPJava defines a distributed index when we want
    to update or access a single element of a
    distributed array rather than accessing a whole
    set of elements in parallel
  • When we want to update a 1, 4
  • float -,- a new float x, y
  • // a 1, 4 19 lt---- Not allowed
    since 1 and 4 are not distributed indices,
  • //
    therefore, not legal subscripts
  • at (i x 1)
  • at (j y4)
  • a i, j 19
  • The operational semantics of at construct is
    similar to that of on construct
  • i - the back quote symbol is used as a postfix
    operator on a distributed index

Distribution format
  • HPJava provides further distribution formats for
    dimensions of distributed arrays without further
    extensions to the syntax
  • Instead, the Range class hierarchy is extended
  • BlockRange, CyclicRange, IrregRange, Dimension
  • ExtBlockRange a BlockRange distribution
    extended with ghost regions
  • CollapsedRange a range that is not distributed,
    i.e. all elements of the range mapped to a single

Ghost regions
  • Ghost region extra space around the edges of
    the locally held block of distributed array
  • These extra space can cache some of the element
    values properly belonging to adjacent processors
  • With ghost regions, the inner loop of algorithms
    for stencil updates can be written in a simple
    way, since the edges of the block dont need
    special treatment in accessing neighboring
  • Shifted indices can locate the proper values
    cached in the ghost region
  • e.g. a i, j1

Array Sections
  • HPJava supports subarrays modeled on the array
    sections of Fortran 90
  • Whereas an element reference is a variable, an
    array section is an expression that represents a
    new distributed array object
  • The new array section is a subset of the elements
    of the parent array
  • Triplet subscript
  • The rank of an array section is equal to the
    number of triplet subscripts
  • e.g. float -,- a new float x, y
  • float - b a 0,
  • Subrange the range of an array section
  • e.g. Range u x 0 N-1 2

Distributed Array Type
  • Type signature of a distributed array
  • T attr0, , attrR-1 bras
  • where R is the rank of the array and each
    term attrr is either a single hyphen, - or a
    single asterisk, , the term bras is a string of
    zero or more bracket pairs,
  • T can be any Java type other than an array type.
    This signature represents the type of a
    distributed array whose elements have Java type
  • T bras
  • A distributed array type is not treated as a
    class type
  • It means that a distributed array cannot be an
    element of an ordinary Java array, nor can a
    distributed array reference be stored in a
    standard library class like Vector, which expects
    an Object
  • If we say distributed arrays have a class, it
    would commits us to either extending the
    definition of class in Java language, or creating
    genuine Java classes for each type of HPJava
    array that might be need impractical

HPspmd classes and APG
  • HPJava translator try to distinguish HPJava code
    from Java code
  • It introduces a special interface,
    hpjava.lang.HPspmd, which must be implemented by
    any class that uses the special syntax
  • An HPspmd class is a class that implements the
    hpjava.lang.HPspmd interface. Any other class is
    a non-HPspmd class
  • Many of the special operations in HPJava rely on
    the active process group the APG
  • APG is changed during the course of the program
    as distributed control constructs limit control
    to different subsets of the processors
  • In the current HPJava translator, the value of
    APG is passed as a hidden argument to methods and
    constructors of HPspmd classes (like this

Basic Translation Scheme
  • The HPJava system is not exactly a high-level
    parallel programming language more like a tool
    to assist programmers generate SPMD parallel code
  • This suggests the transformations the system
    applies should be relatively simple and
    well-documented, so programmers can exploit the
    tool more effectively
  • We dont expect the generated code to be human
    readable or modifiable, but at least the
    programmer should be able to work out what is
    going on
  • The HPJava specification defines the basic
    translation scheme as a series of schema

Translation of a distributed array declaration
  • Source T attr0, ,
    attrR-1 a
  • ArrayBase a bas
    (attr0) a 0
    (attrR-1) a R-1
  • where DIMENSION_TYPE (attrr) ArrayDim if attrr
    is a hyphen, or
  • DIMENSION_TYPE (attrr) SeqArrayDim if
    attrr is a asterisk
  • e.g.
  • float -, var ? float

  • ArrayBase var__bas

  • ArrayDim var__0

  • SeqArrayDim var__1

Translation of the overall construct
  • SOURCE overall (i x for e lo e
    hi e stp) S
  • TRANSLATION Block b x.localBlock(T e
    lo, T e hi, T e stp)
  • Group p ((Group)
  • for (int l 0 l lt
    b.count l )
  • int sub
    b.sub_bas b.sub_stp l
  • int glb
    b.glb_bas b.glb_stp l
  • T S p
  • where i is an index name in the source
  • x is a simple expression in the
    source program,
  • e lo, e hi, and e stp are
    expressions in the source,
  • S is a statement in the source
    program, and
  • b, p, l, sub and glb are names
    of new variables

Important features of translation scheme
  • From the last slide, the basic translation scheme
    reduces overall constructs to simple local for
  • Inside these loops, the only overheads relative
    to hand-coded local for loops is a proliferation
    of references to fields of simple classes like
    Block and ArrayDim
  • These things can easily be lifted outside loops,
    strength reduction optimizations can be applied
    to the local subscript expressions, loops can be
    unrolled, remove redundant checks (run-time
    checks) etc
  • These things can all be done easily by a slightly
    more optimized form of the translator

Optimization Strategies
  • Here we only consider strength reduction
    optimizations on the index expression
  • Consider the nested overall and loop constructs
  • overall (ix for )
  • overall (jy for )
  • float sum 0
  • for (int k0 kltN k)
  • sum a i, k b k, j
  • c i, j sum

A correct but naive translation
  • Block bi x.localBlock()
  • for (int lx 0 lxltbi.count lx )
  • Block bj y.localBlock()
  • for (int ly 0 lyltbj.count ly )
  • float sum 0
  • for (int k 0 kltN k )
  • sum a.dat() a.bas() (bi.sub_bas
    bi.sub_stp lx) a.str(0)
  • k a.str(1)
  • b.dat() b.bas()
    (bj.sub_bas bj.sub_stp ly) b.str(1)
  • k b.str(0)
  • c.dat() c.bas() (bi.sub_bas bi.sub_stp
    lx) c.str(0)
  • (bj.sub_bas
    bj.sub_stp ly) c.str(1) sum

Strength-Reduction Optimization
  • The complexity of the associated terms in the
    subscript expressions
  • The subscript expressions can be greatly
    simplified by application of strength-reduction
  • Eliminate complicated expressions involving
    multiplication from expressions in inner loops by
    introducing the induction variables
  • vai_ a.bas() (bi.sub_bas
    bi.sub_stp lx) a.str(0)
  • vci_ c.bas() (bi.sub_bas
    bi.sub_stp lx) c.str(0)
  • vb_j b.bas() (bj.sub_bas
    bj.sub_stp ly) b.str(1)
  • vcij c.bas() (bj.sub_bas
    bj.sub_stp ly) c.str(0)
  • bj.sub_bas
    bj.sub_stp ly) c.str(1)
  • Which can be computed efficiently by increasing
    at suitable points with the induction increments
  • sia0 bi.sub_stp a. str(0)
    sic0 bi.sub_stp c. str(0)
  • sjb0 bj.sub_stp b. str(1)
    sjc1 bj.sub_stp c. str(1)

  • Translation of overall after applying strength
    reduction to distributed index subscript
  • Block bi x.localBlock()
  • int vai_ a.bas() bi.sub_bas a.str(0)
  • int vci_ c.bas() bi.sub_bas c.str(0)
  • final int sia0 bi.sub_stp a.str(0),
    sic0 bi.sub_stp c.str(0)
  • for (int lx 0 lx lt bi.count lx )
  • Block bj y.localBlock()
  • int vb_j b.bas() bj.sub_bas
  • int vcij vci_ bj.sub_bas
  • final int sjb1 bj.sub_stp b.str(1),
    sjc1 bj.sub_stp c.str(1)
  • for (int ly 0 ly lt bj.count ly )
  • float sum 0
  • for (int k 0 k lt N k )
  • sum a.dat() vai_ k a.str(1)
    b.dat() vb_j k b.str(0)
  • c.dat() vcij sum
  • vb_j sia0 vcij sjc1

Related Systems (1)
  • Co-Array Fortran (formerly called F--)
  • A simple and small set of extensions to Fortran
    95 for SPMD processing
  • The logical model of communication is built-in
  • HPJava follows MPI philosophy i.e. no
    communication primitives
  • ZPL
  • An array programming language designed from first
    principles for fast execution on both sequential
    and parallel computers
  • A A B (where A and B are two
    dimensional arrays)
  • Parallelism and communication is more implicit
    than HPJava
  • HPJava provides lower-level access to parallel
    machine using mpiJava

Related Systems (2)
  • Spar
  • A Java-based programming language for
    semi-automatic array-parallel programming
  • Multidimensional arrays, array sections and
    parallel loop
  • Similar in syntax, but semantically different to
  • Suitable to shared memory computing systems
  • HPJava targets massively parallel distributed
    memory computing
  • A parallel C library designed as a super set of
    the ANSI C STL and executed on uni- or multi-
    processors for SPMD programming
  • While STAPL and HPJava share a SPMD programming
    model, HPJava is more naturally suited to
    distributed memory systems since it is using the
    philosophy of distributed arrays

Java Performances
  • Benchmarked on Linux Red Hats 7.2 (Pentium IV 1.5
  • Linpack
  • Compared Java with GNU cc and Fortran77
  • Seems like we dont need loop unrolling for Java

Why Fortran is slower than C ?
  • Could say performance of Fortran and C are same
  • But, depends upon compilers
  • GNU Fortran 77 compiler generates more machine
    codes than GNU cc compiler does for main loop in

  • Historical review of data-parallel languages such
    as HPF
  • Message-passing frameworks p4, PARMACS, PVM and
    MPI standard
  • High-level libraries for distributed arrays
    PARTI, GA and Adlib
  • HPspmd programming language model SPMD
    framework for using libraries based on
    distributed arrays
  • Specific syntax, new control constructs, basic
    translation schemes, and basic optimization
    strategies for HPJava
  • Related systems Co-Array Fortran, ZPL, Spar,
    and STAPL
  • Current status of HPJava
  • Collaborated with Bryan Carpenter, Geoffrey Fox,
    Guansong Zhang, Sang Lim and Zheng Qiang
  • The first fully functional HPJava translator
    (written in Java) is now operational
  • Parser JavaCC and JTB tools
  • Has been tested and debugged against small test
    suite and 800-line multigrid code
  • Next stage implement the optimization