Platforms for HPJava: Runtime Support for Scalable Programming in Java - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Platforms for HPJava: Runtime Support for Scalable Programming in Java

Description:

Platforms for HPJava: Runtime Support for Scalable Programming in Java Sang Boem Lim Florida State University slim_at_csit.fsu.edu Contents Overview of HPJava Library ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 53
Provided by: Sang96
Category:

less

Transcript and Presenter's Notes

Title: Platforms for HPJava: Runtime Support for Scalable Programming in Java


1
Platforms for HPJava Runtime Support for
Scalable Programming in Java
  • Sang Boem Lim
  • Florida State University
  • slim_at_csit.fsu.edu

2
Contents
  • Overview of HPJava
  • Library support for HPJava
  • High-level APIs
  • Low-level API
  • Applications and performance
  • Contributions
  • Conclusions

3
Goals
  • Our research is concerned with enabling parallel,
    high-performance computation--in particular
    development of scientific software in the
    network-aware programming language, java.
  • Issues concerned with the implementation of the
    run-time environment underlying HPJava.
  • High-level APIs (e.g. Adlib)
  • Low-level API for underlying communications (e.g.
    mpjdev)
  • Adlib is the first application-level library for
    HPspmd mode.
  • The mpjdev API is a underlying communication
    library to perform actual communications.
  • Implementations of mpjdev
  • mpiJava-based implementation
  • Multithreaded implementation
  • LAPI implementation

4
SPMD Parallel Computing
  • SIMD A single control unit dispatches
    instructions to each processing unit.
  • e.g.) Illiac IV, Connection Machine-2, ect.
  • introduced a new concept, distributed arrays
  • MIMD Each processor is capable of executing a
    different program independent of the other
    processors.
  • asynchronous, flexible, but hard to program
  • e.g.) Cosmic Cube, Cray T3D, IBM SP3, etc.
  • SPMD Each processor executes the same program
    asynchronously. Synchronization takes place only
    when processors need to exchange data.
  • loosely synchronous model (SIMDMIMD)
  • HPF - an extension of Fortran 90 to support the
    data parallel programming model on distributed
    memory parallel computers.

5
Motivation
  • SPMD (Single Program, Multiple Data) programming
    has been very successful for parallel computing.
  • Many higher-level programming environments and
    libraries assume the SPMD style as their basic
    modelScaLAPACK, DAGH, Kelp, Global Array
    Toolkit.
  • But the library-based SPMD approach to
    data-parallel programming lacks the uniformity
    and elegance of HPF.
  • Compared with HPF, creating distributed arrays
    and accessing their local and remote elements is
    clumsy and error-prone.
  • Because the arrays are managed entirely in
    libraries, the compiler offers little support and
    no safety net of compile-time or
    compiler-generated run-time checking.
  • These observations motivate our introduction of
    the HPspmd modeldirect SPMD programming
    supported by additional syntax for HPF-like
    distributed arrays.

6
HPspmd
  • Proposed by Fox, Carpenter, Xiaoming Li around
    1998.
  • Independent processes executing same program,
    sharing elements of distributed arrays described
    by special syntax.
  • Processes operate directly on locally owned
    elements. Explicit communication needed in
    program to permit access to elements owned by
    other processes.
  • Envisaged bindings for base languages like
    Fortran, C, Java, etc.

7
HPJavaOverview
  • Environment for parallel programming.
  • Extends Java by adding some predefined classes
    and some extra syntax for dealing with
    distributed arrays.
  • So far the only implementation of HPspmd model.
  • HPJava program translated to standard Java
    program which calls communication libraries and
    parallel runtime system.

8
HPJava Example
  • Procs p new Procs2(2, 2)
  • on(p)
  • Range x new ExtBlockRange(M, p.dim(0), 1),
    y new ExtBlockRange(N, p.dim(1), 1)
  • float -,- a new float x, y
  • . . . Initialize edge values in a (boundary
    conditions)
  • float -,- b new float x,y, r new
    float x,y // r residuals
  • do
  • Adlib.writeHalo(a)
  • overall (i x for 1 N 2)
  • overall (j y for 1 N 2)
  • float newA 0.25 (ai - 1, j ai
    1, j ai, j - 1 ai, j 1 )
  • r i, j Math.abs(newA a i, j)
  • b i, j newA
  • HPspmd.copy(a, b) // Jacobi relaxation.
  • while(Adlib.maxval(r) gt EPS)

9
Processes and Process Grids
  • An HPJava program is started concurrently in some
    set of processes.
  • Processes named through grid objects
  • Procs p new Procs2 (2, 3)
  • Assumes program currently executing on 6 or more
    processes.
  • Specify execution in a particular process grid by
    on construct
  • on(p)
  • . . .

10
Distributed Arrays in HPJava
  • Many differences between distributed arrays and
    ordinary arrays of Java. New kind of container
    type with special syntax.
  • Type signatures, constructors use double brackets
    to emphasize distinction
  • Procs2 p new Procs2(2, 3)
  • on(p)
  • Range x new BlockRange(M, p.dim(0))
  • Range y new BlockRange(N, p.dim(1))
  • float -,- a new floatx, y
  • . . .

11
2-dimensional array block-distributed over p
p.dim(1)
M N 8
0
1
2
a0,0 a0,1 a0,2 a1,0 a1,1 a1,2 a2,0
a2,1 a2,2 a3,0 a3,1 a3,2
a0,3 a0,4 a0,5 a1,3 a1,4 a1,5 a2,3
a2,4 a2,5 a3,3 a3,4 a3,5
a0,6 a0,7 a1,6 a1,7 a2,6 a2,7 a3,6
a3,7
0
p.dim(0)
a4,0 a4,1 a4,2 a5,0 a5,1 a5,2 a6,0
a6,1 a6,2 a7,0 a7,1 a7,2
a4,3 a4,4 a4,5 a5,3 a5,4 a5,5 a6,3
a6,4 a6,5 a7,3 a7,4 a7,5
a4,6 a4,7 a5,6 a5,7 a6,6 a6,7 a7,6
a7,7
1
12
The Range hierarchy of HPJava
BlockRange
CyclicRange
ExtBlockRange
Range
IrregRange
CollapsedRange
Dimension
13
The overall construct
  • overalla distributed parallel loop
  • General form parameterized by index triplet
  • overall (i x for l u s) . . .
  • i distributed index,
  • l lower bound, u upper bound, s step.
  • In general a subscript used in a distributed
    array element must be a distributed index in the
    array range.

14
Irregular distributed data structures
  • Can be described as distributed array of Java
    arrays.

float - a new float x overall (i
x ) a i new float f(x)
0
1
0
1
2
3
Size 4
Size 2
Size 5
Size 3
15
Library Support for HPJava
16
Historical Adlib I
  • Adlib library was completed in the Parallel
    Compiler Runtime Consortium (PCRC).
  • This version used C as an implementation
    language.
  • Initial emphasis was on High Performance Fortran
    (HPF).
  • Initially Adlib was not meant be user-level
    library. It was called by HPF compiler-generate
    code when HPF translated user application.
  • It was developed on top of portable MPI.
  • Used by two experimental HPF translators (SHPF,
    and PCRC HPF).

17
Historical Adlib II
  • Initially HPJava used a JNI wrapper interface to
    the C kernel of the PCRC library.
  • This version of implementation had limitations
    and disadvantages.
  • Most importantly this version was hard and
    inefficient to support Java object types.
  • It had performance disadvantages because all
    calls to C Adlib should go though JNI calls.
  • It did not provide a set of gather/scatter buffer
    operation to better support HPC applications.

18
Collective Communication Library
  • Java version of Adlib is the first library of its
    kind developed from scratch for application-level
    use in HPspmd model.
  • Borrows many ideas from the PCRC library, but for
    this project we rewrote high-level library for
    Java.
  • It is extended to support Java Object types, to
    target Java based communication platforms and to
    use Java exception handlingmaking it safe for
    Java.
  • Support collective operations on distributed
    arrays described by HPJava syntax.
  • The Java version of the Adlib library is
    developed on top of mpjdev. The mpjdev API can be
    implemented portably on network platforms and
    efficiently on parallel hardware.

19
Java version of Adlib
  • This API intended for an application level
    communication library which is suitable for
    HPJava programming.
  • There are three main families of collective
    operation in Adlib
  • regular collective communications
  • reduction operations
  • irregular communications
  • Complete APIs of Java Adlib have been presented
    in Appendix A of my dissertation.

20
Regular Collective Communications I
  • remap
  • To copy the values of the elements in the source
    array to the corresponding elements in the
    destination array.
  • void remap (T - dst, T - src)
  • T stands as a shorthand for any primitive type or
    Object type of Java.
  • Destination and source must have the same size
    and shape but they can have any, unrelated,
    distribution formats.
  • Can implement a multicast if destination has
    replicated distribution formats.
  • shift
  • void shift (T - dst, T - src, int amount,
    int dimension)
  • implements simpler pattern of communication than
    general remap.

21
Regular Collective Communications II
  • writeHalo
  • void writeHalo (T - a)
  • applied to distributed arrays that have ghost
    regions. It updates those regions.
  • A more general form of writeHalo allows to
    specify that only a subset of the available ghost
    area is to be updated.
  • void writeHalo(T - a, int wlo, int whi, int
    mode)
  • wlo, whi specify the widths at upper and lower
    ends of the bands to be update.

22
Solution of Laplace equation using ghost regions
1
0
  • Range x new ExtBlockRange(M, p.dim(0), 1)
  • Range y new ExtBlockRange(N, p.dim(1), 1)
  • float -,- a new float x, y
  • . . . Initialize values in a
  • float -,- b new float x,y, r new
    float x,y
  • do
  • Adlib.writeHalo(a)
  • overall (i x for 1 N 2)
  • overall (j y for 1 N 2)
  • float newA 0.25 (ai - 1, j ai 1,
    j
  • ai, j - 1
    ai, j 1 )
  • r i, j Math.abs(newA a i, j)
  • b i, j newA
  • HPspmd.copy(a, b)
  • while(Adlib.maxval(r) gt EPS)

a0,0 a0,1 a0,2 a1,0 a1,1 a1,2
a2,0 a2,1 a2,2
a0,1 a0,2 a0,3 a1,1 a1,2
a1,3 a2,1 a2,2 a2,3
0
a3,0 a3,1 a3,2
a3,1 a3,2 a3,3
a2,0 a2,1 a2,2
a2,1 a2,2 a2,3
1
a3,0 a3,1 a3,2 a4,0 a4,1 a4,1
a5,0 a5,1 a5,2
a3,1 a3,2 a3,3 a4,1 a4,2 a4,3
a5,1 a5,2 a5,3
23
Illustration of the effect of executing the
writeHalo function
Physical Segment Of array
Declared ghost Region of array segment
Ghost area written By writeHalo
24
Other features of Adlib
  • Provide reduction operations (e.g. maxval() and
    sum()) and irregular communications (e.g.
    gather() and scatter()).
  • Complete API and implementation issues are
    described in depth in my dissertation.

25
Other High-level APIs
  • Java Grande Message-Passing Working Group
  • formed as a subset of the existing Concurrency
    and Applications working group of Java Grande
    Forum.
  • Discussion of a common API for MPI-like Java
    libraries.
  • To avoid confusion with standards published by
    the original MPI Forum the API was called MPJ.
  • java-mpi mailing list has about 195 subscribers.

26
mpiJava
  • Implements a Java API for MPI suggested in late
    97.
  • mpiJava is currently implemented as Java
    interface to an underlying MPI implementationsuch
    as MPICH or some other native MPI
    implementation.
  • The interface between mpiJava and the underlying
    MPI implementation is via the Java Native
    Interface (JNI).
  • This software is available from
  • http//www.hpjava.org/mpiJava.html
  • Around 1465 people downloaded this software.

27
Low-level API
  • One area of research is how to transfer data
    between the Java program and the network while
    reducing overheads of the Java Native Interface.
  • Should do this Portably on network platforms and
    efficiently on parallel hardware.
  • We developed a low-level Java API for HPC message
    passing, called mpjdev.
  • The mpjdev API is a device level communication
    library. This library is developed with HPJava in
    mind, but it is a standalone library and could be
    used by other systems.

28
mpjdev I
  • Meant for library developer.
  • Application level communication libraries like
    Java version of Adlib (or potentially MPJ) can be
    implemented on top of mpjdev.
  • API for mpjdev is small compared to MPI (only
    includes point-to-point communications)
  • Blocking mode (like MPI_SEND, MPI_RECV)
  • Non-blocking mode (like MPI_ISEND, MPI_IRECV)
  • The sophisticated data types of MPI are omitted.
  • provide a flexible suit of operations for copying
    data to and from the buffer. (like gather- and
    scatter-style operations.)
  • Buffer handling has similarity to JDK 1.4 new I/O.

29
mpjdev II
  • mpjdev could be implemented on top of Java
    sockets in a portable network implementation,
    oron HPC platformsthrough a JNI interface to a
    subset of MPI.
  • Currently there are three different
    implementations.
  • The initial version was targeted to HPC
    platforms, through a JNI interface to a subset of
    MPI.
  • For SMPs, and for debugging on a single
    processor, we implemented a pure-Java,
    multithreaded version.
  • We also developed a more system-specific mpjdev
    built on the IBM SP system using LAPI.
  • A Java sockets version which will provide a more
    portable network implementation and will be added
    in the future.

30
HPJava communication layers
Other application- level API
MPJ and
Java version of Adlib
mpjdev
Pure Java
Native MPI
SMPs or Networks of PCs
Parallel Hardware (e.g. IBM SP3, Sun HPC)
31
mpiJava-based Implementation
  • Assumes C binding of native method calls to MPI
    from mpiJava as basic communication protocol.
  • Can be divided into two parts.
  • Java APIs (Buffer and Comm classes) which are
    used to call native methods via JNI.
  • C native methods that construct the message
    vector and perform communication.
  • For elements of Object type, the serialized data
    are stored into a Java byte array.
  • copying into the existing message vector if it
    has space to hold serialized data array.
  • or using separate send if the original message
    vector is not large enough.

32
Multithreaded Implementation
  • The processes of an HPJava program are mapped to
    the Java threads of a single JVM.
  • This allows to debug and demonstrate HPJava
    programs without facing the ordeal of installing
    MPI or running on a network.
  • As a by-product, it also means we can run HPJava
    programs on shared memory parallel computers.
  • e.g. high-end UNIX servers
  • Java threads of modern JVMs are usually executed
    in parallel on this kinds machines.

33
LAPI Implementation
  • The Low-level Application Programming Interface
    (LAPI) is a low level communication interface for
    the IBM Scalable Powerparallel (SP) supercomputer
    Switch.
  • This switch provides scalable high performance
    communication between SP nodes.
  • LAPI functions can be divided into three
    different characteristic groups.
  • Active message infrastructure allows programmers
    to write and install their own set of handlers.
  • Two Remote Memory Copy (RMC) interfaces.
  • PUT operation copies data from the address space
    of the origin process into the address space of
    the target process.
  • GET operation opposite of the PUT operation.
  • We produced two different implementations of
    mpjdev using LAPI.
  • Active message function (LAPI_Amsend) GET
    operation (LAPI_Get)
  • Active message function (LAPI_Amsend)

34
LAPI Implementation Active message and GET
operation
35
LAPI Implementation Active message
36
Applications and Performance
37
Environments
  • System IBM SP3 supercomputing system with AIX
    4.3.3 operating system and 42 nodes.
  • CPU A node has four processors (Power3 375 MHZ)
    and 2 gigabytes of shared memory.
  • Network MPI Setting Shared css0 adapter with
    User Space (US) communication mode.
  • Java VM IBMs JIT
  • Java Compiler IBM J2RE 1.3.1 with -O option.
  • HPF Compiler IBM xlhpf95 with -qhot and -O3
    options.
  • Fortran 95 Compiler IBM xlf95 with -O5 option.

38
  • HPJava can out-perform sequential Java by up to
    17 times.
  • On 36 processors HPJava can get about 79 of the
    performance of HPF.

39
(No Transcript)
40
Multigrid
  • The multigrids method is a fast algorithm for
    solution of linear and nonlinear problems. It
    uses hierarchy grids with restrict and
    interpolate operations between current grids
    (fine grid) and restricted grids (coarse grid).
  • General stratagem is
  • make the error smooth by performing a relaxation
    method.
  • restricting a smoothed version of the error term
    to a coarse grid, computing a correction term on
    the coarse grid, then interpolating this
    correction back to the original fine grid.
  • Perform some step of the relaxation method again
    to improve the original approximation to the
    solution.

41
  • Speedup is relatively modest. This seems to be
    due to the complex pattern of communication in
    this algorithm.

42
Speedup of HPJava Benchmarks
Multigrid Solver Multigrid Solver Multigrid Solver Multigrid Solver Multigrid Solver Multigrid Solver
Processors Processors Processors 2 3 4 6 9 9 9 9
5122 5122 5122 1.90 2.29 2.39 2.96 3.03 3.03 3.03 3.03
2D Laplace Equation 2D Laplace Equation 2D Laplace Equation 2D Laplace Equation 2D Laplace Equation 2D Laplace Equation
Processors Processors Processors 4 9 16 25 36 36 36 36
2562 2562 2562 2.67 3.73 4.67 6.22 6.22 6.22 6.22 6.22
5122 5122 5122 4.03 7.70 10.58 12.09 16.93 16.93 16.93 16.93
10242 10242 10242 4.41 8.82 13.40 19.71 25.77 25.77 25.77 25.77
3D Diffusion Equation 3D Diffusion Equation 3D Diffusion Equation 3D Diffusion Equation 3D Diffusion Equation 3D Diffusion Equation
Processors Processors Processors 2 4 8 16 32 32 32 32
323 323 323 1.58 2.72 3.45 4.75 5.43 5.43 5.43 5.43
643 643 643 1.67 3.00 4.85 7.47 8.92 8.92 8.92 8.92
1283 1283 1283 1.80 3.31 5.76 9.98 13.88 13.88 13.88 13.88
43
HPJava with GUI
  • Illustrate how our HPJava can be used with a Java
    graphical user interface.
  • The Java multithreaded implementation of mpjdev
    makes it possible for HPJava to cooperate with
    Java AWT.
  • For test and demonstration of multithreaded
    version of mpjdev, We implemented computational
    fluid dynamics (CFD) code using HPJava.
  • Illustrates usage of Java object in our
    communication library.
  • You can view this demonstration and source code
    at http//www.hpjava.org/demo.html

44
(No Transcript)
45
  • Removed the graphical part of the CFD code and
    did performance tests on the computational part
    only.
  • Changed a 2 dimensional Java object distributed
    array into a 3 dimensional double distributed
    array to eliminate object serialization overhead.
  • Using HPC implementation of underlying
    communication to run the code on an SP.

46
LAPI mpjdev Performance
  • We found that current version of Java thread
    synchronization is not implemented with high
    performance.
  • The Java thread consumes more then five times a
    long as POSIX thread, to perform wait and awake
    thread function calls.
  • 57.49 microseconds (Java thread) vs. 10.68
    microseconds (POSIX)
  • This result suggests we should look for a new
    architectural design for mpjdev using LAPI.
  • Consider using POSIX threads by calling JNI to
    the C instead of Java threads.

47
(No Transcript)
48
Contributions
49
Contributions I
  • HPJava
  • My main contributions to HPJava was to develop
    runtime communication libraries.
  • Java version of Adlib has been developed as
    application-level communication library suitable
    for data parallel programming in Java.
  • The mpjdev API has been developed as device level
    communication library.
  • Other contributions The Type-Analyzer for
    analyzing byte classes hierarchy, some part of
    the type-checker, and Pre-translator of HPJava
    was developed.
  • Some Applications of HPJava and full test codes
    of Adlib and mpjdev were developed.

50
Contributions II
  • mpiJava
  • Main contribution to mpiJava project was to add
    support for direct communication of Java objects
    via serialization.
  • Complete set of test cases of mpiJava for Java
    object types were developed.
  • Maintaining mpiJava.

51
Conclusions I
  • We have discussed in detail the design and
    development of high-level and low-level runtime
    communication libraries for HPJava.
  • The Adlib API is presented as high-level
    communication library. This API is intended as an
    example of an application communication library
    suitable for data parallel programming in Java.
  • fully supports Java object types, as part of the
    basic data types.
  • The API and usage of collective communications.
  • based on low-level communication library called
    mpjdev.
  • The mpjdev API is a device level communication
    library. This library is developed with HPJava in
    mind, but it is a standalone library and could be
    used by other systems. Three different
    implementations are
  • mpiJava-based implementation
  • Multithreaded implementation
  • LAPI implementation

52
Conclusion II
  • Some benchmark results were presented.
  • We got good performance on simple applications
    without any serious optimization. For example
  • HPJava can out-perform sequential Java by up to
    17 times.
  • On large number of processors HPJava can get
    about 79 of the performance of HPF.
  • Laplace equation with size of 10242 got about 25
    times speed up on 36 processors .
Write a Comment
User Comments (0)
About PowerShow.com