The Jacquard Programming Environment - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The Jacquard Programming Environment

Description:

Loop nest optimization. Global optimization within a function scope. ... Optimizes loop nests rather than just inner loops, i.e. inverts indices, etc. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 22
Provided by: min6152
Category:

less

Transcript and Presenter's Notes

Title: The Jacquard Programming Environment


1
The Jacquard Programming Environment Mike
Stewart NUG User Training, 10/3/05
2
Outline
  • Compiling and Linking.
  • Optimization.
  • Libraries.
  • Debugging.
  • Porting from Seaborg and other systems.

3
Pathscale Compilers
  • Default compilers Pathscale Fortran 90, C, and
    C.
  • Module path is loaded by default and points to
    the current default version of the Pathscale
    compilers (currently 2.2.1).
  • Other versions available module avail path.
  • Extensive vendor documentation available on-line
    at http//pathscale.com/docs.html.
  • Commercial product well supported and optimized.

4
Compiling Code
  • Compiler invocation
  • No MPI pathf90, pathcc, pathCC.
  • MPI mpif90, mpicc, mpicxx
  • The mpi compiler invocation will use the
    currently loaded compiler version.
  • The mpi and non-mpi compiler invocations have the
    same options and arguments.

5
Compiler Optimization Options
  • 4 numeric levels On where n ranges from 0 (no
    optimization) to 3.
  • Default level -O2 (unlike IBM)
  • g without a O option changes the default to O0.

6
-O1 Optimization
  • Minimal impact on compilation time compared to
    O0 compile.
  • Only optimizations applied to straight line code
    (basic blocks) like instruction scheduling.

7
-O2 Optimization
  • Default when no optimization arguments given.
  • Optimizations that always increase performance.
  • Can significantly increase compilation time.
  • -O2 optimization examples
  • Loop nest optimization.
  • Global optimization within a function scope.
  • 2 passes of instruction scheduling.
  • Dead code elimination.
  • Global register allocation.

8
-O3 Optimization
  • More extensive optimizations that may in some
    cases slow down performance.
  • Optimizes loop nests rather than just inner
    loops, i.e. inverts indices, etc.
  • Safe optimizations produces answers identical
    with those produced by O0.
  • NERSC recommendation based on experiences with
    benchmarks.

9
-Ofast Optimization
  • Equivalent to -O3 -ipa -fno-math-errno
  • -OPTroundoff2Olimit0div_splitONaliastyped.
  • ipa interprocedural analysis.
  • Optimizes across functional boundaries.
  • Must be specified both at compile and link time.
  • Aggressive unsafe optimizations
  • Changes order of evaluation.
  • Deviates from IEEE 754 standard to obtain better
    performance.
  • There are some known problems with this level of
    optimization in the current release, 2.2.1.

10
NAS B Serial Benchmarks Performance (MOP/S)
11
NAS B Serial Benchmarks Compile Times (seconds)
12
NAS B Optimization Arguments Used by LNXI
Benchmarkers
13
NAS C FT (32 Proc)
14
SuperLU MPI Benchmark
  • Based on the SuperLU general purpose library for
    the direct solution of large, sparse,
    nonsymmetric systems of linear equations.
  • Mostly C with some Fortran 90 routines.
  • Run on 64 processors/32 nodes.
  • Uses BLAS routines from ACML.

15
SLU (64 procs)
16
Jacquard Applications Acceptance Benchmarks
17
ACML Library
  • AMD Core Math Library - set of numerical routines
    tuned specifically for AMD64 platform processors.
  • BLAS
  • LAPACK
  • FFT
  • To use with pathscale
  • module load acml (built with pathscale compilers)
  • Compile and link with ACML
  • To use with gcc
  • module load acml_gcc (build with pathscale
    compilers)
  • Compile and link with ACML

18
Matrix Multiply Optimization Example
  • 3 ways to multiply 2 dense matrices
  • Directly in Fortran with nested loops
  • Matmul F90 intrinsic
  • dgemm from ACML
  • Example 2 1000 by 1000 double precision matrices.
  • Order of indices ijk means
  • do i1,n
  • do j1,n
  • do k1,n

19
Fortran Matrix Multiply MFLOPs
20
Debugging
  • Etnus Totalview debugger has been installed on
    the system.
  • Still in testing mode, but it should be available
    to users soon.

21
Porting codes
  • Jacquard is a linux system so gnu tools like
    gmake are the defaults.
  • Pathscale compilers are good, but new, so please
    report any evident compiler bugs to consult.
Write a Comment
User Comments (0)
About PowerShow.com