Title: Challenges in Performance Evaluation and Improvement of Scientific Codes
1Challenges in Performance Evaluation and
Improvement of Scientific Codes
- Boyana Norris
- Argonne National Laboratory
- http//www.mcs.anl.gov/norris
- Ivana Veljkovic
- Pennsylvania State University
2Outline
- Performance evaluation challenges
- Component-based approach
- Motivating example adaptive linear system
solution - A component infrastructure for performance
monitoring and adaptation of applications - Summary and future work
3Acknowledgments
- Ivana Veljkovic, Padma Raghavan (Penn State)
- Sanjukta Bhowmick (ANL/Columbia)
- Lois Curfman McInnes (ANL)
- TAU developers (U. Oregon)
- PERC members
- Sponsor DOE and NSF
4Challenges in performance evaluation
- Many tools for performance data gathering and
analysis - PAPI, TAU, SvPablo, Kojak,
- Various interfaces, levels of automation, and
approaches to information presentation - Users point of view
- What do the different tools do? Which is most
appropriate for a given application? - (How) can multiple tools be used in concert?
- I have tons of performance data, now what?
- What automatic tuning tools are available, what
exactly do they do? - How hard is it to install/learn/use tool X?
- Is instrumented code portable? Whats the
overhead of instrumentation? How does code
evolution affect the performance analysis process?
5Incomplete list of tools
- Source instrumentation TAU/PDT, KOJAK
(MPI/OpenMP), SvPablo, Performance Assertions, - Binary instrumentation HPCToolkit, Paradyn,
DyninstAPI, - Performance monitoring MetaSim Tracer (memory),
PAPI, HPCToolkit, Sigma (memory), DPOMP
(OpenMP), mpiP, gprof, psrun, - Modeling/analysis/prediction MetaSim Convolver
(memory), DIMEMAS(network), SvPablo
(scalability), Paradyn, Sigma, - Source/binary optimization Automated Empirical
Optimization of Software (ATLAS), OSKI, ROSE - Runtime adaptation ActiveHarmony, SALSA
6Incomplete list of tools
- Source instrumentation TAU/PDT, KOJAK
(MPI/OpenMP), SvPablo, Performance Assertions, - Binary instrumentation HPCToolkit, Paradyn,
DyninstAPI, - Performance monitoring MetaSim Tracer (memory),
PAPI, HPCToolkit, Sigma (memory), DPOMP
(OpenMP), mpiP, gprof, psrun, - Modeling/analysis/prediction MetaSim Convolver
(memory), DIMEMAS(network), SvPablo
(scalability), Paradyn, Sigma, - Source/binary optimization Automated Empirical
Optimization of Software (ATLAS), OSKI, ROSE - Runtime adaptation ActiveHarmony, SALSA
7Challenges (where is the complexity?)
- More effective use ? integration
- Tool developers perspective
- Overhead of initially implementing one-to-one
interoperabilty - Managing dependencies on other tools
- Maintaining interoperabilty as different tools
evolve - Individual Scientist Perspective
- Learning curve for performance tools ? less time
to focus on own research (modeling, physics,
mathematics) - Potentially significant time investment needed to
find out whether/how using someone elses tool
would improve performance ? tend to do own
hand-coded optimizations (time-consuming,
non-reusable) - Lack of tools that automate (at least partially)
algorithm discovery, assembly, configuration, and
enable runtime adaptivity
8What can be done
- How to manage complexity? Provide
- Performance tools that are truly interoperable
- Uniform easy access to tools
- Component implementations of software, esp.
supporting numerical codes, such as linear
algebra algorithms - New algorithms (e.g., interactive/dynamic
techniques, algorithm composition) - Implementation approach components, both for
tools and the application software
9What is being done
- No integrated environment for performance
monitoring, analysis, and optimization - Most past efforts
- One-to-one tool interoperability
- More recently
- OSPAT (initial meeting at SC04), focus on common
data representation and interfaces - Tool-independent performance databases PerfDMF
- Eclipse parallel tools project (LANL)
10OSPAT
- The following areas were recommended for OSPAT to
investigate - A common instrumentation API for source level,
compiler level, library level, binary
instrumentation - A common probe interface for routine entry and
exit events - A common profile database schema
- An API to walk the callstack and examine the heap
memory - A common API for thread creation and fork
interface - Visualization components for drawing histograms
and hierarchical displays typically used by
performance tools
11Components
- Working definition a component is a piece of
software that can be composed with other
components within a framework composition can be
either static (at link time) or dynamic (at run
time) - plug-and-play model for building applications
- For more info C. Szyperski, Component Software
Beyond Object-Oriented Programming, ACM Press,
New York, 1998 - Components enable
- Tool interoperability
- Automation of performance instrumentation/monitori
ng - Application adaptivity (automated or user-guided)
12Example component infrastructure for multimethod
linear solvers
- Goal provide a framework for
- Performance monitoring of numerical components
- Dynamic adaptativity, based on
- Off-line analyses of past performance information
- Online analysis of current execution performance
information - Motivating application examples
- Driven cavity flow Coffey et al, 2003,
nonlinear PDE solution - FUN3D incompressible and compressible Euler
equations - Prior work in multimethod linear solvers
- McInnes et al, 03, Bhowmick et al,03 and 05,
Norris at al. 05.
13Example driven cavity flow
- Linear solver GMRES(30), vary only fill level of
ILU preconditioner - Adaptive heuristic based on
- Previous linear solution convergence rate,
nonlinear solution convergence rate, rate of
increase of linear solution iterations - 96x96 mesh, Grashof 105, lid velocity 100
- Intel P4 Xeon, dual 2.2 GHz, 4GB RAM
14Example Compressible PETSc-FUN3D
- Finite volume discretization, variable order Roe
scheme on a tetrahedral, vertex-centered mesh - Initial discretization first-order scheme
switch to second-order after shock position has
settled down - Large sparse linear system solution takes
approximately 72 of overall solution time
Original FUN3D developer W.K. Anderson et al.,
NASA Langley Image Dinesh Kaushik
15PETSc-FUN3d, cont.
- A3 Nonsequence-based adaptive strategy based on
polynomial interpolation Bhowmick et al., 05 - A3 vs base method time 1 slowdown - 32
improvement - Hand-tuned adaptive vs base method time 7 - 42
improvement
16Component architecture
Off-line analysis
PerfDMF
Runtime DB
extract
extract
insert
Metadata extractor
Checkpoint
TAU
query
extract
checkpoint
Monitor
adapt request
start, stop, trigger
Experiment
adapt algorithm, parameters
17Future work
- Integration of ongoing efforts in
- Performance tools common interfaces and data
represenation (leverage OSPAT, PerfDMF, TAU
performance interfaces, and similar efforts) - Numerical components emerging common interfaces
(e.g., TOPS solver interfaces) increase choice of
solution method ? automated composition and
adaptation strategies - Long term
- Is a more organized (but not too restrictive)
environment for scientific software lifecycle
development possible/desirable?
18Typical application development cycle
Configure, make,
Compilation, Linking
Ext. dependencies, Version control
Debugging
Implementation
Testing
Performance evaluation
Deployment
Design
Performance tools
Production Execution
Job management, Results
19Future work
- Beyond components
- Work flow
- Reproducible results associate all necessary
information for reproducing particular
application instance - Ontology of tools and tools to guide selection
and use
20Summary
- No shortage of performance evaluation, analysis,
and optimization technology (and new capabilities
are continuously added) - Little shared infrastructure, limiting the
utility of performance technology in scientific
computing - Components, both in performance tools, and
numerical software can be used to manage
complexity and enable better performance through
dynamic adaptation or multimethod solvers - A life-cycle environment may be the best
long-term solution - Some relevant sites
- http//www.mcs.anl.gov/norris
- http//perc.nersc.gov (performance tools)
- http//cca-forum.org (component specification)