Performance%20Analysis%20Tools - PowerPoint PPT Presentation

About This Presentation

Title:

Performance%20Analysis%20Tools

Description:

Goal - high performance computing for applications that are distributed: ... from architectural views/post-mortem analysis. to deeper correlation and derived metrics ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 85

Provided by: rtwil

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Performance%20Analysis%20Tools

1
Performance Analysis Tools

Nadya Williams
Spring, 2000
UCSD

2
Outline

Background
Performance measurement
SvPablo
Autopilot
Paradyn
XPVM

3
Background

Goal - high performance computing for
applications that are distributed
by design, e.g. collaborative environments,
distributed data analysis, computer-enhanced
instruments
by implementation, e.g. metacomputing,
high-throughput computing
Goal - to achieve maintain performance
guarantees in heterogeneous, dynamic environments

4
Background

Performance-robust grid applications need to
Identify resources required to meet application
performance requirements
Select from problem specification, algorithm
code variants
Establish hierarchical performance contracts
Select and manage adaptation strategies when
performance contracts are violated

5
Computational grids
MPP
Real-time Data Analysis
Network
Viz Engine
Network
Visualization and Steering
Network

Shared resources
computation, network, and data archives

6
Complexity

Emerging applications are dynamic
time varying resource demands
time varying resource availability
heterogeneous execution environments
geographically distributed
Display and analysis hierarchy
code, thread, process, processor
system and local area network
national/international network

7
Grid performance challenges

Wide area infrastructure
Many resource models
Behavioral variability
complex applications, diverse systems and
networks
irreproducible behavior
Heterogeneous applications
multilingual and multimodel
real-time constraints and shared resources
Prediction scheduling

8
Outline

Background
Performance measurement
SvPablo
Autopilot
Paradyn
XPVM

9
Performance analysis

The ability to
capture
analyze
present
optimize

Multiple analysis levels
hardware
system software
runtime systems
libraries
applications

Good tools must accommodate all
10
Real-time Multilevel Analysis

Multilevel Drilldown
multiple sites
multiple metrics
real-time display
Problems
uncertainty and perturbation
confusion of cause and effect

11
Guidelines

Design for locality
regardless of programming model
threads, MPI, data parallel -- its the same
Recognize historical models
large codes develop over time
assumptions change
Think about more than FLOPS
I/O, memory, networking, user interfaces

12
Initial steps

Develop infrastructure for structural and
performance information
Provide instrumentation of end-user applications
communication libraries
Study performance characteristics of real grid
applications

13
Peak and Sustained Performance

Peak performance
perfect conditions
Actual performance
considerably less
Environment dictates performance
locality really matters
we must design for performance stability
more of less may be better than less of more

14
Instrumentation approaches

At least four major techniques
profiling
counting
interval timing
event tracing
Each strikes a different balance
detail and insight
measurement perturbation
Understand overheads and benefits

15
Measurement developments

Hardware counters
once rare (Cray), now common (Sun, IBM, Intel,
Compaq)
metrics
operation types
memory stalls
Object code patching
run-time instrumentation
Compiler integration
inverse compiler transformations
high-level language analysis

16
Correlating semantic levels

Performance measurements
capture behavior of executing software
reflect output of multi-level transformations
Performance tools
must relate data to user semantic model
cache miss ratios cannot help a MATLAB user
message counts cannot help an HPF user
should suggest possible performance remedies

17
Analysis developments

Visualization techniques
traces and statistics
Search and destroy
AI suggestions and consultants
critical paths and zeroing
Data reduction and processing
statistical clustering/projection pursuit
neural net, and time series classification
Real-time control
sensor/actuator models

18
Performance tool checkpoint

An incomplete view
representative techniques and tools
Major evolution
from architectural views/post-mortem analysis
to deeper correlation and derived metrics
Key open problems
adaptivity
scale
semantic correlation

19
Representative vendor tools

IBM VT
ParaGraph trace display and statistical metrics
Silicon Graphics Speedshop
R10000, R12000 hardware counter tools
Pallas Vampir
event tracing and display tools
Cray ATExpert (autotasking)
basic AI suggestions for tuning
Intel SPV
ParaGraph and hardware counter displays
TMC/SUN Prism
data parallel and message passing analysis

20
Representative research tools

Illinois SvPablo
performance data metaformat
Globus integration (sensor/actuator control)
Illinois Autopilot
performance steering
Wisconsin Paradyn
runtime code patching
performance consultant
Oak Ridge National Lab XPVM
X Windows based, graphical console and monitor
for PVM

21
Outline

Background
Performance measurement
SvPablo
Autopilot
Paradyn

22
SvPablo Graphical source code browser for
performance tuning and visualization

Department of Computer Science
University of Illinois at
Urbana-Champaign

23
SvPablo Outline

Background
SvPablo overview
SvPablo model
Automatic/Interactive instrumentation of programs
The Pablo Self-Defining Data Format

24
SvPablo Background

Motivations
emerging high-level languages (HPF and HPC)
aggressive code transformations for parallelism
large semantic gap between user and code
Goals
relate dynamic performance data to source
hide semantic gap
generate instrumented executable/simulated code
support performance scalability predictions

25
Background

Tools should provide the performance data and
suggestions for performance improvements at the
level of an abstract, high-level program
Tools should integrate dynamic performance data
with information recorded by the compiler that
describes the mapping from the high-level source
to the resulting low-level explicitly parallel
code

26
SvPablo overview

A graphical user interface tool for
source code instrumentation
browsing runtime performance data
Two major components
performance instrumentation libraries
performance analysis and presentation
Provides
performance data capture
analysis
presentation

27
SvPablo overview

Instrumentation
automatic
HPF (from PGI)
interactive
ANSI C
Fortran 77
Fortran 90
Data capture
dynamic software statistics (no traces)
SGI R10000 counter values

28
SvPablo overview

Source code instrumentation
HPF PGI runtime system invokes instrumentation
each procedure call
each HPF source line
C and Fortran programs interactively
instrumented
outer loops
function calls
Instrumentation maintains statistical summary
Summaries correlated across processors
Correlated summary input to browser

29
SvPablo overview

Architectures
any system with the PGI HPF compile
any system with F77 or F90
C applications supported on
single processor Unix workstations
network of Unix workstations using MPI
Intel Paragon
Meiko CS2
GUI supports
Sun (Solaris)
SGI (IRIX)

30
Statistics metrics

For procedures
count
exclusive / inclusive duration
send / receive message duration (HPF only)
For lines
count
duration
exclusive duration
message send and message receive (HPF only)
duration
count
size
event counters (SGI)
Mean, STD, Min, Max

31
SvPablo model
Application
. . .
Performance contexts
Source files
. . .
Performance data
Performance data
32
New project dialog box
33
HPF performance analysis data flow
HPF source code
SvPablo data capture library
instrumented object code
performance file
SvPabloCombine
Linker
Graphical performance browser
Parallel Architecture
instrumented executable
34
HPF instrumentation

pghpf -c -Mproflines source1.F
pghpf -c -Mproflines source2.F
pghpf -Mstats -o prog source1.o source2.o
/usr/local/SvPablo/lib/pghpf2SDDF.o
prog -pghpf -np 8
SvPabloCombine HPF_SDDF

35
Performance visualization
Metrics count exclusive duration
36
Performance metric selection dialog
37
C / F77/ F90 data flow
instrumented source code
create or edit project
SvPablo data capture library
compiler
per-process performance files
Instrument C or Fortran files
instrumented object code
SvPabloCombine
visualize performance file
Linker
Parallel Architecture
performance file
instrumented executable
SvPablo
38
Interactive instrumentation
Instrumentable Constructs (function calls and
outer loops)
39
Generating an instrumented executable program

mpicc -c file1.Context1.inst.c
mpicc -c file2.Context1.inst.c
mpicc -c Context1/InstrumentationInit.c
mpicc -o instFile InstrumentationInit.o
file1.Context1.inst.o
file2.Context1.inst.o
svPabloLib.a

40
SDDF a medium of exchange

Self-Defining Data Format
data meta-format language for performance data
description
specifies both data record structures and data
record instances
separates data structure and semantics
allows the definition of records containing
scalars and arrays
supported by the Pablo SDDF library

41
SDDF files classes of records

Command conveys action to be taken
Stream Attribute gives information pertinent to
the entire file
Record Descriptor declares record structure
Record Data encapsulates data values

42
Record descriptors

Describe record layout
Each Record Descriptor contains
A unique tag and record name
An optional Record Attribute
Field Descriptors, each one containing
an optional Field Attribute
field type specifier
field name
optional field dimension

43
SDDF record descriptor data
tag

300
// "description" "PGI Line-Based Profile Record"
"PGI Line Profile"
int "Line Number"
int "Processor Number
int "Procedure ID"
int "Count"
double "Inclusive Seconds"
double "Exclusive Seconds"
int "Send Data Count"
int "Send Data Byte"
double "Send Data Seconds"
int "Receive Data Count"
int "Receive Data Byte"
double "Receive Data Seconds"
"PGI Line Profile" 359, 27,9, 4, 399384,
31.071, 31.071, 0, 0, 0, 0, 0, 0

record name
field descriptors
44
SvPablo language transparency

Meta-format for performance data
language defined by line and byte offsets
metrics defined by mapping to offsets
SDDF records
performance mapping information
performance measurements
Result
language independent performance browser
mechanism for scalability model integration

45
SvPablo conclusions

Versatility yes
analysis GUI is quite versatile, provides the
ability to define new modules, but steep learning
curve
theoretically, any type of view could be
constructed from the toolkit provided
Portability not quite
Intended for wide range of parallel platforms and
programming languages, reality is different
(SUN, SGI)
Scalability - some
Pablo trace library monitors and dynamically
alters the volume, frequency, and types of event
data recorded
not clear how automatically or by user at low
level?
need to integrate predictions

46
Outline

Background
Performance measurement
SvPablo
Autopilot
Paradyn
XPVM

47
Autopilot - a performance steering
toolkitProvides flexible infrastructure for
real-time adaptive control of parallel and
distributed computing resources

Department of Computer Science
University of Illinois at
Urbana-Champaign

48
Autopilot outline

Background
Autopilot overview
Autopilot components
Conclusions

49
Autopilot background

HPC from single parallel systems to distributed
collections of heterogeneous sequential and
parallel systems.
emerging applications are irregular
have complex, data dependent execution behavior
dynamic, with time varying resource demands
failure to recognize that resource allocation and
management must evolve with applications
Consequence small changes in application
structure
can lead to large changes in observed
performance.

50
Autopilot background

interactions between application and system
resources change
across applications
during a single application's execution
Autopilot approach create adaptable
runtime libraries
resource management policies

51
Autopilot overview

After the integration of
dynamic performance instrumentation
on-the-fly performance data reduction
configurable, malleable resource management
algorithms
real-time adaptive control mechanism
Have adaptive resource management infrastructure
Given
application request patterns
observed system performance
Automatically choose configure resource
management algorithms
increase portability
increase achieved performance

52
Autopilot components

Autopilot - implements the core features of the
Autopilot system.
Fuzzy Library - needed to build the classes
supporting the fuzzy logic decision procedure
infrastructure
Autodriver - provides a graphical user interface
(written in Java)
Performance Monitor - provides tools to retrieve
and record various system performance statistics
on a set of machines.

53
1 Autopilot component

libAutopilot.a creation, registration, and use
sensors
actuators (enable and configure resource
management policies)
decision procedures
AutopilotManager - a utility program which
displays the sensors and actuators currently
registered with the Autopilot Manager

54
2 Fuzzy library component

Fuzzy Rules to C translator
related classes used by the Autopilot fuzzy logic
decision procedure infrastructure.

55
3 Autodriver component

Autopilot Adapter program
provides a Java interface to Autopilot
(must run on UNIX)
JAVA GUI
talks to Autopilot through the Adapter
allows a user to monitor and interact with live
sensors and actuators.
(runs on any platform that supports Java)

56
4 Performance monitor component

two kinds of processes
Collectors
run on the machines to be monitored
capture quantitative application and system
performance data
Recorders
compute performance metrics.
record or output it.
communicate via Autopilot component

57
Closed loop adaptive control
Illinois Autopilot Toolkit (Reed et al)
Globus integration
Real-time measurement
58
Autopilot conclusions

Goal is creation of an infrastructure for
building resilient, distributed and parallel
applications.
allow the creation of software that can change
its behavior and optimize its performance in
response to real-time data
on software dynamics
performance.
order of magnitude performance improvements

59
Outline

Background
Performance measurement
SvPablo
Autopilot
Paradyn

60
Paradynperformance measurement tool for
parallel and distributed programs

Computer Science,
University of Wisconsin

61
Paradyn outline

Motivations
Approach
Performance Consultant
Conclusions

62
Paradyn motivations

provide a performance measurement tool that
scales to long-running programs on large parallel
and distributed systems
automate much of the search for performance
bottlenecks
avoid the space and time overhead typically
associated with trace-based tools.
go beyond post-mortem analysis

63
Paradyn approach

Dynamic instrumentation
based on dynamically controlling what performance
data is to be collected.
allows data collection instructions to be
inserted into an application program during
runtime.
Paradyn
dynamically instruments the application
automatically controls the instrumentation in
search of performance problems

64
Paradyn model

the Paradyn front-end and user interface
display performance visualizations
use the Performance Consultant to find
bottlenecks
start and stop the application
monitor the status of the application
the Paradyn daemons
monitor and instrument the application processes.

65
Performance consultant module

automatically directs the placement of
instrumentation
has a knowledge base of performance bottlenecks
and program structure
can associate bottlenecks with specific causes
and with specific parts of a program.

66
Paradyn runtime

Concepts for performance data analysis/presentatio
n
metric-focus grid cross-product of two vectors
list of performance metrics (CPU time, blocking
time)
list of program components (procedures,
processors, disks)
elements of the matrix can be single-valued
(e.g., current
value, average, min, or max) or time-histograms
time-histogram fixed-size data structure
recording behavior of a metric as it varies over
time
Performance data granularity
global phase
local phase

67
Performance consultant
Wisconsin Paradyn Toolkit (Miller et al)
unknown
true
false
68
Performance consultant
Wisconsin Paradyn Toolkit (Miller et al)
69
Outline

Background
Performance measurement
SvPablo
Autopilot
Paradyn
XPVM

70
XPVMGraphical console and monitor for PVM

developed at the Oak Ridge National Lab
Provides a graphical user interface to the PVM
console commands
Provides several animated views to monitor the
execution of PVM programs

71
XPVM overview

Xpvm generates trace records during PVM program
execution. The resulting trace file is used to
"playback" a program's execution.
The xpvm views provide information about the
interactions among tasks in a parallel PVM
program, to assist in debugging and performance
tuning.
Xpvm writes a Pablo self-defining trace file

72
XPVM menus

Host menu permits to configure a parallel
virtual machine by adding/removing hosts
Tasks menu enables to spawn, signal, or kill PVM
processes, can monitor selected PVM system tasks,
such as the group server process

73
XPVM menus

Reset menu resets parallel virtual machine, xpvm
views, or trace file
Help menu provides help features
Views permits selection of any of the five xpvm
displays for monitoring program execution

74
XPVM menus

Trace file play back controls - play, step
forward, stop or reset the execution trace file
Trace file selection window - displays the name
of the current trace file

75
XPVM views (5)

Network
Displays high-level activity on each node in the
virtual machine
Each host is represented by an icon image showing
host name and architecture
Icons are color illuminated to indicate status
Active - at least one task on that host is doing
useful work
System - no tasks are doing user work and at
least one task is busy executing PVM system
routines
No tasks

76
Network
77
Space time

Shows status of all tasks as they execute across
all hosts
Computing - executing useful user computations
Overhead - executing PVM system routines for
communication, task control, etc.
Waiting - waiting for messages from other tasks
Message - indicates communications between tasks

78
Space time
79
Utilization

Summarizes the Space-Time view at each instant by
showing the aggregate number of tasks computing,
in overhead or waiting for a message.
Shares same horizontal time scale as the
Space-Time view
Zooming-in
Zooming-out

80
Utilization
81
Call trace

Displays each tasks' most recent PVM call
Changes as program executes
Useful for debugging
Clicking on a task in the scrolling task list
will display that task's full name and TID

82
Call trace
83
Task output