Performance%20Analysis%20Tools - PowerPoint PPT Presentation

About This Presentation
Title:

Performance%20Analysis%20Tools

Description:

Goal - high performance computing for applications that are distributed: ... from architectural views/post-mortem analysis. to deeper correlation and derived metrics ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 85
Provided by: rtwil
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Performance%20Analysis%20Tools


1
Performance Analysis Tools
  • Nadya Williams
  • Spring, 2000
  • UCSD

2
Outline
  • Background
  • Performance measurement
  • SvPablo
  • Autopilot
  • Paradyn
  • XPVM

3
Background
  • Goal - high performance computing for
    applications that are distributed
  • by design, e.g. collaborative environments,
    distributed data analysis, computer-enhanced
    instruments
  • by implementation, e.g. metacomputing,
    high-throughput computing
  • Goal - to achieve maintain performance
    guarantees in heterogeneous, dynamic environments

4
Background
  • Performance-robust grid applications need to
  • Identify resources required to meet application
    performance requirements
  • Select from problem specification, algorithm
    code variants
  • Establish hierarchical performance contracts
  • Select and manage adaptation strategies when
    performance contracts are violated

5
Computational grids
MPP
Real-time Data Analysis
Network
Viz Engine
Network
Visualization and Steering
Network
  • Shared resources
  • computation, network, and data archives

6
Complexity
  • Emerging applications are dynamic
  • time varying resource demands
  • time varying resource availability
  • heterogeneous execution environments
  • geographically distributed
  • Display and analysis hierarchy
  • code, thread, process, processor
  • system and local area network
  • national/international network

7
Grid performance challenges
  • Wide area infrastructure
  • Many resource models
  • Behavioral variability
  • complex applications, diverse systems and
    networks
  • irreproducible behavior
  • Heterogeneous applications
  • multilingual and multimodel
  • real-time constraints and shared resources
  • Prediction scheduling

8
Outline
  • Background
  • Performance measurement
  • SvPablo
  • Autopilot
  • Paradyn
  • XPVM

9
Performance analysis
  • The ability to
  • capture
  • analyze
  • present
  • optimize
  • Multiple analysis levels
  • hardware
  • system software
  • runtime systems
  • libraries
  • applications

Good tools must accommodate all
10
Real-time Multilevel Analysis
  • Multilevel Drilldown
  • multiple sites
  • multiple metrics
  • real-time display
  • Problems
  • uncertainty and perturbation
  • confusion of cause and effect

11
Guidelines
  • Design for locality
  • regardless of programming model
  • threads, MPI, data parallel -- its the same
  • Recognize historical models
  • large codes develop over time
  • assumptions change
  • Think about more than FLOPS
  • I/O, memory, networking, user interfaces

12
Initial steps
  • Develop infrastructure for structural and
    performance information
  • Provide instrumentation of end-user applications
    communication libraries
  • Study performance characteristics of real grid
    applications

13
Peak and Sustained Performance
  • Peak performance
  • perfect conditions
  • Actual performance
  • considerably less
  • Environment dictates performance
  • locality really matters
  • we must design for performance stability
  • more of less may be better than less of more

14
Instrumentation approaches
  • At least four major techniques
  • profiling
  • counting
  • interval timing
  • event tracing
  • Each strikes a different balance
  • detail and insight
  • measurement perturbation
  • Understand overheads and benefits

15
Measurement developments
  • Hardware counters
  • once rare (Cray), now common (Sun, IBM, Intel,
    Compaq)
  • metrics
  • operation types
  • memory stalls
  • Object code patching
  • run-time instrumentation
  • Compiler integration
  • inverse compiler transformations
  • high-level language analysis

16
Correlating semantic levels
  • Performance measurements
  • capture behavior of executing software
  • reflect output of multi-level transformations
  • Performance tools
  • must relate data to user semantic model
  • cache miss ratios cannot help a MATLAB user
  • message counts cannot help an HPF user
  • should suggest possible performance remedies

17
Analysis developments
  • Visualization techniques
  • traces and statistics
  • Search and destroy
  • AI suggestions and consultants
  • critical paths and zeroing
  • Data reduction and processing
  • statistical clustering/projection pursuit
  • neural net, and time series classification
  • Real-time control
  • sensor/actuator models

18
Performance tool checkpoint
  • An incomplete view
  • representative techniques and tools
  • Major evolution
  • from architectural views/post-mortem analysis
  • to deeper correlation and derived metrics
  • Key open problems
  • adaptivity
  • scale
  • semantic correlation

19
Representative vendor tools
  • IBM VT
  • ParaGraph trace display and statistical metrics
  • Silicon Graphics Speedshop
  • R10000, R12000 hardware counter tools
  • Pallas Vampir
  • event tracing and display tools
  • Cray ATExpert (autotasking)
  • basic AI suggestions for tuning
  • Intel SPV
  • ParaGraph and hardware counter displays
  • TMC/SUN Prism
  • data parallel and message passing analysis

20
Representative research tools
  • Illinois SvPablo
  • performance data metaformat
  • Globus integration (sensor/actuator control)
  • Illinois Autopilot
  • performance steering
  • Wisconsin Paradyn
  • runtime code patching
  • performance consultant
  • Oak Ridge National Lab XPVM
  • X Windows based, graphical console and monitor
    for PVM

21
Outline
  • Background
  • Performance measurement
  • SvPablo
  • Autopilot
  • Paradyn

22
SvPablo Graphical source code browser for
performance tuning and visualization
  • Department of Computer Science
  • University of Illinois at
  • Urbana-Champaign

23
SvPablo Outline
  • Background
  • SvPablo overview
  • SvPablo model
  • Automatic/Interactive instrumentation of programs
  • The Pablo Self-Defining Data Format

24
SvPablo Background
  • Motivations
  • emerging high-level languages (HPF and HPC)
  • aggressive code transformations for parallelism
  • large semantic gap between user and code
  • Goals
  • relate dynamic performance data to source
  • hide semantic gap
  • generate instrumented executable/simulated code
  • support performance scalability predictions

25
Background
  • Tools should provide the performance data and
    suggestions for performance improvements at the
    level of an abstract, high-level program
  • Tools should integrate dynamic performance data
    with information recorded by the compiler that
    describes the mapping from the high-level source
    to the resulting low-level explicitly parallel
    code

26
SvPablo overview
  • A graphical user interface tool for
  • source code instrumentation
  • browsing runtime performance data
  • Two major components
  • performance instrumentation libraries
  • performance analysis and presentation
  • Provides
  • performance data capture
  • analysis
  • presentation

27
SvPablo overview
  • Instrumentation
  • automatic
  • HPF (from PGI)
  • interactive
  • ANSI C
  • Fortran 77
  • Fortran 90
  • Data capture
  • dynamic software statistics (no traces)
  • SGI R10000 counter values

28
SvPablo overview
  • Source code instrumentation
  • HPF PGI runtime system invokes instrumentation
  • each procedure call
  • each HPF source line
  • C and Fortran programs interactively
    instrumented
  • outer loops
  • function calls
  • Instrumentation maintains statistical summary
  • Summaries correlated across processors
  • Correlated summary input to browser

29
SvPablo overview
  • Architectures
  • any system with the PGI HPF compile
  • any system with F77 or F90
  • C applications supported on
  • single processor Unix workstations
  • network of Unix workstations using MPI
  • Intel Paragon
  • Meiko CS2
  • GUI supports
  • Sun (Solaris)
  • SGI (IRIX)

30
Statistics metrics
  • For procedures
  • count
  • exclusive / inclusive duration
  • send / receive message duration (HPF only)
  • For lines
  • count
  • duration
  • exclusive duration
  • message send and message receive (HPF only)
  • duration
  • count
  • size
  • event counters (SGI)
  • Mean, STD, Min, Max

31
SvPablo model
Application
. . .
Performance contexts
Source files
. . .
Performance data
Performance data
32
New project dialog box
33
HPF performance analysis data flow
HPF source code
SvPablo data capture library
instrumented object code
performance file
SvPabloCombine
Linker
Graphical performance browser
Parallel Architecture
instrumented executable
34
HPF instrumentation
  • pghpf -c -Mproflines source1.F
  • pghpf -c -Mproflines source2.F
  • pghpf -Mstats -o prog source1.o source2.o
  • /usr/local/SvPablo/lib/pghpf2SDDF.o
  • prog -pghpf -np 8
  • SvPabloCombine HPF_SDDF

35
Performance visualization
Metrics count exclusive duration
36
Performance metric selection dialog
37
C / F77/ F90 data flow
instrumented source code
create or edit project
SvPablo data capture library
compiler
per-process performance files
Instrument C or Fortran files
instrumented object code
SvPabloCombine
visualize performance file
Linker
Parallel Architecture
performance file
instrumented executable
SvPablo
38
Interactive instrumentation
Instrumentable Constructs (function calls and
outer loops)
39
Generating an instrumented executable program
  • mpicc -c file1.Context1.inst.c
  • mpicc -c file2.Context1.inst.c
  • mpicc -c Context1/InstrumentationInit.c
  • mpicc -o instFile InstrumentationInit.o
  • file1.Context1.inst.o
  • file2.Context1.inst.o
  • svPabloLib.a

40
SDDF a medium of exchange
  • Self-Defining Data Format
  • data meta-format language for performance data
    description
  • specifies both data record structures and data
    record instances
  • separates data structure and semantics
  • allows the definition of records containing
    scalars and arrays
  • supported by the Pablo SDDF library

41
SDDF files classes of records
  • Command conveys action to be taken
  • Stream Attribute gives information pertinent to
    the entire file
  • Record Descriptor declares record structure
  • Record Data encapsulates data values

42
Record descriptors
  • Describe record layout
  • Each Record Descriptor contains
  • A unique tag and record name
  • An optional Record Attribute
  • Field Descriptors, each one containing
  • an optional Field Attribute
  • field type specifier
  • field name
  • optional field dimension

43
SDDF record descriptor data
tag
  • 300
  • // "description" "PGI Line-Based Profile Record"
  • "PGI Line Profile"
  • int "Line Number"
  • int "Processor Number
  • int "Procedure ID"
  • int "Count"
  • double "Inclusive Seconds"
  • double "Exclusive Seconds"
  • int "Send Data Count"
  • int "Send Data Byte"
  • double "Send Data Seconds"
  • int "Receive Data Count"
  • int "Receive Data Byte"
  • double "Receive Data Seconds"
  • "PGI Line Profile" 359, 27,9, 4, 399384,
    31.071, 31.071, 0, 0, 0, 0, 0, 0

record name
field descriptors
44
SvPablo language transparency
  • Meta-format for performance data
  • language defined by line and byte offsets
  • metrics defined by mapping to offsets
  • SDDF records
  • performance mapping information
  • performance measurements
  • Result
  • language independent performance browser
  • mechanism for scalability model integration

45
SvPablo conclusions
  • Versatility yes
  • analysis GUI is quite versatile, provides the
    ability to define new modules, but steep learning
    curve
  • theoretically, any type of view could be
    constructed from the toolkit provided
  • Portability not quite
  • Intended for wide range of parallel platforms and
    programming languages, reality is different
    (SUN, SGI)
  • Scalability - some
  • Pablo trace library monitors and dynamically
    alters the volume, frequency, and types of event
    data recorded
  • not clear how automatically or by user at low
    level?
  • need to integrate predictions

46
Outline
  • Background
  • Performance measurement
  • SvPablo
  • Autopilot
  • Paradyn
  • XPVM

47
Autopilot - a performance steering
toolkitProvides flexible infrastructure for
real-time adaptive control of parallel and
distributed computing resources
  • Department of Computer Science
  • University of Illinois at
  • Urbana-Champaign

48
Autopilot outline
  • Background
  • Autopilot overview
  • Autopilot components
  • Conclusions

49
Autopilot background
  • HPC from single parallel systems to distributed
    collections of heterogeneous sequential and
    parallel systems.
  • emerging applications are irregular
  • have complex, data dependent execution behavior
  • dynamic, with time varying resource demands
  • failure to recognize that resource allocation and
    management must evolve with applications
  • Consequence small changes in application
    structure
  • can lead to large changes in observed
    performance.

50
Autopilot background
  • interactions between application and system
    resources change
  • across applications
  • during a single application's execution
  • Autopilot approach create adaptable
  • runtime libraries
  • resource management policies

51
Autopilot overview
  • After the integration of
  • dynamic performance instrumentation
  • on-the-fly performance data reduction
  • configurable, malleable resource management
    algorithms
  • real-time adaptive control mechanism
  • Have adaptive resource management infrastructure
  • Given
  • application request patterns
  • observed system performance
  • Automatically choose configure resource
    management algorithms
  • increase portability
  • increase achieved performance

52
Autopilot components
  1. Autopilot - implements the core features of the
    Autopilot system.
  2. Fuzzy Library - needed to build the classes
    supporting the fuzzy logic decision procedure
    infrastructure
  3. Autodriver - provides a graphical user interface
    (written in Java)
  4. Performance Monitor - provides tools to retrieve
    and record various system performance statistics
    on a set of machines.

53
1 Autopilot component
  • libAutopilot.a creation, registration, and use
  • sensors
  • actuators (enable and configure resource
    management policies)
  • decision procedures
  • AutopilotManager - a utility program which
    displays the sensors and actuators currently
    registered with the Autopilot Manager

54
2 Fuzzy library component
  • Fuzzy Rules to C translator
  • related classes used by the Autopilot fuzzy logic
    decision procedure infrastructure.

55
3 Autodriver component
  • Autopilot Adapter program
  • provides a Java interface to Autopilot
  • (must run on UNIX)
  • JAVA GUI
  • talks to Autopilot through the Adapter
  • allows a user to monitor and interact with live
    sensors and actuators.
  • (runs on any platform that supports Java)

56
4 Performance monitor component
  • two kinds of processes
  • Collectors
  • run on the machines to be monitored
  • capture quantitative application and system
    performance data
  • Recorders
  • compute performance metrics.
  • record or output it.
  • communicate via Autopilot component

57
Closed loop adaptive control
Illinois Autopilot Toolkit (Reed et al)
Globus integration
Real-time measurement
58
Autopilot conclusions
  • Goal is creation of an infrastructure for
    building resilient, distributed and parallel
    applications.
  • allow the creation of software that can change
    its behavior and optimize its performance in
    response to real-time data
  • on software dynamics
  • performance.
  • order of magnitude performance improvements

59
Outline
  • Background
  • Performance measurement
  • SvPablo
  • Autopilot
  • Paradyn

60
Paradynperformance measurement tool for
parallel and distributed programs
  • Computer Science,
  • University of Wisconsin

61
Paradyn outline
  • Motivations
  • Approach
  • Performance Consultant
  • Conclusions

62
Paradyn motivations
  • provide a performance measurement tool that
    scales to long-running programs on large parallel
    and distributed systems
  • automate much of the search for performance
    bottlenecks
  • avoid the space and time overhead typically
    associated with trace-based tools.
  • go beyond post-mortem analysis

63
Paradyn approach
  • Dynamic instrumentation
  • based on dynamically controlling what performance
    data is to be collected.
  • allows data collection instructions to be
    inserted into an application program during
    runtime.
  • Paradyn
  • dynamically instruments the application
  • automatically controls the instrumentation in
    search of performance problems

64
Paradyn model
  • the Paradyn front-end and user interface
  • display performance visualizations
  • use the Performance Consultant to find
    bottlenecks
  • start and stop the application
  • monitor the status of the application
  • the Paradyn daemons
  • monitor and instrument the application processes.

65
Performance consultant module
  • automatically directs the placement of
    instrumentation
  • has a knowledge base of performance bottlenecks
    and program structure
  • can associate bottlenecks with specific causes
    and with specific parts of a program.

66
Paradyn runtime
  • Concepts for performance data analysis/presentatio
    n
  • metric-focus grid cross-product of two vectors
  • list of performance metrics (CPU time, blocking
    time)
  • list of program components (procedures,
    processors, disks)
  • elements of the matrix can be single-valued
    (e.g., current
  • value, average, min, or max) or time-histograms
  • time-histogram fixed-size data structure
    recording behavior of a metric as it varies over
    time
  • Performance data granularity
  • global phase
  • local phase

67
Performance consultant
Wisconsin Paradyn Toolkit (Miller et al)
unknown
true
false
68
Performance consultant
Wisconsin Paradyn Toolkit (Miller et al)
69
Outline
  • Background
  • Performance measurement
  • SvPablo
  • Autopilot
  • Paradyn
  • XPVM

70
XPVMGraphical console and monitor for PVM
  • developed at the Oak Ridge National Lab
  • Provides a graphical user interface to the PVM
    console commands
  • Provides several animated views to monitor the
    execution of PVM programs

71
XPVM overview
  • Xpvm generates trace records during PVM program
    execution. The resulting trace file is used to
    "playback" a program's execution.
  • The xpvm views provide information about the
    interactions among tasks in a parallel PVM
    program, to assist in debugging and performance
    tuning.
  • Xpvm writes a Pablo self-defining trace file

72
XPVM menus
  • Host menu permits to configure a parallel
    virtual machine by adding/removing hosts
  • Tasks menu enables to spawn, signal, or kill PVM
    processes, can monitor selected PVM system tasks,
    such as the group server process

73
XPVM menus
  • Reset menu resets parallel virtual machine, xpvm
    views, or trace file
  • Help menu provides help features
  • Views permits selection of any of the five xpvm
    displays for monitoring program execution

74
XPVM menus
  • Trace file play back controls - play, step
    forward, stop or reset the execution trace file
  • Trace file selection window - displays the name
    of the current trace file

75
XPVM views (5)
  • Network
  • Displays high-level activity on each node in the
    virtual machine
  • Each host is represented by an icon image showing
    host name and architecture
  • Icons are color illuminated to indicate status
  • Active - at least one task on that host is doing
    useful work
  • System - no tasks are doing user work and at
    least one task is busy executing PVM system
    routines
  • No tasks

76
Network
77
Space time
  • Shows status of all tasks as they execute across
    all hosts
  • Computing - executing useful user computations
  • Overhead - executing PVM system routines for
    communication, task control, etc.
  • Waiting - waiting for messages from other tasks
  • Message - indicates communications between tasks

78
Space time
79
Utilization
  • Summarizes the Space-Time view at each instant by
    showing the aggregate number of tasks computing,
    in overhead or waiting for a message.
  • Shares same horizontal time scale as the
    Space-Time view
  • Zooming-in
  • Zooming-out

80
Utilization
81
Call trace
  • Displays each tasks' most recent PVM call
  • Changes as program executes
  • Useful for debugging
  • Clicking on a task in the scrolling task list
    will display that task's full name and TID

82
Call trace
83
Task output
  • Provides a view of output (stdout) generated by
    tasks in a scrolling window
  • Can be saved to a file at any point

84
Concluding remarks
  • System complexity is rising fast
  • computational grids
  • multidisciplinary applications
  • performance tools
  • There are many open problems
  • adaptive optimization
  • performance prediction
  • compiler/tool integration
  • performance quality of service (QoS)

85
Concluding remarks
  • the software problems are large cannot be solve
    in isolation
  • open source collaboration
  • vendors, laboratories, and academics
  • technology assessment
Write a Comment
User Comments (0)
About PowerShow.com