Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker"

About This Presentation

Title:

Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker"

Description:

HPM library can be used to instrument code sections. Embed calls into source code. Fortran, C, C ... C: hpmTerminate(taskID) You can have multiple, ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 29

Provided by: richard62

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker"

1
Introduction to IBM's profiling tools "HPMlib"
and "PEBenchmarker"
Richard Gerber NERSC User Services
ragerber_at_nersc.gov
2
Outline

HPMlib and hpmviz
PE Benchmarker
Performance Collection Tool
Profile Visualization Tool
Unified Trace Environment utilities

3
The HPM Library

The Hardware Performance Monitor (HPM) Library
provides a set of functions to collect POWER 3
hardware counter data
Calls are inserted into source code
API is simple
Many different counters can be started and
stopped at arbitrary positions in your code

4
Using HPMLIB

HPM library can be used to instrument code
sections
Embed calls into source code
Fortran, C, C
Access through the hpmtoolkit module
module load hpmtoolkit
compile with HPMTOOLKIT env variable
xlf qarchpwr3 O2 source.F \
HPMTOOLKIT
Execute program normally
Output written to files separate ones for each
task

5
HPMlib Functions

Include files
Fortran f_hpmlib.h
C libhpm.h
Initialize library
Fortran f_hpminit(taskID, progName)
C hpmInit(taskID, progName)
Start Counter
Fortran f_hpmstart(id,label)
C hpmStart(id,label)

6
HPMlib Functions II

Stop Counter
Fortran f_hpmstop(id)
C hpmStop(id)
Finalize library when finished
Fortran f_hpmterminate(taskID)
C hpmTerminate(taskID)
You can have multiple, overlapping counter
stops/starts in your code

7
HPMlib Sample Code

Declarations...
Z0.0
CALL RANDOM_NUMBER(X)
CALL RANDOM_NUMBER(Y)
!
! Initialize HPM Performance Library and Start
Counter
!
CALL f_hpminit(0,"ma.F")
CALL f_hpmstart(1,"matrix-matrix
multiply")
DO J1,N
DO K1,N
DO I1,N
Z(I,J) Z(I,J)
X(I,K) Y(K,J)
END DO
END DO

8
HMPlib Example Output

module load hpmtoolkit
xlf90 -o xma_hpmlib O2 qarchpwr3 ma.F
HPMTOOLKIT
./xma_hpmlib
libHPM output in perfhpm0000.67880

libhpm (Version 2.4.2) summary - running on
POWER3-II Total execution time of instrumented
code (wall time) 4.185484 seconds . . .
Instrumented section 1 - Label matrix-matrix
multiply - process 0 Wall Clock Time 4.18512
seconds Total time in user mode
4.16946747484786 seconds . . . PM_FPU0_CMPL
(FPU 0 instructions)
505166645 PM_FPU1_CMPL (FPU 1 instructions)
6834038 PM_EXEC_FMA (FMAs
executed) 512000683 . .
. MIPS
610.707 Instructions per cycle
1.637 HW Float points
instructions per Cycle 0.327
Floating point instructions FMAs
1024.001 M Float point instructions FMA
rate 243.856 Mflip/s FMA
percentage
100.000 Computation intensity
0.666
9
The hpmviz tool

The hpmviz tool has a GUI to help browse HPMlib
output
Part of the hpmtoolkit module
After running a code with HPMlib calls, a .viz
file is also produced for each task.
Usage
hpmviz filename1.viz filename2.viz

10
hpmviz Screen Shot 1
11
hpmviz Screen Shot 2
Right clicking on the Label line in the previous
slide brings up a detail window.
12
Parallel hpmviz

For parallel codes, right-clicking shows each task

13
PE Benchmarker

PE Benchmarker is a suite of IBM performance
analysis applications and utilities
Performance Collection Tool (pct)
Collect hardware counter, system info, or
Collect MPI trace info, user events
Profile Visualization Tool (pvt)
Visualize hardware counter, system info data
Unified Trace Environment utilities
MPI summary info
Convert to format for ANLs Jumpshot utility for
visualizing MPI events

14
Performance Collection Tool

A tool to collection either
Hardware counter OS system info
MPI and user event data
Built on Dynamic Probe Class Library (DPCL)
Allows insertion and deletion of instrumentation
probes while code is running
No code modification needed
GUI and command line interface
Profiles at program, file, subroutine levels

15
Preparing to Use PCT

Compile with thread-safe compiler, e.g. mxlf90_r
Set MPE_UTE environment variable
setenv MPE_UTE yes (csh)
export MPE_UTEyes (ksh)
Load java module
module load java

16
Starting PCT

Example program mpi_heat2D
mpxlf_r -O2 mpi_heat2D.f draw_heat.o -o
mpi_heat2D
Start PCT
pct

17
pct Options

Use full pathnames
POE arguments must specify nodes and procs
Use retry nsecs and retrycount ntimes to ensure
job startup

18
Select Type of Data

Select either
MPI and user events
Hardware and OS profiles
Use full pathname for Data Collection directory

19
Hardware/OS Profiles

Select processes
Select routines
Select probes
Select HPM group

20
pct Profile Data

Start program from Application menu
After job completes, it writes netCDF output
files basename.cdf.taskno
Files can be viewed with pvt application.

21
MPI Event Statistics

Select processes
Select routines
Select MPI events
Add user markers

22
MPI Event Data

Start program from Application menu
When job finishes, output written to AIX trace
files named basename.xx, one per node.
AIX trace files can be large
Need to convert AIX trace files to UTE format
using uteconvert utility

23
Performance Visualization Tool

Examine hardware/OS data that was collected using
pct
To run
module load java
pvt basename.cdf.

24
Examining Data with pvt

Pick data to view from drop-down menu
Expand source and function listings with mouse

25
pvt Reports

Can generate many different reports
TLB miss report shown here
Limited to data group selection made when using
pct to collect data

26
UTE Utilities

uteconvert
Converts AIX trace files to UTE interval trace
files
utemerge
Merges multiple UTE files in a single UTE file
utestats
Generates statistics tables from UTE files
slogmerge
Converts and merges UTE files to SLOG files
needed by Jumpshot
module load java mpe
Read /usr/common/usg/mpe/1.2.2/share/jumpshot-3/do
c/TourStepByStep.pdf

27
Summary and Recommendations

HPMlib provides API to profile your code.
PE Benchmarker suite allows in-depth dynamic
profiling with no code modification.
My personal recommendations.
Start with hpmcount and poe to profile entire
code.
If want more granularity, use HPMlib to wrap
portions of code to gather performance data.
PE Benchmarker may be good for intermediate to
expert programmers with good knowledge of the
hardware and performance metrics. May have steep
learning curve for novice programmers.

28
More Information

NERSC Website http//hpcf.nersc.gov
HPM Toolkit
http//hpcf.nersc.gov/software/ibm/hpmcount/HPM_2_
4_2.html
IBM PE Benchmarker manuals
http//hpcf.nersc.gov/vendor_docs/ibm/pe/am103mst2
4.html
Compilers, general NERSC SP info
http//hpcf.nersc.gov/computers/SP/

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker" - PowerPoint PPT Presentation

Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker"

HPM library can be used to instrument code sections. Embed calls into source code. Fortran, C, C ... C: hpmTerminate(taskID) You can have multiple, ... – PowerPoint PPT presentation