Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker" - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker"

Description:

HPM library can be used to instrument code sections. Embed calls into source code. Fortran, C, C ... C: hpmTerminate(taskID) You can have multiple, ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 29
Provided by: richard62
Category:

less

Transcript and Presenter's Notes

Title: Introduction to IBM's profiling tools "HPMlib" and "PEBenchmarker"


1
Introduction to IBM's profiling tools "HPMlib"
and "PEBenchmarker"
Richard Gerber NERSC User Services
ragerber_at_nersc.gov
2
Outline
  • HPMlib and hpmviz
  • PE Benchmarker
  • Performance Collection Tool
  • Profile Visualization Tool
  • Unified Trace Environment utilities

3
The HPM Library
  • The Hardware Performance Monitor (HPM) Library
    provides a set of functions to collect POWER 3
    hardware counter data
  • Calls are inserted into source code
  • API is simple
  • Many different counters can be started and
    stopped at arbitrary positions in your code

4
Using HPMLIB
  • HPM library can be used to instrument code
    sections
  • Embed calls into source code
  • Fortran, C, C
  • Access through the hpmtoolkit module
  • module load hpmtoolkit
  • compile with HPMTOOLKIT env variable
  • xlf qarchpwr3 O2 source.F \
    HPMTOOLKIT
  • Execute program normally
  • Output written to files separate ones for each
    task

5
HPMlib Functions
  • Include files
  • Fortran f_hpmlib.h
  • C libhpm.h
  • Initialize library
  • Fortran f_hpminit(taskID, progName)
  • C hpmInit(taskID, progName)
  • Start Counter
  • Fortran f_hpmstart(id,label)
  • C hpmStart(id,label)

6
HPMlib Functions II
  • Stop Counter
  • Fortran f_hpmstop(id)
  • C hpmStop(id)
  • Finalize library when finished
  • Fortran f_hpmterminate(taskID)
  • C hpmTerminate(taskID)
  • You can have multiple, overlapping counter
    stops/starts in your code

7
HPMlib Sample Code
  • Declarations...
  • Z0.0
  • CALL RANDOM_NUMBER(X)
  • CALL RANDOM_NUMBER(Y)
  • !
  • ! Initialize HPM Performance Library and Start
    Counter
  • !
  • CALL f_hpminit(0,"ma.F")
  • CALL f_hpmstart(1,"matrix-matrix
    multiply")
  • DO J1,N
  • DO K1,N
  • DO I1,N
  • Z(I,J) Z(I,J)
    X(I,K) Y(K,J)
  • END DO
  • END DO

8
HMPlib Example Output
  • module load hpmtoolkit
  • xlf90 -o xma_hpmlib O2 qarchpwr3 ma.F
    HPMTOOLKIT
  • ./xma_hpmlib
  • libHPM output in perfhpm0000.67880

libhpm (Version 2.4.2) summary - running on
POWER3-II Total execution time of instrumented
code (wall time) 4.185484 seconds . . .
Instrumented section 1 - Label matrix-matrix
multiply - process 0 Wall Clock Time 4.18512
seconds Total time in user mode
4.16946747484786 seconds . . . PM_FPU0_CMPL
(FPU 0 instructions)
505166645 PM_FPU1_CMPL (FPU 1 instructions)
6834038 PM_EXEC_FMA (FMAs
executed) 512000683 . .
. MIPS
610.707 Instructions per cycle
1.637 HW Float points
instructions per Cycle 0.327
Floating point instructions FMAs
1024.001 M Float point instructions FMA
rate 243.856 Mflip/s FMA
percentage
100.000 Computation intensity
0.666
9
The hpmviz tool
  • The hpmviz tool has a GUI to help browse HPMlib
    output
  • Part of the hpmtoolkit module
  • After running a code with HPMlib calls, a .viz
    file is also produced for each task.
  • Usage
  • hpmviz filename1.viz filename2.viz

10
hpmviz Screen Shot 1
11
hpmviz Screen Shot 2
Right clicking on the Label line in the previous
slide brings up a detail window.
12
Parallel hpmviz
  • For parallel codes, right-clicking shows each task

13
PE Benchmarker
  • PE Benchmarker is a suite of IBM performance
    analysis applications and utilities
  • Performance Collection Tool (pct)
  • Collect hardware counter, system info, or
  • Collect MPI trace info, user events
  • Profile Visualization Tool (pvt)
  • Visualize hardware counter, system info data
  • Unified Trace Environment utilities
  • MPI summary info
  • Convert to format for ANLs Jumpshot utility for
    visualizing MPI events

14
Performance Collection Tool
  • A tool to collection either
  • Hardware counter OS system info
  • MPI and user event data
  • Built on Dynamic Probe Class Library (DPCL)
  • Allows insertion and deletion of instrumentation
    probes while code is running
  • No code modification needed
  • GUI and command line interface
  • Profiles at program, file, subroutine levels

15
Preparing to Use PCT
  • Compile with thread-safe compiler, e.g. mxlf90_r
  • Set MPE_UTE environment variable
  • setenv MPE_UTE yes (csh)
  • export MPE_UTEyes (ksh)
  • Load java module
  • module load java

16
Starting PCT
  • Example program mpi_heat2D
  • mpxlf_r -O2 mpi_heat2D.f draw_heat.o -o
    mpi_heat2D
  • Start PCT
  • pct

17
pct Options
  • Use full pathnames
  • POE arguments must specify nodes and procs
  • Use retry nsecs and retrycount ntimes to ensure
    job startup

18
Select Type of Data
  • Select either
  • MPI and user events
  • Hardware and OS profiles
  • Use full pathname for Data Collection directory

19
Hardware/OS Profiles
  • Select processes
  • Select routines
  • Select probes
  • Select HPM group

20
pct Profile Data
  • Start program from Application menu
  • After job completes, it writes netCDF output
    files basename.cdf.taskno
  • Files can be viewed with pvt application.

21
MPI Event Statistics
  • Select processes
  • Select routines
  • Select MPI events
  • Add user markers

22
MPI Event Data
  • Start program from Application menu
  • When job finishes, output written to AIX trace
    files named basename.xx, one per node.
  • AIX trace files can be large
  • Need to convert AIX trace files to UTE format
    using uteconvert utility

23
Performance Visualization Tool
  • Examine hardware/OS data that was collected using
    pct
  • To run
  • module load java
  • pvt basename.cdf.

24
Examining Data with pvt
  • Pick data to view from drop-down menu
  • Expand source and function listings with mouse

25
pvt Reports
  • Can generate many different reports
  • TLB miss report shown here
  • Limited to data group selection made when using
    pct to collect data

26
UTE Utilities
  • uteconvert
  • Converts AIX trace files to UTE interval trace
    files
  • utemerge
  • Merges multiple UTE files in a single UTE file
  • utestats
  • Generates statistics tables from UTE files
  • slogmerge
  • Converts and merges UTE files to SLOG files
    needed by Jumpshot
  • module load java mpe
  • Read /usr/common/usg/mpe/1.2.2/share/jumpshot-3/do
    c/TourStepByStep.pdf

27
Summary and Recommendations
  • HPMlib provides API to profile your code.
  • PE Benchmarker suite allows in-depth dynamic
    profiling with no code modification.
  • My personal recommendations.
  • Start with hpmcount and poe to profile entire
    code.
  • If want more granularity, use HPMlib to wrap
    portions of code to gather performance data.
  • PE Benchmarker may be good for intermediate to
    expert programmers with good knowledge of the
    hardware and performance metrics. May have steep
    learning curve for novice programmers.

28
More Information
  • NERSC Website http//hpcf.nersc.gov
  • HPM Toolkit
  • http//hpcf.nersc.gov/software/ibm/hpmcount/HPM_2_
    4_2.html
  • IBM PE Benchmarker manuals
  • http//hpcf.nersc.gov/vendor_docs/ibm/pe/am103mst2
    4.html
  • Compilers, general NERSC SP info
  • http//hpcf.nersc.gov/computers/SP/
Write a Comment
User Comments (0)
About PowerShow.com