Performance Monitoring Tools on TCS - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Performance Monitoring Tools on TCS

Description:

Operation counts, wall time, MFLOP rates. Cache utilization ratio. Study scalability ... More information available, e.g. message sizes, time variability, etc. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 19
Provided by: robert723
Category:

less

Transcript and Presenter's Notes

Title: Performance Monitoring Tools on TCS


1
Performance Monitoring Tools on TCS
  • Roberto Gomez and Raghu Reddy
  • Pittsburgh Supercomputing Center
  • David ONeal
  • National Center for Supercomputing Applications

2
Objective
  • Measure single PE performance
  • Operation counts, wall time, MFLOP rates
  • Cache utilization ratio
  • Study scalability
  • Time spent in MPI calls vs. computation
  • Time spent in OpenMP parallel sections

3
Atom Tools
  • atom(1)
  • Various tools
  • Low overhead
  • No recompiling or re-linking in some cases

4
Useful Tools
  • Flop2
  • Floating point operations count
  • Timer5
  • Wall time (inclusive exclusive) per routine
  • Calltrace
  • Detailed statistics of calls and their arguments
  • Developed by Dick Foster _at_ Compaq

5
Instrumentation
  • setenv ATOMTOOLPATH rreddy/Atom/Tools
  • nm g a.out awk if(5T) print 1 gt
    routines
  • Edit routines
  • place main routine first
  • remove unwanted ones.
  • Instrument executable
  • cat routines atom tool flop2 a.out
  • cat routines atom tool timer5 a.out
  • Execute a.out.flop2,timer5 to create fprof.
    and tprof.

6
Single PE Performance Analysis
Sample Timer5 output file
Procedure
Calls Self Time Total Time

null_evolnull_j_
3072 60596709 79880903
null_ethnull_d1_ 72458 45499161
45499161 null_hyper_unull_u_
3328 39889655 44500045
null_hyper_wnull_w_ 3328 19195271
33769541 ...
... ... ...

Total
1961226 248258934 248258934
7
Single PE Performance Analysis
Sample Flop2 output file
Procedure
Calls Fops

null_evolnull_j_ 3072 20406036288
null_ethnull_d1_ 72458
20220926518 null_hyper_unull_u_
3328 14062774258 null_hyper_wnull_w
_ 3328 3823795456
... ... ...

Total 1936818
70876179927
Obtain MFLOPS Fops/(Self Time)
8
MPI calltrace
  • setenv ATOMTOOLPATH rreddy/Atom/Tools
  • cat rreddy/Atom/mpicalls atom tool \
    calltrace a.out
  • Execute a.out.calltrace to generate one trace
    file per PE
  • Gather timings for desired MPI routines
  • Repeat for increasing number of processors

9
Sample calltrace statistics
Number of processors 8 PEs 128 PEs 256
PEs Processor grid 2x2x2 8x4x4
8x8x4 Total Run time 277.028 314.857
422.170 MPI_ISEND Statistics 1.250
1.498 2.265 MPI_RECV Statistics 4.349
19.779 26.537 MPI_WAIT Statistics
9.172 16.311 20.150 MPI_ALLTOALL Statistics
5.072 9.433 12.894 MPI_REDUCE Statistics
0.013 0.162 0.002 MPI_ALLREDUCE
Statistics 0.391 2.073 10.313 MPI_BCAST
Statistics 0.061 1.135
1.382 MPI_BARRIER Statistics 14.959 28.694
62.028 _________________________________________
___________ Total MPI Time 35.267
79.085 135.571
10
calltrace timings graph
11
DCPI
  • Digital Continuous Profiling Infrastructure
  • daemon and profiling utilities
  • Very low overhead (1-2)
  • Aggregate or per-process data and analysis
  • No code modifications
  • Requires interactive access to compute nodes

12
DCPI Example
  • Driver script e.g PBS
  • creates map file and host list
  • calls daemon and profiling scripts
  • Daemon startup script
  • starts daemon with selected options
  • Daemon shutdown script
  • halts daemon
  • Profiling script
  • executes post-processing utility with selected
    options

13
DCPI Driver Script
  • PBS job file
  • dcpi.pbs
  • Creates map file and host list
  • Image map generated by dcpiscan(1)
  • Host list used by dsh(1) commands
  • Executes daemon and profiling scripts
  • Start daemon, run test executable, stop daemon,
    post-process

14
DCPI Startup Script
  • C shell script
  • dcpi_start.csh
  • Three arguments defined by driver job
  • MAP, WORK, EXE
  • Creates database directory (DCPIDB)
  • Derived from WORK hostname
  • Starts dcpid(1) process
  • Events of interest are specified here

15
DCPI Stop Script
  • C shell script
  • dcpi_stop.csh
  • No arguments
  • dcpiquit(1) flushes buffers and halts the daemon
    process

16
DCPI Profiling Script
  • C shell script
  • dcpi_post.csh
  • Three arguments defined by driver job
  • MAP, WORK, EXE
  • Determines database location (as before)
  • Uses dcpiprof(1) to post-process database files
  • Profile selection(s) must be consistent with
    daemon startup options

17
Common DCPI Problems
  • Login denied (dsh)
  • Requires permission to login on compute nodes
  • Start the daemon in background
  • Set filemode of DCPIDB directory correctly
  • chmod 755 DCPIDB
  • Mismatches between startup configuration and
    profiling specs
  • See dcpid(1), dcpiprof(1), and dcpiprofileme(1)

18
Summary
  • Low-level interfaces provide access to hardware
    counters
  • Very effective, but requires experience
  • Minimal overhead costs
  • Report timings, flop counts, MFLOP rates for user
    code and library calls, e.g. MPI
  • More information available, e.g. message sizes,
    time variability, etc.
Write a Comment
User Comments (0)
About PowerShow.com