Pthreads Parallel Program Profiler aka PPPP aka P4 aka PiV - PowerPoint PPT Presentation

Loading...

PPT – Pthreads Parallel Program Profiler aka PPPP aka P4 aka PiV PowerPoint presentation | free to view - id: 18ca23-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Pthreads Parallel Program Profiler aka PPPP aka P4 aka PiV

Description:

Assume no extra work done by the app. Wallclock time: the 'time' command ... assume the waiter wakes up as soon as the signaller finishes signalling ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 21
Provided by: sitara8
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Pthreads Parallel Program Profiler aka PPPP aka P4 aka PiV


1
Pthreads Parallel Program Profiler aka PPPP aka
P4 aka PiV!
  • Comp422 project
  • Prabhu, Sitaram
  • with lots of help from Juan
  • 29 April 1999

2
A Profiler?
  • measures execution time
  • tries to identify performance bottlenecks
  • usually works at the level of functions
  • most are useful only for sequential programs

3
A parallel profiler?
  • Things we could do
  • charge profiling time to fine-grained units
  • use appropriate timing models
  • classify time into pthreads-specific notions

gcc -include wrap.h application.c -lwrap
export PPPPTPLBI a.out
4
Fine grained units
  • (option P) primitives
  • (option L) filename, linenumber
  • (option T) threads

define pthread_cond_wait(x...) \
WRAP_pthread_cond_wait(__FUNCTION__, __FILE__,
__LINE__, pthread_self(), x)
5
Timing Models
  • Assume no extra work done by the app.
  • Wallclock time the time command
  • useless beyond the roughest granularity
  • or produce ratios of the execution time with some
    base time scale.
  • 1. Cumulative time (default)
  • 2. Individual time (option I)

6
Timing models
1
2


W
3
4
C
I1
I0
I2
7
Experimental setup
  • Tiger mpa -sc sixplex -min 4
  • compiled with gcc -some-weird-flags
  • Application Gaussian elimination
  • matrix size 300x300
  • number of processors 4
  • Everything averaged over 10 runs

8
PPPPP
9
PPPPPT
10
(where are we?)
  • Things we could do
  • charge profiling time to fine-grained units
  • use appropriate timing models
  • classify time into pthreads-specific notions

11
Time classification
  • Option B… blockedness…
  • primitive overhead without blocking
  • blocking in pthread_mutex_lock
  • blocking in pthread_join
  • blocking in pthread_cond_wait for
  • receiving the signal
  • reacquiring the lock

12
pthread_mutex_lock
  • pthread_mutex_lock blocking vs overhead
  • overhead is when there's no blocking. so
  • pthread_mutex_trylock
  • .. to grab the lock if available, else to
    block.

13
pthread_join
  • pthread_join blocking vs overhead
  • pthread_create(..., start_routine, arg)
  • wrap_pthread_create(..., my_routine,
    (start_routine,arg) )
  • my_routine
  • (my_arg-gtstart_routine)(my_arg-gtarg)

14
pthread_cond_wait
  • pthread_cond_wait signal-waiting vs
  • lock-reacquire
  • simulate the signalling mechanism with a queue of
    waiting threads
  • assume the waiter wakes up as soon as the
    signaller finishes signalling
  • assume FIFO else itll be approximate...

15
PPPPPB
16
Interpretation
  • primitive overhead, overparallelizn (PLB)
  • overuse of pthread_create/join (PTIBgrep)
  • lock contention (PLB)
  • overall load imbalance (TB)
  • imbalance at a barrier (PTBgrep)
  • spotting stupid mistakes stronger semantics

17
MetaPiV
  • PiV is a parallel program (wow!)
  • so it needs synchronization
  • which incurs overhead
  • how does one measure that?
  • use PiV!! so.. MetaPiV.. reentrant!
  • Wrap all the pthreads calls that PiV makes, other
    than calls to original functions

18
PiV performance
19
Potential applications
  • parallel chess Darin et al. blip at 4 threads
  • performance debugging OpenMP compilers
  • called by a HPF compiler to make choices
  • 422 assignments grading!
  • and in general trying to optimize pthreads
    programs…

20
Future ideas
  • automated graphplotting
  • heuristics to spot performance bugs
  • measure actual CPU time that a process gets
  • to estimate the extra work
  • to use CPU idletime as a metric
  • standard deviation, in addition to total time
About PowerShow.com