Code Optimization

About This Presentation

Title:

Code Optimization

Description:

Repeat (indefinitely) 4. gprof UNIX performance tool. Compile and link ... timer: repeat ... count mallocs. Scientific Performance Method: Measure. Fix. Repeat ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 12

Provided by: orionl7

Category:

more less

Transcript and Presenter's Notes

Title: Code Optimization

1
Code Optimization

Orion Sky Lawlor
olawlor_at_uiuc.edu
2003/9/17

2
Roadmap

Introduction
gprof
Timer calls
Understanding Performance

3
Introduction

Scientific Performance Method
Measure (dont assume!)
Find the bottlenecks in the code
They arent where you expect!
Fix the worst problems first
Consider stopping-- is it good enough?
Fix
Improve algorithms first
Improve implementations second
Repeat (indefinitely)

4
gprof UNIX performance tool

Compile and link with -pg flag
Adds instrumentation to code
Run serial program normally
Instrumentation runs automatically
Run gprof to analyze trace
gprof pgm gmon.out
Shows a function-level view of execution time
Heisenberg measurement error!

5
Timer Calls

CPU time (virtual time)
Time spent running your code
CmiCpuTimer() 10ms resolution
Wall clock time (real time)
Includes OS interference, network delays,
context-switching overhead
CmiWallTimer() to 1ns resolution
This is what really counts

6
How to call the timer one time

Whats wrong with this?
double sCmiWallTimer()
foo()
double eCmiWallTimer()-s
CkPrintf(foo took fs\n,e)
If CmiWallTimer takes 100ns, and foo takes 50ns,
this may print 150ns!
Only a problem for very fast functions (or slow
timers!)

7
How to call the timer repeat

Repetition can decrease apparent timer overhead
and increase resolution
const int n1000
double sCmiWallTimer()
for (i0iltni) foo()
double e(CmiWallTimer()-s)/n
CkPrintf(foo took fs\n,e)
Problem what if foos performance is
cache-sensitive?

8
Understanding Performance CPU
GHz 1ns
Integer Arithmetic
Floating Point
Branches
Cache
10ns
(int)x
Subroutine
Memory
/
/ or
100ns
inline
MHz 1us
Bitshifts and masks
(int )x
10us
100us
Cache-friendly algorithms
KHz 1ms
9
Understanding Performance OS
GHz 1ns
Timer
10ns
sin, tan,...
100ns
Malloc
Syscall
MHz 1us
10us
Tables, identities
Reuse buffers
100us
Avoid
KHz 1ms
Disk
Timeslice
10
Understanding Performance Net
GHz 1ns
10ns
Message Combining
Rethink
100ns
Probe
MHz 1us
RDMA
Barrier
Message
10us
100us
KHz 1ms
11
Conclusions

Performance is the whole point of parallel
programming
painful but necessary
Asymptotics matter find O(n)
Constants matter count mallocs
Scientific Performance Method
Measure
Fix
Repeat

Write a Comment

User Comments (0)

About PowerShow.com

Code Optimization - PowerPoint PPT Presentation

Code Optimization

Repeat (indefinitely) 4. gprof UNIX performance tool. Compile and link ... timer: repeat ... count mallocs. Scientific Performance Method: Measure. Fix. Repeat ... – PowerPoint PPT presentation