APC523AST523 Scientific Computation in Astrophysics - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

APC523AST523 Scientific Computation in Astrophysics

Description:

N.b. an optimising compiler will ignore all of the floating point code (as it ... HP Wildebeest (WDB) version of GDB which is included in new versions of HPUX. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 61
Provided by: robert454
Category:

less

Transcript and Presenter's Notes

Title: APC523AST523 Scientific Computation in Astrophysics


1
APC523/AST523Scientific Computation in
Astrophysics
  • Lecture 4
  • Programming for scientific computation

2
Topics covered today
  • Computer Languages
  • Good Programming Style
  • Software Engineering for Gradstudents
  • Debugging
  • Optimization

3
1. Computer Languages
  • Primitive Languages
  • Compiled (normal) Languages
  • Interpreted (scripting) Languages

4
Primitive Languages
  • E.g. Machine Code Assembler
  • Forth Postscript
  • Require explicit instructions about
  • how to do everything
  • Extremely powerful, in the right hands
  • Very tedious to use

5
C
  • int main(void)
  • float x 1
  • float y 10
  • float z x y
  • return 0

N.b. an optimising compiler will ignore all of
the floating point code (as it isnt actually
used) and simply return 0
6
PPC Assembler
  • 0x2d08 0x3c003f80 N.b. 0x3f80 16256
  • 0x901e0020
  • 0x3c004120 N.b. 0x4120 16672
  • 0x901e001c
  • 0x2d18 0xc1be0020
  • 0xc01e001c
  • 0xec0d002a
  • 0xd01e0018
  • 0x2d08 lis r0,16256 r0 16256
    ltlt 16
  • stw r0,32(r30) r3032 r0
  • lis r0,16672 r0 16672
    ltlt 16
  • stw r0,28(r30) r3028 r0
  • 0x2d18 lfs f13,32(r30) f13
    r3032
  • lfs f0,28(r30) f0
    r3028
  • fadds f0,f13,f0 f0 f13
    f0
  • stfs f0,24(r30) r3024 f0

7
Forth/Postscript
  • 1 10
  • 1 10 add

N.b. No concept of variable or datatype
8
Compiled (Normal) languages
  • Fortran (IV 77 90 95)
  • C (KR 89 99)
  • C
  • Java (sort of)
  • Ada
  • Lisp
  • Modula (2, 3, Oberon)
  • Pascal

9
Characteristics of Compiled Languages
  • Good performance
  • Longish Edit-Compile-Link-Run cycle
  • Types defined (and checked) at compile time
  • Usually poor intrinsic support for datatypes
    beyond arrays

10
Interpreted (Scripting) languages
  • perl (4 5 6?)
  • python
  • TCL (7 8)
  • IDL
  • IRAF cl
  • MATLAB
  • lua
  • ruby
  • smalltalk

11
Characteristics of Interpreted Languages
  • Poor/Bad performance
  • Short Edit-Run cycle
  • Types often defined dynamically
  • Good intrinsic support for datatypes
  • Arrays dictionaries lists
  • Extensive libraries
  • FITS sql xml
  • Graphics (FITS viewers Tk)

12
ScriptingCompiled languages
  • Combine the best features of Scripting and
    Compiled languages
  • Write the compute-intensive parts in e.g. C
  • Write the rest in e.g. Python
  • Usually implemented via dynamical libraries
  • The interfaces can usually be machine-generated
    using a scripting language, or SWIG, (or emacs)

13
Object-Oriented Design
Once we move beyond forth and Fortran 77, we
naturally start packaging data into containers
rather than
define NPT 1000 float xNPT, yNPT, zNPT,
vxNPT, vyNPT, vzNPT
We write
typedef struct float x, y, z float vx,
vy, vz POWER_POINT POWER_POINT ptsNPT
(Actually, wed probably write something like
POWER_POINT pts malloc(nsizeof(POWE
R_POINT))
assert (pts ! NULL)
)
14
Is this Object-Oriented Design?
  • No

15
Object-Oriented Design
  • Associate code (methods) with data
  • Make usage of objects independent of (current)
    implementation
  • Design in terms of networks of objects, rather
    than sets of function calls
  • Support polymorphism where possible

16
Python
  • class actAtmosModel(object)
  • """Describe an atmospheric model"""
  • def __init__(self)
  • self.skyOpticalDepth_dry
    3actUtils.ACT_NO_VALUE
  • self.CT2 3actUtils.ACT_NO_VALUE
  • def make_atmos(atmosFile, band, hard, alt0, az0,
    rand)
  • """Make a model of the atmosphere, and
    project it onto the sky"""
  • atModel allocAndInitialize("atmos",
    actAtmosModel())
  • if atmosFile0 "gt"
  • atmosFile atmosFile1
  • size 100 size
    of atmospheric patch, (m)
  • pixscale 0.1 size
    of pixels in simulation (m)
  • npixel int(size/pixscale 0.5)
  • at actCalculateAtmosphere(npixel,
    pixscale, atModel.L_outer, rand)

17
C SWIG
actImage.h
actUtils.i
18
Which language should I choose?
Write the compute-intensive parts in a compiled
language (C, C, F90) For interactive code,
its best to use a scripting language. Such
languages have a short debug-rerun cycle It is
possible to mix languages.
Well discuss later how to know what should be in
C/Fortran, and what can be in any language you
like
19
2. Elements of Programming Style
  • Write code that expresses the algorithm
  • Design datatypes that either map naturally to the
    problem, or to the algorithm, or to both
  • Try to separate the program into as many
    self-contained parts as possible
  • Dont over-abstract the problem
  • Never believe that its just a quick hack

20
Elements of style (IIa)
  • Always declare variables, and add a comment if
    theyre non-trivial. If the language permits,
    initialise the variables where you declare them
  • Separate code from interfaces
  • Put all prototypes/typedefs in .h files
  • Always use module definitions
  • Document your APIs (function signatures)

21
/ Return the detected power for a given
pointing. The input model of the sky is assumed
to have been convolved with the beam / int
actGetSample( actFilter band,
// the band of interest const
actTelstate restrict pointing, // where are we
pointing? const actHardware restrict
hardware, // the PSFs, scales, array geometries
etc. const actSky restrict sky, //
sky model, convolved with the beam const
actAtmos restrict atmos, // atmosphere
model actArrayNoiseModel restrict nm,
// model of noise to add (or NULL) actRandom
restrict rand) // a source of
entropy (NULL if nm is NULL) assert (band
gt 0 band lt ACT_NBAND) assert (pointing
! NULL) assert (hardware ! NULL)
const actArrayGeom arr hardware-gtarrayGeomsban
d const int nrow arr-gtnrow
const int ncol arr-gtncol / Image
the sky with the array / double alt
0, az 0 // alt, az of a pixel in
the array assert (sky ! NULL sky-gtvalues
! NULL sky-gtbeam ! NULL sky-gtbeam-gtncomp
gt 0) const actWCS wcs sky-gtvalues00-gtw
cs assert (wcs ! NULL wcs-gttype
ACT_COE wcs-gtunit ACT_RADEC) const
float C sqrt(atmos-gtCP2band) //
atmos-gtvalues assumes that CT2 is 1 mK2 m-5/3
22
Elements of style (IIb)
  • Write comments as you go, documenting what the
    code is supposed to do

/ Find which pixel that peak lies in, so
(40.6, 40.5) --gt (40, 40) (adding 0.5 and
truncating is the wrong thing to do) /
rowc objc-gtcolorc-gtrowc colc
objc-gtcolorc-gtcolc
if(objc-gtcolorc-gtflags OBJECT1_DEBLENDED_AS_PS
F) int rad // radius to mask i rowc -
rmin if(rowc lt rmin) // something's rotten
in the state of the astrometry i 0
row sym-gtROWSi rad
0.5((rsize lt csize) ? rsize csize)

float x // a real variable called x
y my_function(x) // call my_function
with argument x j 10 // add 10 to j
23
Elements of style (IIc)
  • Good variable/function names are more useful than
    comments.
  • Protect your namespace
  • Use consistent formatting (whitespace )
  • It doesnt matter which editor you use,
    providing
  • It supports syntax colouring
  • It supports proper code indentation
  • Its name matches eacms4

24
3. Software engineering
  • We expect you to demonstrate knowledge of just
    use two tools
  • make
  • cvs (or svn)

25
Make
  • Youve probably typed
  • cc foo.c
  • or
  • f77 foo.f
  • and been surprised to see a file named a.out.
  • So you wrote a shell script
  • cc -o foo foo.c
  • or
  • !/bin/sh
  • cc -o 1 1.c

26
  • Then, after a while, you have make_foo
  • !/bin/sh
  • cc -c -g -O2 foo.c
  • cc -c -g -O2 goo.c
  • cc -o foo foo.o goo.o -lm
  • Used as
  • /bin/rm -f .o make_foo
  • We expect you to use make instead. Write a
    Makefile that looks something like
  • .c.o
  • (CC) -c (CFLAGS) .c
  • CC cc
  • CFLAGS -g -O2
  • LIBS -lm
  • OBJS foo.o goo.o
  • foo (OBJS)

27
More Makefile Boilerplate
  • PROGS foo
  • Update Makefile dependencies
  • -include .Makefile.depend
  • .PHONY depend
  • depend
  • _at_echo Rebuilding make dependencies
  • (CC) (CFLAGS) -MM (OBJS.o.c) gt
    .Makefile.depend
  • clean
  • (RM) (PROGS) .o core

28
Source Code Managers
  • There are three popular unix source code
    managers
  • cp/rsync
  • cvs
  • svn
  • We expect that youll use cvs or svn

29
Naïve Source Code Managers
  • People have various strategies to avoid
    catastrophes when working on code
  • Pray, and rely on system backups
  • Rsync copies of your code to geographically
    distributed localities
  • Make snapshots every day/week/month/year and
    apply one of the two previous methods.

The latter helps with I didnt change anything,
but my code stopped working
30
CVS
  • rsync users type
  • rsync -r musebackups
  • cvs users type
  • cvs ci

In the interests of full disclosure, once upon a
time they also had to type export
CVS_RSHssh export CVSROOTjeeves.astro.princeto
n.edu/u/cvs/src cvs import my_project v1_0
v1_0 cvs checkout my_project
There are many introductions to cvs on the web
the one that I recommend is http//www.astro.pri
nceton.edu/rhl/cvs/cvs-cookbook.html
31
Useful CVS commands
32
4. Debugging
  • There are two schools of thought on debugging
  • Sprinkle the code with print statements
  • Use a debugger
  • Sometimes the former approach is unavoidable,
    usually when youve violated the rules of the
    language, or if your favourite debugger is buggy.

33
GDB
My preferred debugger is gdb, with a nice,
powerful, command line interface and a limited
macro facility
  • Many people have implemented a graphical user
    interface on top of GDB, e.g.
  • Insight is a GUI for GDB written in tcl/tk.
  • DDD is a popular GUI for GDB and dbx (also xxgdb)
  • Code Medic is another GUI written for GDB.
  • kdbg is another GUI written for GDB, designed for
    KDE
  • HP Wildebeest (WDB) version of GDB which is
    included in new versions of HPUX.
  • GNU Visual Debugger written in Ada and uses the
    GtkAda graphical toolkit.
  • Jessie written in Java. Includes multi-thread and
    multi-process features.
  • RHIDE is yet another IDE, this one with a look
    and feel similar to the Borland 3.1 toolset.
  • (AST523 doesnt vouch for any of these)

34
Notes on using Debuggers
  • Youll have to compile with the -g flag, to
    include debugging information in the .o file
  • If you compile with -On, youll find it harder to
    see whats going on as the compiler works harder
  • Code is moved around, e.g out of loops
  • Flow-of-control may be confusing
  • Variables may not exist, or their values may be
    wrong (due to using registers)
  • Functions may not exists if theyve been inlined

35
Why do I use a Debugger?
Because it allows me to explore hypotheses about
what went wrong as the fancy strikes me.
  • I can look at anything, not just read the output
    from compiled-in print statements
  • print object-gtchild-gtpsfCounts
  • I can tell the program to stop wherever I think
    that it might be interesting
  • stop in estimate_entropy when S_in gt S_out

36
Debugging Memory Problems
Memory (stack or heap) brings its own problems
  • Corruption - you wrote where you shouldnt
  • Leaks - you failed to free memory that you were
    finished with
  • Not a problem for languages with Garbage
    Collection (e.g. python), but be careful when you
    mix languages with e.g. SWIG

37
Debugging Memory Problems
  • Most unix versions have debugging versions of the
    heap-management libraries that you can link to
    (or enable via environment variables).
  • There are tools such as purify (commercial) and
    valgrind that can be used to find leaks
  • Ive found specialised wrappers around malloc
    very useful in SDSS and Pan-STARRS they provide
    e.g. unique Ids for every memory transaction

38
Optimization
  • Only optimize code that needs to be optimised
  • Profile, dont guess, to find the bottlenecks
  • Improve algorithms before fiddling with code (but
    dont be lazy)
  • Trust the compiler to do a lot of the grunt work
  • Moores Law trumps writing assembler

39
Profiling
  • The standard unix profiler is gprof (there are
    also hardware specific profilers of which more
    anon)
  • Compile and link all your code with -pg
  • Run your masterpiece, which will produce a file
    called gmon.out
  • Run gprof masterpiece to generate the desired
    profile

40
Profiling with gprof
  • gprof produces two types of information
  • Statistics on the call stack for every function
    call
  • (this is done by special code inserted in all
    function prologues to save sp, which is why you
    compile with -pg)
  • What is happening every tick (0.01s)
  • (this is done by the CRTL code generating SIGPROF
    interrupts every tick so save sp, which is why
    you link with -pg)

41
Sample gprof Output
  • Flat profile
  • Each sample counts as 0.01 seconds.
  • cumulative self self
    total
  • time seconds seconds calls s/call
    s/call name
  • 64.91 484.05 484.05 1 484.05
    536.54 spatial_convolve
  • 13.21 582.54 98.49 1 98.49
    98.49 spreadMask
  • 7.04 635.04 52.50 19213 0.00
    0.00 make_kernel
  • 6.75 685.38 50.34 200 0.25
    0.27 getPsfCenters
  • 1.94 699.84 14.46 204 0.07
    0.10 getStampStats3
  • 1.49 710.95 11.11 1372 0.01
    0.01 xy_conv_stamp
  • 0.99 718.36 7.41
    main
  • 0.59 722.75 4.39 5906776 0.00
    0.00 ran1
  • 0.50 726.46 3.71 8266178 0.00
    0.00 get_background
  • 0.46 729.88 3.42 7 0.49
    0.49 fset
  • 0.45 733.26 3.38 238835 0.00
    0.00 checkPsfCenter
  • 0.32 735.66 2.40 4 0.60
    0.60 makeNoiseImage4
  • 0.29 737.80 2.14 1 2.14
    2.14 makeInputMask
  • 0.25 739.65 1.85 215 0.01
    0.01 sigma_clip

42
Sample gprof Output (II)
  • index time self children called name

  • ltspontaneousgt
  • 1 99.8 7.41 736.79 main
    1
  • 484.05 52.49 1/1
    spatial_convolve 2
  • 98.49 0.00 1/1
    spreadMask 3
  • 0.00 74.26 100/100
    buildStamps 4
  • 0.02 11.22 28/30
    fillStamp 8
  • 3.71 0.00 8266128/8266178
    get_background 11
  • 3.42 0.00 7/7
    fset 12
  • 2.40 0.00 4/4
    makeNoiseImage4 14
  • 0.02 2.26 2/2
    check_stamps 15
  • 2.14 0.00 1/1
    makeInputMask 16
  • 0.00 1.43 1/1
    fitKernel 19
  • 0.42 0.00 1/1
    getNoiseStats3 24
  • 0.28 0.12 4/204
    getStampStats3 7
  • 0.05 0.00 1/1
    hp_fits_write_subset 32
  • 0.01 0.00 1/215
    sigma_clip 17
  • 0.01 0.00 3/19213
    make_kernel 6
  • 0.00 0.00 209/209
    imin 45

(To Be Continued)
43
(continued)
  • index time self children called
    name
  • 484.05 52.49 1/1
    main 1
  • 2 71.9 484.05 52.49 1
    spatial_convolve 2
  • 52.49 0.00 19208/19213
    make_kernel 6
  • -----------------------------------------------
  • 98.49 0.00 1/1
    main 1
  • 3 13.2 98.49 0.00 1
    spreadMask 3
  • -----------------------------------------------
  • 0.00 74.26 100/100
    main 1
  • 4 10.0 0.00 74.26 100
    buildStamps 4
  • 50.34 3.38 200/200
    getPsfCenters 5
  • 14.18 6.02 200/204
    getStampStats3 7
  • 0.34 0.00 200/200
    cutStamp 25
  • -----------------------------------------------
  • 50.34 3.38 200/200
    buildStamps 4
  • 5 7.2 50.34 3.38 200
    getPsfCenters 5
  • 3.38 0.00 238835/238835
    checkPsfCenter 13
  • 0.00 0.00 28/28
    quick_sort 47
  • -----------------------------------------------

44
Profiling with vendor tools
Most modern processors have hardware counters
that keep track of instructions executed,
floating point operations completed, cache hits,
etc. Accessing this information using requires a
profiling software provided (sold!) by the chip
manufacturer. Advantage can provide much more
detailed information about performance Example
SpeedShop on SGI Origin and Altix machines
45
Summary for execution of athena -i
../tst/2D-mhd/athinput.linear-wave
time/tlim0.2
Based on 400
MHz IP35
MIPS R12000/R14000 CPU

Typical
Minimum Maximum Event Counter Name
Counter Value
Time (sec) Time (sec) Time (sec)


0 Cycles......................................
................ 1332354800 3.330887
3.330887 3.330887 16 Executed prefetch
instructions..............................
676432 0.000000 0.000000 0.000000 21
Graduated floating point instructions.............
.......... 502607856 1.256520 0.628260
65.339021 2 Decoded loads.....................
.......................... 448868464
1.122171 1.122171 1.122171 18 Graduated
loads.............................................
443549696 1.108874 1.108874
1.108874 3 Decoded stores........................
...................... 252710176 0.631775
0.631775 0.631775 19 Graduated
stores............................................
252191312 0.630478 0.630478
0.630478 4 Miss handling table
occupancy...............................
135901872 0.339755 0.339755
0.339755 25 Primary data cache misses.............
...................... 10219840 0.217172
0.055443 0.217172 24 Mispredicted
branches.......................................
8901440 0.162006 0.133522
0.196054 6 Resolved conditional
branches...............................
63185632 0.157964 0.157964
0.157964 22 Quadwords written back from primary
data cache.............. 11542880
0.114852 0.090612 0.114852 26 Secondary
data cache misses.................................
44032 0.010996 0.006938
0.010996 9 Primary instruction cache
misses............................ 56768
0.002414 0.000616 0.002414 7 Quadwords
written back from scache..........................
46080 0.000978 0.000680
0.001010 23 TLB misses............................
...................... 3264 0.000635
0.000635 0.000635 10 Secondary instruction
cache misses..........................
240 0.000060 0.000038 0.000060 31
Store/prefetch exclusive to shared block in
scache.......... 12192 0.000030
0.000030 0.000030 30 Store/prefetch exclusive
to clean block in scache........... 288
0.000001 0.000001 0.000001 1 Decoded
instructions......................................
.. 1708899584 0.000000 0.000000
4.272249 5 Failed store conditionals.............
...................... 0 0.000000
0.000000 0.000000 8 Correctable scache
data array ECC errors....................
0 0.000000 0.000000 0.000000 11
Instruction misprediction from scache way
prediction table.. 3824 0.000000
0.000000 0.000010 12 External
interventions.....................................
. 5616 0.000000 0.000000
0.000000 13 External invalidations................
...................... 21712 0.000000
0.000000 0.000000 14 ALU/FPU progress
cycles.....................................
0 0.000000 0.000000 0.000000 15
Graduated instructions............................
.......... 1605826832 0.000000 0.000000
4.014567 17 Prefetch primary data cache
misses.......................... 88256
0.000000 0.000000 0.000221 20 Graduated
store conditionals................................
0 0.000000 0.000000
0.000000 27 Data misprediction from scache way
prediction table......... 77648
0.000000 0.000000 0.000194 28 State of
intervention hits in scache.......................
. 5520 0.000000 0.000000
0.000000 29 State of invalidation hits in
scache........................ 5008
0.000000 0.000000 0.000000
46
Statistics

Graduated instructions/cycle.....................
...........................
1.205255 Graduated floating point
instructions/cycle................................
. 0.377233 Graduated loads
stores/cycle......................................
........ 0.522189 Graduated loads
stores/floating point instruction.................
........ 1.384262 Mispredicted
branches/Resolved conditional branches............
............ . 0.140878 Graduated loads
/Decoded loads ( and prefetches
)...........................
0.986664 Graduated stores/Decoded
stores............................................
. 0.997947 Data mispredict/Data scache
hits............................................
0.007631 Instruction mispredict/Instruction
scache hits..............................
0.067648 L1 Cache Line Reuse......................
...................................
67.077485 L2 Cache Line Reuse.....................
....................................
231.100291 L1 Data Cache Hit Rate.................
.....................................
0.985311 L2 Data Cache Hit Rate...................
...................................
0.995692 Time accessing memory/Total
time............................................
0.590880 Time not making progress (probably
waiting on memory) / Total time..........
1.000000 L1--L2 bandwidth used (MB/s, average per
process)...........................
153.629036 Memory bandwidth used (MB/s, average
per process)...........................
1.913417 MFLOPS (average per process).............
...................................
150.893097 Cache misses in flight per cycle
(average)..................................
0.102001 Prefetch cache miss rate.................
................................... 0.130473
47
Improving Algorithms Sorting
  • Bubble sort n2
  • Insertion sort n2
  • Shell sort n3/2
  • Quick sort n ln(n)
  • Heap sort n ln(n)
  • Radix sort n
  • Stupid sort n!

48
Improving Algorithms Astronomy
  • Consider a CCD with a few bad pixels.

If I want to ask Is this pixel bad?, an
unsigned char mask might be a good
representation If I want to return all of the
bad pixels, a struct int x, y
badpixels might be a good representation
49
Lets look at another example
  • include ltstdlib.hgt
  • include ltstdio.hgt
  • include ltmath.hgt
  • include "alias.h"
  • bool AST523_calc_trajectory(
  • AST523_TRAJECTORY traj, //
    object's trajectory
  • float height0, // height
    above ground where egg was released m
  • float vel0) //
    initial velocity of egg m/s
  • const float g 9.81 //
    acceleration due to gravity m/s2
  • for (int i 0 i lt traj-gtnpt i)
  • float t traj-gttimei
  • traj-gtheighti height0 vel0t -
    gpow(t,2)/2
  • return (traj-gtheighttraj-gtnpt - 1 lt 0) ?
    true false

50
alias.h
  • if !defined(ALIAS_H)
  • define ALIAS_H
  • include ltstdbool.hgt
  • typedef struct
  • float time // time
    since egg was thrown
  • float height // height
    of egg at time time
  • int npt //
    dimension of height,time
  • AST523_TRAJECTORY
  • bool AST523_calc_trajectory(
  • AST523_TRAJECTORY traj, //
    object's trajectory
  • float height0, //
    initial height above ground m
  • float vel0) //
    object's initial velocity m/s
  • endif

51
What does gprof tell us?
gprof --line egg_toss
  • Each sample counts as 0.01 seconds.
  • cumulative self self
    total
  • time seconds seconds calls Ts/call
    Ts/call name
  • 46.01 2.77 2.77
    AST523_calc_trajectory (alias.c16)
  • 27.79 4.43 1.67
    main (main.c39)
  • 20.05 5.64 1.21
    AST523_calc_trajectory (alias.c14)
  • 5.32 5.96 0.32
    main (main.c38)
  • 0.83 6.01 0.05
    AST523_calc_trajectory (alias.c15)
  • 0.00 6.01 0.00 1 0.00
    0.00 AST523_calc_trajectory (alias.c11)
  • 0.00 6.01 0.00 1 0.00
    0.00 trajDel (main.c24)
  • 0.00 6.01 0.00 1 0.00
    0.00 trajNew (main.c12)

alias.c14 for (int i 0 i lt traj-gtnpt i)
alias.c15 float t traj-gttimei alias.c
16 traj-gtheighti height0 vel0t -
gpow(t,2)/2 alias.c17
52
  • 0x804869d alias.c14 mov (edi),edx
  • 0x804869f alias.c14 xor esi,esi
  • 0x80486a1 alias.c14 cmp 0x0,edx
  • 0x80486a4 alias.c14 jmp 0x80486e8
    ltalias.c14gt
  • 0x80486a6 alias.c14 mov esi,esi
  • 0x80486a8 alias.c15 mov 0x4(edi),eax
  • 0x80486ab alias.c15 flds (eax,esi,4)
  • 0x80486ae alias.c16 flds 0x10(ebp)
  • 0x80486b1 alias.c16 mov 0x8(edi),ebx
  • 0x80486b4 alias.c16 fmul st(1),st
  • 0x80486b6 alias.c16 push 0x40000000
  • 0x80486bb alias.c16 fadds 0xc(ebp)
  • 0x80486be alias.c16 push 0x0
  • 0x80486c0 alias.c16 sub 0x8,esp
  • 0x80486c3 alias.c16 fstpl 0xffffffe0(ebp)
  • 0x80486c6 alias.c16 fstpl (esp)
  • 0x80486c9 alias.c16 call 0x8048450
    lt_init88gt

53
Each sample counts as 0.01 seconds.
cumulative self self total
time seconds seconds calls
Ts/call Ts/call name 53.60 2.78
2.78
AST523_calc_trajectory (alias.c16) 32.67
4.47 1.69 main
(main.c39) 5.91 4.78 0.31
AST523_calc_trajectory
(alias.c14) 3.68 4.97 0.19
AST523_calc_trajectory
(alias.c15) 3.10 5.13 0.16
main (main.c38 _at_ 804868c)
1.45 5.20 0.08
main (main.c38 _at_ 8048669) 0.00 5.20
0.00 1 0.00 0.00
AST523_calc_trajectory (alias.c11)
Make the obvious change
mov (edi),edx inc esi add
0x10,esp cmp esi,edx jg 0x80486a8
ltalias.c15gt
inc esi cmp 0xffffffe8(ebp),esi jl
0x8048704 ltalias.c15gt
alias.c13 const int npt traj-gtnpt //
dealias traj-gtnpt alias.c14 for (int i 0 i lt
npt i) alias.c15 float t
traj-gttimei alias.c16 traj-gtheighti
height0 vel0t - gpow(t,2)/2 alias.c17
54
Back to the previous example
gprof --line hotpants
  • Flat profile
  • Each sample counts as 0.01 seconds.
  • cumulative self self
    total
  • time seconds seconds calls ns/call
    ns/call name
  • 10.95 81.64 81.64
    spreadMask (functions.c1372)
  • 6.39 129.29 47.65
    spatial_convolve (alard.c1287)
  • 5.67 171.58 42.29
    spatial_convolve (alard.c1295)
  • 5.26 210.81 39.23
    spatial_convolve (alard.c1298)
  • 4.97 247.84 37.03
    spatial_convolve (alard.c1301)
  • 4.93 284.61 36.77
    spatial_convolve (alard.c1294)
  • 4.93 321.38 36.77
    spatial_convolve (alard.c1293)
  • 4.29 353.38 32.01
    spatial_convolve (alard.c1291)

55
functions.c
  • functions.c1372
  • mDataiirPixXjj FLAG_OK_CONV(!(mDataiir
    PixXjj

  • FLAG_INPUT_ISBAD))

(gdb) b functions.c1372 (gdb) continue (gdb)
x/16i pc 0x8059d93 imul eax,ecx 0x8059d96
add 0xffffffec(ebp),ecx 0x8059d99 mov
0x8(ebp),eax 0x8059d9c mov
(eax,ecx,4),eax 0x8059d9f test
al,al 0x8059da1 mov ecx,0xffffffe4(ebp) 0x
8059da4 mov eax,0xffffffe0(ebp) 0x8059da7
js 0x8059dad ltfunctions.c1372gt 0x8059da9
orl 0x40,0xffffffe0(ebp) 0x8059dad mov
0xffffffe0(ebp),ecx 0x8059db0 mov
0xffffffe4(ebp),esi 0x8059db3 mov
0x8(ebp),eax 0x8059db6 mov
ecx,(eax,esi,4) 0x8059db9 mov
0x818a6d4,ecx 0x8059dbf mov
ecx,0xffffffe8(ebp) 0x8059dc2 mov
0x818a6e8,ecx
globals.hint rPixX
56
Lets help the compiler
  • Make those globals local, and

0x8059da8 imul 0xffffffe4(ebp),eax 0x8059dac
add 0xffffffe8(ebp),eax 0x8059daf mov
0xfffffff0(ebp),ecx 0x8059db2 mov
eax,0xffffffd8(ebp) 0x8059db5 mov
(ecx,eax,4),eax
57
alard.c
  • alard.c1287
  • for (ic i - hwKernel ic lt i hwKernel
    ic)

globals.hint hwKernel
58
Rerun gprof
  • Flat profile
  • Each sample counts as 0.01 seconds.
  • cumulative self self
    total
  • time seconds seconds calls s/call
    s/call name
  • 62.42 432.09 432.09 1 432.09
    477.76 spatial_convolve
  • 14.88 535.07 102.98 1 102.98
    102.98 spreadMask

Cf. the old results Each sample counts as 0.01
seconds. cumulative self
self total time seconds
seconds calls s/call s/call name
64.91 484.05 484.05 1 484.05
536.54 spatial_convolve 13.21 582.54
98.49 1 98.49 98.49 spreadMask
59
Why didnt that help with
spreadMask?
  • for (l -w2 l lt w2 l)
  • jj j l
  • if (jj lt 0 jj gt rPixY_l)
  • continue
  • mDataiirPixX_ljj FLAG_OK_CONV(!(mDa
    taiirPixX_ljj

  • FLAG_INPUT_ISBAD))


mDataiirPixX_ljj FLAG_OK_CONV(!(mDataiir
PixX_ljj
FLAG_INPUT_ISBAD))
Memorys being addressed in the wrong order
60
Moral Lessons
  • Understand your computer and your languages
  • Dont be sloppy think!
  • Go forth and multiply
Write a Comment
User Comments (0)
About PowerShow.com