A Smarter Paradyn Performance Consultant: Combining a Call GraphBased Search with Stack Sampling - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

A Smarter Paradyn Performance Consultant: Combining a Call GraphBased Search with Stack Sampling

Description:

August 9, 2000. 2000 Barton P. Miller. A Smarter Paradyn Performance Consultant: ... Uses two main Paradyn technologies. Dynamic instrumentation ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 38
Provided by: Para7
Category:

less

Transcript and Presenter's Notes

Title: A Smarter Paradyn Performance Consultant: Combining a Call GraphBased Search with Stack Sampling


1
A Smarter Paradyn Performance Consultant
Combining a Call Graph-Based Search with Stack
Sampling
  • Barton Miller and Philip Roth
  • bart,pcroth_at_cs.wisc.edu
  • Computer Sciences Department
  • University of Wisconsin
  • Madison, WI 53706-1685
  • USA

2
Paradyn
  • Uses two main Paradyn technologies
  • Dynamic instrumentation
  • Automated bottleneck search (Performance
    Consultant)
  • The PC has been effective for both novices and
    experts . . .
  • . . . and our new Call Graph based search is a
    definite win.
  • Underlying theme automate the techniques of an
    experienced programmer.
  • But we can do better using sampling data to
    speed search.

3
Paradyn Basics Resource Hierarchies
Thread1
Process1
printstatus
Host1
Thread1
Process2
debugA
Machine
testutil.C
Thread1
Host2
Process1
debugB
Thread2
main.C
main
Code
Barrier
b1
c1
t1
vectinsert
Message
c2
vect.C
SyncObject
vectdelete
Semaphore
sem1
vectsize
SpinLock
spin1
4
Paradyn Basics Resource Hierarchies
Thread1
Process1
printstatus
Host1
Thread1
Process2
debugA
Machine
testutil.C
Thread1
Host2
Process1
debugB
Thread2
main.C
main
Code
Barrier
b1
c1
t1
vectinsert
Message
c2
vect.C
SyncObject
vectdelete
Semaphore
sem1
vectsize
SpinLock
spin1
Example focus /Code/testutil.C/printstatus
, /Machine, /SyncObject
5
Paradyn Basics Resource Hierarchies
Thread1
Process1
printstatus
Host1
Thread1
Process2
debugA
Machine
testutil.C
Thread1
Host2
Process1
debugB
Thread2
main.C
main
Code
Barrier
b1
c1
t1
vectinsert
Message
c2
vect.C
SyncObject
vectdelete
Semaphore
sem1
vectsize
SpinLock
spin1
Example focus /Code/testutil.C/printstatus
, /Machine/Host1/Process1, /SyncObject
6
Paradyn Basics Performance Metrics
  • Metrics are measurable performance
    characteristics such as CPU time, function
    calls, I/O bytes transferred, L2 cache misses
  • Performance data collected for metric/focus pair
  • Example metric/focus pairs
  • cpu /Code/mod1/func1
  • msgs /Code/mod1/func1, /Machine/host1/proc4/thre
    ad2, /SyncObject/Message/1/0

7
Performance Consultant Basics
  • Why is the application running slowly?
  • Test bottleneck hypotheses
  • CPU Bound?
  • I/O Wait Bound?
  • Synchronization Wait Bound?
  • Memory Bound?
  • Performance metric associated with each
    hypothesis
  • Which part of the application is slow?
  • Isolates bottleneck to part of resource
    hierarchy

8
Call Graph Based Performance Consultant
  • Based on applications call graph
  • Code hierarchy search starts at function main,
    search continues to mains children
  • Advantages Lots!
  • Its Scalable Natural hierarchical refinement
    from course grained search to fine grained search
  • Uses less costly inclusive metrics
  • Functions which are not part of call graph will
    never be instrumented

9
Call Graph Based PC Example
Top Level Hypothesis
SyncWaitBound
CPUBound
I/OWaitBound
10
Call Graph Based PC Example
Top Level Hypothesis
SyncWaitBound
CPUBound
I/OWaitBound
main
11
Call Graph Based PC Example
Top Level Hypothesis
SyncWaitBound
CPUBound
I/OWaitBound
main
a1
a2
a3
a4
12
Call Graph Based PC Example
Top Level Hypothesis
SyncWaitBound
CPUBound
I/OWaitBound
main
a1
a3
a2
a4
13
Call Graph Based PC Example
Top Level Hypothesis
SyncWaitBound
CPUBound
I/OWaitBound
main
a1
a3
a2
b1
a4
b2
b3
14
Call Graph Based PC Example
Top Level Hypothesis
SyncWaitBound
CPUBound
I/OWaitBound
main
a1
a3
a2
a4
b1
b2
b3
15
Call Graph Construction
  • Problem Cannot determine targets of calls using
    function pointers and virtual functions.
  • Unknown callees in static call graph may cause
    blind spots in new PC search
  • We resolve dynamic callee addresses at run time
  • Strategy
  • Build static call graph at program start
  • Fill in dynamic call graph on demand.

16
New Performance Consultant Enhancements
  • Hybrid Search Strategy
  • Stack sampling to find functions close to
    actual bottlenecks (deep starters)
  • Call graph search to cover rest of search space
  • Bi-directional Search - refine searches both
    upward and downward in call graph
  • Stack sampling data
  • Comes from current Paradyn stack-walks
  • Collected each time Paradyn tries to insert or
    remove instrumentation.

17
Enhancement Benefits
  • Finds bottlenecks hidden by a strict call graph
    search

A
B
C
D
E
  • Finds bottlenecks more quickly and efficiently

18
Choosing Deep Starters
  • Function counts kept in Graph

C
D
B
E
F
A
G
  • For each subgraph of the graph whose node counts
    are above threshold, use the node furthest from
    the root node of the graph

19
Search Algorithms
  • Deep Start
  • When starting search in the call graph...
  • add deep starters at high priority
  • add root of hierarchy at normal priority
  • Has benefits of deep start search, but wont miss
    bottlenecks due to statistical nature of sampling

20
Deep Start Example
  • Call graph

CPUBound
main
time_step
density
factor
21
Deep Start Example
  • Deep start

CPUBound
main
time_step
22
Deep Start Example
  • Deep start

CPUBound
main
factor
time_step
23
Deep Start Example
  • Deep start

CPUBound
main
factor
time_step
density
factor
24
Search Algorithms (cont.)
  • Bi-directional Search
  • Extends Deep Start to search upward in call graph
  • Add upward nodes from deep starters at medium
    priority
  • Prune upward search based on residual metric
    values

25
Upward Pruning
density
factor
26
Preliminary Results
  • 8-node Pentium-III laptop cluster

27
Conclusion
  • Call graph based search strategy has been highly
    effective.
  • As programs scale in size, there are many
    opportunities for improvement.
  • New version of PC available in Paradyn 3.2 (next
    week!)
  • Lots of ongoing experiments.
  • http//www.cs.wisc.edu/paradyn

28
Performance Results
29
Retroactive Instrumentation
  • Problem Find CPU Time for a function if we are
    executing in one of its children.
  • When do we start the timer for the entry to
    function?
  • Need mechanism to trigger instrumentation code.
  • Retroactive instrumentation walks stack,
    triggering outstanding timers

30
Dynamic Call Sites
  • Characterized by keeping the address of a callee
    in a register or memory location
  • New type of instrumentation necessary to
    determine callee
  • Examples

31
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
CodeGenerator
PerformanceConsultant
Notifier
ParadynFront-End
ParadynDaemon
Application
32
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
1. PC requests instrument call sites of foo.
CodeGenerator
PerformanceConsultant
Notifier
ParadynFront-End
ParadynDaemon
Application
33
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
2. Daemon instruments call sites of foo.
CodeGenerator
PerformanceConsultant
Notifier
ParadynFront-End
ParadynDaemon
Application
34
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
A. Ap executes call site, notifies daemon.
CodeGenerator
PerformanceConsultant
Notifier
ParadynFront-End
ParadynDaemon
Application
35
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
B. Daemon notifies PC that foo called bar.
2.
1.
CodeGenerator
PerformanceConsultant
Notifier
A.
ParadynFront-End
ParadynDaemon
Application
36
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
C. PC requests inclusive-time metric for bar.
2.
1.
CodeGenerator
PerformanceConsultant
B.
Notifier
A.
ParadynFront-End
ParadynDaemon
Application
37
Dynamic Call Site Instrumentation
main() fpbarfoo() (fp)()bar()
. . .
D. Daemon instruments bar.
2.
1.
CodeGenerator
PerformanceConsultant
C.
B.
Notifier
A.
ParadynFront-End
ParadynDaemon
Application
Write a Comment
User Comments (0)
About PowerShow.com