Title: Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems
1 Targeted Path Profiling Lower Overhead Path
Profiling for Staged Dynamic Optimization Systems
- Rahul Joshi, UIUC
- Michael Bond, UT Austin
- Craig Zilles, UIUC
2Path information is useful
- Enlarges scope of optimizations
- Superblock formation
- Hyperblock formation
- Improves other optimizations
- Code scheduling and register allocation
- Dataflow analysis
- Software pipelining
- Code layout
- Static branch prediction
3Overhead vs. accuracy
Edge profiling (SPEC 95 INT)
4Overhead vs. accuracy
Ball-Larus path profiling (SPEC 2000 INT)
Edge profiling (SPEC 95 INT)
5Overhead vs. accuracy
Ball-Larus path profiling (SPEC 2000 INT)
Targeted path profiling (SPEC 2000 INT)
Edge profiling (SPEC 95 INT)
6Overhead vs. accuracy
Ball-Larus path profiling (SPEC 2000 INT)
Profile-guided profiling
Targeted path profiling (SPEC 2000 INT)
Edge profiling (SPEC 95 INT)
7Outline
- Background
- Staged dynamic optimization and
profile-guided profiling - Ball-Larus path profiling
- Opportunities for reducing overhead
- Targeted path profiling
- Results
- Overhead and accuracy
8Staged dynamic optimization
Stage 0
Static optimizations
9Staged dynamic optimization
Stage 0
Static optimizations
Edge profile
Hardware edge profiler
10Staged dynamic optimization
Stage 0
Stage 1
Static optimizations
Local Optimizations (code layout)
Edge profile
Hardware edge profiler
11Staged dynamic optimization
Stage 0
Stage 1
Static optimizations
Local Optimizations (code layout)
Edge profile
Path profiling instrumentation
Hardware edge profiler
12Staged dynamic optimization
Stage 0
Stage 1
Static optimizations
Local Optimizations (code layout)
Edge profile
Path profile
Path profiling instrumentation
Hardware edge profiler
13Staged dynamic optimization
Stage 0
Stage 2
Stage 1
Static optimizations
Local Optimizations (code layout)
Global Optimizations (superblock formation)
Edge profile
Path profile
Path profiling instrumentation
Hardware edge profiler
14Profile-guided profiling
Stage 0
Stage 2
Stage 1
Static optimizations
Local Optimizations (code layout)
Global Optimizations (superblock formation)
Path profile
Edge profile
Path profiling instrumentation
Hardware edge profiler
15Ball-Larus path profiling
- Acyclic, intraprocedural paths
- Handles cyclic CFGs
- Paths end at loop back edges
- Each path computes unique integer
16Ball-Larus path profiling
A
C
B
D
F
E
G
17Ball-Larus path profiling
- 4 paths
- Each path computes unique integer
A
2
C
B
D
1
F
E
G
18Ball-Larus path profiling
- 4 paths
- Each path computes unique integer
- Path 0
A
2
C
B
D
1
F
E
G
19Ball-Larus path profiling
- 4 paths
- Each path computes unique integer
- Path 0
- Path 1
A
2
C
B
D
1
F
E
G
20Ball-Larus path profiling
- 4 paths
- Each path computes unique integer
- Path 0
- Path 1
- Path 2
A
2
C
B
D
1
F
E
G
21Ball-Larus path profiling
- 4 paths
- Each path computes unique integer
- Path 0
- Path 1
- Path 2
- Path 3
A
2
C
B
D
1
F
E
G
22Ball-Larus path profiling
- r path register
- count array of path frequencies
r0
A
rr2
C
B
D
rr1
F
E
G
countr
23Overhead in Ball-Larus path profiling
24Overhead in Ball-Larus path profiling
- Opportunities for reducing overhead?
- When there are many paths
- When edge profile gives perfect path profile
25Routines with many paths
- Many possible paths
- Exponential in number of edges
- Cant use array of counters
- Number of taken paths small
- Ball-Larus uses hash table
- Hash function call expensive
- Hashed path 5 times overhead
26Edge profile gives perfect path profile
27Edge profile gives perfect path profile
28Edge profile gives perfect path profile
- An obvious path contains an edge that is only on
that path - Path uniquely identified by edge
- Path freq edge freq
- If all paths obvious, edge profile gives perfect
path profile
29Outline
- Background
- Staged dynamic optimization and
profile-guided profiling - Ball-Larus path profiling
- Opportunities for reducing overhead
- Targeted path profiling
- Results
- Overhead and accuracy
30Targeted path profiling
- Profile-guided profiling
- Use existing edge profile
- Exploits opportunities for reducing overhead
- When there are many paths
- Remove cold edges
- When edge profile gives perfect path profile
- Dont instrument obvious routines and loops
31Removing cold edges
- Examine relative execution frequency of each
branch - if (relFreq lt threshold)
- edge is cold
3
97
32Removing cold edges
- Examine relative execution frequency of each
branch - if (relFreq lt threshold)
- edge is cold
40
60
3
97
100
0
3
97
50
50
33Removing cold edges
- Examine relative execution frequency of each
branch - if (relFreq lt threshold)
- edge is cold
40
60
3
97
100
0
3
97
50
50
34Removing cold edges
- A path that contains a cold edge is a cold path
- Removing an edge may halve number of paths
40
60
3
97
100
0
50
50
35Removing cold edges
- A path that contains a cold edge is a cold path
- Removing an edge may halve number of paths
- Number of paths 16 ? 4
40
60
97
100
50
50
36Removing cold edges
- A path that contains a cold edge is a cold path
- Removing an edge may halve number of paths
- Number of paths 16 ? 4
- Goal hashed ? non-hashed
40
60
97
100
50
50
37Removing cold edges
- Remaining paths potentially hot
- 4 paths ? 0, 3
2
1
38Removing cold edges
r0
- Remaining paths potentially hot
- 4 paths ? 0, 3
rr2
rr1
countr
39Removing cold edges
r0
rr2
rr1
countr
40Removing cold edges
r0
- What if cold edge taken?
- Cold edges poison path
rr2
rpoison
rpoison
rr1
countr
41Removing cold edges
r0
- What if cold edge taken?
- Cold edges poison path
- Instrumentation checks for poisoned path
rr2
rpoison
rpoison
rr1
if (r poisoned) cold_counter else countr
42Checking for poison
if (r poisoned) cold_counter else countr
43Obvious routines
- All paths obvious
- We dont instrument obvious routines
- Edge profile gives perfect path profile
44Obvious loops
- Loop with obvious body
- Dont instrument obvious loops with high
average trip counts - Edge profile yields high-accuracy path profile
45Obvious loops
- Loop with obvious body
- Dont instrument obvious loops with high
average trip counts - Edge profile yields high-accuracy path profile
46Summary of our techniques
- Remove cold edges
- Eliminates many cold paths
- Count paths with array (instead of hash table)
- Dont instrument obvious routines and loops
- Edge profile derives path profile
47Outline
- Background
- Staged dynamic optimization and
profile-guided profiling - Ball-Larus path profiling
- Opportunities for reducing overhead
- Targeted path profiling
- Results
- Overhead and accuracy
48Implementation
- Static profiling
- PP tool for path profiling
- TPP tool for targeted path profiling
- Tools instrument native SPARC executables
- SPEC 95 ref
- SPEC 2000 ref
49Results SPEC 2000 INT
50Where does benefit come from?
- Cold path elimination alone 60
- Add obvious path elimination 40
- Little benefit from obvious path elimination alone
51Related work
- Dynamo Bala et al. 00
- Successful online path-guided optimization
- Bails out when no dominant path
- Instrumentation sampling Arnold Ryder 01
- Orthogonal to targeted path profiling
- Selective path profiling Apiwattanapong
Harrold 02 - Useful when only a few paths of interest
52Summary
- Profile-guided profiling in a staged dynamic
optimization system - Two synergistic techniques
- Remove cold paths
- Dont instrument obvious routines and loops
- Reduces overhead by half (SPEC 95) to
two-thirds (SPEC 2000) - High accuracy 99
53Remaining slides not part of talk
54Future work
- Targeted path profiling in a staged dynamic
optimization system - Jikes RVM
55Future work
- Targeted path profiling in a staged dynamic
optimization system - Jikes RVM
- Pseudo-obvious subgraphs
- Maintaining path profiles across program
transformations
56Staged dynamic optimization
Stage 0 Static optimizations
Stage 2 Global optimizations
Stage 1 Local optimizations
Edge profile
Path profile
Path profiling instrumentation
Edge profiler
57Accuracy
- Our techniques lose path information
- For removed cold paths (cold counter)
- For paths that enter or exit disconnected loops
- Accuracy of targeted path profiling 99
- Accuracy of edge profiling
80 SPEC 95 (76 INT, 84 FP)
58Why not edge profiling?
- Edge profile is point profile
- Correlation between edge frequencies ambiguous
A
50
50
C
B
D
50
50
F
E
G
59Edge profile limitations
- Edge profile is point profile
- Correlation between edge frequencies ambiguous
A
50
50
C
B
D
50
50
F
E
G
60Edge profiling limitations
- Edge profile is point profile
- Correlation between edge frequencies ambiguous
A
50
50
C
B
D
50
50
F
E
G
61Staged dynamic optimization
- Dynamic optimization system decides if profiling
likely to be beneficial - Staged dynamic optimization system applies more
powerful and expensive optimizations at each stage
62Cyclic graphs
A
B
C
D
E
F
63Cyclic graphs
- 2 paths ? 8 paths
- Acyclic paths
- Start at A or B
- End at E or F
A
B
C
D
E
F
64Cyclic graphs
- 2 paths ? 8 paths
- Acyclic paths
- Start at A or B
- End at E or F
r0
A
B
C
D
E
F
countr
65Cyclic graphs
- 2 paths ? 8 paths
- Acyclic paths
- Start at A or B
- End at E or F
r0
A
B
C
D
countr r0
E
F
countr
66Cyclic graphs
- 2 paths ? 8 paths
- Acyclic paths
- Start at A or B
- End at E or F
- Paths enter and/or exit loop body
r0
A
B
C
D
countr r0
E
F
countr