Partial Method Compilation using Dynamic Profile Information presentation

About This Presentation

Transcript and Presenter's Notes

Title: Partial Method Compilation using Dynamic Profile Information

1
Partial Method Compilationusing Dynamic Profile
Information
John Whaley Stanford University October 17, 2001
2
Outline

Background and Overview
Dynamic Compilation System
Partial Method Compilation Technique
Optimizations
Experimental Results
Related Work
Conclusion

3
Dynamic Compilation

We want code performance comparable to static
compilation techniques
However, we want to avoid long startup delays and
slow responsiveness
Dynamic compiler should be fast AND good

4
Traditional approach

Interpreter plus optimizing compiler
Switch from interpreter to optimizing compiler
via some heuristic
Problems
Interpreter is too slow! (10x to 100x)

5
Another approach

Simple compiler plus optimizing compiler
(Jalapeno, JUDO, Microsoft)
Switch from simple to optimizing compiler via
some heuristic
Problems
Code from simple compiler is still too slow!
(30 to 100 slower than optimizing)
Memory footprint problems (Suganuma et al.,
OOPSLA01)

6
Yet another approach

Multi-level compilation (Jalapeno, HotSpot)
Use multiple compiled versions to slowly
accelerate into optimized execution
Problems
This simply increases the delay before the
program runs at full speed!

7
Problem with compilation

Compilation takes time proportional to the amount
of code being compiled
Many optimizations are superlinear in the size of
the code
Compilation of large amounts of code is the cause
of undesirably long compilation times

8
Methods can be large

All of these techniques operate at method
boundaries
Methods can be large, especially after inlining
Cutting inlining too much hurts performance
considerably (Arnold et al., Dynamo00)
Even when being frugal about inlining, methods
can still become very large

9
Methods are poor boundaries

Method boundaries do not correspond very well to
the code that would most benefit from
optimization
Even hot methods typically contain some code
that is rarely or never executed

10
Example SpecJVM db

void read_db(String fn)
int n 0, act 0 byte buffer null
try
FileInputStream sif new FileInputStream(fn)
buffer new byten
while ((b sif.read(buffer, act, n-act))gt0)
act act b
sif.close()
if (act ! n)
/ lots of error handling code, rare /
catch (IOException ioe)
/ lots of error handling code, rare /

Hot loop
11
Example SpecJVM db

void read_db(String fn)
int n 0, act 0 byte buffer null
try
FileInputStream sif new FileInputStream(fn)
buffer new byten
while ((b sif.read(buffer, act, n-act))gt0)
act act b
sif.close()
if (act ! n)
/ lots of error handling code, rare /
catch (IOException ioe)
/ lots of error handling code, rare /

Lots of rare code!
12
Hot regions, not methods

The regions that are important to compile have
nothing to do with the method boundaries
Using a method granularity causes the compiler to
waste time optimizing large pieces of code that
do not matter

13
Overview of our technique

Increase the precision of selective
compilation to operate at a sub-method
granularity
Collect basic block level profile data for hot
methods
Recompile using the profile data, replacing rare
code entry points with branches into the
interpreter

14
Overview of our technique

Takes advantage of the well-known fact that a
large amount of code is rarely or never executed
Simple to understand and implement, yet highly
effective
Beneficial secondary effect of improving
optimization opportunities on the common paths

15
Overview of Dynamic Compilation System
16
interpreted code
Stage 1
when execution count t1
compiled code
Stage 2
when execution count t2
fully optimized code
Stage 3
17
Identifying rare code

Simple technique any basic block executed during
Stage 2 is said to be hot
Effectively ignores initialization
Add instrumentation to the targets of conditional
forward branches
Better techniques exist, but using this we saw no
performance degradation
Enable/disable profiling is implicitly handled by
stage transitions

18
Method-at-a-time strategy
of basic blocks
execution threshold
19
Actual basic blocks executed
of basic blocks
execution threshold
20
Partial method compilation technique
21
Technique

Based on profile data, determine the set of rare
blocks.
Use code coverage information from the first
compiled version

22
Technique

Perform live variable analysis.
Determine the set of live variables at rare block
entry points

live x,y,z
23
Technique

Redirect the control flow edges that targeted
rare blocks, and remove the rare blocks.

to interpreter
24
Technique

Perform compilation normally.
Analyses treat the interpreter transfer point as
an unanalyzable method call.

25
Technique

Record a map for each interpreter transfer point.
In code generation, generate a map that specifies
the location, in registers or memory, of each of
the live variables.
Maps are typically lt 100 bytes

live x,y,z
x sp - 4
y R1
z sp - 8
26
Optimizations
27
Partial dead code elimination

Modified dead code elimination to treat rare
blocks specially
Move computation that is only live on a rare path
into the rare block, saving computation in the
common case

28
Partial dead code elimination

Optimistic approach on SSA form
Mark all instructions that compute essential
values, recursively
Eliminate all non-essential instructions

29
Partial dead code elimination

Calculate necessary code, ignoring all rare
blocks
For each rare block, calculate the instructions
that are necessary for that rare block, but not
necessary in non-rare blocks
If these instructions are recomputable at the
point of the rare block, they can be safely
copied there

30
Partial dead code example

x 0
if (rare branch 1)
...
z x y
...
if (rare branch 2)
...
a x z
...

31
Partial dead code example

if (rare branch 1)
x 0
...
z x y
...
if (rare branch 2)
x 0
...
a x z
...

32
Pointer and escape analysis

Treating an entrance to the rare path as a method
call is a conservative assumption
Typically does not matter because there are no
merges back into the common path
However, this conservativeness hurts pointer and
escape analysis because a single unanalyzed call
kills all information

33
Pointer and escape analysis

Stack allocate objects that dont escape in the
common blocks
Eliminate synchronization on objects that dont
escape the common blocks
If a branch to a rare block is taken
Copy stack-allocated objects to the heap and
update pointers
Reapply eliminated synchronizations

34
Copying from stack to heap
Heap
copy
stack object
stack object
rewrite
35
Reconstructing interpreter state

We use a runtime glue routine
Construct a set of interpreter stack frames,
initialized with their corresponding method and
bytecode pointers
Iterate through each location pair in the map,
and copy the value at the location to its
corresponding position in the interpreter stack
frame
Branch into the interpreter, and continue
execution

36
Experimental Results
37
Experimental Methodology

Fully implemented in a proprietary system
Unfortunately, cannot publish those numbers!
Proof-of-concept implementation in thejoeq
virtual machine http//joeq.sourceforge.net
Unfortunately, joeq does not perform significant
optimizations!

38
Experimental Methodology

Also implemented as an offline step, using
refactored class files
Use offline profile information to split methods
into hot and cold parts
We then rely on the virtual machines default
method-at-a-time strategy
Provides a reasonable approximation of the
effectiveness of this technique
Can also be used as a standalone optimizer
Available under LGPL as part of joeq release

39
Experimental Methodology

IBM JDK 1.3 cx130-20010626 on RedHat Linux 7.1
Pentium 3 600 mhz, 512 MB RAM
Thresholds t1 2000, t2 25000
Benchmarks SpecJVM, SwingSet, Linpack, JavaLex,
JavaCup

40
Run time improvement
First bar original Second bar PMC Third bar
PMC my opts
Blue optimized execution
41
Related Work

Dynamic techniques
Dynamo (Bala et al., PLDI00)
Self (Chambers et al., OOPSLA91)
HotSpot (JVM01)
IBM JDK (Ishizaki et al., OOPSLA00)

42
Related Work

Static techniques
Trace scheduling (Fisher, 1981)
Superblock scheduling (IMPACT compiler)
Partial redundancy elimination with cost-benefit
analysis (Horspool, 1997)
Optimal compilation unit shapes (Bruening,
FDDO00)
Profile-guided code placement strategies

Partial Method Compilation using Dynamic Profile Information PowerPoint PPT Presentation