Partial Method Compilation using Dynamic Profile Information - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Partial Method Compilation using Dynamic Profile Information

Description:

Compilation takes time proportional to the amount of code being compiled ... The regions that are important to compile have nothing to do with the method boundaries ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 44
Provided by: johnw53
Category:

less

Transcript and Presenter's Notes

Title: Partial Method Compilation using Dynamic Profile Information


1
Partial Method Compilationusing Dynamic Profile
Information
John Whaley Stanford University October 17, 2001
2
Outline
  • Background and Overview
  • Dynamic Compilation System
  • Partial Method Compilation Technique
  • Optimizations
  • Experimental Results
  • Related Work
  • Conclusion

3
Dynamic Compilation
  • We want code performance comparable to static
    compilation techniques
  • However, we want to avoid long startup delays and
    slow responsiveness
  • Dynamic compiler should be fast AND good

4
Traditional approach
  • Interpreter plus optimizing compiler
  • Switch from interpreter to optimizing compiler
    via some heuristic
  • Problems
  • Interpreter is too slow! (10x to 100x)

5
Another approach
  • Simple compiler plus optimizing compiler
    (Jalapeno, JUDO, Microsoft)
  • Switch from simple to optimizing compiler via
    some heuristic
  • Problems
  • Code from simple compiler is still too slow!
    (30 to 100 slower than optimizing)
  • Memory footprint problems (Suganuma et al.,
    OOPSLA01)

6
Yet another approach
  • Multi-level compilation (Jalapeno, HotSpot)
  • Use multiple compiled versions to slowly
    accelerate into optimized execution
  • Problems
  • This simply increases the delay before the
    program runs at full speed!

7
Problem with compilation
  • Compilation takes time proportional to the amount
    of code being compiled
  • Many optimizations are superlinear in the size of
    the code
  • Compilation of large amounts of code is the cause
    of undesirably long compilation times

8
Methods can be large
  • All of these techniques operate at method
    boundaries
  • Methods can be large, especially after inlining
  • Cutting inlining too much hurts performance
    considerably (Arnold et al., Dynamo00)
  • Even when being frugal about inlining, methods
    can still become very large

9
Methods are poor boundaries
  • Method boundaries do not correspond very well to
    the code that would most benefit from
    optimization
  • Even hot methods typically contain some code
    that is rarely or never executed

10
Example SpecJVM db
  • void read_db(String fn)
  • int n 0, act 0 byte buffer null
  • try
  • FileInputStream sif new FileInputStream(fn)
  • buffer new byten
  • while ((b sif.read(buffer, act, n-act))gt0)
  • act act b
  • sif.close()
  • if (act ! n)
  • / lots of error handling code, rare /
  • catch (IOException ioe)
  • / lots of error handling code, rare /

Hot loop
11
Example SpecJVM db
  • void read_db(String fn)
  • int n 0, act 0 byte buffer null
  • try
  • FileInputStream sif new FileInputStream(fn)
  • buffer new byten
  • while ((b sif.read(buffer, act, n-act))gt0)
  • act act b
  • sif.close()
  • if (act ! n)
  • / lots of error handling code, rare /
  • catch (IOException ioe)
  • / lots of error handling code, rare /

Lots of rare code!
12
Hot regions, not methods
  • The regions that are important to compile have
    nothing to do with the method boundaries
  • Using a method granularity causes the compiler to
    waste time optimizing large pieces of code that
    do not matter

13
Overview of our technique
  • Increase the precision of selective
  • compilation to operate at a sub-method
  • granularity
  • Collect basic block level profile data for hot
    methods
  • Recompile using the profile data, replacing rare
    code entry points with branches into the
    interpreter

14
Overview of our technique
  • Takes advantage of the well-known fact that a
    large amount of code is rarely or never executed
  • Simple to understand and implement, yet highly
    effective
  • Beneficial secondary effect of improving
    optimization opportunities on the common paths

15
Overview of Dynamic Compilation System
16
interpreted code
Stage 1
when execution count t1
compiled code
Stage 2
when execution count t2
fully optimized code
Stage 3
17
Identifying rare code
  • Simple technique any basic block executed during
    Stage 2 is said to be hot
  • Effectively ignores initialization
  • Add instrumentation to the targets of conditional
    forward branches
  • Better techniques exist, but using this we saw no
    performance degradation
  • Enable/disable profiling is implicitly handled by
    stage transitions

18
Method-at-a-time strategy
of basic blocks
execution threshold
19
Actual basic blocks executed
of basic blocks
execution threshold
20
Partial method compilation technique
21
Technique
  • Based on profile data, determine the set of rare
    blocks.
  • Use code coverage information from the first
    compiled version

22
Technique
  • Perform live variable analysis.
  • Determine the set of live variables at rare block
    entry points

live x,y,z
23
Technique
  • Redirect the control flow edges that targeted
    rare blocks, and remove the rare blocks.

to interpreter
24
Technique
  • Perform compilation normally.
  • Analyses treat the interpreter transfer point as
    an unanalyzable method call.

25
Technique
  • Record a map for each interpreter transfer point.
  • In code generation, generate a map that specifies
    the location, in registers or memory, of each of
    the live variables.
  • Maps are typically lt 100 bytes

live x,y,z
x sp - 4
y R1
z sp - 8
26
Optimizations
27
Partial dead code elimination
  • Modified dead code elimination to treat rare
    blocks specially
  • Move computation that is only live on a rare path
    into the rare block, saving computation in the
    common case

28
Partial dead code elimination
  • Optimistic approach on SSA form
  • Mark all instructions that compute essential
    values, recursively
  • Eliminate all non-essential instructions

29
Partial dead code elimination
  • Calculate necessary code, ignoring all rare
    blocks
  • For each rare block, calculate the instructions
    that are necessary for that rare block, but not
    necessary in non-rare blocks
  • If these instructions are recomputable at the
    point of the rare block, they can be safely
    copied there

30
Partial dead code example
  • x 0
  • if (rare branch 1)
  • ...
  • z x y
  • ...
  • if (rare branch 2)
  • ...
  • a x z
  • ...

31
Partial dead code example
  • if (rare branch 1)
  • x 0
  • ...
  • z x y
  • ...
  • if (rare branch 2)
  • x 0
  • ...
  • a x z
  • ...

32
Pointer and escape analysis
  • Treating an entrance to the rare path as a method
    call is a conservative assumption
  • Typically does not matter because there are no
    merges back into the common path
  • However, this conservativeness hurts pointer and
    escape analysis because a single unanalyzed call
    kills all information

33
Pointer and escape analysis
  • Stack allocate objects that dont escape in the
    common blocks
  • Eliminate synchronization on objects that dont
    escape the common blocks
  • If a branch to a rare block is taken
  • Copy stack-allocated objects to the heap and
    update pointers
  • Reapply eliminated synchronizations

34
Copying from stack to heap
Heap
copy
stack object
stack object
rewrite
35
Reconstructing interpreter state
  • We use a runtime glue routine
  • Construct a set of interpreter stack frames,
    initialized with their corresponding method and
    bytecode pointers
  • Iterate through each location pair in the map,
    and copy the value at the location to its
    corresponding position in the interpreter stack
    frame
  • Branch into the interpreter, and continue
    execution

36
Experimental Results
37
Experimental Methodology
  • Fully implemented in a proprietary system
  • Unfortunately, cannot publish those numbers!
  • Proof-of-concept implementation in thejoeq
    virtual machine http//joeq.sourceforge.net
  • Unfortunately, joeq does not perform significant
    optimizations!

38
Experimental Methodology
  • Also implemented as an offline step, using
    refactored class files
  • Use offline profile information to split methods
    into hot and cold parts
  • We then rely on the virtual machines default
    method-at-a-time strategy
  • Provides a reasonable approximation of the
    effectiveness of this technique
  • Can also be used as a standalone optimizer
  • Available under LGPL as part of joeq release

39
Experimental Methodology
  • IBM JDK 1.3 cx130-20010626 on RedHat Linux 7.1
  • Pentium 3 600 mhz, 512 MB RAM
  • Thresholds t1 2000, t2 25000
  • Benchmarks SpecJVM, SwingSet, Linpack, JavaLex,
    JavaCup

40
Run time improvement
First bar original Second bar PMC Third bar
PMC my opts
Blue optimized execution
41
Related Work
  • Dynamic techniques
  • Dynamo (Bala et al., PLDI00)
  • Self (Chambers et al., OOPSLA91)
  • HotSpot (JVM01)
  • IBM JDK (Ishizaki et al., OOPSLA00)

42
Related Work
  • Static techniques
  • Trace scheduling (Fisher, 1981)
  • Superblock scheduling (IMPACT compiler)
  • Partial redundancy elimination with cost-benefit
    analysis (Horspool, 1997)
  • Optimal compilation unit shapes (Bruening,
    FDDO00)
  • Profile-guided code placement strategies

43
Conclusion
  • Partial method compilation technique is simple to
    implement, yet very effective
  • Compile times reduced drastically
  • Overall run times improved by an average of 10,
    and up to 32
  • System is available under LGPL at
    http//joeq.sourceforge.net
Write a Comment
User Comments (0)
About PowerShow.com