Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler - PowerPoint PPT Presentation

About This Presentation
Title:

Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler

Description:

... arbitrary data for Virtual Machines which employ ... Virtual/Interface call targets. Switch statement index. Instanceof and checkcast runtime types ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 23
Provided by: eecgTo
Category:

less

Transcript and Presenter's Notes

Title: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler


1
Buffered dynamic run-time profiling of arbitrary
data for Virtual Machines which employ
interpreter and Just-In-Time (JIT) compiler
  • Compiler workshop 08Nikola Grcevski, IBM Canada
    Lab

2
Agenda
  • The motivation and the importance of profiling
  • Design and implementation of J9 VM interpreter
    profiler
  • Performance results and start-up overhead

3
The static vs. dynamic compiler
  • Static compilers can take their time to analyze
    the code - perform intra procedural analysis
  • Dynamic Just-In-Time compilers dont have this
    luxury, compilation happens during application
    runtime
  • Can dynamic compilers ever produce quality
    optimized code comparable to static compilers?

4
Why profile?
  • The whole category of speculative optimizations
    relies on some type of profiling information
  • Opens up opportunities for new code and memory
    optimizations
  • Critical for high performance dynamic compiler
    systems

5
What could we profile?
  • Pretty much anything that we expect will provide
    repeatable information that we can use to
    optimize
  • The profiling can be at the Java level or CPU
    level if the OS supports it.

6
What kind of profilers does J9 have
  • JIT profiler
  • Instruments methods with various profiling hooks
  • Targeted only to methods that are very hot
  • Temporal and slows down execution
  • Interpreter profiler
  • The topic of this presentation

7
What kinds of data we collect withthe
interpreter profiler?
  • Branch direction
  • Virtual/Interface call targets
  • Switch statement index
  • Instanceof and checkcast runtime types

8
Interpreter profiler design
  • Buffered approach to data collection on the
    application threads

Application Thread 1
Application Thread N
div
vcall
if
icall
mul
add
vcall
if
if
if
switch
.
9
Interpreter profiler design
  • Buffer full event triggers processing of the data
    by the JIT

Buffer full event
Application Thread 1
if
JIT runtime
vcall
if
switch
if
.
10
Interpreter profiler design
  • JIT parses the application thread profiling
    buffer and builds internal profiling data
    structure

JIT profiling hashtable
Profiling buffer
JIT runtime
data
Bytecode program counter
Hash function based on bytecode PC
11
Whats in the data we collect?
  • Bytecode program counter
  • Variable size data packet
  • 1 byte for branch direction
  • Word size for call targets and runtime types
  • 4 bytes for switch index

12
Processing the buffered branch information
  • We create an object to hold the bytecode PC and
    branch counts. We are using 4 bytes to store the
    branch information.

pc
taken not taken
13
What does the JIT do with the call information?
  • We keep up to 3 call targets with their counts as
    well as residue count

pc
residue
Class A
count
Class B
count
Class C
count
We use the same approach for checkcast and
instanceof
14
What does the JIT do with the switch information?
  • We create a data structure to hold the bytecode
    PC and counts for switch index. The index data is
    8 bytes wide, split into 4 records the top 3 and
    the rest.

pc
record 1
record 2
record 3
The rest
each record is split into 2 portions 1 byte
count and 1 byte switch index
count index
15
Storing the profiling data
  • Each data record is stored in global hashtable,
    using the PC for the hash function
  • On subsequent encounters of the same PC with
    profiling data the records are updated.
  • Branch and switch counts are incremented
  • Call targets and runtime types are added and
    counts incremented.

16
Using the profiling information
  • The profiler database only knows of bytecode PC
  • At all points where the compiler is interested in
    profiling information it generates the bytecode
    pc from the method information and the bytecode
    index
  • The compiler has to make sense out of the
    information in the hashtable

17
Interpreter profiler design
  • JIT compiler consults the profiling hashtable in
    various stages of method compilation

JIT profiling hashtable
Compilation Thread
inliner
order code
.
codegen
18
Performance results
  • Up to 30 improvement on various applications
  • EJB and other middleware applications benefit
    mostly from code ordering and devirtualization
    for the purpose of inlining
  • Benchmarks typically benefit from other
    optimization enabled by the ability to
    devirtualize virtual and interface calls
  • With various tweaks we managed to drive the
    start-up over head to below 10

19
How do we manage the profiling overhead?
  • We turn the profiler off in Xquickstart mode
  • No locking on the hashtable
  • We detect startup phase of the application and
    skip records to ease off the data collection
    overhead

20
Turning the profiler ON and OFF
  • The profiler is ON by default
  • The sampler thread turns the profiler OFF or back
    ON
  • Number of consecutive ticks in JIT generated code
    turns the profiler OFF
  • Number of consecutive ticks in interpreter turns
    the profiler back ON

21
Some of the problems we encountered
  • Tuning for optimal balance between startup
    overhead and throughput performance wasnt easy
  • Application phase change detection wasnt easy
  • Class unloading created lots of problems

22
Summary
  • Profiling is critical for performance of run-time
    systems
  • Using buffered approach to data collection can
    help build efficient profilers
  • Tuning for optimal balance of startup overhead
    and throughput performance is challenging
Write a Comment
User Comments (0)
About PowerShow.com