Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler - PowerPoint PPT Presentation

About This Presentation

Title:

Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler

Description:

... arbitrary data for Virtual Machines which employ ... Virtual/Interface call targets. Switch statement index. Instanceof and checkcast runtime types ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 23

Provided by: eecgTo

Learn more at: https://www.eecg.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler

1
Buffered dynamic run-time profiling of arbitrary
data for Virtual Machines which employ
interpreter and Just-In-Time (JIT) compiler

Compiler workshop 08Nikola Grcevski, IBM Canada
Lab

2
Agenda

The motivation and the importance of profiling
Design and implementation of J9 VM interpreter
profiler
Performance results and start-up overhead

3
The static vs. dynamic compiler

Static compilers can take their time to analyze
the code - perform intra procedural analysis
Dynamic Just-In-Time compilers dont have this
luxury, compilation happens during application
runtime
Can dynamic compilers ever produce quality
optimized code comparable to static compilers?

4
Why profile?

The whole category of speculative optimizations
relies on some type of profiling information
Opens up opportunities for new code and memory
optimizations
Critical for high performance dynamic compiler
systems

5
What could we profile?

Pretty much anything that we expect will provide
repeatable information that we can use to
optimize
The profiling can be at the Java level or CPU
level if the OS supports it.

6
What kind of profilers does J9 have

JIT profiler
Instruments methods with various profiling hooks
Targeted only to methods that are very hot
Temporal and slows down execution
Interpreter profiler
The topic of this presentation

7
What kinds of data we collect withthe
interpreter profiler?

Branch direction
Virtual/Interface call targets
Switch statement index
Instanceof and checkcast runtime types

8
Interpreter profiler design

Buffered approach to data collection on the
application threads

Application Thread 1
Application Thread N
div
vcall
if
icall
mul
add
vcall
if
if
if
switch
.
9
Interpreter profiler design

Buffer full event triggers processing of the data
by the JIT

Buffer full event
Application Thread 1
if
JIT runtime
vcall
if
switch
if
.
10
Interpreter profiler design

JIT parses the application thread profiling
buffer and builds internal profiling data
structure

JIT profiling hashtable
Profiling buffer
JIT runtime
data
Bytecode program counter
Hash function based on bytecode PC
11
Whats in the data we collect?

Bytecode program counter
Variable size data packet
1 byte for branch direction
Word size for call targets and runtime types
4 bytes for switch index

12
Processing the buffered branch information

We create an object to hold the bytecode PC and
branch counts. We are using 4 bytes to store the
branch information.

pc
taken not taken
13
What does the JIT do with the call information?

We keep up to 3 call targets with their counts as
well as residue count

pc
residue
Class A
count
Class B
count
Class C
count
We use the same approach for checkcast and
instanceof
14
What does the JIT do with the switch information?

We create a data structure to hold the bytecode
PC and counts for switch index. The index data is
8 bytes wide, split into 4 records the top 3 and
the rest.

pc
record 1
record 2
record 3
The rest
each record is split into 2 portions 1 byte
count and 1 byte switch index
count index
15
Storing the profiling data

Each data record is stored in global hashtable,
using the PC for the hash function
On subsequent encounters of the same PC with
profiling data the records are updated.
Branch and switch counts are incremented
Call targets and runtime types are added and
counts incremented.

16
Using the profiling information

The profiler database only knows of bytecode PC
At all points where the compiler is interested in
profiling information it generates the bytecode
pc from the method information and the bytecode
index
The compiler has to make sense out of the
information in the hashtable

17
Interpreter profiler design

JIT compiler consults the profiling hashtable in
various stages of method compilation

JIT profiling hashtable
Compilation Thread
inliner
order code
.
codegen
18
Performance results

Up to 30 improvement on various applications
EJB and other middleware applications benefit
mostly from code ordering and devirtualization
for the purpose of inlining
Benchmarks typically benefit from other
optimization enabled by the ability to
devirtualize virtual and interface calls
With various tweaks we managed to drive the
start-up over head to below 10

19
How do we manage the profiling overhead?

We turn the profiler off in Xquickstart mode
No locking on the hashtable
We detect startup phase of the application and
skip records to ease off the data collection
overhead

20
Turning the profiler ON and OFF

The profiler is ON by default
The sampler thread turns the profiler OFF or back
ON
Number of consecutive ticks in JIT generated code
turns the profiler OFF
Number of consecutive ticks in interpreter turns
the profiler back ON

21
Some of the problems we encountered

Tuning for optimal balance between startup
overhead and throughput performance wasnt easy
Application phase change detection wasnt easy
Class unloading created lots of problems

22
Summary

Profiling is critical for performance of run-time
systems
Using buffered approach to data collection can
help build efficient profilers
Tuning for optimal balance of startup overhead
and throughput performance is challenging

Write a Comment

User Comments (0)