Title: Sampling profiler for Rotor as part of optimizing compilation system
1Sampling profiler for Rotor as part of
optimizing compilation system
Sofia Chilingarova, St-Petersburg, Russia
Prof. Vladimir O. Safonov St-Petersburg, Russia
2Agenda
- Problem Statement
- Rotor Sampling Profiler Implementation
- Results
3Problem Statement
- Rotor does not implement optimizations in
JIT-compiler - To implement optimizations runtime profiling is
needed - Sampling based profiler the best option, rather
full information by low cost
4Typical Optimizing Dynamic Compilation Subsystem
Architecture
IL (bytecode, CIL)
Base Compiler/ Interpreter
Executable Code
Multilevel Optimizing Compiler
Data
Profiling
Compilation Queue
Profiler
Data
Controller
Methods list
Profiling plan
5Rotor Sampling Profiler Implementation
- Goals
- Profiling Subsystem Architecture
- Data Storage Structure
- Self-Tuning
- Integration with Rotor
6Goals
- To estimate individual method calls frequency
- To construct a Call Graph
- To achieve a reasonably low cost
- small total overhead of profiling
- avoid suspending user threads for a long time
- To make good use of existing Rotor facilities
7Profiling Subsystem Architecture
SSCLI Threads
Profiler marks managed threads
buffer
Profiler
Marking- Thread
local queue
raw samples data
Manager-Thread
Global
queue
Data Storage
8Data Storage Structure previous approaches
a bunch of samples
DCG (Dynamic Call Graph)
PCCT (Partial Call Context Tree)
9Data Storage Structure - our approach
10Self-Tuning
- When taking a sample if visited frame is
encountered, stack lookup is completed - The sample is marked with a visited mark
- When processing samples if marked sample
contains only 1 frame data (a topmost frame), a
special repetitions counter is incremented - Profiling interval is tuned based on
repetitions counter value when a fixed number
of samples is processed
11Integration with Rotor
- Threads are stopped at safe points to get
profile - Just as they are stopped for GC or debugging
- Inherent SSCLI Stack Walk mechanism is used to
collect managed stack samples - Internal SSCLI VM hash tables and synchronization
locks are used to store and maintain profile data
12Results testing environment
- Tests from Rotor test suit have been used
sscli\tests\bcl\threadsafety - Many threads execute the same code
- Measures used
- statistical correlation of total individual
method calls counters - Arnold Ryders Tree Overlap Percentage Measure
- Self-tuning turned off for simplicity of
measurement - But the best results were obtained with the same
interval, which had been set automatically
(100ms) - Average value from 10 subsequent runs is counted
13Results
Test Correlation Overlap
co8545int32 0.99 0.97
co8546int16 0.99 0.92
co8547sbyte 0.99 0.94
co8548intptr 0.99 0.98
co8549uint16 0.99 0.95
co8550uint32 0.99 0.95
co8502multiplereaders 0.96 0.85
Co8503singlewritermultiple readers 0.96 0.80
14Questions
- Author Sofia Chilingarova, e-mail
sofie-chil_at_hotmail.ru