JIT Instrumentation - PowerPoint PPT Presentation

About This Presentation
Title:

JIT Instrumentation

Description:

Code. Trap Handler. Save processor state. Lookup which instrumentation to call ... Code. Area of Interest. mov $ffffe000,edx. and esp,edx. mov 28(edi),eax. add ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 37
Provided by: johnw76
Category:

less

Transcript and Presenter's Notes

Title: JIT Instrumentation


1
JIT Instrumentation A Novel Approach To
Dynamically Instrument Operating Systems
Marek Olszewski Keir Mierle Adam
Czajkowski Angela Demke Brown University of
Toronto
2
Instrumenting Operating Systems
  • Operating systems are growing in complexity
  • Kernel instrumentation can help
  • Used for debugging, profiling, monitoring, and
    security auditing...
  • Dynamic instrumentation
  • No recompilation no reboot
  • Good for debugging systemic problems
  • Feasible in production settings

3
Current Approach Probe-Based
  • Dynamic instrumentation tools for OSs are
  • probe based
  • Overwrite existing code with jump/trap
  • Efficient on fixed length architectures
  • Slow on variable length architectures
  • Not safe to overwrite multiple instructions with
    jump
  • Branch to between instructions might exist
  • Thread might be sleeping in between the
    instructions
  • Must use trap instruction

4
Current Approach Trap-based
Area of interest
Instrumentation Code
Trap Handler
sub 6c,esp
  • Save processor state
  • Lookup which instrumentation to call
  • Call instrumentation
  • Emulate overwritten instruction
  • Restore processor state

mov ffffe000,edx
add 1,count_l adc 0,count_h
and esp,edx
inc 14(edx)
int3
mov 28(edi),eax
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
Very Expensive!
5
Alternative JIT Instrumentation
  • Propose to use just-in-time dynamic
    instrumentation
  • Rewrite code to insert new instructions in
    between existing ones
  • More Efficient.
  • More Powerful. Supports
  • Instrumenting branch directions
  • Basic block-level instrumentation
  • Per execution-path instrumentation
  • Proven itself in user space (Pin, Valgrind)

6
JIT Instrumentation
Instrumentation Code
Area of Interest
Code Cache
sub 6c,esp
mov ffffe000,edx
add 1,count_l
and esp,edx
adc 0,count_h
inc 14(edx)
mov 28(edi),eax
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
7
JIT Instrumentation
Instrumentation Code
Area of Interest
Code Cache
sub 6c,esp
sub 6c,esp
mov ffffe000,edx
mov ffffe000,edx
add 1,count_l
and esp,edx
and esp,edx
adc 0,count_h
inc 14(edx)
pushf
mov 28(edi),eax
call instrmtn
popf
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
8
JIT Instrumentation
Instrumentation Code
Area of Interest
Code Cache
sub 6c,esp
sub 6c,esp
mov ffffe000,edx
mov ffffe000,edx
and esp,edx
and esp,edx
inc 14(edx)
pushf
mov 28(edi),eax
call instrmtn
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
9
Dynamic Binary Rewriting
  • Use dynamic binary rewriting to insert the new
    instructions.
  • Interleaves binary rewriting with execution
  • Performed by a runtime system
  • Typically at basic block granularity
  • Code is rewritten into a code cache
  • Rewritten code must be
  • Efficient
  • Unaware of its new location

10
Dynamic Binary Rewriting
Original Code
Code Cache
bb1
bb1
bb1
bb3
bb2
bb4
Runtime System
11
Dynamic Binary Rewriting
Original Code
Code Cache
bb1
bb1
bb3
bb2
bb2
bb2
bb4
Runtime System
12
Dynamic Binary Rewriting
Original Code
Code Cache
bb1
bb1
bb1
bb3
bb2
bb2
bb2
bb4
bb4
bb4
Runtime System
No longer need to enter runtime system
13
Dynamic Binary Rewriting
  • Used for rewriting operating systems
  • Virtualization (VMware)
  • Emulation (QEMU)
  • Never used for instrumentation of OSs
  • Never used to rewrite host OS in a general manner
  • Allows instrumentation of live system

14
Outline
  • Prototype (JIFL)
  • Design
  • OS Issues
  • Performance comparison
  • Kprobes vs JIFL
  • Example Plugin
  • Checking branch hint directions

15
Prototype Design
  • JIFL - JIT Instrumentation Framework for Linux
  • Instruments code reachable from system calls

16
JIFL Software Architecture
JIFL Plugin Starter
JIFL Plugin Starter
User Space
JIFL Plugin (Loadable Kernel Module)
Linux Kernel (All code reachable from
system calls)
Kernel Space
JIFL (Loadable Kernel Module)
Code Cache
Runtime System
Dispatcher
JIT Compiler
Heap
Memory Allocator
17
Gaining Control
  • Runtime System must gain control before it can
    start rewriting/instrumenting OS
  • Update system call table entry to point to
    dynamically emitted entry stub
  • Calls per-system call instrumentation
  • Calls dispatcher
  • Passing original system call pointer

18
Dispatcher
  • Saves registers and condition code states
  • Dispatcher checks if target basic block is in
    code cache
  • If so it jumps to this basic block
  • Otherwise it invokes the JIT to compile and
    instrument the new basic block

19
JIT Compiler
  • Like conventional JIT compiler, except its
    input/output is x86 machine code
  • Compiles at a dynamic basic block granularity
  • All but the last control flow instruction are
    copied directly into the code cache
  • Control flow instructions are modified to account
    for the new location of the code
  • Communicates with the JIFL plugin to determine
    what instrumentation to insert

20
JIT Inserting Instrumentation
  • Instrumentation is added by inserting a call
    instruction into the basic block
  • Additional instructions are also needed to
  • Push/Pop instrumentation parameters
  • Save/Restore volatile registers (eax, edx, ecx)
  • Save/Restore condition code register
  • Several optimizations can be performed to reduce
    instrumentation cost

21
Eliminating Redundant State Saving
  • Eliminate dead register and condition code saving
    code
  • Perform liveness analysis
  • Reduce state saving overhead
  • Per-basic block Instrumentation
  • Search for the cheapest place to insert it

22
Inlining Instrumentation
  • Small instrumentation can be inlined into the
    basic block
  • Removes the call and ret instructions
  • Constant parameters are propagated to remove
    stack accesses
  • Copy propagation and dead-code elimination is
    applied to specialize the instrumentation routine
    for context
  • All done on native x86 code. No IR!

23
Effect of Optimizations
  • Average system call latencies with per-basic
    block instrumentation

Normalized Execution Time
24
Prototype
  • Operating System Issues

25
Memory Allocator
  • While JITing JIFL often needs to allocate dynamic
    memory
  • Cannot rely on Linux kmalloc and vmalloc routines
    as they are not reentrant
  • Instead, we created our own memory allocator
  • Pre-allocate a heap when JIFL starts up

26
Releasing Control
  • Calls to schedule() have to be redirected
  • Otherwise, JIFL keeps control even after context
    switch
  • Have to
  • Save return address in hash table
  • Call schedule()
  • Look up and call dispatcher

27
Performance Comparison
  • JIFL vs. Kprobes

28
Performance Evaluation
  • Instrument every system call with three types of
    instrumentation
  • System Call Monitoring (Coarse Grained)
  • Call Tracing (Medium Grained)
  • Basic Block Counting (Fine Grained)
  • LMbench and ApacheBench2 benchmarks
  • Test Setup
  • 4-way Intel Pentium 4 Xeon SMP - 2.8GHz
  • Linux 2.6.17.13
  • With SMP support and no preemption

29
System Call Monitoring
Normalized Execution Time
30
Call Tracing
Normalized Execution Time Log Scale
31
Basic Block Counting
Normalized Execution Time Log Scale
32
Apache Throughput
Normalized Requests / Second
33
Example Plugin
  • Checking Correctness of Branch Hints

34
Example Plugin Checking Branch Hints
Int correct_count ? 0 Int incorrect_count ? 0 //
Called for every newly discovered basic
block. Procedure Basic_Block_Callback if
last instruction is not a hinted branch
return if hinted in the branch not taken
direction call Insert_Branch_Not_Taken_In
strumentation( Increment_Counter,
correct_count) call Insert_Branch_Taken_
Instrumentation( Increment_Counter,
incorrect_count) else // Insert
same instrumentation but for reverse //
branch directions // Executed for every
instrumented branch. Procedure Increment_Counter(I
nt Counter) Counter ? Counter 1
35
Example Plugin Checking Branch Hints
  • 5 system calls with bad branch hint performance
  • Misprediction rates gt 75
  • Contained gt 30 of hinted branch executed
  • Examined using a second plugin
  • Monitored individual branches
  • Found 4 greatest contributors
  • Mapped back to source code
  • Cant fix Not hinted by programmer!

36
Conclusions
  • JIT instrumentation viable for operating systems
  • Developed a prototype for the Linux kernel (JIFL)
  • Results are very competitive
  • JIFL outperforms Kprobes by orders of magnitude
  • Enables more powerful instrumentation
  • e.g. Branch Hints
Write a Comment
User Comments (0)
About PowerShow.com