Performance Tools in Managed Runtime Environments - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Performance Tools in Managed Runtime Environments

Description:

Usage of tools in runtimes have subtle differences when compared to ... Select the appropriate kind of tool for your application. Should be minimally intrusive ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 30
Provided by: wilfredr
Category:

less

Transcript and Presenter's Notes

Title: Performance Tools in Managed Runtime Environments


1
Performance Tools in Managed Runtime Environments
  • Padma Apparao
  • Performance Architect
  • Intel Corporation
  • March 23rd, 2003

2
Outline
  • Motivation
  • Overview of Run-Time Workloads
  • Characterization/Optimization Methodology
  • Profiling Techniques Tools
  • Examples of Use
  • Limitations Desired Enhancements

3
Introduction Motivation
  • Runtime environments introduce level of
    indirection between the user code and the
    underlying hardware architecture
  • Usage of tools in runtimes have subtle
    differences when compared to static apps
  • Just-in-time compilation
  • Code generation and layout is different between
    runs and within the same run
  • For example profiling tools need the ability to
    resolve an address down to its method and offset.
  • A whole new world of profiling the heap has
    opened up

4
Runtime App Characteristics
  • Non-Steady State
  • Non-uniformity of the app itself is a problem,
    every transaction may not behave the same way
    from start to finish
  • Static Applications also may have problems with
    steady state.
  • Managed run times may have additional steady
    state issues due to
  • Garbage collection characteristics modify the
    behavior of the app
  • Dynamic Jitting
  • Characterizing the homogeneity of the application
    is important.
  • Java programs tend to be overly synchronized
  • Most locks not contended (less than 10)
  • Locks are expensive (10 of the time spent in
    lock related instructions, varies by workload)

5
Runtime App Characteristics
  • Workloads tend to be very branchy
  • 1 branch in every 5 instructions or so.
  • Pointer chasing, short methods.
  • Large number of small methods typical of OO code
    gt many calls and returns
  • For Java Apps approx 50 instructions per call.
  • Tradeoffs between in-lining (code bloat) vs.
    extensive calls.

Other names and brands may be claimed as the
property of others.
6
Workload Homogeneity
Correlating performance metrics in non-steady
workloads is difficult
7
Performance Methodology
  • Magnitude of improvements depend on
  • Maturity of application
  • Previous level of performance tuning
  • Performance Methodology
  • Define understand the workload this is key
  • Follow a systematic tuning approach
  • Study effects at various levels system,
    application, micro-architecture
  • Use the right tools
  • Use the Closed Loop Cycle let the results of one
    iteration direct the next
  • Make one change at a time

8
Top-Down Closed Loop Methodology
Top-Down Approach
Closed Loop
9
Types of Tools
  • Hardware Software
  • Non-intrusive hardware counters
  • Operating System or application code counters
  • Profiling Instrumentation
  • System level profiling
  • Application call-tree information via
    instrumentation
  • Event Based Time Based
  • Sampling based on occurrence of particular events
    within the processor e.g. Cache Misses,
    Instructions retired
  • Sampling based on clock ticks
  • Select the appropriate kind of tool for your
    application
  • Should be minimally intrusive
  • Should provide relevant and accurate information
    to optimize your application

10
Tools Hierarchy
  • System Level Monitoring
  • Processor
  • Memory
  • Network
  • Disk
  • Application Level Profiling
  • Lock contention
  • Heap contention
  • Threading
  • Good bad APIs
  • Micro-Architecture Level Event monitoring
  • Branch prediction
  • Cache performance
  • Data alignment

11
Windows vs. Linux
Linux Tools depend on kernels supported
Other names and brands may be claimed as the
property of others.
12
System Level Tools
  • Perfmon /IOStat-Sar
  • Counters arranged by object (subsystem)
  • Processor, Memory, Disk, Network, File System
  • Derive standard formulas (ratios) for good vs.
    bad subsystem performance
  • File system usage, CPI, cache behaviour
  • Disk Network I/O bottlenecks
  • Memory latency
  • Advantages
  • Low hanging fruit
  • Helps identify obvious problems early
  • Low intrusiveness on the system
  • These problems usually easy to fix

Other names and brands may be claimed as the
property of others.
13
System Level Tools
  • Perfmon on Windows
  • Well integrated and complete
  • Can add new objects into registry automatically
    detected
  • Counters available for Runtime in Microsoft .NET
  • Object CLR Memory GC counters are exposed
  • Extensive information available in /proc but need
    tools to extract that information
  • New drivers may put information into /proc

Other names and brands may be claimed as the
property of others.
14
IOstat Example
Iostat in Linux can give q sizes, wait time in
the queue and service time
Other names and brands may be claimed as the
property of others.
15
Application Level Tools
  • Profilers
  • VTune, Visual Quantify, Metrowerks Code
    Warrior, JProbe, OptimizeIt, strace, ltrace
  • Show where the time is being spent
  • Ntoskernel.exe (kernel time)
  • Hal.dll (hardware drivers)
  • Ntdll.dll (synchronization, heap/memory mgt)
  • vmlinux
  • Analyze results to determine
  • Kernel vs. User time
  • Lock implementation latency
  • Efficiency of protected resource
  • Good vs. Bad APIs
  • Thread and lock contention issues

Other names and brands may be claimed as the
property of others.
16
Application Level Profilers Heap Profiling
  • Memory and Heap profiling
  • Heap is a tangled web of object references
  • Sizes of heap generations, heap expansion and
    shrinkage
  • How many objects created, and of what type/class
  • Nature of object graph connectivity and depth
  • How much heap is used, how much is transient
  • Identification of memory leaks
  • Hprof (Java HAT and HPjmeter) used for Heap
    profiling
  • JProbe Memory Debugger
  • http//www.sitraka.com/software/jprobe/jprobedebug
    ger.html

Other names and brands may be claimed as the
property of others.
17
Heap Profiling Example (using Hprof)
18
Residual Objects (Hprof data displayed by
HPJmeter)
Gives the objects still lingering in the heap and
which method allocated these objects
Other names and brands may be claimed as the
property of others.
19
Call Graph Data showing CPU Time distribtuion
(Hprof data displayed by HPJmeter)
CPU time spent in each method and call graph tree
Other names and brands may be claimed as the
property of others.
20
Garbage Collection
  • Garbage Collection
  • How often does GC kick in?
  • Which method invoked GC?
  • What is the GC pause time?
  • How many objects were reclaimed during each GC?
  • Use verbosegc to gather GC stats
  • CLR Allocation Profiler (AP)
  • Details of object allocation inside the CLRs
    heap
  • Sizes and frequency of collection of each
    generation

Other names and brands may be claimed as the
property of others.
21
Garbage Collection Data with 6 GC threads
GCViewer tagtraum industries
Other names and brands may be claimed as the
property of others.
22
Garbage Collection Data with 10 GC threads
23
Object Profiling
  • Object profiling Can track objects, their sizes,
    lifetimes, memory allocations
  • Object references and scope, hot objects
  • Object access patterns
  • Can start from an object and walk down its path
    of references
  • Can start from a class and look at all its object
    allocations
  • Allocator methods

24
Object Lifetime Profiling
25
Also possibleJIT and Lock Profiling
  • Track JIT optimizations for profile information
  • Rejitted code How often does a method get jitted
  • Inlined functions
  • Function splitting
  • Lock profiling
  • Useful for Scalability/Synchronizaton
  • Thin and inflated locks statistics
  • Contended Locks
  • with average contention, max contention
  • average maximum hold times, acquisitions.

26
Thread Profiling
  • Thread Analyzer Useful for Multithreaded
    Programs
  • Detect race conditions
  • Detect deadlocks and predict them
  • Display status of threads running, blocked etc.
  • Point to source code where contentions occur
  • JProbe Thread Analyzer
  • http//www.sitraka.com/software/jprobe/jprobethrea
    dalyzer.html

Other names and brands may be claimed as the
property of others.
27
Examples of Profiling Tools
  • Optimizeit
  • Has a Java Performance Suite with Profiler, Code
    Coverage tool and a thread debugger.
  • Optimizeit Profiler has 2 Profilers
  • CPU Profiler handles sampling and
    instrumentation techniques and provides
    information on execution time.
  • Memory profiler helps identify classes that
    consume most memory, have most number of
    instances.
  • JProbe
  • Has Memory Debugger, Java Profiler, Thread
    Analyzer and Coverage.
  • Memory object life time analysis, identifying
    loitering objects and short life time
    objects.
  • Thread profiler/analyzer identifies data race
    problems and determines deadlocks and predicts
    them.
  • VTune Performance Analyzer
  • Well Integrated with Java and .NET

Other names and brands may be claimed as the
property of others.
28
µArchitecture Tools
  • Micro-Architecture Performance
  • VTune and Emon (Intel Architecture specific
    tools)
  • Knowledgeable of CPU perf counters
  • Cache hit/miss ratios
  • CPI (path length)
  • Branch performance
  • Misaligned Data
  • Use after all other system and application level
    issues have been resolved
  • Bigger investment to optimize
  • Restructuring code / fundamental changes to
    design architecture specific to Intel
    architecture
  • Usually yields no more than 5-10 increase

29
Desired Enhancements to Existing Tools
  • Current Profiling Tools use JVMPI to profile
    applications
  • Need a cheaper way to capture frequently
    generated events
  • Selection of appropriate methods to track
  • Byte Code Instrumentation also available in some
    tools
  • Heap Profiling
  • Extremely Intrusive about 30-50x slower
  • Less expensive heap profiling needed
  • Profiling of JIT events is least intrusive, but
    requires more effort
  • One-stop-shop for all Platforms, Operating
    Systems will reduce the learning curve for tool
    usage
Write a Comment
User Comments (0)
About PowerShow.com