Cacheaware Crossprofiling for Java Processors - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Cacheaware Crossprofiling for Java Processors

Description:

Concomitant performance analysis is essential. Application must be deployed on the ... Concomitant performance analysis of embedded Java software is crucial ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 19
Provided by: hsienhs
Category:

less

Transcript and Presenter's Notes

Title: Cacheaware Crossprofiling for Java Processors


1
Cache-aware Cross-profiling for Java Processors
  • P. Moret, W. Binder, A. Villazón
  • University of Lugano, Switzerland
  • M. Schoeberl
  • Vienna University of Technology, Austria

2
Overview
  • Motivation
  • Cross-profiling principles
  • Supported targets
  • Software architecture
  • Accuracy assessment
  • Overhead evaluation
  • Implementation issues
  • Ongoing research

3
Profiling
  • Analysis of program behavior
  • Detection of hot spots
  • Two general approaches
  • Exact profiling
  • Sampling
  • Calling context profiling
  • Dynamic metrics for each calling context
  • Data structure Calling Context Tree

4
Calling Context Tree (CCT)
  • void f()
  • int i
  • for (i1ilt10i)
  • h()
  • g(i)
  • void g(int i)
  • int j
  • for (j1jltij)
  • h()
  • void h() return

f() Invocations 1 Cycles 10600
h() Invocations 10 Cycles 210
g(int) Invocations 10 Bytecodes 8100
h() Invocations 55 Bytecodes 1155
5
Profiling Support in Java
  • JVMPI
  • Experimental interface
  • Inflexible, limited set of events
  • JVMTI
  • Standard interface since JDK 1.5
  • Improved flexibility
  • Limitations of both interfaces
  • Native profiling agents, limited portability
  • Prevailing profilers are very slow
  • Measurement perturbation

6
Profiling of Embedded Applications
  • Application has to get by with limited resoures
  • Concomitant performance analysis is essential
  • Application must be deployed on the target system
  • Hardware must be available in an early
    development phase
  • Deployment wastes development time
  • JVMs for embeddeded systems provide limited
    profiling support
  • E.g., subset of the JVMTI
  • Profiling on the embedded system is slow
  • JVMTI events
  • Slow CPU
  • Significant measurement perturbations
  • Calling context profiling may be impossible
    because of memory constraints

7
Deployment on Embedded Target
8
Our Approach
  • Cross-profiling
  • Application is being profiled on any host
  • Profile represents the execution time on the
    embedded target
  • CProf Portable and extensible cross-profiling
    framework based on program transformations

9
Cross-Profiling on Host
10
Cross-Profiling Target
  • Assumptions
  • Accurate, constant CPU cycle estimate for most
    bytecodes
  • Invoke/return bytecodes may consume a variable
    number of cycles
  • Depending on method size
  • Depending on instruction cache
  • Possibly inaccurate cycle estimates for some less
    frequently executed bytecodes
  • Assumptions met by some recent Java processors
  • Our target JOP - Java Optimized Processor
  • FPGA core
  • Designed to ease WCET analysis
  • Method cache

11
CProf
Basic Block Analysis
Method Bytecode
. . .
12
JBE Accuracy (Percent Error)
13
Reasons for Inaccuracies
  • Imprecise estimates for certain bytecodes
  • Differences in the Java class libraries
  • Target Based on GNU Classpath
  • Host Sun JDK 1.7
  • Automated memory management of the target not
    simulated on the host
  • Non-determinism
  • Thread scheduling
  • Hash codes
  • Etc.

14
JBE Overhead (Factor)
15
JVM98 Overhead (Factor)
16
Implementation Challenges
  • Instrumentation with complete bytecode coverage
  • Standard class library
  • Dynamically generated or loaded code
  • FERRARI (Framework for Exhaustive Rewriting and
    Reification with Advanced Runtime
    Instrumentation)
  • Bootstrapping support
  • Dynamic bypass for skipping instrumented code
  • Efficient calling context reification
  • Preserving accurate calling context across native
    method invocations

17
Ongoing Research
  • Simulation of some aspects of the target on the
    host
  • Data cache
  • Automated memory management
  • Cross-profiling for embedded systems with
    just-in-time compilers
  • CACAO / YARI on a FPGA
  • Aspect-oriented programming (AOP) for the rapid
    development of profilers and debuggers

18
Conclusions
  • Concomitant performance analysis of embedded Java
    software is crucial
  • Cross-profiling is a promising approach
  • Target need not be available
  • No software deployment on target
  • Resource constraints of the target are avoided
  • Calling context profiling is possible
  • Targeting embedded Java processors
  • High accuracy
  • Very fast when compared with alternatives, such
    as VHDL simulation
  • On the web www.inf.unisi.ch/phd/moret/cprof/
Write a Comment
User Comments (0)
About PowerShow.com