CS380 C lecture 20 - PowerPoint PPT Presentation

About This Presentation

Title:

CS380 C lecture 20

Description:

JIT compilation, GC, dynamic checks, etc. Methodology has not adapted ... with 'best' application performance in order to keep from hiding mutator overheads in ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 33

Provided by: CSCF

Learn more at: https://www.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS380 C lecture 20

1
CS380 C lecture 20

Last time
Linear scan register allocation
Classic compilation techniques
On to a modern context
Today
Jenn Sartor
Experimental evaluation for managed languages
with JIT compilation and garbage collection

2
Wake Up and Smell the Coffee Performance
Analysis Methodologies for the 21st Century

Kathryn S McKinley
Department of Computer Sciences
University of Texas at Austin

3
Shocking News!

In 2000, Java overtook C and C as the most
popular programming language
TIOBE 2000--2008

4
Systems Researchin Industry and Academia

ISCA 2006
20 papers use C and/or C
5 papers are orthogonal to the programming
language
2 papers use specialized programming languages
2 papers use Java and C from SPEC
1 paper uses only Java from SPEC

5
What is Experimental Computer Science?
6
What is Experimental Computer Science?

An idea
An implementation in some system
An evaluation

7
The success of most systems innovation hinges on
evaluation methodologies.

Benchmarks reflect current and ideally, future
reality
Experimental design is appropriate
Statistical data analysis

8
The success of most systems innovation hinges on
experimental methodologies.
?

Benchmarks reflect current and ideally, future
reality DaCapo Benchmarks 2006
Experimental design is appropriate.
Statistical Data Analysis Georges et al. 2006

?
9
Experimental Design

Were not in Kansas anymore!
JIT compilation, GC, dynamic checks, etc
Methodology has not adapted
Needs to be updated and institutionalized

this sophistication provides a significant
challenge to understanding complete system
performance, not found in traditional languages
such as C or C Hauswirth et al OOPSLA 04
10
Experimental Design

Comprehensive comparison
3 state-of-the-art JVMs
Best of 5 executions
19 benchmarks
Platform 2GHz Pentium-M, 1GB RAM, linux 2.6.15

11
Experimental Design
12
Experimental Design
13
Experimental Design
14
Experimental Design
First Iteration
Second Iteration
Third Iteration
15
Experimental Design

Another Experiment
Compare two garbage collectors
Semispace Full Heap Garbage Collector
Marksweep Full Heap Garbage Collector

16
Experimental Design

Another Experiment
Compare two garbage collectors
Semispace Full Heap Garbage Collector
Marksweep Full Heap Garbage Collector
Experimental Design
Same JVM, same compiler settings
Second iteration for both
Best of 5 executions
One benchmark - SPEC 209_db
Platform 2GHz Pentium-M, 1GB RAM, linux 2.6.15

17
Marksweep vs Semispace
18
Marksweep vs Semispace
19
Marksweep vs Semispace
20
Experimental Design
21
Experimental DesignBest Practices

Measuring JVM innovations
Measuring JIT innovations
Measuring GC innovations
Measuring Architecture innovations

22
JVM InnovationBest Practices

Examples
Thread scheduling
Performance monitoring
Workload triggers differences
real workloads perhaps microbenchmarks
e.g., force frequency of thread switching
Measure report multiple iterations
start up
steady state (aka server mode)
never configure the VM to use completely
unoptimized code!
Use a modest or multiple heap sizes computed as a
function of maximum live size of the application
Use report multiple architectures

23
Best Practices
24
JIT Innovation Best Practices

Example new compiler optimization
Code quality Does it improve the application
code?
Compile time How much compile time does it add?
Total time compiler and application time
together
Problem adaptive compilation responds to
compilation load
Question How do we tease all these effects apart?

25
JIT Innovation Best Practices

Teasing apart compile time and code quality
requires multiple experiments
Total time Mix methodology
Run adaptive system as intended
Result mixture of optimized and unoptimized code
First second iterations (that include compile
time)
Set and/or report the heap size as a function of
maximum live size of the application
Report average and show statistical error
Code quality
OK Run iterations until performance stabilizes
on best, or
Better Run several iterations of the benchmark,
turn off the compiler, and measure a run
guaranteed to have no compilation
Best Replay mix compilation
Compile time
Requires the compiler to be deterministic
Replay mix compilation

26
Replay Compilation

Force the JIT to produce a deterministic result
Make a compilation profiler replayer
Profiler
Profile first or later iterations with adaptive
JIT, pick best or average
Record profiling information used in compilation
decisions, e.g., dynamic profiles of edges,
paths, /or dynamic call graph
Record compilation decisions, e.g., compile
method bar at level two, inline method foo into
bar
Mix of optimized and unoptimized, or all
optimized/unoptimized
Replayer
Reads in profile
As the system loads each class, apply profile /-
innovation
Result
controlled experiments with deterministic
compiler behavior
reduces statistical variance in measurements
Still not a perfect methodology for inlining

27
GC Innovation Best Practices

Requires more than one experiment...
Use report a range of fixed heap sizes
Explore the space time tradeoff
Measure heap size with respect to the maximum
live size of the application
VMs should report total memory not just
application memory
Different GC algorithms vary in the meta-data
they require
JIT and VM use memory...
Measure time with a constant workload
Do not measure through put
Best run two experiments
mix with adaptive methodology what users are
likely to see in practice
replay hold the compiler activity constant
Choose a profile with best application
performance in order to keep from hiding mutator
overheads in bad code.

28
Architecture Innovation Best Practices

Requires more than one experiment...
Use more than one VM
Set a modest heap size and/or report heap size as
a function of maximum live size
Use a mixture of optimized and uncompiled code
Simulator needs the same code in many cases to
perform comparisons
Best for microarchitecture only changes
Multiple traces from live system with adaptive
methodology
start up and steady state with compiler turned
off
what users are likely to see in practice
Wont work if architecture change requires
recompilation, e.g., new sampling mechanism
Use replay to make the code as similar as possible

29
benchmarks
There are lies, damn lies, and

statistics
Disraeli

30
Conclusions

Methodology includes
Benchmarks
Experimental design
Statistical analysis OOPSLA 2007
Poor Methodology
can focus or misdirect innovation and energy
We have a unique opportunity
Transactional memory, multicore performance,
dynamic languages
What we can do
Enlist VM builders to include replay
Fund and broaden participation in benchmarking
Research and industrial partnerships
Funding through NSF, ACM, SPEC, industry or ??
Participate in building community workloads

31
CS380 C

More on Java Benchmarking
www.dacapobench.org
Alias analysis
Read A. Diwan, K. S. McKinley, and J. E. B.
Moss, Using Types to Analyze and Optimize
Object-Oriented Programs, ACM Transactions on
Programming Languages and Systems, 23(1) 30-72,
January 2001.

32
Suggested ReadingsPerformance Evaluation of JVMs

How Java Programs Interact with Virtual Machines
at the Microarchitectural Level, Lieven Eeckhout,
Andy Georges and Koen De Bosschere, The 18th
Annual ACM SIGPLAN Conference on Object-Oriented
Programming, Systems, Languages and Applications
(OOPSLA'03), Oct. 2003
Method-Level Phase Behavior in Java Workloads,
Andy Georges, Dries Buytaert, Lieven Eeckhout and
Koen De Bosschere, The 19th Annual ACM SIGPLAN
Conference on Object-Oriented Programming,
Systems, Languages and Applications (OOPSLA'04),
Oct. 2004
Myths and Realities The Performance Impact of
Garbage Collection, S. M. Blackburn, P. Cheng,
and K. S. McKinley, ACM SIGMETRICS Conference on
Measurement Modeling Computer Systems, pp.
25--36, New York, NY, June 2004.
The DaCapo Benchmarks Java Benchmarking
Development and Analysis, S. M. Blackburn, et.
al., The ACM SIGPLAN Conference on Object
Oriented Programming Systems, Languages and
Applications (OOPSLA), Portland, OR, pp.
191--208, October 2006.
Statistically Rigorous Java Performance
Evaluation, A. Georges, D. Buytaert, and L.
Eeckhout, The ACM SIGPLAN Conference on Object
Oriented Programming Systems, Languages and
Applications (OOPSLA), Montreal, Canada, Oct
2007. To appear.