Statistical Compiler Tuning - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Statistical Compiler Tuning

Description:

... Mann-Whitney algorithm two other architectures, the IA64 dual core Itanium2 1.296GHz and the SUN SPARC dual core 1.28GHz, to check the robustness of the ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 26

Provided by: ceEtTu

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Compiler Tuning

1
Statistical Compiler Tuning

M. Haneda
P.M.W. Knijnenburg
H.A.G. Wijshoff

2
Motivation

An optimal compiler optimization setting can be
obtained by considering the interaction between
applications, architectures, and compiler
optimizations.
Profiling is a (the best?) way to understand this
interaction.
However a huge number of optimization settings
are possible.

3
Example gcc 3.3.1

42 options
defer-pop, force-mem, force-addr,
optimize-sibling-calls, inline-functions,
merge-constants, strength-reduce, thread-jumps,
cse-follow-jumps, cse-skip-blocks,
rerun-cse-after-loop, rerun-loop-opt, gcse,
loop-optimize, crossjumping, if-conversion,
if-conversion2, delete-null-pointer-checks,
expensive-optimizations, optimize-register-move,
schedule-insns, sched-interblock, sched-spec,
schedule-insns2, sched-spec-load,
sched-spec-load-dangerous, caller-saves,
move-all-movables, reduce-all-givs, peephole
peephole2, reorder-blocks, reorder-functions,
strict-aliasing, align-functions, align-labels,
align-loops, align-jumps, cprop-registers,
function-sections, data-sections, unroll-loops

4
Example (Contd)

42 options, all on/off switches leads to 242
4.4x1012
Each profile takes 10 runs each taking
approximately 10 sec. Total time per profile 100
sec.
Total amount of time for full profile takes
4.4x1014 sec 7.3x108 weeks 1.4x107 years

5
Challenge

How to find the optimal configuration with
limited amount of profiling?

6
Three Approaches

Statistical approach
Using Main effect
Using the Mann-Whitney test
Random approach
Approach which focuses on the interaction between
compiler optimizations

7
Statistical Approach

Start with an appropriate initial representation
of the full search space based on the orthogonal
arrays.
Each time after data collection
Approach1 Compute main effects of compiler
options from the profiling data.
Approach2 Apply Inferential Statistics
(Mann-Whitney Test ) to the profiling data to
detect effective compiler options.

8
Orthogonal Arrays

Orthogonal arrays (OAs) are well chosen
fractional factorial designs.
An OA is expressed as an N x k matrix of 0s and
1s.
The columns are interpreted as factors (compiler
options).
Each row of an array defines a compiler setting.

9
Orthogonal Arrays (contd)

An OA has the property that for any two arbitrary
columns the patterns
00 01 10 11
occur equally often.
According to this property,
Each compiler option is turned on and off equally
often.
When we drop columns of an OA, the array is still
an orthogonal array.

10
Example

0 0 0 0 0
1 0 0 1 1
0 1 0 1 0
0 0 1 0 1
1 1 0 0 1
1 0 1 1 0
0 1 1 1 1
1 1 1 0 0

O1 O2 O3 O4 O5 Run1 off off
off off off Run2 on off off on on Run3
off on off on off Run4 off off on off
on Run5 on on off off on Run6 on off
on on off Run7 off on on on on Run8 on
on on off off
Interpreted as Compiler Settings
11
Inferential Statistics

Inferential statistics is used to predict whether
a factor of an experiment has a significant
effect in the presence of other factors.
Inferential statistics is based on a null
hypothesis and test statistics.

12
Null Hypothesis

The null hypothesis denies the effect of a factor
in an experiment
Compiler option A is not effective to
optimize application B.
The Mann-Whitney test is used to compute the test
statistics to evaluate the likelihood of the null
hypothesis.

13
Iterative Algorithm
List of compiler options, OA Target application,
Input dataset
Compile application according to the compiler
setting from OA
Profiling data
New option list
Mann-Whitney test
Remove significant options from option list
Significant options
14
Iterative Algorithm (Contd)

Until
All options are set, or
No options with a significant effect are detected
anymore, or
The experimental data has not enough variation
(low standard deviation) to apply the
Mann-Whitney test meaningfully.

15
Application to GCC

Compiler version 3.3.1
Number of options 42 options
Architecture Pentium 4 at 2.8GHz
Applications 7 programs from the SPECint 2000
benchmark suite
Measurement Unix time command
Improvement of configured setting Onew
Obase setting optimization level O with all
options explicitly turned off

16
Case study (parser, SPECint2000)

1st iteration of the experiment using the
benchmark Parser
We use an OA of order 48, which derives 48
compiler settings
Option 5 is selected.

17
Case study (parser, SPECint2000)

2nd iteration of the experiment using the
benchmark Parser
Option 3 and 13 are selected.

18
Case study (parser, SPECint2000)

3rd iteration of the experiment using the
benchmark Parser
Option 4, 19, and 33 are selected.

19
Overall Results
20
Profiling Time
21
Different Architectures

We apply the Mann-Whitney algorithm two other
architectures, the IA64 dual core Itanium2
1.296GHz and the SUN SPARC dual core 1.28GHz, to
check the robustness of the approach.
We only apply the Mann-Whitney test and 5 out of
the 12 SPECint benchmarks due to compilation
errors.

22
Different Architectures
IA64
SPARC

The performance for the IA64 is comparable to the
performance of O3
The results for the SPARC are better.

23
Different Input Sets (1)

Improvements using the resulted
setting and reference input datasets.
The performance is modest over all
architectures, but it seems that
the resulted settings can be
applicable to different dataset.

24
Different Input Sets (2)
25
Conclusion

The Mann-Whitney test can be used to achieve a
fully automated method to determine optimal
compiler settings for a single application.
Resulted compiler settings are applicable for
different input dataset, however the results are
better when we use the target datasets.
The same methodology can be applied to the code
size reduction.

Write a Comment

User Comments (0)