Title: Extending Global Optimizations in the OpenUH Compiler for OpenMP
1Extending Global Optimizations in the OpenUH
Compiler for OpenMP
2Goals
- Exploit the compiler analysis and optimizations
for OpenMP programs - Enable high level optimizations by taking OpenMP
semantics into consideration - Build a general framework for OpenMP compiler
optimizations
3OpenUH Compiler based on Open64
Source code w/ OpenMP directives
FRONTENDS (C/C, Fortran 90, OpenMP)
Open64 Compiler infrastructure
IPA (Inter Procedural Analyzer)
Source code with runtime library calls
OMP_PRELOWER (Preprocess OpenMP )
A Native Compiler
LNO (Loop Nest Optimizer)
LOWER_MP (Transformation of OpenMP )
Linking
Object files
WOPT (global scalar optimizer)
WHIRL2C WHIRL2F (IR-to-source for none-Itanium )
A Portable OpenMP Runtime library
CG (code for IA-32, IA-64, Opteron)
4OpenUH Compiler based on Open64
Source code w/ OpenMP directives
FRONTENDS (C/C, Fortran 90, OpenMP)
Open64 Compiler infrastructure
IPA (Inter Procedural Analyzer)
Source code with runtime library calls
OMP_PRELOWER (Preprocess OpenMP )
A Native Compiler
LNO (Loop Nest Optimizer)
LOWER_MP (Transformation of OpenMP )
Linking
Object files
WOPT (global scalar optimizer)
WHIRL2C WHIRL2F (IR-to-source for none-Itanium )
A Portable OpenMP Runtime library
CG (code for IA-32, IA-64, Opteron)
5Motivation
Compiler flags -O3 -O3 mp3
PRE-example 7.42 46.8
NAS FT 18.45 26.17
NAS UA 130.31 220.15
Why different performance?
6A PRE Example
7A PRE Example
no copy propagation!
copy propagation
8Parallel Data Flow Analysis
- Compilers need to further optimize OpenMP codes
- Most current OpenMP compilers perform
optimizations after OpenMP constructs have been
lowered to threaded codes - Have to restrict the traditional optimizations
inside an OpenMP construct, not crossing
synchronizations - Need to enable global optimizations
- Missed opportunity to perform high-level OpenMP
optimizations - Such as barrier elimination
9Solution Method
- Based on the OpenMP Memory Model
- Relaxed Consistency
- Flush is the key operation!
- Design a Parallel Control Flow Graph to represent
a OpenMP program
10Parallel edge
Composite node
Basic Node
Super node
Sequential edge
Conflict edge
Entry
a0 b0 pragma omp parallel sections
pragma omp section
a1 pragma omp flush(a,b)
IF (b 0) Critical1
a 0 pragma omp flush(a) ELSE
else1 pragma omp section
b1 pragma omp flush(a,b)
IF (a 0)
Critical2 b 0
pragma omp flush(b) ELSE
else2
a1
b1
Flush(a,b)
Flush(a,b)
If (a 0)
If (b 0)
Else
b0
a0
Else
Flush(b)
Flush(a)
Barrier
B The corresponding PCFG
A an OpenMP section example
11Input WHIRL tree
Input WHIRL tree
- Construct CFG
- Control Flow Analyses
- Parallel Control Flow Analysis
- Flow Free Alias Analysis
PCFG
CFG
- Construct CFG
- Control Flow Analyses
- Flow Free Alias Analysis
- Construct HSSA representation
- Phi insertion for conflict edges
- Points-to and Pointer Alias Analysis
- Create CODEMAP representation
HSSA
- Construct HSSA representation
- Points-to and Pointer Alias Analysis
- Create CODEMAP representation
HSSA
IVR
IVR
- PREOPT SSA-based optimizations
CP DCE
CP DCE
Flow free copy propagation
Flow free copy propagation
SSAPRE
- Perform PRE on OpenMP code
Emit
- Emit new WHIRL from optimized CFG/SSA
Emit
- Emit new WHIRL from optimized CFG/SSA
Output WHIRL tree
Output WHIRL tree
12Conclusion
- Implementing in the OpenUH compiler
- Improve the scalability of OpenMP programs
- A framework for conducting more aggressive
optimizations for Cluster OpenMP - Can be used in conjunction with data race
detection tools