Extending Global Optimizations in the OpenUH Compiler for OpenMP - PowerPoint PPT Presentation

About This Presentation
Title:

Extending Global Optimizations in the OpenUH Compiler for OpenMP

Description:

Exploit the compiler analysis and optimizations for OpenMP programs ... Executables. A Portable OpenMP. Runtime library. FRONTENDS (C/C , Fortran 90, OpenMP) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 13
Provided by: lei140
Category:

less

Transcript and Presenter's Notes

Title: Extending Global Optimizations in the OpenUH Compiler for OpenMP


1
Extending Global Optimizations in the OpenUH
Compiler for OpenMP
  • Open64 Workshop, CGO 08

2
Goals
  • Exploit the compiler analysis and optimizations
    for OpenMP programs
  • Enable high level optimizations by taking OpenMP
    semantics into consideration
  • Build a general framework for OpenMP compiler
    optimizations

3
OpenUH Compiler based on Open64
Source code w/ OpenMP directives
FRONTENDS (C/C, Fortran 90, OpenMP)
Open64 Compiler infrastructure
IPA (Inter Procedural Analyzer)
Source code with runtime library calls
OMP_PRELOWER (Preprocess OpenMP )
A Native Compiler
LNO (Loop Nest Optimizer)
LOWER_MP (Transformation of OpenMP )
Linking
Object files
WOPT (global scalar optimizer)
WHIRL2C WHIRL2F (IR-to-source for none-Itanium )
A Portable OpenMP Runtime library
CG (code for IA-32, IA-64, Opteron)
4
OpenUH Compiler based on Open64
Source code w/ OpenMP directives
FRONTENDS (C/C, Fortran 90, OpenMP)
Open64 Compiler infrastructure
IPA (Inter Procedural Analyzer)
Source code with runtime library calls
OMP_PRELOWER (Preprocess OpenMP )
A Native Compiler
LNO (Loop Nest Optimizer)
LOWER_MP (Transformation of OpenMP )
Linking
Object files
WOPT (global scalar optimizer)
WHIRL2C WHIRL2F (IR-to-source for none-Itanium )
A Portable OpenMP Runtime library
CG (code for IA-32, IA-64, Opteron)
5
Motivation
Compiler flags -O3 -O3 mp3
PRE-example 7.42 46.8
NAS FT 18.45 26.17
NAS UA 130.31 220.15
Why different performance?
6
A PRE Example
7
A PRE Example
no copy propagation!
copy propagation
8
Parallel Data Flow Analysis
  • Compilers need to further optimize OpenMP codes
  • Most current OpenMP compilers perform
    optimizations after OpenMP constructs have been
    lowered to threaded codes
  • Have to restrict the traditional optimizations
    inside an OpenMP construct, not crossing
    synchronizations
  • Need to enable global optimizations
  • Missed opportunity to perform high-level OpenMP
    optimizations
  • Such as barrier elimination

9
Solution Method
  • Based on the OpenMP Memory Model
  • Relaxed Consistency
  • Flush is the key operation!
  • Design a Parallel Control Flow Graph to represent
    a OpenMP program

10
Parallel edge
Composite node
Basic Node
Super node
Sequential edge
Conflict edge
Entry
a0 b0 pragma omp parallel sections
pragma omp section
a1 pragma omp flush(a,b)
IF (b 0) Critical1
a 0 pragma omp flush(a) ELSE
else1 pragma omp section
b1 pragma omp flush(a,b)
IF (a 0)
Critical2 b 0
pragma omp flush(b) ELSE
else2
a1
b1
Flush(a,b)
Flush(a,b)
If (a 0)
If (b 0)
Else
b0
a0
Else
Flush(b)
Flush(a)
Barrier
B The corresponding PCFG
A an OpenMP section example
11
Input WHIRL tree
Input WHIRL tree
  • Construct CFG
  • Control Flow Analyses
  • Parallel Control Flow Analysis
  • Flow Free Alias Analysis

PCFG
CFG
  • Construct CFG
  • Control Flow Analyses
  • Flow Free Alias Analysis
  • Construct HSSA representation
  • Phi insertion for conflict edges
  • Points-to and Pointer Alias Analysis
  • Create CODEMAP representation

HSSA
  • Construct HSSA representation
  • Points-to and Pointer Alias Analysis
  • Create CODEMAP representation

HSSA
IVR
IVR
  • SSA-based optimizations
  • PREOPT SSA-based optimizations

CP DCE
CP DCE
Flow free copy propagation
Flow free copy propagation
SSAPRE
  • Perform PRE on OpenMP code

Emit
  • Emit new WHIRL from optimized CFG/SSA

Emit
  • Emit new WHIRL from optimized CFG/SSA

Output WHIRL tree
Output WHIRL tree
12
Conclusion
  • Implementing in the OpenUH compiler
  • Improve the scalability of OpenMP programs
  • A framework for conducting more aggressive
    optimizations for Cluster OpenMP
  • Can be used in conjunction with data race
    detection tools
Write a Comment
User Comments (0)
About PowerShow.com