Hardware/Compiler Co-Design and Compiler Optimizations - PowerPoint PPT Presentation

About This Presentation
Title:

Hardware/Compiler Co-Design and Compiler Optimizations

Description:

Complete and flexible support of inner-loop scheduling (SWP), instruction ... Unimodular (e.g. permutation?) Loop unrolling? Both? Others ? Objective: performace ? ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 26
Provided by: Intr1
Category:

less

Transcript and Presenter's Notes

Title: Hardware/Compiler Co-Design and Compiler Optimizations


1
Topic 8
Optimization for Parallel Computation
2
Reading List
  • Slides Topic 8x
  • Other readings as assigned in class or homework

3
Outline
  • Basic Concepts
  • Parallelism
  • Locality
  • Loop Nest Optimization
  • Summary

4
Parallelism
  • What is Parallelism ?
  • Parallelism in Computer Architecture
  • Instruction-Level Parallelism (ILP)
  • Thread-Level Parallelism (TLP)
  • Parallelism in Programs/Applications
  • Statement Level Parallelism
  • Loop Level Parallelism
  • Task Level Parallelism

5
General Compiler Framework
Source
  • Good IPO
  • Good LNO
  • Good global optimization
  • Good integration of IPO/LNO/OPT
  • Smooth information passing between FE and CG
  • Complete and flexible support of inner-loop
    scheduling (SWP), instruction scheduling and
    register allocation

Inter-Procedural Optimization (IPA)
Loop Nest Optimization (LNO)
Global Optimization (OPT)
ME
Global inst scheduling
Innermost Loop scheduling
Arch Models
Reg alloc
Local inst scheduling
BE/CG
Executable
6
A Multiprocessor Architecture
  • A generic modern multiprocessor
  • Node processor(s), memory system, plus
    communication assist
  • Network interface and communication controller
  • Scalable network

7
Locality
  • Temporal Locality
  • the same data is used several times within a
    short time period
  • Spatial Locality
  • when different data elements that are located
    near to each other are used within a short period
    of time

8
Loop Nest Tansformation and Optimization
  • Simple Loop Transformation
  • Unimodular Loop Transformations
  • Beyond Unimodular Transformations
  • Combining Loop Transformation
  • Summary

9
Simple Loop Transformation
  • Loop unrolling
  • Loop peeling
  • ...

10
Unimodular Loop Transformation
  • Loop interchange
  • Loop reversal
  • Loop skewing

11
Loop Interchange
Why we wish to perform loop interchange ?
12
Safety of Loop Interchange
DO J 1, M DO I 1, N A(I, J1)
A(I1, J) B ENDDO ENDDO
Is it legal to do interchange of I, J?
13
Legality of Loop Interchange
DO J 1, M DO I 1, N A(I, J1)
A(I1, J)) B ENDDO ENDDO
Note Interchange here is Illegal!
14
Loop Reversal An Example
15
Loop Reversal An Example (Contd)
Interchange
DO J M, 1, -1 DO I A(I1, J)
A(I, J1)) B ENDDO ENDDO
16
Skewing - An Example
17
Skewing - An Example
(Contd)
DO j 2, NN DO I max(1,
j-n), min(N, j-1) AI, j-1
AI-1, j-1 AI, j-I-1 END
END
18
Disadvantage of Loop Skewing
  • Recompute loop bounds
  • Loop bounds changes
  • average vector length changes.

19
Unimodular Transformations
  • Motivation
  • Easy to represent compound transformations
  • Elegant formulation of objective functions under
    compound loop transformations

20
Beyond Unimodular Loop Trasformation
  • Loop Strip-Mining
  • Loop Tiling
  • Loop Fusion
  • Loop Fission

21
Advanced Topics Toward A Framework of
Combining Loop Transformations
22
An Example
  • Assume a multi-issue architecture with resource
    constraints to be considered
  • caches,
  • registers,
  • instruction scheduling
  • Question What loop transformations to apply and
    in what order?
  • Unimodular (e.g. permutation?)
  • Loop unrolling?
  • Both?
  • Others ?
  • Subroutine nest (a, b, c)
  • Real8 a(1000)
  • Real8 b(1000, 1000), c(1000)
  • Do j 1, 1000
  • DO i 1, 1000
  • a(j) a(j) b(j, i) c(j)
  • END DO
  • END DO
  • end

23
Motivating Example
Contd
  • Subroutine nest (a, b, c)
  • Real8 a(1000)
  • Real8 b(1000, 1000), c(1000)
  • Do j 1, 1000
  • DO i 1, 1000
  • a(j) a(j) b(j, i) c(j)
  • END DO
  • END DO
  • end

Do i 1, 1000, 4 DO j 1, 1000
a(j) a(j) b(j, i) c(j) a(j) a(j)
b(j, i1) c(j) a(j) a(j) b(j, i2)
c(j) a(j) a(j) b(j, i3) c(j) END
DO END DO end
Question Is the above A good combination
? Loop interchange Outer loop unrolling
Inner loop fusion Why do this ? (cache effect
? of loads/stores ? Reg. Alloc ?)
24
What We Need?
  • A good cost model
  • A way to enumerate the space of possible loop
    transformation
  • An intelligent way to search through the space
  • Modularity of each individual transformation so
    to facilitate their combination

25
It is still a problem for open research
Write a Comment
User Comments (0)
About PowerShow.com