CMPUT680 - Winter 2006 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

CMPUT680 - Winter 2006

Description:

Scalar Expansion. Breaks anti-dependence relations by expanding, ... Strip Mining: Decompose a single loop into two. nested loop (the inner loop computes ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 30
Provided by: csUal
Category:

less

Transcript and Presenter's Notes

Title: CMPUT680 - Winter 2006


1
CMPUT680 - Winter 2006
  • Topic B Loop Restructuring
  • José Nelson Amaral
  • http//www.cs.ualberta.ca/amaral/courses/680

2
Reading
Wolfe, Michael, High Performance Compilers
for Parallel Computing, Addison-Wesley,
1996 Chapter 9
Allen, Randy and Kennedy, Ken, Optimizing
Compilers for Modern Architectures,
Morgan-Kaufmann, 2002 Chapter 8
3
Unswitching
Remove loop independent conditionals from a loop.
4
Unswitching
Constraints The conditional tested must be
completely independent of
the loop.
Legality It is always legal.
Advantages Reduces the frequence of execution
of the conditional statement.
Disadvantage Loop structure is more complex.
Code size expansion.
Might prevent data reuse.
5
Loop Peeling
Remove the first (last) iteration of the loop
into separate code.
6
Loop Peeling
Constraints If the compiler does not know that
the trip count is always
positive, the peeled code
must be protected by a zero-trip test.
Advantages Used to enable loop fusion or remove
conditionals on the index
variable from inside the
loop. Allows execution of loop invariant code
only in the first iteration.
Disadvantage Code size expansion.
7
Index Set Splitting
Divides the index set into two portions.
8
Index Set Splitting
Advantages Used to enable loop fusion or remove
conditionals on the index
variable from inside the
loop. Can remove conditionals that test index
variables.
Disadvantage Code size expansion.
9
Scalar Expansion
In the following loop, the scalar variable T
creates (1) a flow dependence from the first to
the second assignment (2) a
loop-carried anti-dependence from the second
to the first assignment This anti-dependence
can prevent some loop transformations.
for i1 to N do T Ai Bi Ci T
1/T endfor
10
Scalar Expansion
Breaks anti-dependence relations by expanding,
or promoting a scalar into an array.
11
Scalar Expansion
Constraints The loop must be countable and the
scalar must have no upward
exposed uses.
If the scalar is live on the loop exit, the
last value assigned in the array must be
copied into the scalar upon exit.
Flow dependences for the scalar in the loop must
be loop independent
Advantages Eliminates anti-dependences and
output dependences.
Disadvantage In nested loops the size of the
array might be
prohibitive.
12
Loop Fusion
Takes two adjacent loops and generates a
single loop.
(1) for i1 to N do (2) Ai Bi 1 (3)
endfor (4) for i1 to N do (5) Ci Ai /
2 (6) endfor (7) for i1 to N do (8) Di 1
/ Ci1 (9) endfor
Before Loop Fusion
But, is this fusion legal?
13
Loop Fusion
Assume N4
(1) for i1 to N do (2) Ai Bi 1 (3)
endfor (4) for i1 to N do (5) Ci Ai /
2 (6) endfor (7) for i1 to N-1 do (8) Di
1 / Ci1 (9) endfor
Before Loop Fusion
14
Loop Fusion
Assume N4
After the first loop
0
0
0
0
A
0
2
After the second loop
0
0
0
0
C
0
1
After the third loop
0
0
0
0
D

15
Loop Fusion
To be legal, a loop fusion must preserve all
the dependence relations of the original loops.
16
Loop Fusion
The original loop has the flow dependencies S2
?f S5 S5 ?f S8
In the fused loop, the dependences are S2 ?f
S5 S8 ?a S5
Fusion reversed the dependence between S5 and
S8!! ?Thus it is illegal.
17
Loop Fusion
Takes two adjacent loops and generates a
single loop.
(1) for i1 to N do (2) Ai Bi 1 (3)
endfor (4) for i1 to N do (5) Ci Ai /
2 (6) endfor (7) for i1 to N do (8) Di 1
/ Ci1 (9) endfor
(1) for i1 to N do (2) Ai Bi 1 (5)
Ci Ai / 2 (6) endfor (7) for i1 to N
do (8) Di 1 / Ci1 (9) endfor
After Loop Fusion
Before Loop Fusion
This is a legal fusion!
18
Loop Fusion
Initially only data independent loops would be
fused.
Now we try to fuse data dependent loops to
increase data locality and benefit from caches.
Loop fusion increases the size of the loop, which
reduces instruction temporal locality (a problem
only in machines with tiny instruction caches).
Larger loop bodies enable more effective scalar
optimizations (common subexpression elimination
and instruction scheduling).
19
Loop Fusion (Complications)
To be fused, two loops must be compatible,
i.e. (1) they iterate the same number of
times (2) they are adjacent or can be reordered
to become adjacent (3) the
compiler must be able to use the
same induction variable in both loops Compilers
use other transformations to make loops meet the
conditions above.
20
Loop Fusion (Another Example)
(2) A1 B1 1 (1) for i2 to 99 do (2)
Ai Bi 1 (3) endfor (4) for i1 to 98
do (5) Ci Ai1 2 (6) endfor
(1) for i1 to 99 do (2) Ai Bi 1 (3)
endfor (4) for i1 to 98 do (5) Ci Ai1
2 (6) endfor
21
Loop Fission(or Loop Distribution)
Breaks a loop into two or more smaller loops.
(1) for i1 to N do (2) Ai Ai
Bi-1 (3) Bi Ci-1X Z (4) Ci
1/Bi (5) Di sqrt(Ci) (6) endfor
Original Loop
22
Loop Fission (or Loop Distribution)
Breaks a loop into two or more smaller loops.
(1) for ib0 to N-1 do (3) Bib1 CibX
Z (4) Cib1 1/Bib1 (6) endfor (1) for
ib0 to N-1 do (2) Aib1 Aib1
Bib (6) endfor (1) for ib0 to N-1 do (5)
Dib1 sqrt(Cib1) (6) endfor (1) i N1
(1) for i1 to N do (2) Ai Ai
Bi-1 (3) Bi Ci-1X Z (4) Ci
1/Bi (5) Di sqrt(Ci) (6) endfor
Original Loop
After Loop Fission
23
Loop Fission (or Loop Distribution)
All statements that form a strongly connected
component in the original loop dependence graph
must remain in the same loop after fission
When finding strongly connected components
for loop fission, the compiler can ignore loop
carried anti-dependence and output dependence
for scalars that are expanded by loop fission.
24
Loop Fission (or Loop Distribution)
To find a legal order of the loops after fission,
we compute the acyclic condensation of the
dependence graph.
S3-S4
Acyclic Condensation
25
Loop Fission (or Loop Distribution)
Uses of loop fission - it can improve cache use
in machines with very small caches
- it can be required for other transformations,
such as loop interchanging.
26
Loop Reversal
Run a loop backward. All dependence directions
are reversed. It is only legal for loops that
have no loop carried dependences.
Can be used to allow fusion
(1) for i1 to N do (2) Ai Bi 1 (3)
Ci Ai/2 (4) endfor (5) for i1 to N do (6)
Di 1/Ci1 (7) endfor
(1) for iN downto 1 do (2) Ai Bi
1 (3) Ci Ai/2 (4) endfor (5) for iN
downto 1 do (6) Di 1/Ci1 (7) endfor
(1) for iN downto 1 do (2) Ai Bi
1 (3) Ci Ai/2 (6) Di 1/Ci1 (7)
endfor
27
Loop Interchanging
Reverses the nesting order of nested loops.
If the outer loop iterates many times, and the
inner loop iterates only a few times,
interchanging reduces the startup cost of the
original inner loop.
Interchanging can change the spatial locality of
memory references.
28
Loop Interchanging
(1) for j2 to M do (2) for i1 to N do (3)
Ai,j Ai,j-1 Bi,j (4)
endfor (5) endfor
(1) for i1 to N do (2) for j2 to M do (3)
Ai,j Ai,j-1 Bi,j (4)
endfor (5) endfor
29
Other Loop Restructuring
Loop Skewing Unnormalize iteration vectors to
change the shape of the
iteration space to
allow loop interchanging.
Strip Mining Decompose a single loop into two
nested loop (the inner loop
computes a strip of the
data computed by the
original loop). Used for vector processors.
Loop Tiling The loop space is divided in tiles,
with the tile boundaries
parallel to the iteration
space axes.
Write a Comment
User Comments (0)
About PowerShow.com