Code Generation

About This Presentation

Title:

Code Generation

Description:

Code Generation Beyond basic blocks: data flow analysis Basic blocks are nodes in the flow graph Can compute global properties of program as iterative algorithms on ... – PowerPoint PPT presentation

Number of Views:808

Avg rating:5.0/5.0

Slides: 103

Provided by: deep97

Category:

more less

Transcript and Presenter's Notes

Title: Code Generation

1
Code Generation
2
Introduction
Front end
Code generator
Code Optimizer
Source program
Intermediate code
Intermediate code
target program
Symbol table
Position of code generator
3

The target program generated must preserve the
semantic meaning of the source program and be of
high quality.
It must make effective use of the available
resources of the target machine.
The code generator itself must run efficiently.

4
A code generator has 3 primary tasks

Instruction Selection
Register Allocation and Assignment
Instruction Ordering

5
Issues in the Design of a Code Generator

The most important criteria for the code gen is
that it produces correct codes.
Depend on
Input to the code gen(IR)
Target program(language)
Operating System
Memory management
Instruction Selection
Register allocation and assignment
Evaluation order

6
Issues in the Design of a Code Generator1)Input
to the Code Generator

We assume, front end has
Scanned, parsed and translate the source program
into a reasonably detailed intermediate
representations(IR)
Type checking, type conversion and obvious
semantic errors have already been detected
Symbol table is able to provide run-time address
of the data objects
Intermediate representations may be
Three address representation quadruples,
triples, indirect triples.
Linear representation -Postfix notations
Virtual machine representation bytecode, Stack
machine code
Graphical representation - Syntax tree, DAG

7
Issues in the Design of a Code Generator2)Target
Programs(1)

The instruction-set architecture of the target
machine has a significant impact on the
difficulty of constructing a good code generator
that produces high-quality machine code.
The most common target-machine architectures are
RISC
CISC
Stack based.

8
Issues in the Design of a Code Generator2)Target
Programs(2)RISC machine

many registers,
three-address instructions,
simple addressing modes,
and a relatively simple instruction-set
architecture.

9
Issues in the Design of a Code Generator2)Target
Programs(3)CISC machine

few registers,
two-address instructions,
a variety of addressing modes,
several register classes,
variable-length instructions,
and instructions with side effects.

10
Issues in the Design of a Code Generator2)Target
Programs(4)Stack-based machine

Operations are done by pushing operands onto a
stack and then performing the operations on the
operands at the top of the stack.
To achieve high performance the top of the stack
is typically kept in registers.
Stack-based machines almost disappeared because
it was felt that the stack organization was too
limiting and required too many swap and copy
operations.

11
Issues in the Design of a Code Generator2)Target
Programs(5)Stack-based machine

Stack-based architectures were revived with the
introduction of the Java Virtual Machine (JVM)
The JVM is a software interpreter for Java
bytecodes, an intermediate language produced by
Java compilers.
The interpreter provides software compatibility
across multiple platforms.
(just-in-time) JIT compilers translate bytecodes
during run time to the native hardware
instruction set of the target machine.

12
Issues in the Design of a Code Generator2)Target
Programs(6)

The output of the code generator is the target
program.
Target program may be
Absolute machine language
It can be placed in a fixed location of memory
and immediately executed
Re-locatable machine language
Subprograms to be compiled separately
A set of re-locatable object modules can be
linked together and loaded for execution by a
linker
Assembly language
Easier

13
Issues in the Design of a Code Generator2)Target
Programs(7)Assumptions in this chapter

very simple RISC machine
CISC-like addressing modes(few)
For readability, assembly code as the target
language

14
Issues in the Design of a Code Generator3)
Instruction Selection(1)

The code generator must map the IR program into a
code sequence that can be executed by the target
machine.
The complexity of performing this mapping is
determined by a factors such as-
the level of the IR
the nature of the instruction-set
architecture
the desired quality of the generated code.

15
Issues in the Design of a Code Generator3)
Instruction Selection(2)1) Level of the IR

If the IR is high level-
often produces poor code that needs further
optimization.
If the IR is low level-
generate more efficient code sequences.

16
Issues in the Design of a Code Generator3)
Instruction Selection(3)2) Nature of the
instruction-set architecture

For example, the uniformity and completeness of
the instruction set are important factors.
If the target machine does not support each data
type in a uniform manner, then each exception to
the general rule requires special handling.
On some machines, for example, floating-point
operations are done using separate registers.

17
Issues in the Design of a Code Generator3)
Instruction Selection(4)3)The quality of the
generated code

is determined by its speed and size.
Say for 3-addr stat a a 1 the translated
sequence is
LD R0,a
Add R0,R0,1
ST a,R0
Instood ,if the target machine has increment
instruction (INC), then it would be more
efficient.
we can write inc a
We need to know instruction costs in order to
design good code sequences
But ,accurate cost information is often difficult
to obtain.

18
Issues in the Design of a Code Generator3)
Instruction Selection(5) 4) Instruction speeds
and machine idioms

For example, every three-address statement of the
form x y z, where x, y, and z are statically
allocated, can be translated as
LD RO, y
ADD RO, RO, z
ST x, RO
This strategy often produces redundant loads and
stores.

19
Issues in the Design of a Code Generator3)
Instruction Selection(6) 4) Instruction speeds
and machine idioms(2)

For example, the sequence of three-address
statements
a b c
d a e
can be translated as
LD RO, b
ADD RO, RO, c
ST a, RO
LD RO, a
ADD RO, RO, e
ST d, RO
Here, the fourth statement is redundant since it
loads a value that has just been stored, and
so is the third if a is not subsequently used.

20
Issues in the Design of a Code Generator4)
Register allocation (1)

A key problem in code generation is deciding what
values to hold in what registers.
Instructions involving
register operands - are usually shorter and
faster
Memory operands -larger and comparatively slow.
Efficient utilization of register is particularly
important in code generation.
The use of register is subdivided into two sub
problems
register allocation- during which we select the
set of variables that will reside in register at
a point in the program.
register assignment- during which we pick the
specific register that a variable will reside in.

21
Issues in the Design of a Code Generator4)
Register allocation (2)

For example certain machines require
register-pairs for some operands and results.
M x, y multiplication instruction
where x, the multiplicand, is the even register
of an even/odd register pair and
y, the multiplier, is the odd register.
The product occupies the entire even/odd register
pair.

22
Issues in the Design of a Code Generator4)
Register allocation (3)

D x, y the division instruction
where the dividend occupies an even/odd register
pair whose even register is x
the divisor is y.
After division, the even register holds the
remainder and the odd register the quotient.

23
Issues in the Design of a Code Generator4)
Register allocation (4)

Now, consider the two three-address code
sequences in which the only difference in the
second statement
t a b t a b
t t c t t c
t t / d t t / d
(a) (b)

24
Issues in the Design of a Code Generator4)
Register allocation (5)

The shortest assembly-code sequences for (a) and
(b) are
L R1,a L R0, a
A R1,b A R0, b
M R0,c A R0, c
D R0,d SRDA R0, 32
ST R1,t D R0, d
ST R1, t
(a)
(b)
Where SRDA stands for Shift-Right-Double-Arithmeti
c and
SRDA RO, 32 shifts the dividend into Rl and
clears RO so all bits equal its sign bit.

25
Issues in the Design of a Code Generator5)
Evaluation order

It affects the efficiency of the target code.
Some computation orders require fewer registers
to hold intermediate results than others.
Picking a best order in the general case is a
difficult NP-complete problem.
Initially, we shall avoid the problem by
generating code for the three-address statements
in the order in which they have been produced by
the intermediate code generator.

26
The Target Language

The target machine and its instruction set is a
prerequisite for designing a good code generator.
In this chapter, we shall use as a target
language assembly code for a simple computer that
is representative of many register machines.

27
A Simple Target Machine Model

It is a three-address machine with load and store
operations, computation operations, jump
operations, and conditional jumps.
Is a byte-addressable machine with n
general-purpose registers, R0,R1,... ,Rn - 1.
Very limited set of instructions
Assume that all operands are integers.
A label may precede an instruction.
Most instructions consists of an operator,
followed by a target, followed by a list of
source operands.

We assume the following kinds of instructions are
available
Load operations assignment dst addr
LD dst, addr
LD r, x
LD r1,r2
Store operationsassignment x r
ST x, r

Computation operationsOP dst, src1,src2,
SUB r1,r2,r3 r1 r2 - r3
Unconditional jumps
BR L
Conditional jumps
Bcond r, L,
where r is a register, L is a label
BLTZ r, L

30
Assume target machine has a variety of addressing
modes

Memory to-memory-
Indexed addressing - useful in accessing arrays
of the form a(r), where a is a variable and r is
a register.
For example, the instruction LD Rl, a(R2)
Rl contents (a contents (R2))

3. Register indexed addressing -useful for
pointers
For example, LD Rl, 100(R2)
Rl contents(100 contents(R2))
4. Indirect addressing -
r means the memory location found in the
location represented by the contents of register
r and
100(r) means the memory location found in the
location obtained by adding 100 to the contents
of r.
For example, LD Rl, 100(R2)
Rl contents(contents(100 contents(R2)))

5. Immediate addressing
The constant is prefixed by .
LD Rl, 100 loads the integer 100 into register
Rl,
ADD Rl, Rl, 100 adds the integer 100
into register Rl. '

The three-address statement x y - z can be
implemented by the machine instructions
LD Rl, y
LD R2, z
SUB Rl, Rl, R2
ST x, Rl

Suppose a is an array whose elements are 8-byte
values, perhaps real numbers.
Also assume elements of a are indexed starting at
0.
three-address instruction b a i by the
machine instructions
LD Rl, i // Rl i
MUL Rl, Rl, 8 // Rl Rl 8
LD R2, a(Rl) // R2 contents(a contents(Rl))
ST b, R2 // b R2

the assignment into the array a represented by
three-address instruction
a j c is implemented by
LD Rl, c // Rl c
LD R2, j // R2 j
MUL R2, R2, 8 // R2 R2 8
ST a(R2), Rl // contents(a contents(R2)) Rl

the three-address statement
x p, we can use machine instructions like
LD Rl, p // Rl p
LD R2, 0(R1) // R2 contents(0 contents(Rl))
ST x, R2 // x R2

The assignment through a pointer p y is
similarly implemented in machine code by
LD Rl, p // Rl p
LD R2, y // R2 y
ST 0(R1), R2 // contents(0 contents(Rl)) R2

a conditional-jump three-address instruction like
if x lt y goto L
The machine-code equivalent would be something
like
LD Rl, x
LD R2, y
SUB Rl, R l , R2
BLTZ R l , M

39
Generate code for the following three-address
statements assuminga and b are arrays whose
elements are 4-byte values.

x a i
y b j
a i y
b j x

40
Program and Instruction Costs

A cost with compiling and running a program.
some common cost measures are
the length of compilation time and the size,
running time and power consumption of the target
program.
Determining the actual cost of compiling and
running a program is a complex problem.

Finding an optimal target program for a given
source program is an undecidable problem
Many of the subproblems involved are NP-hard.
In code generation we must often be content with
heuristic techniques that produce good but not
necessarily optimal target programs.

Assume each target-language instruction has an
associated cost.
For simplicity, we take the cost of an
instruction to be one plus the costs associated
with the addressing modes of the operands.
This cost corresponds to the length in words of
the instruction.

Addressing modes involving
registers have zero additional cost,
memory location or constant in them have an
additional cost of one,
Some examples
LD RO, Rl cost1
LD RO, M cost2
LD Rl, 100(R2) cost 3

the cost of a target-language program on a given
input is the sum of costs of the individual
instructions executed when the program is run on
that input.
Good code-generation algorithms seek to minimize
the sum of the costs of the instructions executed
by the generated target program on typical inputs.

Determine the costs of the following instruction
sequence
LD RO, y
LD Rl, z
ADD RO, RO, Rl
ST x, RO

46
Basic Blocks and Flow Graphs
Basic Block A basic block is a sequence of
consecutive statements in which flow of control
enters at the beginning and leaves at the end
without halt or possibly of the branching except
at the end.

Flow Graph A graph representation of three
address statements, called flow graph.
Nodes in the flow graph represent computations.
Edges represent the flow of control.
Used to do better job of register allocation and
instruction selection.

47
Basic Blocks (2)

Algorithm Partitioning three address
instructions into basic blocks
Input a sequence of three address instructions.
Output a list of basic block for that sequence
in which each instruction is assigned to exactly
one basic block.
Method
We first determine the leader(first instruction
in some basic block)
1) The first instruction is a leader
2) Any instruction that is the target of a
conditional or unconditional goto is a leader
3) Any instruction that immediately follows a
goto or unconditional goto instruction is a
leader
For each leader, its basic block consists of the
leader and all the instructions up to but not
including the next leader or the end of the
program.

48
Basic Blocks(3)

Example Consider the source code where 10 x 10
matrix a is converted into an identity matrix.
for i from 1 to 10 do
for j from 1 to 10 do
ai,j) 0.0
for i from 1 to 10 do
ai, i 1.0
In generating the intermediate code, we have
assumed that the real-valued array elements take
8 bytes each, and that the matrix a is stored in
row-major form.

49
Intermediate code to set a 10 x 10 matrix to an
identity matrix

1) i 1
2) j 1
3) t l 10 i
4) t 2 t l j
5) t 3 8 t2
6) t 4 t3 - 88
7) a t 4 0.0
8) j j 1
9) i f j lt 10 goto (3)

10 ) i i 1
11) i f i lt 10 goto (2)
12) i 1
13) t 5 i - 1
14) t 6 88 t5
15) a t 6 1.0
16) i i 1
17) i f i lt 10 goto (13)

50
Basic Blocks (5)

The leaders are instructions-
1) By rule 1 of the algorithm
2) By rule 2 of the algorithm
3) By rule 2 of the algorithm
10) By rule 3 of the algorithm
12) By rule 3 of the algorithm
13) By rule 2 of the algorithm
We conclude that the leaders are instructions 1,
2, 3, 10, 12, and 13.

51
Basic Blocks(6)

The basic block of each leader contains all the
instructions from itself until just before the
next leader.
Thus,the basic block 1 is just having 1)
the basic block 2 is having 2)
the basic block 3 is having 3) to 9)
the basic block 4 is having 10) to 11)
the basic block 5 is having 12)
the basic block 6 is having 13) to 17)

52
Flow Graphs

Once an intermediate-code program is partitioned
into basic blocks, we represent the flow of
control between them by a flow graph.
The nodes of the flow graph are the basic blocks.
we add two nodes, called the entry and exit, that
do not correspond to executable intermediate
instructions.
There is an edge from the entry to the first
executable node of the flow graph, that is, to
the basic block that comes from the first
instruction of the intermediate code.
There is an edge to the exit from any basic
block that contains an instruction that could be
the last executed instruction of the program.

53
Flow Graphs(2)
54
Representation of Flow Graphs

Flow graphs, being quite ordinary graphs, can be
represented by any of the data structures
appropriate for graphs.
It is likely to be more efficient to create a
linked list of instructions for each basic block.

55
Loops

Every program spends most of its time in
executing its loops, it is especially important
for a compiler to generate good code for loops.
Many code transformations depend upon the
identification of "loops" in a flow graph.

56
Loops(2)

We say that a set of nodes L in a flow graph is a
loop if
1. There is a node in L called the loop entry
with the property that no other node in L has a
predecessor outside L. That is, every path from
the entry of the entire flow graph to any node in
L goes through the loop entry.
2. Every node in L has a nonempty path,
completely within L, to the entry of L.

57
Loops(3)

Example The flow graph has three loops
1. B3 by itself.
2. B6 by itself.
3. B2, B3, B4.

58
Flow Graphs(3)

The successor of B1 is B2.
The successor of B3 is B3 and B4.
The successor of B4 is B2,B3,B4 and B5.
The successor of B5 is B6.

59
Next-Use Information

If the value of a variable that is currently in a
register will never be referenced subsequently,
then that register can be assigned to another
variable.
Suppose three-address statement i assigns a value
to x. If statement j has x as an operand, and
control can flow from statement i to j along a
path that has no intervening assignments to x,
then we say statement j uses the value of x
computed at statement i. We further say that x is
live at statement i.
We wish to determine for each three-address
statement x y z what the next uses of x, y,
and z are.

60
Next-Use InformationAlgorithm to determining
the liveness and next-use information for each
statement in a basic block.

INPUT A basic block B of three-address
statements. We assume that the
symbol table initially shows all nontemporary
variables in B as being live on exit.
OUTPUT At each statement i x y z in B, we
attach to i the liveness and
next-use information of x, y, and z.
METHOD We start at the last statement in B and
scan backwards to the
beginning of B. At each statement i x y z in
B, we do the following
1. Attach to statement i the information
currently found in the symbol table regarding the
next use and liveness of x, y, and y.
2. In the symbol table, set x to "not live" and
"no next use."
3. In the symbol table, set y and z to "live" and
the next uses of y and z to i.

61
Next-Use Information

Here we have used as a symbol representing any
operator. If the three-address statement i is of
the form x y or x y, the steps are the same
as above, ignoring z.
Note that the order of steps (2) and (3) may not
be interchanged because x may be y or z.
For example -quadruple i x y op z
Record next uses of x, y ,z into quadruple
Mark x dead (previous value has no next use)
Next use of y is i next use of z is i y, z
are live

62
Transformation on Basic Block

A basic block computes a set of expressions.
Transformations are useful for improving the
quality of code.
Two important classes of local optimizations that
can be applied to a basic blocks
Structure Preserving Transformations
Algebraic Transformations

63
The DAG Representation of Basic Blocks

Many important techniques for local optimization
begin by transforming a basic block into a DAG
(directed acyclic graph).
Construction of a DAG for a basic block is as
follows
There is a node in the DAG for each of the
initial values of the variables appearing in the
basic block.
2. There is a node N associated with each
statement s within the block. The children of N
are those nodes corresponding to statements that
are the last definitions, prior to s, of the
operands used by s.

3. Node N is labeled by the operator applied at
s, and also attached to N is the list of
variables for which it is the last definition
within the block.
4. Certain nodes are designated output nodes.
These are the nodes whose variables are live on
exit from the block that is, their values may be
used later, in another block of the flow graph.
Calculation of these "live variables" is a matter
for global flow analysis.

The DAG representation of a basic block lets us
perform several code improving transformations on
the code represented by the block.
a) We can eliminate local common subexpressions,
that is, instructions that compute a value that
has already been computed.
b) We can eliminate dead code, that is,
instructions that compute a value that is never
used.
c) We can reorder statements that do not depend
on one another such reordering may reduce the
time a temporary value needs to be preserved in a
register.
d) We can apply algebraic laws to reorder
operands of three-address instructions, and
sometimes thereby simplify the computation.

66
Finding Local Common Subexpressions

Common subexpressions can be detected by using
"value-number" method.
As a new node M is about to be added, whether
there is an existing node N with the same
children, in the same order, and with the same
operator.
If so, N computes the same value as M and may be
used in its place.
Consider a block a b c
b a - d
c b c
d a - d

67
The DAG for the basic block is
68

the node corresponding to the fourth statement d
a - d has the operator - and the nodes with
attached variables a and do as children.
Since the operator and the children are the same
as those for the node corresponding to statement
two, we do not create this node, but add d to the
list of definitions for the node labeled .
In fact, if b is not live on exit from the block,
then we do not need to compute that variable, and
can use d to receive the value represented by the
node labeled .

The block then become
a b c
d a - d
c d c
However, if both b and d are live on exit, then a
fourth statement must be used to copy the value
from one to the other.

a b c
b b - d
c c d
e b c

When we look for common subexpressions we really
are looking for expressions that are guaranteed
to compute the same value, no matter how that
value is computed.
Thus, the DAG method will miss the fact that the
expression computed by the first and fourth
statements in the sequence is the same b0c0.

That is, even though b and c both change between
the first and last statements, their sum remains
the same, because b c (b - d) (c d).
The DAG does not exhibit any common
subexpressions.
However, algebraic identities applied
to the DAG, may expose the equivalence.

73
Dead Code Elimination

Delete from a DAG any root (node with no
ancestors) that has no live variables attached.
Repeated application of this transformation will
remove all nodes from the DAG that correspond to
dead code.
Example

In the above DAG ,a and b are live but c and e
are not, we can immediately remove the root
labeled e.
Then, the node labeled c becomes a root and can
be removed. The roots labeled a and b remain,
since they each have live variables attached.

75
Structure Preserving Transformations

Dead Code Elemination
Renaming Temporary Variables
say, t bc where t is a temporary var.
If we change u bc, then change all instances
of t to u.
Interchange of Statements
t1 b c
t2 x y
We can interchange iff neither x nor y is t1 and
neither b nor c is t2

Say, x is dead, that is never subsequently used,
at the point where the statement x y z
appears in a block. We can safely remove x
76
Algebraic Transformations

Replace expensive expressions by cheaper one
X X 0 eliminate
X X 1 eliminate
X y2 (why expensive? Answer Normally
implemented by function call)
by X y y
Flow graph
We can add flow of control information to the set
of basic blocks making up a program by
constructing directed graph called flow graph.
There is a directed edge from block B1 to block
B2 if
There is conditional or unconditional jump from
the last statement of B1 to the first statement
of B2 or
B2 is immediately follows B1 in the order of the
program, and B1 does not end in an unconditional
jump.

77
Loops

A loop is a collection of nodes in a flow graph
such that
All nodes in the collection are strongly
connected, that is from any node in the loop to
any other, there is a path of length one or more,
wholly within the loop, and
The collection of nodes has a unique entry, that
is, a node in the loop such that, the only way to
reach a node from a node out side the loop is to
first go through the entry.

78
The DAG representation of Basic Block

1 A 4i
2 B aA
3 C 4i
4 D bC
5 E B D
6 F prod E
7 Prod F
8 G i 1
9 i G
10 if I lt 20 goto (1)

prod
lt

20

a
b
4
i0
1
79
Exercise

given the code fragment
draw the dependency graph before and after
common subexpression elimination.

x aa 2ab bb y aa 2ab bb
80
Answers

dependency graph before CSE

81
Answers

dependency graph after CSE

x

2
82
Answers

dependency graph after CSE

83
Better code generation requires greater context

Over expressions
optimal ordering of subtrees
Over basic blocks
Common subexpression elimination
Register tracking with last-use information
Over procedures
global register allocation, register coloring
Over the program
Interprocedural flow analysis

84
Basic blocks

Better code generation requires information about
points of definition and points of use of
variables
In the presence of flow of control, value of a
variable can depend on multiple points in the
program
y 12
x y 2 -- here x
24
label1
x y 2 -- 24?
Cant tell, y may be different
A basic block is a single-entry, single-exit
code fragment values that are computed within a
basic block have a single origin more constant
folding and common subexpression elimination,
better register use.

85
Finding basic blocks

To partition a program into basic blocks
Call the first instruction (quadruple) in a basic
block its leader
The first instruction in the program is a leader
Any instruction that is the target of a jump is a
leader
Any instruction that follows a jump is a leader
In the presence of procedures with side-effects,
every procedure call ends a basic block
A basic block includes the leader and all
instructions that follow, up to but not including
the next leader

86
Transformations on basic blocks

Common subexpression elimination recognize
redundant computations, replace with single
temporary
Dead-code elimination recognize computations not
used subsequently, remove quadruples
Interchange statements, for better scheduling
Renaming of temporaries, for better register
usage
All of the above require symbolic execution of
the basic block, to obtain definition/use
information

87
Simple symbolic interpretation next-use
information

If x is computed in quadruple i, and is an
operand of quadruple j, j gt i, its value must be
preserved (register or memory) until j.
If x is computed at k, k gt i, the value computed
at i has no further use, and be discarded (i.e.
register reused)
Next-use information is annotation over
quadruples and symbol table.
Computed on one backwards pass over quadruple.

88
Computing next-use

Use symbol table to annotate status of variables
Each operand in a quadruple carries additional
information
Operand liveness (boolean)
Operand next use (later quadruple)
On exit from block, all temporaries are dead (no
next-use)
For quadruple q x y op z
Record next uses of x, y ,z into quadruple
Mark x dead (previous value has no next use)
Next use of y is q next use of z is q y, z
are live

89
Register allocation over basic block tracking

Goal is to minimize use of registers and memory
references
Doubly linked data structure
For each register, indicate current contents (set
of variables) register descriptor.
For each variable, indicate location of current
value memory and/or registers address
descriptor.
Procedure getreg determines optimal choice to
hold result of next quadruple

90
Getreg heuristics

For quadruple x y op z
if y is in Ri, Ri contains no other variable, y
is not live, and there is no next use of y, use
Ri
Else if there is an available register Rj, use it
Else if there is a register Rk that holds a dead
variable, use it
If y is in Ri, Ri contains no other variable, and
y is also in memory, use Ri.
Else find a register that holds a live variable,
store variable in memory (spill), and use
register
Choose variable whose next use is farthest away

91
Using getreg

For x y op z
Call getreg to obtain target register R
Find current location of y, generate load into
register if in memory, update address descriptor
for y
Ditto for z
Emit instruction
Update register descriptor for R, to indicate it
holds x
Update address descriptor for x to indicate it
resides in R
For x y
Single load, register descriptor indicates that
both x and y are in R.
On block exit, store registers that contain live
values

92
Computing dependencies in a basic block the dag

Use directed acyclic graph (dag) to recognize
common subexpressions and remove redundant
quadruples.
Intermediate code optimization
basic block gt dag gt improved block gt assembly
Leaves are labeled with identifiers and
constants.
Internal nodes are labeled with operators and
identifiers

93
Dag construction

Forward pass over basic block
For x y op z
Find node labeled y, or create one
Find node labeled z, or create one
Create new node for op, or find an existing one
with descendants y, z (need hash scheme)
Add x to list of labels for new node
Remove label x from node on which it appeared
For x y
Add x to list of labels of node which currently
holds y

94
Example dot product

prod 0
for j in 1 .. 20 loop
prod prod a (j) b
(j) -- assume 4-byte integer
end loop
Quadruples
prod 0
-- basic block leader
J 1
start T1 4 j
-- basic block leader
T2 a (T1)
T3 4 j
-- redundant
T4 b (T3)
T5 T2 T4
T6 prod T5
prod T6
T7 j 1
j T7
If j lt 20 goto
start

95
Dag for body of loop

Common subexpression identified

T6, prod
Start

lt
T5

prod0
20
T7, i

T4

T2

1
j0
T1, T3

a
b
4
j
96
From dag to improved block

Any topological sort of the dag is a legal
evaluation order
A node without a label is a dead value
Choose the label of a live variable over a
temporary
start T1 4 j
T2 a T1
T4 b T1
T5 T2 T4
prod prod T5
J J 1
If j lt20 goto start
Fewer quadruples, fewer temporaries

97
Programmers dont produce common subexpressions,
code generators do!

A, B matrix (lo1 .. hi1, lo2 ..
hi2) -- component size w bytes
A (j, k) is at location
base_a ((j lo1) (hi2 lo2 1) k
lo2) w
The following requires 19 quadruples
for k in lo .. hi loop
A ( j, k) 1 B (j, k)
end loop
Can reduce to 11 with a dag
base_a (j lo1) (hi2 lo2 1) w is loop
invariant ( loop optimization)
w is often a power of two (peephole optimization)

98
Beyond basic blocks data flow analysis

Basic blocks are nodes in the flow graph
Can compute global properties of program as
iterative algorithms on graph
Constant folding
Common subexpression elimination
Live-dead analysis
Loop invariant computations
Requires complex data structures and algorithms

99
Using global information register coloring

Optimal use of registers in subprogram keep all
variables in registers throughout
To reuse registers, need to know lifetime of
variable (set of instructions in program)
Two variables cannot be assigned the same
register if their lifetimes overlap
Lifetime information is translated into
interference graph
Each variable is a node in a graph
There is an edge between two nodes if the
lifetimes of the corresponding variables overlap
Register assignment is equivalent to graph
coloring

100
Graph coloring

Given a graph and a set of N colors, assign a
color to each vertex so two vertices connected by
an edge have different colors
Problem is NP-complete
Fast heuristic algorithm (Chaitin) is usually
linear
Any node with fewer than N -1 neighbors is
colorable, so can be deleted from graph. Start
with node with smallest number of neighbors.
Iterate until graph is empty, then assign colors
in inverse order
If at any point a node has more that N -1
neighbors, need to free a register (spill). Can
then remove node and continue.

101
Example

F A B F A
D E C D E C
Order of removal B, C, A, E, F, D
Assume 3 colors are available assign colors in
reverse order, constrained by already colored
nodes.
D (no constraint) F (D) E (D) A (F, E) C (D, A )
B (A, C)