Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining presentation

About This Presentation

Transcript and Presenter's Notes

Title: Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining

1
Advanced Computer Architecture5MD00 /
5Z033MIPS Pipelining

Henk Corporaal
www.ics.ele.tue.nl/heco/courses/aca
TUEindhoven
2009

2
Topics

Pipelining
Pipelined datapath
Pipelined control
Hazards
Structural
Data
Control
Exceptions
Scheduling
For details see the book (chapter 6)

3
Pipelining

Improve performance by increasing instruction
throughput

4
Pipelining

Ideal speedup number of stages
Do we achieve this?

5
Pipelining

What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores
What makes it hard?
structural hazards suppose we had only one
memory
control hazards need to worry about branch
instructions
data hazards an instruction depends on a
previous instruction
Well build a simple pipeline and look at these
issues
Well talk about modern processors and what
really makes it hard
exception handling
trying to improve performance with out-of-order
execution, etc.

6
Basic idea start from single cycle impl.

What do we need to add to actually split the
datapath into stages?

7
Pipelined Datapath

Can you find a problem even if there are no
dependencies? What instructions can we execute
to manifest the problem?

8
Corrected Datapath
0
M
u
x
1
I
F
/
I
D
E
X
/
M
E
M
M
E
M
/
W
B
A
d
d
A
d
d
4
A
d
d
r
e
s
u
l
t
S
h
i
f
t
l
e
f
t

2
n
o
A
d
d
r
e
s
s
P
C
i
t
c
u
r
t
s
n
I
n
s
t
r
u
c
t
i
o
n
I
m
e
m
o
r
y
0
0
3
2
9
Graphically Representing Pipelines

Can help with answering questions like
how many cycles does it take to execute this
code?
what is the ALU doing during cycle 4?
use this representation to help understand
datapaths

10
Pipeline Control
11
Pipeline Control
(compare single cycle control!)

Pass control signals along
just like the data

12
Datapath with Control
13
Hazards
14
Hazards

Hazards problems due to pipelining
Hazard types
Structural
same resource is needed multiple times in the
same cycle
Data
data dependencies limit pipelining
Control
next executed instruction may not be the next
specified instruction

15
Structural hazards

Examples
Two accesses to a single ported memory
Two operations need the same function unitat the
same time
Two operations need the same function unitin
successive cycles, but the unit is not pipelined
Solutions
stalling
add more hardware

16
Structural hazards

Simple pipelining diagram (but not MIPS!)
IF instruction fetch
ID instruction decode
OF operand fetch
EX execute stage(s)
WB write back

time
Instruction stream
Pipeline stalls due to lack of resources
17
Structural hazards

Non-pipelined units

Same non-pipelined FU
time
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
EX
IF
ID
OF
EX
WB
EX
Instruction stream
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
Stall cycle
18
Structural hazards on MIPS

Q Do we have structural hazards on our simple
MIPS pipeline?

19
Data hazards

Data dependencies
RaW (read-after-write)
WaW (write-after-write)
WaR (write-after-read)
Hardware solution
Forwarding / Bypassing
Detection logic
Stalling
Software solution Scheduling

20
Data dependences

Three types RaW, WaR and WaW
add r1, r2, 5 r1 r25
sub r4, r1, r3 RaW of r1
add r1, r2, 5
sub r2, r4, 1 WaR of r2
add r1, r2, 5
sub r1, r1, 1 WaW of r1
st r1, 5(r2) Mr25 r1
ld r5, 0(r4) RaW if 5r2 0r4

WaW and WaR do not occur in simple pipelines, but
they limit scheduling freedom! Problems for
your compiler and Pentium! ? use register
renaming to solve this!
21
RaW dependence
add r1, r2, 5 r1 r25 sub r4, r1, r3 RaW of
r1
Without bypass circuitry
time
add r1, r2, 5
sub r4, r1, r3
OF
EX
WB
IF
ID
With bypass circuitry
time
add r1, r2, 5
Saves two cycles
sub r4, r1, r3
22
RaW on MIPS pipeline
23
Forwarding

Use temporary results, dont wait for them to be
written
register file forwarding to handle read/write to
same register
ALU forwarding

24
Forwarding hardware
ALU forwarding circuitry principle

Note there are two options
buf - ALU bypass mux - buf
buf - bypass mux ALU - buf

25
Forwarding
26
Forwarding check

Check for matching register-ids
For each source-id of operation in the EX-stage
check if there is a matching pending dest-id

Q. How many comparators do we need?
27
Can't always forward

Load word can still cause a hazard
an instruction tries to read register r following
a load to the same r
Need a hazard detection unit to stall the load
instruction

28
Stalling

We can stall the pipeline by keeping an
instruction in the same stage

29
Hazard Detection Unit
I
D
/
E
X
.
M
e
m
R
e
a
d
H
a
z
a
r
d
d
e
t
e
c
t
i
o
n
I
D
/
E
X
u
n
i
t
W
B
e
t
E
X
/
M
E
M
i
r
W
M
D
I
/
C
o
n
t
r
o
l
M
W
B
u
F
M
E
M
/
W
B
I
x
0
E
X
M
W
B
I
F
/
I
D
e
t
i
r
W
M
n
C
u
o
P
i
t
x
c
u
r
t
R
e
g
i
s
t
e
r
s
s
n
D
a
t
a
I
n
s
t
r
u
c
t
i
o
n
I
A
L
U
P
C
m
e
m
o
r
y
M
m
e
m
o
r
y
u
x
M
u
x
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
s
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
R
t
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
M
E
X
/
M
E
M
.
R
e
g
i
s
t
e
r
R
d
u
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
d
R
d
x
I
D
/
E
X
.
R
e
g
i
s
t
e
r
R
t
R
s
F
o
r
w
a
r
d
i
n
g
M
E
M
/
W
B
.
R
e
g
i
s
t
e
r
R
d
u
n
i
t
R
t
30
Software only solution?

Have compiler guarantee that no hazards occur
Example where do we insert the NOPs
? sub 2, 1, 3 and 12, 2, 5 or 13,
6, 2 add 14, 2, 2 sw 13, 100(2)
Problem this really slows us down!

sub 2, 1, 3 nop nopand 12, 2, 5or 13,
6, 2add 14, 2, 2 nopsw 13, 100(2)
31
Control hazards

Control operations may change the sequential flow
of instructions
branch
jump
call (jump and link)
return
(exception/interrupt and rti / return from
interrupt)

32
Control hazard Branch

Branch actions
Compute new address
Determine condition
Perform the actual branch (if taken) PC new
address

33
Branch example
34
Branching

Squash pipeline
When we decide to branch, other instructions are
in the pipeline!
We are predicting branch not taken
need to add hardware for flushing instructions if
we are wrong

35
Branch with predict not taken
Clock cycles
Branch L
IF
ID
EX
MEM
WB
Predict not taken
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
L
36
Branch speedup

Earlier address computation
Earlier condition calculation
Put both in the ID pipeline stage
adder
comparator

37
Improved branching / flushing IF/ID

38
Exception support

Types of exceptions
Overflow
I/O device request
Operating system call
Undefined instruction
Hardware malfunction
Page fault
Precise exception
finish previous instructions (which are still in
the pipeline)
flush excepting and following instructions, redo
them after handling the exception(s)

39
Exceptions

Changes needed for handling overflow exception of
an operation in EX stage (see book for details)
Extend PC input mux with extra entry with fixed
address
Add EPC register recording the ID/EX stage PC
this is the address of the next instruction !
Cause register recording exception type
E.g., in case of overflow exception insert 3
bubblesflush the following stages
IF/ID stage
ID/EX stage
EX/MEM stage

40
Scheduling, why?

Lets look at the execution time
Texecution Ncycles x Tcycle
Ninstructions x CPI x Tcycle
Scheduling may reduce Texecution
Reduce CPI (cycles per instruction)
early scheduling of long latency operations
avoid pipeline stalls due to structural, data and
control hazards
allow Nissue gt 1 and therefore CPI lt 1
Reduce Ninstructions
compact many operations into each instruction
(VLIW)

41
Scheduling data hazardsexample 1

Try to avoid RaW stalls
E.g., reorder these instructions

Store has to wait for t2
lw t0, 0(t1) lw t2, 4(t1) sw t0, 4(t1) sw
t2, 0(t1)
lw t0, 0(t1) lw t2, 4(t1) sw t2, 0(t1) sw
t0, 4(t1)
?
Note if you study the MIPS pipeline carefully,
you may observe that HW forwarding support could
have avoided this as well.
42
Scheduling data hazardsexample 2
Avoiding RaW stalls
Reordering instructions for following program (by
you or the compiler)
Code a b c d e - f
43
Scheduling control hazards

Texecution Ninstructions x CPI x Tcycle
CPI CPIideal fbranch x Pbranch
where Pbranch Ndelayslots x miss_rate
Modern processors tend to have large branch
penalty, Pbranch, due to
many pipeline stages
multi-issue
Note that penalties have much larger impact when
CPIideal is low

44
Scheduling control hazards

What can we do about control hazards and CPI
penalty?
Keep penalty Pbranch low
Early computation of new PC
Early determination of condition
Visible branch delay slots filled by compiler
(MIPS)
Branch prediction
Reduce control dependencies (control height
reduction) Schlansker and Kathail, Micro95
Remove branches if-conversion
Conditional instructions CMOVE, cond skip next
Guarding all instructions TriMedia

45
Branch delay slot

Add a branch delay slot
the next instruction after a branch is always
executed
rely on compiler to fill the slot with
something useful
Is this a good idea?
let's look how it works

46
Branch delay slot scheduling
Q. What to put in the delay slot?
op 1
beq r1,r2, L
.............
op 2
.............
'fall-through'
L op 3
branch target
.............
47
Summary

Modern processors are (deeply) pipelined, to
reduce Tcycle and aim at CPI 1
Hazards increase CPI
Several software and hardware measure to avoid or
reduce hazards are taken
Not discussed you'll see later
Multi-issue further reduces CPI
Branch prediction to avoid high branch penalties
Dynamic scheduling
In all cases a scheduling compiler needed

Write a Comment

User Comments (0)

About PowerShow.com

Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining PowerPoint PPT Presentation