Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining - PowerPoint PPT Presentation

Loading...

PPT – Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining PowerPoint presentation | free to view - id: 298096-OTY4Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining

Description:

structural hazards: suppose we had only one memory. control hazards: need to worry about ... Basic idea: start from single cycle impl. What do we need to add to ... – PowerPoint PPT presentation

Number of Views:515
Avg rating:3.0/5.0
Slides: 48
Provided by: HECO2
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Advanced Computer Architecture 5MD00 5Z033 MIPS Pipelining


1
Advanced Computer Architecture5MD00 /
5Z033MIPS Pipelining
  • Henk Corporaal
  • www.ics.ele.tue.nl/heco/courses/aca
  • TUEindhoven
  • 2009

2
Topics
  • Pipelining
  • Pipelined datapath
  • Pipelined control
  • Hazards
  • Structural
  • Data
  • Control
  • Exceptions
  • Scheduling
  • For details see the book (chapter 6)

3
Pipelining
  • Improve performance by increasing instruction
    throughput

4
Pipelining
  • Ideal speedup number of stages
  • Do we achieve this?

5
Pipelining
  • What makes it easy
  • all instructions are the same length
  • just a few instruction formats
  • memory operands appear only in loads and stores
  • What makes it hard?
  • structural hazards suppose we had only one
    memory
  • control hazards need to worry about branch
    instructions
  • data hazards an instruction depends on a
    previous instruction
  • Well build a simple pipeline and look at these
    issues
  • Well talk about modern processors and what
    really makes it hard
  • exception handling
  • trying to improve performance with out-of-order
    execution, etc.

6
Basic idea start from single cycle impl.
  • What do we need to add to actually split the
    datapath into stages?

7
Pipelined Datapath
  • Can you find a problem even if there are no
    dependencies? What instructions can we execute
    to manifest the problem?

8
Corrected Datapath
0
M
u
x
1
I
F
/
I
D
E
X
/
M
E
M
M
E
M
/
W
B
A
d
d
A
d
d
4
A
d
d
r
e
s
u
l
t
S
h
i
f
t
l
e
f
t

2
n
o
A
d
d
r
e
s
s
P
C
i
t
c
u
r
t
s
n
I
n
s
t
r
u
c
t
i
o
n
I
m
e
m
o
r
y
0
0
3
2
9
Graphically Representing Pipelines
  • Can help with answering questions like
  • how many cycles does it take to execute this
    code?
  • what is the ALU doing during cycle 4?
  • use this representation to help understand
    datapaths

10
Pipeline Control
11
Pipeline Control
(compare single cycle control!)
  • Pass control signals along
  • just like the data

12
Datapath with Control
13
Hazards
14
Hazards
  • Hazards problems due to pipelining
  • Hazard types
  • Structural
  • same resource is needed multiple times in the
    same cycle
  • Data
  • data dependencies limit pipelining
  • Control
  • next executed instruction may not be the next
    specified instruction

15
Structural hazards
  • Examples
  • Two accesses to a single ported memory
  • Two operations need the same function unitat the
    same time
  • Two operations need the same function unitin
    successive cycles, but the unit is not pipelined
  • Solutions
  • stalling
  • add more hardware

16
Structural hazards
  • Simple pipelining diagram (but not MIPS!)
  • IF instruction fetch
  • ID instruction decode
  • OF operand fetch
  • EX execute stage(s)
  • WB write back

time
Instruction stream
Pipeline stalls due to lack of resources
17
Structural hazards
  • Non-pipelined units

Same non-pipelined FU
time
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
EX
IF
ID
OF
EX
WB
EX
Instruction stream
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
IF
ID
OF
EX
WB
Stall cycle
18
Structural hazards on MIPS
  • Q Do we have structural hazards on our simple
    MIPS pipeline?

19
Data hazards
  • Data dependencies
  • RaW (read-after-write)
  • WaW (write-after-write)
  • WaR (write-after-read)
  • Hardware solution
  • Forwarding / Bypassing
  • Detection logic
  • Stalling
  • Software solution Scheduling

20
Data dependences
  • Three types RaW, WaR and WaW
  • add r1, r2, 5 r1 r25
  • sub r4, r1, r3 RaW of r1
  • add r1, r2, 5
  • sub r2, r4, 1 WaR of r2
  • add r1, r2, 5
  • sub r1, r1, 1 WaW of r1
  • st r1, 5(r2) Mr25 r1
  • ld r5, 0(r4) RaW if 5r2 0r4

WaW and WaR do not occur in simple pipelines, but
they limit scheduling freedom! Problems for
your compiler and Pentium! ? use register
renaming to solve this!
21
RaW dependence
add r1, r2, 5 r1 r25 sub r4, r1, r3 RaW of
r1
Without bypass circuitry
time
add r1, r2, 5
sub r4, r1, r3
OF
EX
WB
IF
ID
With bypass circuitry
time
add r1, r2, 5
Saves two cycles
sub r4, r1, r3
22
RaW on MIPS pipeline
23
Forwarding
  • Use temporary results, dont wait for them to be
    written
  • register file forwarding to handle read/write to
    same register
  • ALU forwarding

24
Forwarding hardware
ALU forwarding circuitry principle
  • Note there are two options
  • buf - ALU bypass mux - buf
  • buf - bypass mux ALU - buf

25
Forwarding
26
Forwarding check
  • Check for matching register-ids
  • For each source-id of operation in the EX-stage
    check if there is a matching pending dest-id

Q. How many comparators do we need?
27
Can't always forward
  • Load word can still cause a hazard
  • an instruction tries to read register r following
    a load to the same r
  • Need a hazard detection unit to stall the load
    instruction

28
Stalling
  • We can stall the pipeline by keeping an
    instruction in the same stage

29
Hazard Detection Unit
I
D
/
E
X
.
M
e
m
R
e
a
d
H
a
z
a
r
d
d
e
t
e
c
t
i
o
n
I
D
/
E
X
u
n
i
t
W
B
e
t
E
X
/
M
E
M
i
r
W
M
D
I
/
C
o
n
t
r
o
l
M
W
B
u
F
M
E
M
/
W
B
I
x
0
E
X
M
W
B
I
F
/
I
D
e
t
i
r
W
M
n
C
u
o
P
i
t
x
c
u
r
t
R
e
g
i
s
t
e
r
s
s
n
D
a
t
a
I
n
s
t
r
u
c
t
i
o
n
I
A
L
U
P
C
m
e
m
o
r
y
M
m
e
m
o
r
y
u
x
M
u
x
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
s
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
R
t
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
M
E
X
/
M
E
M
.
R
e
g
i
s
t
e
r
R
d
u
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
d
R
d
x
I
D
/
E
X
.
R
e
g
i
s
t
e
r
R
t
R
s
F
o
r
w
a
r
d
i
n
g
M
E
M
/
W
B
.
R
e
g
i
s
t
e
r
R
d
u
n
i
t
R
t
30
Software only solution?
  • Have compiler guarantee that no hazards occur
  • Example where do we insert the NOPs
    ? sub 2, 1, 3 and 12, 2, 5 or 13,
    6, 2 add 14, 2, 2 sw 13, 100(2)
  • Problem this really slows us down!

sub 2, 1, 3 nop nopand 12, 2, 5or 13,
6, 2add 14, 2, 2 nopsw 13, 100(2)
31
Control hazards
  • Control operations may change the sequential flow
    of instructions
  • branch
  • jump
  • call (jump and link)
  • return
  • (exception/interrupt and rti / return from
    interrupt)

32
Control hazard Branch
  • Branch actions
  • Compute new address
  • Determine condition
  • Perform the actual branch (if taken) PC new
    address

33
Branch example
34
Branching
  • Squash pipeline
  • When we decide to branch, other instructions are
    in the pipeline!
  • We are predicting branch not taken
  • need to add hardware for flushing instructions if
    we are wrong

35
Branch with predict not taken
Clock cycles
Branch L
IF
ID
EX
MEM
WB
Predict not taken
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
L
36
Branch speedup
  • Earlier address computation
  • Earlier condition calculation
  • Put both in the ID pipeline stage
  • adder
  • comparator

37
Improved branching / flushing IF/ID

38
Exception support
  • Types of exceptions
  • Overflow
  • I/O device request
  • Operating system call
  • Undefined instruction
  • Hardware malfunction
  • Page fault
  • Precise exception
  • finish previous instructions (which are still in
    the pipeline)
  • flush excepting and following instructions, redo
    them after handling the exception(s)

39
Exceptions
  • Changes needed for handling overflow exception of
    an operation in EX stage (see book for details)
  • Extend PC input mux with extra entry with fixed
    address
  • Add EPC register recording the ID/EX stage PC
  • this is the address of the next instruction !
  • Cause register recording exception type
  • E.g., in case of overflow exception insert 3
    bubblesflush the following stages
  • IF/ID stage
  • ID/EX stage
  • EX/MEM stage

40
Scheduling, why?
  • Lets look at the execution time
  • Texecution Ncycles x Tcycle
  • Ninstructions x CPI x Tcycle
  • Scheduling may reduce Texecution
  • Reduce CPI (cycles per instruction)
  • early scheduling of long latency operations
  • avoid pipeline stalls due to structural, data and
    control hazards
  • allow Nissue gt 1 and therefore CPI lt 1
  • Reduce Ninstructions
  • compact many operations into each instruction
    (VLIW)

41
Scheduling data hazardsexample 1
  • Try to avoid RaW stalls
  • E.g., reorder these instructions

Store has to wait for t2
lw t0, 0(t1) lw t2, 4(t1) sw t0, 4(t1) sw
t2, 0(t1)
lw t0, 0(t1) lw t2, 4(t1) sw t2, 0(t1) sw
t0, 4(t1)
?
Note if you study the MIPS pipeline carefully,
you may observe that HW forwarding support could
have avoided this as well.
42
Scheduling data hazardsexample 2
Avoiding RaW stalls
Reordering instructions for following program (by
you or the compiler)
Code a b c d e - f
43
Scheduling control hazards
  • Texecution Ninstructions x CPI x Tcycle
  • CPI CPIideal fbranch x Pbranch
  • where Pbranch Ndelayslots x miss_rate
  • Modern processors tend to have large branch
    penalty, Pbranch, due to
  • many pipeline stages
  • multi-issue
  • Note that penalties have much larger impact when
    CPIideal is low

44
Scheduling control hazards
  • What can we do about control hazards and CPI
    penalty?
  • Keep penalty Pbranch low
  • Early computation of new PC
  • Early determination of condition
  • Visible branch delay slots filled by compiler
    (MIPS)
  • Branch prediction
  • Reduce control dependencies (control height
    reduction) Schlansker and Kathail, Micro95
  • Remove branches if-conversion
  • Conditional instructions CMOVE, cond skip next
  • Guarding all instructions TriMedia

45
Branch delay slot
  • Add a branch delay slot
  • the next instruction after a branch is always
    executed
  • rely on compiler to fill the slot with
    something useful
  • Is this a good idea?
  • let's look how it works

46
Branch delay slot scheduling
Q. What to put in the delay slot?
op 1
beq r1,r2, L
.............
op 2
.............
'fall-through'
L op 3
branch target
.............
47
Summary
  • Modern processors are (deeply) pipelined, to
    reduce Tcycle and aim at CPI 1
  • Hazards increase CPI
  • Several software and hardware measure to avoid or
    reduce hazards are taken
  • Not discussed you'll see later
  • Multi-issue further reduces CPI
  • Branch prediction to avoid high branch penalties
  • Dynamic scheduling
  • In all cases a scheduling compiler needed
About PowerShow.com