CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to

Description:

Register File's Read ports (bus A and busB) for the Reg/Dec stage. ALU for the Exec stage ... Reg/Dec: Registers Fetch and Instruction Decode. Exec: ALU ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 57
Provided by: johnkubi
Category:

less

Transcript and Presenter's Notes

Title: CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to


1
CS152Computer Architecture and
EngineeringLecture 14Pipelining Control
ContinuedIntroduction to Advanced Pipelining
2
Recap Summary of Pipelining Basics
  • 5 stages
  • Fetch Fetch instruction from memory
  • Decode get register values and decode control
    information
  • Execute Execute arithmetic operations/calculate
    addresses
  • Memory Do memory ops (load or store)
  • Writeback Write results back to registers (I.e.
    COMMIT)
  • Pipelines pass control information down the pipe
    just as data moves down pipe
  • Forwarding/Stalls handled by local control
  • Balancing length of instructions makes pipelining
    much smoother
  • Increasing length of pipe increases impact of
    hazards pipelining helps instruction bandwidth,
    not latency

3
Recap Can pipelining get us into trouble?
  • Yes Pipeline Hazards
  • structural hazards attempt to use the same
    resource two different ways at the same time
  • E.g., combined washer/dryer would be a structural
    hazard or folder busy doing something else
    (watching TV)
  • data hazards attempt to use item before it is
    ready
  • E.g., one sock of pair in dryer and one in
    washer cant fold until get sock from washer
    through dryer
  • instruction depends on result of prior
    instruction still in the pipeline
  • control hazards attempt to make a decision
    before condition is evaulated
  • E.g., washing football uniforms and need to get
    proper detergent level need to see after dryer
    before next load in
  • branch instructions
  • Can always resolve hazards by waiting
  • pipeline control must detect the hazard
  • take action (or delay action) to resolve hazards

4
Pipelining the Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
2nd lw
3rd lw
  • The five independent functional units in the
    pipeline datapath are
  • Instruction Memory for the Ifetch stage
  • Register Files Read ports (bus A and busB) for
    the Reg/Dec stage
  • ALU for the Exec stage
  • Data Memory for the Mem stage
  • Register Files Write port (bus W) for the Wr
    stage

5
The Four Stages of R-type
Cycle 1
Cycle 2
Cycle 3
Cycle 4
R-type
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec
  • ALU operates on the two register operands
  • Update PC
  • Wr Write the ALU output back to the register file

6
Pipelining the R-type and Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
Ops! We have a problem!
R-type
R-type
Load
R-type
R-type
  • We have a structural hazard
  • Two instructions try to write to the register
    file at the same time!
  • Only one write port

7
Important Observation
  • Each functional unit can only be used once per
    instruction
  • Each functional unit must be used at the same
    stage for all instructions
  • Load uses Register Files Write Port during its
    5th stage
  • R-type uses Register Files Write Port during its
    4th stage
  • 2 ways to solve this pipeline hazard.

8
Solution 1 Insert Bubble into the Pipeline
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
Load
R-type
Pipeline
R-type
R-type
Bubble
  • Insert a bubble into the pipeline to prevent 2
    writes at the same cycle
  • The control logic can be complex.
  • Lose instruction fetch and issue opportunity.
  • No instruction is started in Cycle 6!

9
Solution 2 Delay R-types Write by One Cycle
  • Delay R-types register write by one cycle
  • Now R-type instructions also use Reg Files write
    port at Stage 5
  • Mem stage is a NOOP stage nothing is being done.

4
1
2
3
5
Exec
Mem
R-type
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Clock
R-type
R-type
Load
R-type
R-type
10
Modified Control Datapath
IR lt- MemPC PC lt PC4
A lt- Rrs Blt Rrt
S lt A B
S lt A SX
S lt A or ZX
S lt A SX
if Cond PC lt PCSX
M lt MemS
MemS lt- B
M lt S
M lt S
Rrd lt M
Rrd lt M
Rrt lt M
Equal
Reg. File
Reg File
S
Exec
PC
IR
Next PC
Inst. Mem
Mem Access
Data Mem
11
The Four Stages of Store
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Store
Wr
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec Calculate the memory address
  • Mem Write the data into the Data Memory

12
The Three Stages of Beq
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Mem
Beq
Wr
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec
  • Registers Fetch and Instruction Decode
  • Exec
  • compares the two register operand,
  • select correct branch target address
  • latch into PC

13
Control Diagram
IR lt- MemPC PC lt PC4
A lt- Rrs Blt Rrt
S lt A B
S lt A SX
S lt A or ZX
S lt A SX
If Cond PC lt PCSX
M lt MemS
MemS lt- B
M lt S
M lt S
Rrd lt S
Rrd lt M
Rrt lt S
Equal
Reg. File
Reg File
Exec
PC
IR
Next PC
Inst. Mem
Mem Access
Data Mem
14
The Big Picture Where are We Now?
  • The Five Classic Components of a Computer

15
Recall Single cycle control
Control
Ideal Instruction Memory
Control Signals
Conditions
Instruction
Rd
Rs
Rt
5
5
5
Instruction Address
A
Data Address
Data Out
32
Rw
Ra
Rb
32
Ideal Data Memory
32
32 32-bit Registers
Next Address
Data In
B
Clk
Clk
32
Datapath
16
Data Stationary Control
  • The Main Control generates the control signals
    during Reg/Dec
  • Control signals for Exec (ExtOp, ALUSrc, ...) are
    used 1 cycle later
  • Control signals for Mem (MemWr Branch) are used 2
    cycles later
  • Control signals for Wr (MemtoReg MemWr) are used
    3 cycles later

Reg/Dec
Exec
Mem
Wr
ExtOp
ExtOp
ALUSrc
ALUSrc
ALUOp
ALUOp
Main Control
RegDst
RegDst
Ex/Mem Register
IF/ID Register
ID/Ex Register
Mem/Wr Register
MemWr
MemWr
MemWr
Branch
Branch
Branch
MemtoReg
MemtoReg
MemtoReg
MemtoReg
RegWr
RegWr
RegWr
RegWr
17
Datapath Data Stationary Control
IR
v
v
v
fun
rw
rw
rw
wb
wb
wb
Inst. Mem
Decode
WB Ctrl
me
me
rt
Mem Ctrl
rs
ex
op
im
rs
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
Next PC
18
Lets Try it Out
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
these addresses are octal
19
Start Fetch 10
n
n
n
n
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
IR
im
rs
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
20
Fetch 14, Decode 10
n
n
n
lw r1, r2(35)
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
IR
im
2
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
21
Fetch 20, Decode 14, Exec 10
n
n
addI r2, r2, 3
Inst. Mem
Decode
WB Ctrl
lw r1
Mem Ctrl
IR
35
2
rt
Reg. File
Reg File
r2
Exec
Mem Access
Data Mem
EX
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
22
Fetch 24, Decode 20, Exec 14, Mem 10
n
sub r3, r4, r5
addI r2, r2, 3
Inst. Mem
Decode
WB Ctrl
lw r1
Mem Ctrl
IR
3
4
5
Reg. File
Reg File
r2
r235
Exec
Mem Access
Data Mem
M
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
23
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
Inst. Mem
Decode
WB Ctrl
addI r2
lw r1
Mem Ctrl
IR
Reg. File
Reg File
Mr235
r23
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
24
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
beq r6, r7 100
Inst. Mem
Decode
WB Ctrl
addI r2
lw r1
sub r3
Mem Ctrl
sub
IR
6
7
Reg. File
Reg File
r4
Mr235
r23
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Note Delayed Branch always execute ori after beq
25
Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
x
x
x
x
IR
x
x
x
r1Mr235
Reg. File
Reg File
x
Exec
x
x
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
WB
M
Take the branch r6-r7 0
26
Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14
ori r8, r9 17
Inst. Mem
Decode
WB Ctrl
addI r2
sub r3
Mem Ctrl
beq
IR
9
xx
100
r1Mr235
Reg. File
Reg File
r6
r23
r4-r5
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
WB
M
Take the branch r6-r7 0
27
Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20
Inst. Mem
Decode
ori r8
WB Ctrl
sub r3
beq
add r10, r11, r12
Mem Ctrl
or
11
12
17
Reg. File
r1Mr235
IR
Reg File
r9
r4-r5
r2 r23
xxx
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
100
PC
Do we have a problem here?
28
Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20
Inst. Mem
Decode
ori r8
WB Ctrl
sub r3
beq
add r10, r11, r12
Mem Ctrl
or
11
12
17
Reg. File
r1Mr235
IR
Reg File
r9
r4-r5
r2 r23
xxx
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
100
PC
ooops, we should have only one delayed instruction
29
Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24
n
Inst. Mem
Decode
add r10
WB Ctrl
beq
ori r8
Mem Ctrl
and r13, r14, r15
add
14
15
xx
Reg. File
r1Mr235
IR
Reg File
r11
xxx
r9 17
r2 r23
Exec
r3 r4-r5
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
104
PC
Squash the extra instruction
30
Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30
n
Inst. Mem
Decode
ori r8
add r10
WB Ctrl
and r13
Mem Ctrl
xx
Reg. File
r1Mr235
IR
Reg File
r14
r9 17
r2 r23
r11r12
Exec
r3 r4-r5
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
108
PC
31
Fetch 112, Dcd 108, Ex 104, Mem 100, WB 34
n
NO WB NO Ovflow
and r13
Inst. Mem
Decode
add r10
WB Ctrl
Mem Ctrl
Reg. File
r1Mr235
IR
Reg File
r11r12
r2 r23
r14 R15
Exec
r3 r4-r5
r8 r9 17
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
Next PC
114
PC
Squash the extra instruction in the branch shadow!
32
Pipelined Processor
  • Separate control at each stage
  • Stalls propagate backwards to freeze previous
    stages
  • Bubbles in pipeline introduced by placing Noops
    into local stage, stall previous stages.

33
Pipeline Hazards Again
I-Fetch DCD MemOpFetch OpFetch
Exec Store
IFetch DCD
Structural Hazard
I-Fet ch DCD OpFetch Jump
Control Hazard
IFetch DCD
IF DCD EX Mem WB
RAW (read after write) Data Hazard
IF DCD EX Mem
WB
WAW Data Hazard (write after write)
IF DCD EX Mem WB
IF DCD
OF Ex Mem
IF DCD OF Ex RS
WAR Data Hazard (write after read)
34
Recap Data Hazards
  • Avoid some by design
  • eliminate WAR by always fetching operands early
    (DCD) in pipe
  • eliminate WAW by doing all WBs in order (last
    stage, static)
  • Detect and resolve remaining ones
  • stall or forward (if possible)

35
Hazard Detection
  • Suppose instruction i is about to be issued and
    a predecessor instruction j is in the
    instruction pipeline.
  • A RAW hazard exists on register ??if ????Rregs( i
    ) ??Wregs( j )
  • Keep a record of pending writes (for inst's in
    the pipe) and compare with operand regs of
    current instruction.
  • When instruction issues, reserve its result
    register as a write reservation.
  • When on operation completes, remove its write
    reservation.
  • A WAW hazard exists on register ??if ????Wregs( i
    ) ??Wregs( j )
  • A WAR hazard exists on register ??if ????Wregs( i
    ) ??Rregs( j )

36
Record of Pending Writes In Pipeline Registers
IAU
npc
  • Current operand registers
  • Pending writes
  • hazard lt
  • ((rs rwex) regWex) OR
  • ((rs rwmem) regWme) OR
  • ((rs rwwb) regWwb) OR
  • ((rt rwex) regWex) OR
  • ((rt rwmem) regWme) OR
  • ((rt rwwb) regWwb)

I mem
Regs
op rw rs rt
PC
im
op
rw
n
B
A
alu
op
rw
n
S
D mem
m
op
rw
n
Regs
37
Resolve RAW by forwarding (or bypassing)
IAU
  • Detect nearest valid write op operand register
    and forward into op latches, bypassing remainder
    of the pipe
  • Increase muxes to add paths from pipeline
    registers
  • Data Forwarding Data Bypassing

npc
I mem
Regs
op rw rs rt
PC
Forward mux
im
op
rw
v
B
A
alu
op
rw
v
S
D mem
m
op
rw
v
Regs
38
What about memory operations?
  • If instructions are initiated in order and
    operations always occur in the same stage, there
    can be no hazards between memory operations!
  • What about data dependence on loads? R1 lt- R4
    R5 R2 lt- Mem R2 I R3 lt- R2 R1?
    Delayed Loads
  • Can recognize this in decode stage and introduce
    bubble while stalling fetch stage
  • Tricky situation R1 lt- Mem R2 I
    MemR334 lt- R1 Handle with bypass in memory
    stage!

op Rd Ra Rb
op Rd Ra Rb
A
B
Rd
Mem
Rd
to reg file
39
Compiler Avoiding Load Stalls
40
What about Interrupts, Traps, Faults?
  • External Interrupts
  • Allow pipeline to drain, Fill with NOPs
  • Load PC with interrupt address
  • Faults (within instruction, restartable)
  • Force trap instruction into IF
  • disable writes till trap hits WB
  • must save multiple PCs or PC state
  • Recall Precise Exceptions ? State of the machine
    is preserved as if program executed up to the
    offending instruction
  • All previous instructions completed
  • Offending instruction and all following
    instructions act as if they have not even started
  • Same system code will work on different
    implementations

41
Exception/Interrupts Implementation questions
  • 5 instructions, executing in 5 different pipeline
    stages!
  • Who caused the interrupt?
  • Stage Problem interrupts occurring
  • IF Page fault on instruction fetch misaligned
    memory access memory-protection violation
  • ID Undefined or illegal opcode
  • EX Arithmetic exception
  • MEM Page fault on data fetch misaligned memory
    access memory-protection violation memory
    error
  • How do we stop the pipeline? How do we restart
    it?
  • Do we interrupt immediately or wait?
  • How do we sort all of this out to maintain
    preciseness?

42
Exception Handling
IAU
npc
I mem
detect bad instruction address
Regs
lw 2,20(5)
Excp
PC
detect bad instruction
im
op
rw
n
Excp
B
A
detect overflow
alu
Excp
S
detect bad data address
D mem
Excp
m
Allow exception to take effect
Regs
43
Another look at the exception problem
Time
Data TLB
Bad Inst
Inst TLB fault
Program Flow
Overflow
  • Use pipeline to sort this out!
  • Pass exception status along with instruction.
  • Keep track of PCs for every instruction in
    pipeline.
  • Dont act on exception until it reaches WB stage
  • Handle interrupts through faulting no-op in IF
    stage
  • When instruction reaches end of MEM stage
  • Save PC ? EPC, Interrupt vector addr ? PC
  • Turn all instructions in earlier stages into
    no-ops!

44
Resolution Freeze above Bubble Below
IAU
npc
I mem
freeze
Regs
op rw rs rt
PC
bubble
im
op
rw
n
B
A
alu
op
rw
n
S
  • Flush accomplished by setting invalid bit in
    pipeline

D mem
m
op
rw
n
Regs
45
FYI MIPS R3000 clocking discipline
phi1
phi2
  • 2-phase non-overlapping clocks
  • Pipeline stage is two (level sensitive) latches

phi1
phi1
phi2
Edge-triggered
46
MIPS R3000 Instruction Pipeline
Decode Reg. Read
Inst Fetch
ALU / E.A
Memory
Write Reg
TLB I-Cache RF Operation
WB
E.A. TLB D-Cache
Write in phase 1, read in phase 2 gt eliminates
bypass from WB
47
Recall Data Hazard on r1
Time (clock cycles)
IF
ID/RF
EX
MEM
WB
add r1,r2,r3
Reg
Reg
ALU
Im
Dm
I n s t r. O r d e r
sub r4,r1,r3
Dm
Reg
Reg
Dm
Reg
and r6,r1,r7
Reg
Im
Dm
Reg
Reg
or r8,r1,r9
ALU
xor r10,r1,r11
With MIPS R3000 pipeline, no need to forward from
WB stage
48
MIPS R3000 Multicycle Operations
Use control word of local stage to step through
multicycle operation Stall all stages above
multicycle operation in the pipeline Drain
(bubble) stages below it Alternatively, launch
multiply/divide to autonomous unit, only stall
pipe if attempt to get result before ready -
This means stall mflo/mfhi in decode stage if
multiply/divide still executing
Ex Multiply, Divide, Cache Miss
49
Is CPI 1 for our pipeline?
  • Remember that CPI is an Average cycles/inst
  • CPI here is 1, since the average throughput is 1
    instruction every cycle.
  • What if there are stalls or multi-cycle
    execution?
  • Usually CPI gt 1. How close can we get to 1??

50
Recall Compute CPI?
  • Start with Base CPI
  • Add stalls
  • Suppose
  • CPIbase1
  • Freqbranch20, freqload30
  • Suppose branches always cause 1 cycle stall
  • Loads cause a 100 cycle stall 1 of time
  • Then CPI 1 (1?0.20)(100 ? 0.30?0.01)1.5
  • Multicycle? Could treat as CPIstall(CYCLES-CP
    Ibase) ? freqinst

51
Case Study MIPS R4000 (200 MHz)
  • 8 Stage Pipeline
  • IFfirst half of fetching of instruction PC
    selection happens here as well as initiation of
    instruction cache access.
  • ISsecond half of access to instruction cache.
  • RFinstruction decode and register fetch, hazard
    checking and also instruction cache hit
    detection.
  • EXexecution, which includes effective address
    calculation, ALU operation, and branch target
    computation and condition evaluation.
  • DFdata fetch, first half of access to data
    cache.
  • DSsecond half of access to data cache.
  • TCtag check, determine whether the data cache
    access hit.
  • WBwrite back for loads and register-register
    operations.
  • 8 Stages What is impact on Load delay? Branch
    delay? Why?

52
Case Study MIPS R4000
IF
IS IF
RF IS IF
EX RF IS IF
DF EX RF IS IF
DS DF EX RF IS IF
TC DS DF EX RF IS IF
WB TC DS DF EX RF IS IF
TWO Cycle Load Latency
IF
IS IF
RF IS IF
EX RF IS IF
DF EX RF IS IF
DS DF EX RF IS IF
TC DS DF EX RF IS IF
WB TC DS DF EX RF IS IF
THREE Cycle Branch Latency
(conditions evaluated during EX phase)
Delay slot plus two stalls Branch likely cancels
delay slot if not taken
53
MIPS R4000 Floating Point
  • FP Adder, FP Multiplier, FP Divider
  • Last step of FP Multiplier/Divider uses FP Adder
    HW
  • 8 kinds of stages in FP units
  • Stage Functional unit Description
  • A FP adder Mantissa ADD stage
  • D FP divider Divide pipeline stage
  • E FP multiplier Exception test stage
  • M FP multiplier First stage of multiplier
  • N FP multiplier Second stage of multiplier
  • R FP adder Rounding stage
  • S FP adder Operand shift stage
  • U Unpack FP numbers

54
MIPS FP Pipe Stages
  • FP Instr 1 2 3 4 5 6 7 8
  • Add, Subtract U SA AR RS
  • Multiply U EM M M M N NA R
  • Divide U A R D28 DA DR, DR, DA, DR, A, R
  • Square root U E (AR)108 A R
  • Negate U S
  • Absolute value U S
  • FP compare U A R
  • Stages
  • M First stage of multiplier
  • N Second stage of multiplier
  • R Rounding stage
  • S Operand shift stage
  • U Unpack FP numbers

A Mantissa ADD stage D Divide pipeline
stage E Exception test stage
55
R4000 Performance
  • Not ideal CPI of 1
  • Load stalls (1 or 2 clock cycles)
  • Branch stalls (2 cycles unfilled slots)
  • FP result stalls RAW data hazard (latency)
  • FP structural stalls Not enough FP hardware
    (parallelism)

56
Summary
  • Hazards limit performance
  • Structural need more HW resources
  • Data need forwarding, compiler scheduling
  • Control early evaluation PC, delayed branch,
    prediction
  • Data hazards must be handled carefully
  • RAW data hazards handled by forwarding
  • WAW and WAR hazards dont exist in 5-stage
    pipeline
  • MIPS I instruction set architecture made pipeline
    visible (delayed branch, delayed load)
  • Exceptions in 5-stage pipeline recorded when
    theyoccur, but acted on only at WB (end of MEM)
    stage
  • Must flush all previous instructions
  • More performance from deeper pipelines,
    parallelism
Write a Comment
User Comments (0)
About PowerShow.com