Title: 14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath
114332331Computer Architecture and Assembly
LanguageFall 2003Lecture 18Introduction to
Pipelined Datapath
- Adapted from Dave Pattersons UCB CS152 slides
and - Mary Jane Irwins PSU CSE331 slides
2Heads Up
- This weeks material
- Introduction to pipelining
- Reading assignment PH 6.1
- Reminders
- HW6 deadline???
- Next weeks material
- I/O, exceptions, and interrupts
- Reading assignment PH 5.6, 8.5, and A.7 through
A.8
3Review Multicycle Data and Control Path
PCWriteCond
PCWrite
PCSource
ALUOp
IorD
Control FSM
MemRead
ALUSrcB
MemWrite
ALUSrcA
MemtoReg
RegWrite
IRWrite
RegDst
PC31-28
Instr31-26
Shift left 2
28
Instr25-0
2
0
1
Address
Memory
0
PC
Read Addr 1
0
A
Read Data 1
IR
Register File
1
zero
1
Read Addr 2
Read Data (Instr. or Data)
0
ALUout
ALU
Write Addr
Read Data 2
Write Data
1
B
0
Write Data
1
4
1
0
2
Sign Extend
Shift left 2
3
Instr15-0
ALU control
32
Instr5-0
4Review RTL Summary
Step R-type R-type Mem Ref Branch Branch Branch Jump
Instr fetch IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4
Decode A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2)
Execute ALUOut A op B ALUOut A sign-extend (IR15-0) ALUOut A sign-extend (IR15-0) ALUOut A sign-extend (IR15-0) if (AB) PC ALUOut PC PC31-28 (IR25-0 ltlt 2) PC PC31-28 (IR25-0 ltlt 2)
Memory access RegIR15-11 ALUOut MDR MemoryALUOut orMemoryALUOut B MDR MemoryALUOut orMemoryALUOut B MDR MemoryALUOut orMemoryALUOut B
Write-back RegIR20-16 MDR RegIR20-16 MDR RegIR20-16 MDR
5Review Multicycle Datapath FSM
Decode
Instr Fetch
0
1
IorD0 MemReadIRWrite ALUSrcA0 ALUsrcB01 PCSour
ce,ALUOp00 PCWrite
Unless otherwise assigned PCWrite,IRWrite,
MemWrite,RegWrite0 othersX
ALUSrcA0 ALUSrcB11 ALUOp00 PCWriteCond0
Start
(Op R-type)
(Op beq)
(Op lw or sw)
(Op j)
2
6
8
9
ALUSrcA1 ALUSrcB10 ALUOp00 PCWriteCond0
ALUSrcA1 ALUSrcB00 ALUOp01 PCSource01 PCWriteC
ond
ALUSrcA1 ALUSrcB00 ALUOp10 PCWriteCond0
PCSource10 PCWrite
Execute
(Op lw)
(Op sw)
3
5
7
Memory Access
RegDst1 RegWrite MemtoReg0 PCWriteCond0
MemRead IorD1 PCWriteCond0
MemWrite IorD1 PCWriteCond0
4
RegDst0 RegWrite MemtoReg1 PCWriteCond0
Write Back
6Review FSM Implementation
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
Combinational control logic
PCSource
Outputs
ALUOp
ALUSourceB
ALUSourceA
RegWrite
RegDst
Inputs
Op0
Op1
Op2
Op3
Op4
Op5
Next State
State Reg
Inst31-26
System Clock
7Single Cycle Disadvantages Advantages
- Uses the clock cycle inefficiently the clock
cycle must be timed to accommodate the slowest
instruction - Is wasteful of area since some functional units
must (e.g., adders) be duplicated since they can
not be shared during a clock cycle - but
- Is simple and easy to understand
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
lw
sw
Waste
8Multicycle Advantages Disadvantages
- Uses the clock cycle efficiently the clock
cycle is timed to accommodate the slowest
instruction step - balance the amount of work to be done in each
step - restrict each step to use only one major
functional unit - Multicycle implementations allow
- functional units to be used more than once per
instruction as long as they are used on different
clock cycles - faster clock rates
- different instructions to take a different number
of clock cycles - but
- Requires additional internal state registers,
muxes, and more complicated (FSM) control
9The Five Stages of Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
lw
- IFetch Instruction Fetch and Update PC
- Dec Registers Fetch and Instruction Decode
- Exec Execute R-type calculate memory address
- Mem Read/write the data from/to the Data Memory
- WB Write the data back to the register file
10Single Cycle vs. Multiple Cycle Timing
Single Cycle Implementation
Cycle 1
Cycle 2
Clk
lw
sw
Waste
Multiple Cycle Implementation
Clk
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
lw
sw
R-type
11Pipelined MIPS Processor
- Start the next instruction while still working on
the current one - improves throughput - total amount of work done
in a given time - instruction latency (execution time, delay time,
response time) is not reduced - time from the
start of an instruction to its completion
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 7
Cycle 6
Cycle 8
lw
sw
R-type
12Single Cycle, Multiple Cycle, vs. Pipeline
Single Cycle Implementation
Cycle 1
Cycle 2
Clk
Load
Store
Waste
Multiple Cycle Implementation
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
lw
sw
R-type
Pipeline Implementation
lw
sw
R-type
13Pipelining the MIPS ISA
- What makes it easy
- all instructions are the same length (32 bits)
- few instruction formats (three) with symmetry
across formats - memory operations can occur only in loads and
stores - operands must be aligned in memory so a single
data transfer requires only one memory access - What makes it hard
- structural hazards what if we had only one
memory - control hazards what about branches
- data hazards what if an instructions input
operands depend on the output of a previous
instruction
14MIPS Pipeline Datapath Modifications
- What do we need to add/modify in our MIPS
datapath? - State registers between pipeline stages to
isolate them
IFetch
Dec
Exec
Mem
WB
1
0
Add
Add
4
Shift left 2
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Address
Read Addr 2
IFetch/Dec
Read Address
PC
Read Data
Dec/Exec
1
Write Addr
ALU
Read Data 2
Mem/WB
0
Exec/Mem
Write Data
0
Write Data
1
Sign Extend
16
32
System Clock
15MIPS Pipeline Control Path Modifications
- All control signals are determined during Decode
- and held in the state registers between pipeline
stages
IFetch
Dec
Exec
Mem
WB
1
0
Control
Add
Add
4
Shift left 2
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Address
Read Addr 2
IFetch/Dec
Read Address
PC
Read Data
Dec/Exec
1
Write Addr
ALU
Read Data 2
Mem/WB
0
Exec/Mem
Write Data
0
Write Data
1
Sign Extend
16
32
System Clock
16Graphically Representing MIPS Pipeline
-
- Can help with answering questions like
- how many cycles does it take to execute this
code? - what is the ALU doing during cycle 4?
- is there a hazard, why does it occur, and how can
it be fixed?
17Why Pipeline? For Throughput!
Time (clock cycles)
Inst 0
Once the pipeline is full, one instruction is
completed every cycle
I n s t r. O r d e r
Inst 1
Inst 2
Inst 3
Inst 4
18Can pipelining get us into trouble?
- Yes Pipeline Hazards
- structural hazards attempt to use the same
resource by two different instructions at the
same time - data hazards attempt to use item before it is
ready - instruction depends on result of prior
instruction still in the pipeline - control hazards attempt to make a decision
before condition is evaulated - branch instructions
- Can always resolve hazards by waiting
- pipeline control must detect the hazard
- take action (or delay action) to resolve hazards
19A Unified Memory Would Be a Structural Hazard
Time (clock cycles)
lw
I n s t r. O r d e r
Inst 1
Inst 2
Inst 3
Inst 4
20How About Register File Access?
Time (clock cycles)
Can fix register file access hazard by doing
reads in the second half of the cycle and writes
in the first half.
add
I n s t r. O r d e r
Inst 1
Inst 2
add
Inst 4
21Branch Instructions Cause Control Hazards
- Dependencies backward in time cause hazards
add
I n s t r. O r d e r
beq
lw
Inst 3
Inst 4
22One Way to Fix a Control Hazard
add
I n s t r. O r d e r
Can fix branch hazard by waiting stall but
affects throughput
beq
23Register Usage Can Cause Data Hazards
- Dependencies backward in time cause hazards
add r1,r2,r3
I n s t r. O r d e r
sub r4,r1,r5
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
24One Way to Fix a Data Hazard
Can fix data hazard by waiting stall but
affects throughput
add r1,r2,r3
I n s t r. O r d e r
25Loads Can Cause Data Hazards
- Dependencies backward in time cause hazards
lw r1,100(r2)
I n s t r. O r d e r
sub r4,r1,r5
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
26Stores Can Cause Data Hazards
- Dependencies backward in time cause hazards
add r1,r2,r3
I n s t r. O r d e r
sw r1,100(r5)
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
27Other Pipeline Structures Are Possible
- What about (slow) multiply operation?
- let it take two cycles
- What if the data memory access is twice as slow
as the instruction memory? - make the clock twice as slow or
- let data memory access take two cycles (and keep
the same clock rate)
MUL
Reg
DM1
Reg
DM2
28Sample Pipeline Alternatives
PC update IM access
decode reg access
ALU op DM access shift/rotate commit result
(write back)
Reg DM2
IM1
IM2
DM1
Reg
SHFT
PC update BTB access start IM access
decode reg 1 access
DM write reg write
ALU op
start DM access exception
shift/rotate reg 2 access
IM access
29Summary
- All modern day processors use pipelining
- Pipelining doesnt help latency of single task,
it helps throughput of entire workload - Multiple tasks operating simultaneously using
different resources - Potential speedup Number of pipe stages
- Pipeline rate limited by slowest pipeline stage
- Unbalanced lengths of pipe stages reduces speedup
- Time to fill pipeline and time to drain it
reduces speedup - Must detect and resolve hazards
- Stalling negatively affects throughput
- To learn (much) more take CSE 431