14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath - PowerPoint PPT Presentation

About This Presentation

Title:

14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath

Description:

Cycle 1. Cycle 2. 331 Lec18.8. Fall 2003. Multicycle Advantages & Disadvantages ... Cycle 1. Cycle 2 ... Cycle 1. Cycle 2. wasted cycle. 331 Lec18.13. Fall 2003 ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 30

Provided by: jani177

Learn more at: https://www.ece.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: 14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath

1
14332331Computer Architecture and Assembly
LanguageFall 2003Lecture 18Introduction to
Pipelined Datapath

Adapted from Dave Pattersons UCB CS152 slides
and
Mary Jane Irwins PSU CSE331 slides

2
Heads Up

This weeks material
Introduction to pipelining
Reading assignment PH 6.1
Reminders
HW6 deadline???
Next weeks material
I/O, exceptions, and interrupts
Reading assignment PH 5.6, 8.5, and A.7 through
A.8

3
Review Multicycle Data and Control Path
PCWriteCond
PCWrite
PCSource
ALUOp
IorD
Control FSM
MemRead
ALUSrcB
MemWrite
ALUSrcA
MemtoReg
RegWrite
IRWrite
RegDst
PC31-28
Instr31-26
Shift left 2
28
Instr25-0
2
0
1
Address
Memory
0
PC
Read Addr 1
0
A
Read Data 1
IR
Register File
1
zero
1
Read Addr 2
Read Data (Instr. or Data)
0
ALUout
ALU
Write Addr
Read Data 2
Write Data
1
B
0
Write Data
1
4
1
0
2
Sign Extend
Shift left 2
3
Instr15-0
ALU control
32
Instr5-0
4
Review RTL Summary
Step R-type R-type Mem Ref Branch Branch Branch Jump
Instr fetch IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4
Decode A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2)
Execute ALUOut A op B ALUOut A sign-extend (IR15-0) ALUOut A sign-extend (IR15-0) ALUOut A sign-extend (IR15-0) if (AB) PC ALUOut PC PC31-28 (IR25-0 ltlt 2) PC PC31-28 (IR25-0 ltlt 2)
Memory access RegIR15-11 ALUOut MDR MemoryALUOut orMemoryALUOut B MDR MemoryALUOut orMemoryALUOut B MDR MemoryALUOut orMemoryALUOut B
Write-back RegIR20-16 MDR RegIR20-16 MDR RegIR20-16 MDR
5
Review Multicycle Datapath FSM
Decode
Instr Fetch
0
1
IorD0 MemReadIRWrite ALUSrcA0 ALUsrcB01 PCSour
ce,ALUOp00 PCWrite
Unless otherwise assigned PCWrite,IRWrite,
MemWrite,RegWrite0 othersX
ALUSrcA0 ALUSrcB11 ALUOp00 PCWriteCond0
Start
(Op R-type)
(Op beq)
(Op lw or sw)
(Op j)
2
6
8
9
ALUSrcA1 ALUSrcB10 ALUOp00 PCWriteCond0
ALUSrcA1 ALUSrcB00 ALUOp01 PCSource01 PCWriteC
ond
ALUSrcA1 ALUSrcB00 ALUOp10 PCWriteCond0
PCSource10 PCWrite
Execute
(Op lw)
(Op sw)
3
5
7
Memory Access
RegDst1 RegWrite MemtoReg0 PCWriteCond0
MemRead IorD1 PCWriteCond0
MemWrite IorD1 PCWriteCond0
4
RegDst0 RegWrite MemtoReg1 PCWriteCond0
Write Back
6
Review FSM Implementation
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
Combinational control logic
PCSource
Outputs
ALUOp
ALUSourceB
ALUSourceA
RegWrite
RegDst
Inputs
Op0
Op1
Op2
Op3
Op4
Op5
Next State
State Reg
Inst31-26
System Clock
7
Single Cycle Disadvantages Advantages

Uses the clock cycle inefficiently the clock
cycle must be timed to accommodate the slowest
instruction
Is wasteful of area since some functional units
must (e.g., adders) be duplicated since they can
not be shared during a clock cycle
but
Is simple and easy to understand

Cycle 1
Cycle 2
Clk
Single Cycle Implementation
lw
sw
Waste
8
Multicycle Advantages Disadvantages

Uses the clock cycle efficiently the clock
cycle is timed to accommodate the slowest
instruction step
balance the amount of work to be done in each
step
restrict each step to use only one major
functional unit
Multicycle implementations allow
functional units to be used more than once per
instruction as long as they are used on different
clock cycles
faster clock rates
different instructions to take a different number
of clock cycles
but
Requires additional internal state registers,
muxes, and more complicated (FSM) control

9
The Five Stages of Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
lw

IFetch Instruction Fetch and Update PC
Dec Registers Fetch and Instruction Decode
Exec Execute R-type calculate memory address
Mem Read/write the data from/to the Data Memory
WB Write the data back to the register file

10
Single Cycle vs. Multiple Cycle Timing
Single Cycle Implementation
Cycle 1
Cycle 2
Clk
lw
sw
Waste
Multiple Cycle Implementation
Clk
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
lw
sw
R-type
11
Pipelined MIPS Processor

Start the next instruction while still working on
the current one
improves throughput - total amount of work done
in a given time
instruction latency (execution time, delay time,
response time) is not reduced - time from the
start of an instruction to its completion

Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 7
Cycle 6
Cycle 8
lw
sw
R-type
12
Single Cycle, Multiple Cycle, vs. Pipeline
Single Cycle Implementation
Cycle 1
Cycle 2
Clk
Load
Store
Waste
Multiple Cycle Implementation
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
lw
sw
R-type
Pipeline Implementation
lw
sw
R-type
13
Pipelining the MIPS ISA

What makes it easy
all instructions are the same length (32 bits)
few instruction formats (three) with symmetry
across formats
memory operations can occur only in loads and
stores
operands must be aligned in memory so a single
data transfer requires only one memory access
What makes it hard
structural hazards what if we had only one
memory
control hazards what about branches
data hazards what if an instructions input
operands depend on the output of a previous
instruction

14
MIPS Pipeline Datapath Modifications

What do we need to add/modify in our MIPS
datapath?
State registers between pipeline stages to
isolate them

IFetch
Dec
Exec
Mem
WB
1
0
Add
Add
4
Shift left 2
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Address
Read Addr 2
IFetch/Dec
Read Address
PC
Read Data
Dec/Exec
1
Write Addr
ALU
Read Data 2
Mem/WB
0
Exec/Mem
Write Data
0
Write Data
1
Sign Extend
16
32
System Clock
15
MIPS Pipeline Control Path Modifications

All control signals are determined during Decode
and held in the state registers between pipeline
stages

IFetch
Dec
Exec
Mem
WB
1
0
Control
Add
Add
4
Shift left 2
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Address
Read Addr 2
IFetch/Dec
Read Address
PC
Read Data
Dec/Exec
1
Write Addr
ALU
Read Data 2
Mem/WB
0
Exec/Mem
Write Data
0
Write Data
1
Sign Extend
16
32
System Clock
16
Graphically Representing MIPS Pipeline

Can help with answering questions like
how many cycles does it take to execute this
code?
what is the ALU doing during cycle 4?
is there a hazard, why does it occur, and how can
it be fixed?

17
Why Pipeline? For Throughput!
Time (clock cycles)
Inst 0
Once the pipeline is full, one instruction is
completed every cycle
I n s t r. O r d e r
Inst 1
Inst 2
Inst 3
Inst 4
18
Can pipelining get us into trouble?

Yes Pipeline Hazards
structural hazards attempt to use the same
resource by two different instructions at the
same time
data hazards attempt to use item before it is
ready
instruction depends on result of prior
instruction still in the pipeline
control hazards attempt to make a decision
before condition is evaulated
branch instructions
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards

19
A Unified Memory Would Be a Structural Hazard
Time (clock cycles)
lw
I n s t r. O r d e r
Inst 1
Inst 2
Inst 3
Inst 4
20
How About Register File Access?
Time (clock cycles)
Can fix register file access hazard by doing
reads in the second half of the cycle and writes
in the first half.
add
I n s t r. O r d e r
Inst 1
Inst 2
add
Inst 4
21
Branch Instructions Cause Control Hazards

Dependencies backward in time cause hazards

add
I n s t r. O r d e r
beq
lw
Inst 3
Inst 4
22
One Way to Fix a Control Hazard
add
I n s t r. O r d e r
Can fix branch hazard by waiting stall but
affects throughput
beq
23
Register Usage Can Cause Data Hazards

Dependencies backward in time cause hazards

add r1,r2,r3
I n s t r. O r d e r
sub r4,r1,r5
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
24
One Way to Fix a Data Hazard
Can fix data hazard by waiting stall but
affects throughput
add r1,r2,r3
I n s t r. O r d e r
25
Loads Can Cause Data Hazards

Dependencies backward in time cause hazards

lw r1,100(r2)
I n s t r. O r d e r
sub r4,r1,r5
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
26
Stores Can Cause Data Hazards

Dependencies backward in time cause hazards

add r1,r2,r3
I n s t r. O r d e r
sw r1,100(r5)
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
27
Other Pipeline Structures Are Possible

What about (slow) multiply operation?
let it take two cycles
What if the data memory access is twice as slow
as the instruction memory?
make the clock twice as slow or
let data memory access take two cycles (and keep
the same clock rate)

MUL
Reg
DM1
Reg
DM2
28
Sample Pipeline Alternatives

ARM7
StrongARM-1
XScale

PC update IM access
decode reg access
ALU op DM access shift/rotate commit result
(write back)
Reg DM2
IM1
IM2
DM1
Reg
SHFT
PC update BTB access start IM access
decode reg 1 access
DM write reg write
ALU op
start DM access exception
shift/rotate reg 2 access
IM access
29
Summary

All modern day processors use pipelining
Pipelining doesnt help latency of single task,
it helps throughput of entire workload
Multiple tasks operating simultaneously using
different resources
Potential speedup Number of pipe stages
Pipeline rate limited by slowest pipeline stage
Unbalanced lengths of pipe stages reduces speedup
Time to fill pipeline and time to drain it
reduces speedup
Must detect and resolve hazards
Stalling negatively affects throughput
To learn (much) more take CSE 431