14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath - PowerPoint PPT Presentation

About This Presentation
Title:

14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath

Description:

Cycle 1. Cycle 2. 331 Lec18.8. Fall 2003. Multicycle Advantages & Disadvantages ... Cycle 1. Cycle 2 ... Cycle 1. Cycle 2. wasted cycle. 331 Lec18.13. Fall 2003 ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: jani177
Category:

less

Transcript and Presenter's Notes

Title: 14:332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath


1
14332331Computer Architecture and Assembly
LanguageFall 2003Lecture 18Introduction to
Pipelined Datapath
  • Adapted from Dave Pattersons UCB CS152 slides
    and
  • Mary Jane Irwins PSU CSE331 slides

2
Heads Up
  • This weeks material
  • Introduction to pipelining
  • Reading assignment PH 6.1
  • Reminders
  • HW6 deadline???
  • Next weeks material
  • I/O, exceptions, and interrupts
  • Reading assignment PH 5.6, 8.5, and A.7 through
    A.8

3
Review Multicycle Data and Control Path
PCWriteCond
PCWrite
PCSource
ALUOp
IorD
Control FSM
MemRead
ALUSrcB
MemWrite
ALUSrcA
MemtoReg
RegWrite
IRWrite
RegDst
PC31-28
Instr31-26
Shift left 2
28
Instr25-0
2
0
1
Address
Memory
0
PC
Read Addr 1
0
A
Read Data 1
IR
Register File
1
zero
1
Read Addr 2
Read Data (Instr. or Data)
0
ALUout
ALU
Write Addr
Read Data 2
Write Data
1
B
0
Write Data
1
4
1
0
2
Sign Extend
Shift left 2
3
Instr15-0
ALU control
32
Instr5-0
4
Review RTL Summary
Step R-type R-type Mem Ref Branch Branch Branch Jump
Instr fetch IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4 IR MemoryPC PC PC 4
Decode A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2) A RegIR25-21B RegIR20-16ALUOut PC (sign-extend(IR15-0)ltlt 2)
Execute ALUOut A op B ALUOut A sign-extend (IR15-0) ALUOut A sign-extend (IR15-0) ALUOut A sign-extend (IR15-0) if (AB) PC ALUOut PC PC31-28 (IR25-0 ltlt 2) PC PC31-28 (IR25-0 ltlt 2)
Memory access RegIR15-11 ALUOut MDR MemoryALUOut orMemoryALUOut B MDR MemoryALUOut orMemoryALUOut B MDR MemoryALUOut orMemoryALUOut B
Write-back RegIR20-16 MDR RegIR20-16 MDR RegIR20-16 MDR
5
Review Multicycle Datapath FSM
Decode
Instr Fetch
0
1
IorD0 MemReadIRWrite ALUSrcA0 ALUsrcB01 PCSour
ce,ALUOp00 PCWrite
Unless otherwise assigned PCWrite,IRWrite,
MemWrite,RegWrite0 othersX
ALUSrcA0 ALUSrcB11 ALUOp00 PCWriteCond0
Start
(Op R-type)
(Op beq)
(Op lw or sw)
(Op j)
2
6
8
9
ALUSrcA1 ALUSrcB10 ALUOp00 PCWriteCond0
ALUSrcA1 ALUSrcB00 ALUOp01 PCSource01 PCWriteC
ond
ALUSrcA1 ALUSrcB00 ALUOp10 PCWriteCond0
PCSource10 PCWrite
Execute
(Op lw)
(Op sw)
3
5
7
Memory Access
RegDst1 RegWrite MemtoReg0 PCWriteCond0
MemRead IorD1 PCWriteCond0
MemWrite IorD1 PCWriteCond0
4
RegDst0 RegWrite MemtoReg1 PCWriteCond0
Write Back
6
Review FSM Implementation
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
Combinational control logic
PCSource
Outputs
ALUOp
ALUSourceB
ALUSourceA
RegWrite
RegDst
Inputs
Op0
Op1
Op2
Op3
Op4
Op5
Next State
State Reg
Inst31-26
System Clock
7
Single Cycle Disadvantages Advantages
  • Uses the clock cycle inefficiently the clock
    cycle must be timed to accommodate the slowest
    instruction
  • Is wasteful of area since some functional units
    must (e.g., adders) be duplicated since they can
    not be shared during a clock cycle
  • but
  • Is simple and easy to understand

Cycle 1
Cycle 2
Clk
Single Cycle Implementation
lw
sw
Waste
8
Multicycle Advantages Disadvantages
  • Uses the clock cycle efficiently the clock
    cycle is timed to accommodate the slowest
    instruction step
  • balance the amount of work to be done in each
    step
  • restrict each step to use only one major
    functional unit
  • Multicycle implementations allow
  • functional units to be used more than once per
    instruction as long as they are used on different
    clock cycles
  • faster clock rates
  • different instructions to take a different number
    of clock cycles
  • but
  • Requires additional internal state registers,
    muxes, and more complicated (FSM) control

9
The Five Stages of Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
lw
  • IFetch Instruction Fetch and Update PC
  • Dec Registers Fetch and Instruction Decode
  • Exec Execute R-type calculate memory address
  • Mem Read/write the data from/to the Data Memory
  • WB Write the data back to the register file

10
Single Cycle vs. Multiple Cycle Timing
Single Cycle Implementation
Cycle 1
Cycle 2
Clk
lw
sw
Waste
Multiple Cycle Implementation
Clk
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
lw
sw
R-type
11
Pipelined MIPS Processor
  • Start the next instruction while still working on
    the current one
  • improves throughput - total amount of work done
    in a given time
  • instruction latency (execution time, delay time,
    response time) is not reduced - time from the
    start of an instruction to its completion

Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 7
Cycle 6
Cycle 8
lw
sw
R-type
12
Single Cycle, Multiple Cycle, vs. Pipeline
Single Cycle Implementation
Cycle 1
Cycle 2
Clk
Load
Store
Waste
Multiple Cycle Implementation
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
lw
sw
R-type
Pipeline Implementation
lw
sw
R-type
13
Pipelining the MIPS ISA
  • What makes it easy
  • all instructions are the same length (32 bits)
  • few instruction formats (three) with symmetry
    across formats
  • memory operations can occur only in loads and
    stores
  • operands must be aligned in memory so a single
    data transfer requires only one memory access
  • What makes it hard
  • structural hazards what if we had only one
    memory
  • control hazards what about branches
  • data hazards what if an instructions input
    operands depend on the output of a previous
    instruction

14
MIPS Pipeline Datapath Modifications
  • What do we need to add/modify in our MIPS
    datapath?
  • State registers between pipeline stages to
    isolate them

IFetch
Dec
Exec
Mem
WB
1
0
Add
Add
4
Shift left 2
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Address
Read Addr 2
IFetch/Dec
Read Address
PC
Read Data
Dec/Exec
1
Write Addr
ALU
Read Data 2
Mem/WB
0
Exec/Mem
Write Data
0
Write Data
1
Sign Extend
16
32
System Clock
15
MIPS Pipeline Control Path Modifications
  • All control signals are determined during Decode
  • and held in the state registers between pipeline
    stages

IFetch
Dec
Exec
Mem
WB
1
0
Control
Add
Add
4
Shift left 2
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Address
Read Addr 2
IFetch/Dec
Read Address
PC
Read Data
Dec/Exec
1
Write Addr
ALU
Read Data 2
Mem/WB
0
Exec/Mem
Write Data
0
Write Data
1
Sign Extend
16
32
System Clock
16
Graphically Representing MIPS Pipeline
  • Can help with answering questions like
  • how many cycles does it take to execute this
    code?
  • what is the ALU doing during cycle 4?
  • is there a hazard, why does it occur, and how can
    it be fixed?

17
Why Pipeline? For Throughput!
Time (clock cycles)
Inst 0
Once the pipeline is full, one instruction is
completed every cycle
I n s t r. O r d e r
Inst 1
Inst 2
Inst 3
Inst 4
18
Can pipelining get us into trouble?
  • Yes Pipeline Hazards
  • structural hazards attempt to use the same
    resource by two different instructions at the
    same time
  • data hazards attempt to use item before it is
    ready
  • instruction depends on result of prior
    instruction still in the pipeline
  • control hazards attempt to make a decision
    before condition is evaulated
  • branch instructions
  • Can always resolve hazards by waiting
  • pipeline control must detect the hazard
  • take action (or delay action) to resolve hazards

19
A Unified Memory Would Be a Structural Hazard
Time (clock cycles)
lw
I n s t r. O r d e r
Inst 1
Inst 2
Inst 3
Inst 4
20
How About Register File Access?
Time (clock cycles)
Can fix register file access hazard by doing
reads in the second half of the cycle and writes
in the first half.
add
I n s t r. O r d e r
Inst 1
Inst 2
add
Inst 4
21
Branch Instructions Cause Control Hazards
  • Dependencies backward in time cause hazards

add
I n s t r. O r d e r
beq
lw
Inst 3
Inst 4
22
One Way to Fix a Control Hazard
add
I n s t r. O r d e r
Can fix branch hazard by waiting stall but
affects throughput
beq
23
Register Usage Can Cause Data Hazards
  • Dependencies backward in time cause hazards

add r1,r2,r3
I n s t r. O r d e r
sub r4,r1,r5
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
24
One Way to Fix a Data Hazard
Can fix data hazard by waiting stall but
affects throughput
add r1,r2,r3
I n s t r. O r d e r
25
Loads Can Cause Data Hazards
  • Dependencies backward in time cause hazards

lw r1,100(r2)
I n s t r. O r d e r
sub r4,r1,r5
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
26
Stores Can Cause Data Hazards
  • Dependencies backward in time cause hazards

add r1,r2,r3
I n s t r. O r d e r
sw r1,100(r5)
and r6,r1,r7
or r8, r1, r9
xor r4,r1,r5
27
Other Pipeline Structures Are Possible
  • What about (slow) multiply operation?
  • let it take two cycles
  • What if the data memory access is twice as slow
    as the instruction memory?
  • make the clock twice as slow or
  • let data memory access take two cycles (and keep
    the same clock rate)

MUL
Reg
DM1
Reg
DM2
28
Sample Pipeline Alternatives
  • ARM7
  • StrongARM-1
  • XScale

PC update IM access
decode reg access
ALU op DM access shift/rotate commit result
(write back)
Reg DM2
IM1
IM2
DM1
Reg
SHFT
PC update BTB access start IM access
decode reg 1 access
DM write reg write
ALU op
start DM access exception
shift/rotate reg 2 access
IM access
29
Summary
  • All modern day processors use pipelining
  • Pipelining doesnt help latency of single task,
    it helps throughput of entire workload
  • Multiple tasks operating simultaneously using
    different resources
  • Potential speedup Number of pipe stages
  • Pipeline rate limited by slowest pipeline stage
  • Unbalanced lengths of pipe stages reduces speedup
  • Time to fill pipeline and time to drain it
    reduces speedup
  • Must detect and resolve hazards
  • Stalling negatively affects throughput
  • To learn (much) more take CSE 431
Write a Comment
User Comments (0)
About PowerShow.com