CpE 242 Computer Architecture and Engineering Designing a Multiple Cycle Processor - PowerPoint PPT Presentation

About This Presentation
Title:

CpE 242 Computer Architecture and Engineering Designing a Multiple Cycle Processor

Description:

WrEn. Dout. Ideal. Memory. 32. 32. 32. CPE 442 multipath..11. Intro. ... be stable BEFORE WrEn = 1. There is a race between Adr and WrEn. Reg File. Ra. Rw. busW ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 39
Provided by: csee1
Category:

less

Transcript and Presenter's Notes

Title: CpE 242 Computer Architecture and Engineering Designing a Multiple Cycle Processor


1
CpE 242Computer Architecture and
EngineeringDesigning a Multiple Cycle Processor
2
A Single Cycle Processor
ALUop
ALU Control
ALUctr
3
func
RegDst
op
3
Main Control
Instrlt50gt
6
ALUSrc
6

Instrlt3126gt
Instructionlt310gt
Branch
Instruction Fetch Unit
Jump
Rt
Rd
lt2125gt
lt1620gt
lt1115gt
lt015gt
Clk
RegDst
0
1
Mux
Imm16
Rd
Rs
Rt
Rs
Rt
RegWr
ALUctr
5
5
5
MemtoReg
busA
Zero
MemWr
Rw
Ra
Rb
busW
32
32 32-bit Registers
0
ALU
32
32
busB
0
Clk
Mux
32
Mux
32
1
WrEn
Adr
1
Data In
32
Data Memory
Extender
imm16
32
16
Instrlt150gt
Clk
ALUSrc
ExtOp
3
Instruction Fetch Unit
30
Addrlt312gt
30
PClt3128gt
Addrlt10gt
00
4
Target
Instruction Memory
30
Instructionlt250gt
26
30
32
30
1
Jump
Instructionlt310gt
30
SignExt
30
imm16
16
Instructionlt150gt
Branch
Zero
4
The Main Control
RegWrite
ALUSrc
RegDst
MemtoReg
MemWrite
Branch
Jump
ExtOp
ALUoplt2gt
ALUoplt1gt
ALUoplt0gt
5
Outline of Todays Lecture
  • Recap and Introduction (5 minutes)
  • Introduction to the Concept of Multiple Cycle
    Processor (15 minutes)
  • Multiple Cycle Implementation of R-type
    Instructions (15 minutes)
  • What is a Multiple Cycle Delay Path and Why is it
    Bad? (10 minutes)
  • Multiple Cycle Implementation of Or Immediate (5
    minutes)
  • Multiple Cycle Implementation of Load and Store
    (15 minutes)
  • Putting it all Together (5 minutes)

6
Drawbacks of this Single Cycle Processor
  • Long cycle time
  • Cycle time must be long enough for the load
    instruction
  • PCs Clock -to-Q
  • Instruction Memory Access Time
  • Register File Access Time
  • ALU Delay (address calculation)
  • Data Memory Access Time
  • Register File Setup Time
  • Clock Skew
  • Cycle time is much longer than needed for all
    other instructions. Examples
  • R-type instructions do not require data memory
    access
  • Jump does not require ALU operation nor data
    memory access

7
Overview of a Multiple Cycle Implementation
  • The root of the single cycle processors
    problems
  • The cycle time has to be long enough for the
    slowest instruction
  • Solution
  • Break the instruction into smaller steps
  • Execute each step (instead of the entire
    instruction) in one cycle
  • Cycle time time it takes to execute the longest
    step
  • Keep all the steps to have similar length
  • This is the essence of the multiple cycle
    processor

8
Overview of a Multiple Cycle Implementation
  • The advantages of the multiple cycle processor
  • Cycle time is much shorter
  • Different instructions take different number of
    cycles to complete
  • Load takes five cycles
  • Jump only takes three cycles
  • Allows a functional unit to be used more than
    once per instruction

9
The Five Steps of a Load Instruction
Instruction Fetch
Instr Decode /
Address
Reg Wr
Data Memory
Reg. Fetch
Clk
Clk-to-Q
New Value
Old Value
PC
Instruction Memory Access Time
Rs, Rt, Rd, Op, Func
Old Value
New Value
Delay through Control Logic
ALUctr
Old Value
New Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
RegWr
Old Value
New Value
Register File Access Time
busA
Old Value
New Value
Delay through Extender Mux
Register File Write Time
busB
Old Value
New Value
ALU Delay
Address
Old Value
New Value
Data Memory Access Time
busW
Old Value
New
10
Register File Memory Write Timing Ideal vs.
Reality
  • In previous lectures, register file and memory
    are simplified
  • Write happens at the clock tick
  • Address, data, and write enable must bestable
    one set-up time before the clock tick
  • In real life
  • Neither register file nor ideal memory has the
    clock input
  • The write path is a combinational logic delay
    path
  • Write enable goes to 1 and Din settles down
  • Memory write access delay
  • Din is written into memaddress
  • Important Address and Data must bestable BEFORE
    Write Enable goes to 1

11
Race Condition Between Address and Write Enable
  • This real (no clock input) register file may
    notwork reliably in the single cycle processor
    because
  • We cannot guarantee Rw willbe stable BEFORE
    RegWr 1
  • There is a race between Rw (address)and RegWr
    (write enable)
  • The real (no clock input) memory may not
    workreliably in the single cycle processor
    because
  • We cannot guarantee Address willbe stable BEFORE
    WrEn 1
  • There is a race between Adr and WrEn

12
How to Avoid this Race Condition?
  • Solution for the multiple cycle implementation
  • Make sure Address is stable by the end of Cycle N
  • Assert Write Enable signal ONE cycle later at
    Cycle (N 1)
  • Address cannot change until Write Enable is
    disasserted

13
Dual-Port Ideal Memory
  • Dual Port Ideal Memory
  • Independent Read (RAdr, Dout) and Write (WAdr,
    Din) ports
  • Read and write (to different location) can occur
    at the same cycle
  • Read Port is a combinational path
  • Read Address Valid --gt
  • Memory Read Access Delay --gt
  • Data Out Valid
  • Write Port is also a combinational path
  • MemWrite 1 --gt
  • Memory Write Access Delay --gt
  • Data In is written into locationWrAdr

14
Instruction Fetch Cycle In the Beginning
  • Every cycle begins right AFTER the clock tick
  • memPC PClt310gt 4

Clk
One Logic Clock Cycle
You are here!
PCWr?
PC
32
MemWr?
IRWr?
32
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
WrAdr
32
Dout
Din
32
ALUop?
32
Clk
15
Instruction Fetch Cycle The End
  • Every cycle ends AT the next clock tick (storage
    element updates)
  • IR lt-- memPC PClt310gt lt-- PClt310gt 4

Clk
One Logic Clock Cycle
You are here!
PCWr1
PC
32
MemWr0
IRWr1
32
00
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
32
WrAdr
Dout
Din
ALUOp Add
32
32
Clk
16
Instruction Fetch Cycle Overall PictureCycle 1
PCWr1
PCWrCondx
PCSrc0
BrWr0
Zero
ALUSelA0
MemWr0
IRWr1
IorD0
1
Mux
32
PC
0
32
Zero
RAdr
32
32
busA
Ideal Memory
32
Instruction Reg
32
4
0
32
WrAdr
32
1
32
Din
Dout
32
busB
2
32
3
ALUSelB00
ALUOpAdd
17
Register Fetch / Instruction DecodeCycle 2
  • busA lt- RegFilers busB lt- RegFilert
  • ALU is not being used ALUctr xx

PCWr0
PCWrCond0
PCSrcx
Zero
ALUSelAx
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Op
Go to the Control
Imm
6
ALUSelBxx
Func
16
6
ALUOpxx
18
Register Fetch / Instruction DecodeCycle 2
(Continue)
  • busA lt- Regrs busB lt- Regrt
  • Target lt- PC SignExt(Imm16)4

PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd

ExtOp1
19
Branch Completion Cycle 3
  • if (busA busB)
  • PC lt- Target

PCWr0
PCWrCond1
PCSrc1
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
32
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
ALUSelB01
16
32
ALUOpSub
ExtOpx
20
Instruction Decode Cycle 2, We have a R-type!
  • Next Cycle R-type Execution

PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd

ExtOp1
21
R-type Execution Cycle 3
  • ALU Output lt- busA op busB

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst1
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpRtype
MemtoRegx
ExtOpx
ALUSelB01
22
R-type Completion Cycle 4
  • Rrd lt- ALU Output

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst1
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpRtype
MemtoReg0
ExtOpx
ALUSelB01
23
A Multiple Cycle Delay Path
  • There is no register to save the results between
  • Register Fetch busA lt- Regrs busB lt-
    Regrt
  • R-type Execution ALU output lt- busA op busB
  • R-type Completion Regrd lt- ALU output

Register here to save outputs of Rfetch?
ALUselA
PCWr
Register here to save outputs of RExec?
Zero
Rs
Ra
5
busA
32
Rt
Rb
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
1
32
Rd
busW
32
busB
2
3
ALUselB
ALUOp
24
A Multiple Cycle Delay Path (Continue)
  • Register is NOT needed to save the outputs of
    Register Fetch
  • IRWr 0 busA and busB will not change after
    Register Fetch
  • Register is NOT needed to save the outputs of
    R-type Execution
  • busA and busB will not change after Register
    Fetch
  • Control signals ALUSelA, ALUSelB, and ALUOpwill
    not change after R-type Execution
  • Consequently ALU output will not change after
    R-type Execution
  • In theory (P. 378, PH), you need a register to
    hold a signal value if
  • (1) The signal is computed in one clock cycle and
    used in another.
  • (2) AND the inputs to the functional block that
    computes this signal can change before the
    signal is written into a state element.
  • You can save a register if Cond 1 is true BUT
    Cond 2 is false
  • But in practice, this will introduce a multiple
    cycle delay path
  • A logic delay path that takes multiple cycles to
    propagate from one storage element to the next
    storage element

25
Pros and Cons of a Multiple Cycle Delay Path
  • A 3-cycle path example
  • IR (storage) -gt Reg File Read -gt ALU -gt Reg
    File Write (storage)
  • Advantages
  • Register savings
  • We can share time among cycles
  • If ALU takes longer than one cycle, still a OK
    as longas the entire path takes less than 3
    cycles to finish

26
Pros and Cons of a Multiple Cycle Delay Path
(Continue)
  • Disadvantage
  • Static timing analyzer, which ONLY looks at delay
    between two storage elements, will report this as
    a timing violation
  • You have to ignore the static timing analyzers
    warnings
  • But you may end up ignoring real timing
    violations
  • We always TRY to put in registers between cycles
    to avoid MCDP
  • assume we add registers A,B, ALUOut, and Mem.
    Data register

A
ALUOut (Can also be Used intead of the Target
reg.)
B
Mem Data
27
Instruction Decode Cycle 2, We have an Ori!
  • Next Cycle Ori Execution

PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Intruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd

ExtOp1
28
Ori Execution Cycle 3
  • ALU output lt- busA or ZeroExtImm16

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpOr
MemtoRegx
ExtOp0
ALUSelB11
29
Ori Completion Cycle 4
  • Regrt lt- ALU output

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpOr
MemtoReg0
ExtOp0
ALUSelB11
30
Instruction Decode Cycle 2, We have a Memory
Access!
  • Next Cycle Memory Address Calculation

PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd

ExtOp1
31
Memory Address CalculationCycle 3
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
  • ALU output lt- busA SignExtImm16

ALUOpAdd
x MemtoReg
PCSrc
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
32
Memory Access for StoreCycle 4
SWmem
1 ExtOp
MemWr
ALUSelA
ALUSelB11
  • memALU output lt- busB

ALUOpAdd
x PCSrc,RegDst
PCWr0
PCWrCond0
PCSrcx
BrWr0
MemtoReg
Zero
ALUSelA1
MemWr1
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
33
Memory Access for LoadCycle 4
  • Mem Dout lt- memALU output

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorD1
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
34
Write Back for LoadCycle 5
  • Regrt lt- Mem Dout

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoReg1
ExtOp1
ALUSelB11
35
Putting it all together Multiple Cycle Datapath
PCWr
PCWrCond
PCSrc
BrWr
Zero
ALUSelA
MemWr
IRWr
RegWr
RegDst
IorD
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOp
MemtoReg
ExtOp
ALUSelB
36
Summary
  • Disadvantages of the Single Cycle Proccessor
  • Long cycle time
  • Cycle time is too long for all instructions
    except the Load
  • Multiple Cycle Processor
  • Divide the instructions into smaller steps
  • Execute each step (instead of the entire
    instruction) in one cycle
  • Do NOT confuse Multiple Cycle Processor with
    Multiple Cycle Delay Path
  • Multiple Cycle Processor executes
    eachinstruction in multiple clock cycles
  • Multiple Cycle Delay Path a combinational logic
    path between two storage elements that takes more
    than one clock cycle to complete
  • It is possible (desirable) to build a MC
    Processor without MCDP
  • Use a register to save a signals value whenever
    a signal is generated in one clock cycle and used
    in another cycle later

37
Putting it all together Control State Diagram
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
38
Where to get more information?
  • Next two lectures
  • Multiple Cycle Controller Appendix C of your
    text book.
  • Microprogramming Section 5.5 of your text book.
  • D. Patterson, Microprograming, Scientific
    America, March 1983.
  • D. Patterson and D. Ditzel, The Case for the
    Reduced Instruction Set Computer, Computer
    Architecture News 8, 6 (October 15, 1980)
Write a Comment
User Comments (0)
About PowerShow.com