Title: CpE 242 Computer Architecture and Engineering Designing a Multiple Cycle Processor
1CpE 242Computer Architecture and
EngineeringDesigning a Multiple Cycle Processor
2A Single Cycle Processor
ALUop
ALU Control
ALUctr
3
func
RegDst
op
3
Main Control
Instrlt50gt
6
ALUSrc
6
Instrlt3126gt
Instructionlt310gt
Branch
Instruction Fetch Unit
Jump
Rt
Rd
lt2125gt
lt1620gt
lt1115gt
lt015gt
Clk
RegDst
0
1
Mux
Imm16
Rd
Rs
Rt
Rs
Rt
RegWr
ALUctr
5
5
5
MemtoReg
busA
Zero
MemWr
Rw
Ra
Rb
busW
32
32 32-bit Registers
0
ALU
32
32
busB
0
Clk
Mux
32
Mux
32
1
WrEn
Adr
1
Data In
32
Data Memory
Extender
imm16
32
16
Instrlt150gt
Clk
ALUSrc
ExtOp
3Instruction Fetch Unit
30
Addrlt312gt
30
PClt3128gt
Addrlt10gt
00
4
Target
Instruction Memory
30
Instructionlt250gt
26
30
32
30
1
Jump
Instructionlt310gt
30
SignExt
30
imm16
16
Instructionlt150gt
Branch
Zero
4The Main Control
RegWrite
ALUSrc
RegDst
MemtoReg
MemWrite
Branch
Jump
ExtOp
ALUoplt2gt
ALUoplt1gt
ALUoplt0gt
5Outline of Todays Lecture
- Recap and Introduction (5 minutes)
- Introduction to the Concept of Multiple Cycle
Processor (15 minutes) - Multiple Cycle Implementation of R-type
Instructions (15 minutes) - What is a Multiple Cycle Delay Path and Why is it
Bad? (10 minutes) - Multiple Cycle Implementation of Or Immediate (5
minutes) - Multiple Cycle Implementation of Load and Store
(15 minutes) - Putting it all Together (5 minutes)
6Drawbacks of this Single Cycle Processor
- Long cycle time
- Cycle time must be long enough for the load
instruction - PCs Clock -to-Q
- Instruction Memory Access Time
- Register File Access Time
- ALU Delay (address calculation)
- Data Memory Access Time
- Register File Setup Time
- Clock Skew
- Cycle time is much longer than needed for all
other instructions. Examples - R-type instructions do not require data memory
access - Jump does not require ALU operation nor data
memory access
7Overview of a Multiple Cycle Implementation
- The root of the single cycle processors
problems - The cycle time has to be long enough for the
slowest instruction - Solution
- Break the instruction into smaller steps
- Execute each step (instead of the entire
instruction) in one cycle - Cycle time time it takes to execute the longest
step - Keep all the steps to have similar length
- This is the essence of the multiple cycle
processor
8Overview of a Multiple Cycle Implementation
- The advantages of the multiple cycle processor
- Cycle time is much shorter
- Different instructions take different number of
cycles to complete - Load takes five cycles
- Jump only takes three cycles
- Allows a functional unit to be used more than
once per instruction
9The Five Steps of a Load Instruction
Instruction Fetch
Instr Decode /
Address
Reg Wr
Data Memory
Reg. Fetch
Clk
Clk-to-Q
New Value
Old Value
PC
Instruction Memory Access Time
Rs, Rt, Rd, Op, Func
Old Value
New Value
Delay through Control Logic
ALUctr
Old Value
New Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
RegWr
Old Value
New Value
Register File Access Time
busA
Old Value
New Value
Delay through Extender Mux
Register File Write Time
busB
Old Value
New Value
ALU Delay
Address
Old Value
New Value
Data Memory Access Time
busW
Old Value
New
10Register File Memory Write Timing Ideal vs.
Reality
- In previous lectures, register file and memory
are simplified - Write happens at the clock tick
- Address, data, and write enable must bestable
one set-up time before the clock tick - In real life
- Neither register file nor ideal memory has the
clock input - The write path is a combinational logic delay
path - Write enable goes to 1 and Din settles down
- Memory write access delay
- Din is written into memaddress
- Important Address and Data must bestable BEFORE
Write Enable goes to 1
11Race Condition Between Address and Write Enable
- This real (no clock input) register file may
notwork reliably in the single cycle processor
because - We cannot guarantee Rw willbe stable BEFORE
RegWr 1 - There is a race between Rw (address)and RegWr
(write enable) - The real (no clock input) memory may not
workreliably in the single cycle processor
because - We cannot guarantee Address willbe stable BEFORE
WrEn 1 - There is a race between Adr and WrEn
12How to Avoid this Race Condition?
- Solution for the multiple cycle implementation
- Make sure Address is stable by the end of Cycle N
- Assert Write Enable signal ONE cycle later at
Cycle (N 1) - Address cannot change until Write Enable is
disasserted
13Dual-Port Ideal Memory
- Dual Port Ideal Memory
- Independent Read (RAdr, Dout) and Write (WAdr,
Din) ports - Read and write (to different location) can occur
at the same cycle - Read Port is a combinational path
- Read Address Valid --gt
- Memory Read Access Delay --gt
- Data Out Valid
- Write Port is also a combinational path
- MemWrite 1 --gt
- Memory Write Access Delay --gt
- Data In is written into locationWrAdr
14Instruction Fetch Cycle In the Beginning
- Every cycle begins right AFTER the clock tick
- memPC PClt310gt 4
Clk
One Logic Clock Cycle
You are here!
PCWr?
PC
32
MemWr?
IRWr?
32
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
WrAdr
32
Dout
Din
32
ALUop?
32
Clk
15Instruction Fetch Cycle The End
- Every cycle ends AT the next clock tick (storage
element updates) - IR lt-- memPC PClt310gt lt-- PClt310gt 4
Clk
One Logic Clock Cycle
You are here!
PCWr1
PC
32
MemWr0
IRWr1
32
00
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
32
WrAdr
Dout
Din
ALUOp Add
32
32
Clk
16Instruction Fetch Cycle Overall PictureCycle 1
PCWr1
PCWrCondx
PCSrc0
BrWr0
Zero
ALUSelA0
MemWr0
IRWr1
IorD0
1
Mux
32
PC
0
32
Zero
RAdr
32
32
busA
Ideal Memory
32
Instruction Reg
32
4
0
32
WrAdr
32
1
32
Din
Dout
32
busB
2
32
3
ALUSelB00
ALUOpAdd
17Register Fetch / Instruction DecodeCycle 2
- busA lt- RegFilers busB lt- RegFilert
- ALU is not being used ALUctr xx
PCWr0
PCWrCond0
PCSrcx
Zero
ALUSelAx
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Op
Go to the Control
Imm
6
ALUSelBxx
Func
16
6
ALUOpxx
18Register Fetch / Instruction DecodeCycle 2
(Continue)
- busA lt- Regrs busB lt- Regrt
- Target lt- PC SignExt(Imm16)4
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd
ExtOp1
19Branch Completion Cycle 3
- if (busA busB)
- PC lt- Target
PCWr0
PCWrCond1
PCSrc1
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
32
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
ALUSelB01
16
32
ALUOpSub
ExtOpx
20Instruction Decode Cycle 2, We have a R-type!
- Next Cycle R-type Execution
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd
ExtOp1
21R-type Execution Cycle 3
- ALU Output lt- busA op busB
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst1
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpRtype
MemtoRegx
ExtOpx
ALUSelB01
22R-type Completion Cycle 4
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst1
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpRtype
MemtoReg0
ExtOpx
ALUSelB01
23A Multiple Cycle Delay Path
- There is no register to save the results between
- Register Fetch busA lt- Regrs busB lt-
Regrt - R-type Execution ALU output lt- busA op busB
- R-type Completion Regrd lt- ALU output
Register here to save outputs of Rfetch?
ALUselA
PCWr
Register here to save outputs of RExec?
Zero
Rs
Ra
5
busA
32
Rt
Rb
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
1
32
Rd
busW
32
busB
2
3
ALUselB
ALUOp
24A Multiple Cycle Delay Path (Continue)
- Register is NOT needed to save the outputs of
Register Fetch - IRWr 0 busA and busB will not change after
Register Fetch - Register is NOT needed to save the outputs of
R-type Execution - busA and busB will not change after Register
Fetch - Control signals ALUSelA, ALUSelB, and ALUOpwill
not change after R-type Execution - Consequently ALU output will not change after
R-type Execution - In theory (P. 378, PH), you need a register to
hold a signal value if - (1) The signal is computed in one clock cycle and
used in another. - (2) AND the inputs to the functional block that
computes this signal can change before the
signal is written into a state element. - You can save a register if Cond 1 is true BUT
Cond 2 is false - But in practice, this will introduce a multiple
cycle delay path - A logic delay path that takes multiple cycles to
propagate from one storage element to the next
storage element
25Pros and Cons of a Multiple Cycle Delay Path
- A 3-cycle path example
- IR (storage) -gt Reg File Read -gt ALU -gt Reg
File Write (storage) - Advantages
- Register savings
- We can share time among cycles
- If ALU takes longer than one cycle, still a OK
as longas the entire path takes less than 3
cycles to finish
26Pros and Cons of a Multiple Cycle Delay Path
(Continue)
- Disadvantage
- Static timing analyzer, which ONLY looks at delay
between two storage elements, will report this as
a timing violation - You have to ignore the static timing analyzers
warnings - But you may end up ignoring real timing
violations - We always TRY to put in registers between cycles
to avoid MCDP - assume we add registers A,B, ALUOut, and Mem.
Data register
A
ALUOut (Can also be Used intead of the Target
reg.)
B
Mem Data
27Instruction Decode Cycle 2, We have an Ori!
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Intruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd
ExtOp1
28Ori Execution Cycle 3
- ALU output lt- busA or ZeroExtImm16
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpOr
MemtoRegx
ExtOp0
ALUSelB11
29Ori Completion Cycle 4
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpOr
MemtoReg0
ExtOp0
ALUSelB11
30Instruction Decode Cycle 2, We have a Memory
Access!
- Next Cycle Memory Address Calculation
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd
ExtOp1
31Memory Address CalculationCycle 3
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
- ALU output lt- busA SignExtImm16
ALUOpAdd
x MemtoReg
PCSrc
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
32Memory Access for StoreCycle 4
SWmem
1 ExtOp
MemWr
ALUSelA
ALUSelB11
ALUOpAdd
x PCSrc,RegDst
PCWr0
PCWrCond0
PCSrcx
BrWr0
MemtoReg
Zero
ALUSelA1
MemWr1
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
33Memory Access for LoadCycle 4
- Mem Dout lt- memALU output
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorD1
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
34Write Back for LoadCycle 5
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoReg1
ExtOp1
ALUSelB11
35Putting it all together Multiple Cycle Datapath
PCWr
PCWrCond
PCSrc
BrWr
Zero
ALUSelA
MemWr
IRWr
RegWr
RegDst
IorD
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOp
MemtoReg
ExtOp
ALUSelB
36Summary
- Disadvantages of the Single Cycle Proccessor
- Long cycle time
- Cycle time is too long for all instructions
except the Load - Multiple Cycle Processor
- Divide the instructions into smaller steps
- Execute each step (instead of the entire
instruction) in one cycle - Do NOT confuse Multiple Cycle Processor with
Multiple Cycle Delay Path - Multiple Cycle Processor executes
eachinstruction in multiple clock cycles - Multiple Cycle Delay Path a combinational logic
path between two storage elements that takes more
than one clock cycle to complete - It is possible (desirable) to build a MC
Processor without MCDP - Use a register to save a signals value whenever
a signal is generated in one clock cycle and used
in another cycle later
37Putting it all together Control State Diagram
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
38Where to get more information?
- Next two lectures
- Multiple Cycle Controller Appendix C of your
text book. - Microprogramming Section 5.5 of your text book.
- D. Patterson, Microprograming, Scientific
America, March 1983. - D. Patterson and D. Ditzel, The Case for the
Reduced Instruction Set Computer, Computer
Architecture News 8, 6 (October 15, 1980)