Title: EEM 486: Computer Architecture Lecture 4 Designing a Multicycle Processor
1EEM 486 Computer ArchitectureLecture
4Designing a Multicycle Processor
2The Big Picture
- Designing a Multiple Clock Cycle Datapath
3Single-Cycle Processor
- In our single-cycle processor, each instruction
is realized - by exactly one control command or microinstruction
4Abstract View of Single Cycle-Processor
5Whats Wrong with CPI1 Processor?
Arithmetic Logical
PC
Reg File
Inst Memory
ALU
setup
mux
mux
Load
PC
Inst Memory
ALU
Data Mem
Reg File
setup
mux
mux
Critical Path
Store
Inst Memory
PC
ALU
Data Mem
Reg File
mux
Branch
PC
Inst Memory
cmp
Reg File
mux
- Long Cycle Time
- All instructions take as much time as the slowest
- Real memory is not as nice as our idealized
memory - Cannot always get the job done in one (short)
cycle
6Memory Access Time
- Physics gt fast memories are small (large
memories are slow) - gt Use a hierarchy of memories
7Multicycle Approach
- Break up the instructions into steps
- Let each step take one smaller clock cycle
- - Balance the amount of work to be done
- - Restrict each cycle to use only one major
functional unit - Major functional units Memory,
Register File, and ALU - Let different instructions take different numbers
of cycles - Use a functional unit more than once within
execution of one instruction (Less hardware) - A single memory unit for both instructions and
data - A single ALU, rather than an ALU and two adders
- At the end of a cycle
- store values for use in later cycles
- introduce additional internal registers
8Partitioning the CPI1 Datapath
- Add registers between smallest steps
MemWr
MemWr
RegDst
RegWr
MemRd
ALUSrc
nPC_sel
ExtOp
ALUctr
Reg. File
Exec
Operand Fetch
Instruction Fetch
Mem Access
PC
Next PC
Data Mem
Write back
Execution
Memory access
Instruction fetch
Decode and Operand fetch
9Recall Step-by-step Processor Design
- Step 1 ISA gt Logical Register Transfers
- Step 2 Components of the Datapath
- Step 3 RTL Components gt Datapath
- Step 4 Datapath Logical RTs gt Physical RTs
- Step 5 Physical RTs gt Control
10Step 4 R-type (add, sub, . . .)
inst Logical Register Transfers ADDU RrdltRrs
Rrt PC lt PC 4
Step 1. Instruction Fetch IR ? MEMPC, PC
? PC 4 Step 2. Instruction Decode and
Register Fetch A ? Rrs, B ? Rrt Step
3. Execution ALUOut ? A op B Step
4. Write-back Rrd ? ALUOut
11R-type - Fetch
12R-type Decode/Register Fetch
13R-type - Execution
14R-type Write Back
15Step 4 Logical immed
inst Logical Register Transfers ORI Rrt
lt Rrs OR ZExt(Im16) PC lt PC 4
Step 1. Instruction Fetch IR ? MEMPC, PC
? PC 4 Step 2. Instruction Decode and
Register Fetch A ? Rrs Step
3. Execution ALUOut ? A OR
ZExt(Im16) Step 4. Write-back Rrt ? ALUOut
16Logical immediate - Execution
ALUSrcA1
RegWrite0
nPCWrite0
MemRead0
IRWrite0
Address
PC
Instruction 25-21
Rs
0
Read data 1
Memory
A
1
MemData
Instruction 20-16
Rt
ALU Out
ALU
Registers
Write data
Inst 15-11
B
0
Read data 2
Instruction 15-0
Rw
1
4
Write data
Instruction register
2
Zero extend
16
32
ALUSrcB2
ALUctrOr
17Logical immediate Write Back
18Step 4 Load
inst Logical Register Transfers LW Rrt lt
MEMRrs SExt(Im16) PC lt PC 4
Step 1. Instruction Fetch IR ? MEMPC, PC
? PC 4 Step 2. Instruction Decode and
Register Fetch A ? Rrs Step 3. Memory
address computation ALUOut ? A
SExt(Im16) Step 4. Memory access MDR ?
MemoryALUOut Step 5. Load completion
Rrt ? MDR
19Load Address Calculation
RegDstx
ALUSrcA1
RegWrite0
nPCWrite0
MemRead0
IRWrite0
Address
PC
Instruction 25-21
Rs
0
Read data 1
Memory
A
1
MemData
Instruction 20-16
Rt
ALU Out
Registers
Write data
ALU
B
0
Read data 2
0
Inst 15-11
Instruction 15-0
Rw
1
1
4
Write data
Instruction register
2
Zero/ Sign extend
16
32
ALUSrcB2
ALUctrAdd
ExtOpSign
20Load Memory Read
RegDstx
MemRead1
ALUSrcAx
RegWrite0
nPCWrite0
IRWrite0
Instruction 31-26
Address
PC
0
Instruction 25-21
Rs
Read data 1
Memory
1
A
MemData
Instruction 20-16
ALU Out
Rt
ALU
Registers
Write data
B
0
Read data 2
Inst 15-11
Instruction 15-0
Rw
1
4
Write data
Instruction register
2
MDR
Extender
16
32
ALUSrcBx
ALUctrx
ExtOpx
IorD1
21Load Write Back
22Step 4 Store
inst Logical Register Transfers SW MEMRrs
SExt(Im16) lt Rrt PC lt PC 4
Step 1. Instruction Fetch IR ? MEMPC, PC
? PC 4 Step 2. Instruction Decode and
Register Fetch A ? Rrs, B ? Rrt Step
3. Memory address computation ALUOut ? A
SExt(Im16) Step 4. Memory access
MemoryALUOut ? B
23Store Address Calculation
24Store Memory Write
25Step 4 Branch
inst Logical Register Transfers BEQ if Rrs
Rrt then PC lt PC 4 SExt(Im16) 00 else
PC lt PC 4
Step 1. Instruction Fetch IR ? MEMPC, PC
? PC 4 Step 2. Instruction Decode and
Register Fetch A ? Rrs, B ? Rrt
ALUOut ? PC SExt(Im16) 00 Step 3. Branch
completion If A B, PC ? ALUOut
26Branch Address Calculation
27BranchExecution
28Multicycle Processor
29Summary of Instruction Steps
30Simple Questions
- How many cycles will it take to execute this
code? lw t2, 0(t3) lw t3, 4(t3) beq
t2, t3, Label assume not add t5, t2,
t3 sw t5, 8(t3)Label ... - What is going on during the 8th cycle of
execution? - In what cycle does the actual addition of t2 and
t3 takes place?
31Finite State Machine (FSM) Controller
- State specifies control points for Register
Transfer - Transfer occurs upon exiting state (same falling
edge)
32FSM for Control
33Step 4 ? Control Specification
34Step 5 ? (datapath state diagram?? control)
- Translate RTs into control points
- Assign states
- Then go build the controller
35Mapping RTs to Control Points
36Assigning States
37Control Logic Datapath Control Outputs
38Control Logic Next State Function
39PLA Implementation
40Performance Evaluation
- What is the average CPI?
- State diagram gives CPI for each instruction type
- Workload gives frequency of each type
Type CPIi for type Frequency CPIi x freqIi
Arith/Logic 4 40 1.6 Load 5 30
1.5 Store 4 10 0.4 branch 3 20
0.6 Average CPI 4.1
41Another Implementation Style
42Address Select Logic
43Address Select Logic
44Dispatch ROMs
Dispatch ROM 1
Op
Opcode name
Value
000000
R-format
0110
000010
jmp
1001
beq
000100
1000
lw
100011
0010
101011
sw
0010
Dispatch ROM 2
Op
Opcode name
Value
100011
lw
0011
sw
101011
0101
45Microprogramming
Microprogramming Designing the control as a
program that implements the machine instructions
in terms of microinstructions
46Microinstruction ???
User program plus Data this can change!
Main Memory
ADD SUB AND
. . .
one of these is mapped into one of these
DATA
execution unit
AND microsequence e.g., Fetch Calc
Operand Addr Fetch Operand(s)
Calculate Save Answer(s)
control memory
CPU
47Microprogramming a Multicycle Processor
- 1) Choose datapath and sequencer architecture
- 2) Assign states and sequence of each
(multicycle) instruction (i.e., define the
controller FSM) - 3) Choose microinstruction format (minimum bits
to describe all allowable functions of sequencer
and datapath) - 4) Map instructions into microinstruction
sequences
48Designing a Microinstruction Set
- 1) Start with list of control signals
- 2) Group signals together that make sense called
fields - 3) Place fields in some logical order (e.g., ALU
operation ALU operands first and
microinstruction sequencing last) - 4) Create a symbolic legend for the
microinstruction format, showing name of field
values and how they set the control signals - 5) To minimize the width, encode operations that
will never be used at the same time
49Multicycle Processor
50Microinstruction fields
51Sequencer
52Microinstructions
Fetch and Decode
R-type instructions
53Microinstructions
Memory-reference
Branch
54Microprogram
55Summary
- Disadvantages of the Single Cycle Processor
- Long cycle time
- Cycle time is too long for all instructions
except the Load - Multiple Cycle Processor
- Divide the instructions into smaller steps
- Execute each step (instead of the entire
instruction) in one cycle - Partition datapath into equal size chunks to
minimize cycle time - Follow same 5-step method for designing real
processor