Title: Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control
1Computer Organization and ArchitectureChapter
5 The Processor Datapath and Control
- Yu-Lun Kuo
- Computer Sciences and Information Engineering
- University of Tunghai, Taiwan
- sscc6991_at_gmail.com
25.1 Introduction
- The performance of a machine
- Instruction count
- Clock cycle time
- Clock cycles per instruction (CPI)
- The compiler and the instruction set architecture
- Determine the instruction count required for a
given instruction
35.1 Introduction
- Both the clock cycle time and the number of CPI
- Determined by the implementation of the processor
- We construct the datapath and control unit for
two different implementations of the MIPS
instruction set - Single cycle implementation
- Multi cycle implementation
45.1 Introduction
- We are going to see how the processor is
implemented - starting with a very simple processor, and adding
some more complexity
5Basic MPIS Implementation
- Include a subset of the MIPS instruction
- Memory-reference instructions lw and sw
- The ALU instructions add, sub, and, or, slt
- Control flow instructions beq and j
- Generic Implementation
- Use the program counter (PC) to supply
instruction address - Fetch the instruction from memory
- Read one/two registers
- Use the instruction to decide exactly what to do
6Basic MPIS Implementation
- All instructions use the ALU after reading the
registers (except jump) - Memory-reference instructions use ALU for address
calculation - Arithmetic-logical instructions for the operation
execution - Branches for comparison
7Our Processor, sort of
- Whats missing
- How to combine input that are joined together
- How to tell which component what to do?
8(No Transcript)
9Multiplexers and Controllers
- In the previous figure we have two or more
wires going into the input of a component - This is because depending on the instruction
being executed different input should be provided - So, based on the instruction, we need to decide
which input should be selected - This is done with a multiplexer (???)
M U X
input 1
. . .
selected output
input n
control ceil(log2(n)) bits
10What about the Control?
- So great, now we can control multiplexers
- Need a controller sends the appropriate control
bits to all the multiplexers and the components - Besides, there are other things to control
- Example the ALU has a bunch of control bits,
that tells it what to do
00 ADD 01 SUB 10 MUL 11 SHIFT
2-bit control
11Control Unit (Simplified)
. . . offset
0 or 1
M U X
PC
input 1
input 0
Add
4
12A More Complete Picture
13(No Transcript)
145.2 Logic Design Conventions
- The functional units (????) in the MIPS
implementation consist of two different types of
logic elements - Elements that operate on data values
(combinational) - Outputs depend only on the current inputs
- Always produces the same output
- It has no internal storage
- Elements that contain state (sequential)
- Has at least two inputs and one output
- Data value to be written into the element
- Clock determine when the data value is written
- The value that was written in a previous clock
cycle
15Clocking Methodology
- Clocking methodology
- When signals can be read and when they can be
written - If a signal is written at the same time it is
read. Computer designs cannot tolerate such
unpredictability - The clock cycle/period is divided into two
portions - high clock
- low clock
falling edge
rising edge
clock cycle
16Edge-triggered Clocking
- Edge-triggered clocking (????)
- meaning that state changes (in state elements)
occur only at a clock edge - Using either the rising edge or the falling edge
- Typical execution
- Read contents of some state elements
- Send values through some combinational logic
- Write results to one or more state elements
Combinational logic
State element 1
State element 2
Clock cycle
17The Clock
state element 1
state element 2
combinatorial circuit
stable
updated on edge
stable by edge
clock cycle
- In the above, we want to use the value in state
element 1 to modify the value in state element
2 It takes one cycle - We need all signals to be stabilized
18(No Transcript)
19Read/Write in a Clock Cycle
- A great implication of edge-triggered clocking
- A state element can be read and written in the
same clock cycle - We will say things like reads happen in the
first half of the clock cycle, writes happen in
the second half
state element 1
state element 2
combinatorial circuit
stable
updated on edge
stable by edge
20(No Transcript)
21Write Control Signal (p.291)
- Both the clock signal and the write control
signal are inputs - The state element is changed only when
- The write control signal is asserted
- Clock edge occurs
- Assuming a rising edge update
- While the control bit stays at 0, nothing happen
- If we set the control bit to 1, the state element
will be updated at the next rising edge
22Busses and bus width
- Many of the state elements and combinational
elements take multi-bit inputs (often 32-bit
inputs) - The term bus refers to a wire that carries more
than one bit - multiple 1-bit wires, really
- We simply indicate the width of the busses as
follows
16
control signal
8
23Building a Datapath
- A datapath is an element in the processor that is
supposed to operate on or hold data - instruction memory, data memory, register file,
ALU, adders - Lets re-examine the datapath elements we only
barely introduced earlier
24Building a Datapath
- Start by looking at which datapath elememts each
instruction needs - Also show their control signals
- Program Counter (PC) (?????)
- (Register) Memory unit to store the instructions
of a program and supply instructions given an
address - 32 bits register that will written at the end of
every clock cycle (not need a write control
signal) - Adder (???)
- Increment the PC to the address of the next
instruction - Combinational. Built from the ALU
25The Three Elements
- Two state element are needed to store and access
instructions - The instruction memory only provide read
- Output at any time reflects the contents of the
location specified by the address input - An adder is needed to compute the next
instruction address (4 Bytes) - ALU wired to always perform an add
26Fetching Instructions
read address, instruction retrieved from
instruction memory
32
PC
4
32
PC 4 latched into PC
read address
Instruction
32
Instruction Memory
The PC gets updated in 1 clock cycle because we
use edge-triggered clocking
27Register File
- The processors 32 general-purpose registers
- Stored in a structure called register file
- Register file
- Collection of registers in which any register can
be read or written by specifying the number of
the register in the file
Clock
5 bits
32 bits
5 bits
5 bits
32 bits
32 bits
Control signal
28Datapath Instruction Store/Fetch PC Increment
Three elements used to store and fetch
instructions and increment the PC
Datapath
29Animating the Datapath
Instruction lt- MEMPC PC lt- PC 4
30What about R-type instructions
- These instructions take 3 registers as arguments
- 1 output register
- 2 input registers
- Example add t1, t2, t3
- Which reads t2 and t3 and writes t1
- We need an input that contains data to be written
into the output register - Typically comes from the ALU
- We need a Write signal to trigger the register
write on the next clock edge - A write anytime during the clock cycle could lead
to race conditions if that register is also read
31Datapath R-Type Instruction
Two elements used to implement R-type instructions
Datapath
32Register File and ALU
Extracted from the 32-bit instruction code
5
Read register 1
Read data 1
5
Register number
Read register 2
32
zero
32
5
Write register
32
32
Read data 2
Operation
4
32
Write data
32
Register File
RegWrite
33Add t1, t1, t2 (sketch)
i n s t r u c t i o n
5
Read register 1
t1
Read data 1
5
Read register 2
t2
32
5
Write register
t1
zero
Read data 2
Operation
4
32
Write data
32
Register File
RegWrite (must be set only at the next edge)
34Animating the Datapath (R-type)
add rd, rs, rt
Rrd lt- Rrs Rrt
35What about the Load/Store
- Ex. lw t1, offset(t2)
- The memory _at_ is computed by adding the 16-bit
signed offset to the input register - The offset of 16-bit, but memory addresses are
32-bit - Therefore, the offset must be sign-extended into
a 32-bit value before being added to the input
register - The memory has both read and write control
- MemWrite control signal
- MemRead control signal
36Datapath Load/Store Instruction
Two additional elements used To implement
load/stores
Datapath
37Implementing Load/Store
MemWrite
Read data
Address
sign extend
32
32
32
16
Write data
32
Data Memory
Sign-extension Unit
MemRead
Data Memory Unit
38(No Transcript)
39Implementing lw s1,offset(s2)
5
Read register 1
5
Read data 1
Read register 2
32
5
Write register
32
Read data 2
Write data
32
Register File
i n s t r u c t i o n
MemWrite (not set)
RegWrite (set on next edge)
s1
Read data
Address
32
32
s2
Write data
offset
32
32
Data Memory
MemRead (set)
40Animating the Datapath (Load)
lw rt, offset(rs)
Rrt lt- MEMRrss_extend(offset)
41Animating the Datapath (Store)
sw rt, offset(rs)
MEMRrssign_extend(offset) lt- Rrt
42What about the Branch (beq)
- 2 registers that are compared
- To do a branch we must
- Compute the branchs target address based on its
offset - Decide whether the branch is taken or not taken
- Taken branch target address becomes the new PC
- PC (PC4)4(target field)
- Not taken if the operands are not equal,
- PCPC4 as usual
43Branch Datapath
No shift hardware required simply connect wires
from input to output, each shifted left 2 bits
Datapath
44(No Transcript)
45Animating the Datapath (branch)
beq rs, rt, offset
if (Rrs Rrt) then PC lt- PC4
s_extend(offsetltlt2)
46Putting it altogether
- The simplest design is one in which
- all instructions are executed in a single clock
cycle - In this case, every element of the datapath is
used only once per clock cycle - No duplication of hardware needed
- Or only of a few adders perhaps here and there
- And we need separate Data and Instruction
memories - Lets at first put together the pieces for the
R-type (ALU) instructions and the memory
instructions as they are quite similar.
47Altogether (not quite)
Combining the datapaths for R-type instructions
and load/stores using two multiplexors
We simply add multiplexer (???) for choosing
between the datapath for the ALU instructions
and the memory instructions
48(No Transcript)
49Animating the Datapath R-type Instruction
add rd,rs,rt
50Animating the Datapath Load Instruction
lw rt,offset(rs)
51Animating the Datapath Store Instruction
sw rt,offset(rs)
52Separate adder as ALU operations and PC increment
occur in the same clock cycle
Separate instruction memory as instruction and
data read occur in the same clock cycle
Adding instruction fetch
53(No Transcript)
54Complete Altogether
New multiplexor
Extra adder needed as both adders operate in each
cycle
Instruction address is either PC4 or branch
target address
Adding branch capability and
another multiplexor
Important note in a single-cycle implementation
data cannot be stored during an instruction it
only moves through combinational logic Question
is the MemRead signal really needed?! Think of
RegWrite!
555.4 What now?
- At this point weve identified most of the
component for an almost full datapath for a very
simple implementation of the MIPS ISA - Let us now design the logic that makes it all
work - i.e., how we set the control signals
56Datapath Executing add
add rd, rs, rt
57Datapath Executing lw
lw rt,offset(rs)
58Datapath Executing sw
sw rt,offset(rs)
59Datapath Executing beq
beq r1,r2,offset
60Control Unit
- Lets go through the type of control signals that
need to be generated - An important set of signals if for the ALU
- Our ALU has four control signals
ALU controls Function
0 0 0 0 AND
0 0 0 1 OR
0 0 1 0 add
0 1 1 0 subtract
0 1 1 1 set on less than
1 1 0 0 NOR
61Controlling the ALU
- Depending on the instruction, the ALU will need
to perform on of these five function - For Load/Store the ALU needs to add
- For R-type instructions depends on the 6-bit
function field in the low-order bits of the
instructions (Remember Chapter 2) - For branch the ALU needs to subtract
62Controlling the ALU
- We can generate the 4-bit ALU control using a
small control unit that takes - 2 control bits called ALUOp
- add (00), sub (01), depends (10)
- the instructions function field
- ALU control inputs based on
- 2-bit ALUOp control
- 6-bit function code
63(No Transcript)
64Determining ALU Control Bits
Dont Care
Inst. Opcode ALUop Inst. Operation Func. Field Desired ALU action ALU control input
lw 00 load xxxxxx add 0010
sw 00 store xxxxxx add 0010
beq 01 branch xxxxxx subtract 0110
R-type 10 add 100000 add 0010
R-type 10 subtract 100010 subtract 0110
R-type 10 and 100100 and 0000
R-type 10 or 100101 or 0001
R-type 10 Set on lt 101010 Set on lt 0111
65Design ALU Control Unit
- Designing logic
- Useful to create a truth table for the
interesting combinations of the function code
field and the ALUOp bits - It can be optimized and then turned into gates
66The Three Instruction Classes
- R-type, load and store, and branch formats
- Need to add a multiplexor to select which field
of the instruction is used to indicate the
destination register - 2016 bit position (rt) for load
- 1511 bit position (rd) for R-type instruction
R-type
3126 2521 2016 1511
106 50
Load store
35 or 43
rs
rt
address
3126 2521 2016
150
4 or 5
rs
rt
address
Branch
3126 2521 2016
150
67New Control Signals
- RegDst destination comes from rt vs. rd
- RegWrite register should be written
- ALUSrc ALU operand from register vs.
instruction - PCSrc PC from adder vs. branch target
- MemRead for lw
- MemWrite for store
- MemtoReg register write from ALU vs.
memory
68(No Transcript)
69The Seven Control Signals
Signal Name Effect when deasserted (????????) Effect when asserted (???????)
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite
ALUSrc
PCSrc
MemRead
MemWrite
Mem2Reg
70The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc
PCSrc
MemRead
MemWrite
Mem2Reg
71The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc
MemRead
MemWrite
Mem2Reg
72The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead
MemWrite
Mem2Reg
73The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite
Mem2Reg
74The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite None Data memory contents designated by the address are replaced by the write data input
Mem2Reg
75The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite None Data memory contents designated by the address are replaced by the write data input
Mem2Reg Write to the register. Write data input comes from the ALU Write to the register. Write data input comes from the data memory
76(No Transcript)
77PCSrc cannot be set directly from the opcode
zero test outcome is required
Determining control signals for the MIPS datapath
based on instruction opcode
78Control Signals R-Type Instruction
0
1
0
0
1
0
Control signals shown in blue
0
79Control Signals lw Instruction
0
010
0
0
1
1
1
Control signals shown in blue
1
80Control Signals sw Instruction
0
010
X
1
X
0
1
Control signals shown in blue
0
81Control Signals beq Instruction
110
X
0
X
0
0
Control signals shown in blue
0
82Single-Cycle Design Problems (p.314)
- Assuming fixed-period clock every instruction
datapath uses one clock cycle implies - CPI 1
- Cycle time determined by length of the longest
instruction path (load) - But several instructions could run in a shorter
clock cycle waste of time - Resources used more than once in the same cycle
need to be duplicated - waste of hardware and chip area
83Performance of Single-Cycle
- Memory units 200 ps
- ALU and adder 100ps
- Register file (read/write) 50ps
- multiplexors, control unit, PC accesses, sign
extension, wires no delay - Assume instruction mix as follows
- all loads take same time and comprise 25
- all stores take same time and comprise 10
- R-format instructions comprise 45
- branches comprise 15
- jumps comprise 5
- Compare the performance of
- (a) a single-cycle implementation using a
fixed-period clock with - (b) one using a variable-period clock where each
instruction executes in one clock cycle that is
only as long as it needs to be (not really
practical but pretend its possible!)
84Solution (1/3)
- CPU time Instruction_count x CPI x clock_cycle
- CPU time Instruction_count x clock_cycle
(CPI1) - We need only find the clock cycle time, since
instruction count and CPI are the same for both
implementations
Instruction class Functional units used by the instruction class Functional units used by the instruction class Functional units used by the instruction class Functional units used by the instruction class Functional units used by the instruction class
R-type Inst. fetch Reg. access ALU Reg. access
Load word Inst. fetch Reg. access ALU Memory access Reg. access
Store word Inst. fetch Reg. access ALU Memory access
Branch Inst. fetch Reg. access ALU
Jump Inst. fetch
85Solution (2/3)
Instruction class Inst. Memory Reg. read ALU operation Data memory Reg. write Total
R-type 200 50 100 0 50 400 ps
Load word 200 50 100 200 50 600 ps
Store word 200 50 100 200 550 ps
Branch 200 50 100 0 350 ps
Jump 200 200 ps
- Machine with a single clock for all instruction
- be determined by the longest instruction ? 600 ps
- Machine with a variable clock
- Find average clock cycle length
- 400456002555010350152005 447.5ps
- It is clearly faster
86Solution (3/3)
- Unfortunately, implementing a variable-speed
clock for each instruction class is extremely
difficult - Overhead for such an approach could be larger
than any advantage gained
87Example Practice
- Consider a machine with an additional floating
point unit. Assume functional unit delays as
follows - memory 2 ns., ALU and adders 2 ns., FPU add 8
ns., FPU multiply 16 ns., register file access
(read or write) 1 ns. - multiplexors, control unit, PC accesses, sign
extension, wires no delay - Assume instruction mix as follows
- all loads take same time and comprise 31
- all stores take same time and comprise 21
- R-format instructions comprise 27
- branches comprise 5
- jumps comprise 2
- FP adds and subtracts take the same time and
totally comprise 7 - FP multiplys and divides take the same time and
totally comprise 7 - Compare the performance of (a) a single-cycle
implementation using a fixed-period clock with
(b) one using a variable-period clock where each
instruction executes in one clock cycle that is
only as long as it needs to be (not really
practical but pretend its possible!)
88Solution
Instruction Instr. Register ALU
Data Register FPU FPU Total
class mem. read oper.
mem. write add/ mul/ time
sub
div ns. Load word 2 1
2 2 1 8 Store word 2 1 2
2 7 R-format
2 1 2 0 1
6 Branch 2 1 2
5 Jump 2
2 FP mul/div 2 1
1 16 20 FP
add/sub 2 1 1 8
12
- Clock period for fixed-period clock longest
instruction time 20 ns. - Average clock period for variable-period clock
- 8 ? 31 7 ? 21 6 ? 27 5 ? 5 2 ? 2
20 ? 7 12 ? 7 - 7.0 ns.
- Therefore, performancevar-period
/performancefixed-period 20/7 2.9 -
895.5 Multi-Cycle Implementation
- The design of a multi-cycle implementation
- The idea is to have the functional units and a
set of additional registers - to hold important values in between the cycles of
a single instruction - This way a functional unit can be shared between
cycles of the same instruction - provided some multiplexers are added to decide
where the input should come from - This sharing can help reduce the amount of
hardware required
90Multi-cycle Design
- Major Advantages
- Instructions to take different numbers of clock
cycles - Share functional units within the execution of a
single instruction - Compare with single-cycle version
- Single memory unit is used for both instructions
and data - Single ALU (not ALU and two adders)
- One or more registers are added after every
functional unit to hold the output - Until the value is used in a subsequent clock
cycle
91Multi-cycle Design
- The clock cycle can accommodate at most one of
the following operations - Memory access
- Register file access (two reads or one write)
- ALU operation
- So, data produced by one of these three
functional units must be saved - Into a temporary register for use on a later
cycle
92Temporary Register
- Instruction register (IR)
- Save the output of the memory for an instruction
read - Memory data register (MDR)
- Save the output of the memory for a data read
- A and B registers
- Hold the register operand values read from the
register file - ALUOut register
- Hold the output of the ALU
93 Multi-cycle vs. single-cycle
- single memory for data
- and instructions
- single ALU, no extra adders
- extra registers to
- hold data between
- clock cycles
-
Single-cycle datapath
Multicycle datapath (high-level view)
94Multicycle Datapath
Basic multicycle MIPS datapath handles R-type
instructions and load/stores new internal
register in red ovals, new multiplexors in blue
ovals
95Breaking Instructions into Steps
- Our goal is to break up the instructions into
steps so that - Each step takes one clock cycle
- The amount of work to be done in each step/cycle
is about equal - Each cycle uses at most once each major
functional unit so that such units do not have to
be replicated - Functional units can be shared between different
cycles within one instruction - Data at end of one cycle to be used in next must
be stored !!
96Breaking Instructions into Steps
- For MIPS, we can think of the instruction running
in 5 1-cycle stages - Instruction fetch and PC increment (IF)
- Instruction decode and register fetch (ID)
- Execution, memory address computation, or branch
completion (EX) - Memory access or R-type instruction completion
(MEM) - Memory read completion (WB)
- Each MIPS instruction takes from 3 5 cycles
(steps)
97- For MIPS, we can think of the instruction running
in 5 1-cycle stages - Instruction fetch and PC increment (IF)
- Instruction decode and register fetch (ID)
- Execution, memory address computation, or branch
completion (EX) - Memory access or R-type instruction completion
(MEM) - Memory read completion (WB)
98Step 1 Instruction Fetch PC Increment
(IF)
- IR MemoryPC PC PC 4
- Use PC to get instruction and write the
instruction into instruction register (IR) - Increment the PC by 4 and put the result back in
the PC - The new value of the PC is not visible until the
next clock cycle (stored into ALUOut) - In this step we dont know yet what the
instruction does
99- For MIPS, we can think of the instruction running
in 5 1-cycle stages - Instruction fetch and PC increment (IF)
- Instruction decode and register fetch (ID)
- Execution, memory address computation, or branch
completion (EX) - Memory access or R-type instruction completion
(MEM) - Memory read completion (WB)
100Step 2 Instruction Decode and
Register Fetch (ID)
- Read registers rs and rt in case we need them
- Read them from the register file and store the
values into the temporary register A and B - Compute the branch address with the ALU and save
it in a temporary register - A RegIR25-21B RegIR20-16ALUOut
PC(sign-extend(IR15-0) ltlt 2)
101- For MIPS, we can think of the instruction running
in 5 1-cycle stages - Instruction fetch and PC increment (IF)
- Instruction decode and register fetch (ID)
- Execution, memory address computation, or branch
completion (EX) - Memory access or R-type instruction completion
(MEM) - Memory read completion (WB)
102Step 3 Execution, Address Computation or Branch
Completion (EX)
- Action to be taken depending on the instruction
class - Memory reference (lw and sw, rsoffset)
- ALUOut A sign-extend(IR15-0)
- Arithmetic-logical instruction (R-type)
- ALUOut A op B
- Branch (A-B ? 0)
- if (AB) PC ALUOut
- Jump
- PC PC31-28 (IR(25-0) ltlt 2)
103- For MIPS, we can think of the instruction running
in 5 1-cycle stages - Instruction fetch and PC increment (IF)
- Instruction decode and register fetch (ID)
- Execution, memory address computation, or branch
completion (EX) - Memory access or R-type instruction completion
(MEM) - Memory read completion (WB)
104Step 4 Memory access or R-type Instruction
Completion (MEM)
- Load or Store instruction accesses memory and an
arithmetic-logical instruction writes its result - If the instruction is a load
- Value is retrieved from memory, it is stored into
the memory data register (MDR) - If the instruction is a store
- Data is written to memory
- If the instruction is a R-type instruction
- Place the result from the ALU into a temporary
register (ALUOut), write to rd
105- For MIPS, we can think of the instruction running
in 5 1-cycle stages - Instruction fetch and PC increment (IF)
- Instruction decode and register fetch (ID)
- Execution, memory address computation, or branch
completion (EX) - Memory access or R-type instruction completion
(MEM) - Memory read completion (WB)
106Step 5 Memory Read Completion (WB)
- Loads complete by writing back the value from
memory - Write the load data, which was stored into MDR
- Write back into the register rt
- RegIR20-16 MDR
107Summary of Instruction Execution
Step
1 IF
2 ID
3 EX
4 MEM
5 WB
108The schematic view
IF
ID
Mem
WB
uses the memory
uses the register file
uses the register file
uses the memory
uses the ALU
Very important to remember the content of this
slide
109Multicycle Execution Step (1)Instruction Fetch
PC 4
110Multicycle Execution Step (2)Instruction Decode
Register Fetch
- A RegIR25-21 (A Regrs)
- B RegIR20-15 (B Regrt)
- ALUOut (PC sign-extend(IR15-0) ltlt 2)
Branch Target Address
111Multicycle Execution Step (3)Memory Reference
Instructions
- ALUOut A sign-extend(IR15-0)
112Multicycle Execution Step (3)ALU Instruction
(R-Type)
113Multicycle Execution Step (3)Branch Instructions
Branch Target Address
114Multicycle Execution Step (3)Jump Instruction
- PC PC31-28 concat (IR25-0 ltlt 2)
Jump Address
115Multicycle Execution Step (4)Memory Access -
Read (lw)
Mem. Data
116Multicycle Execution Step (4)Memory Access -
Write (sw)
117Multicycle Execution Step (4)ALU Instruction
(R-Type)
118Multicycle Execution Step (5)Memory Read
Completion (lw)
119Multicycle Datapath with Control I
with control lines and the ALU control block
added not all control lines are shown
120Multicycle Datapath with Control II
New gates
New multiplexor
For the jump address
Complete multicycle MIPS datapath (with branch
and jump capability) and showing the main control
block and all control lines
121Action of the Control Signals
- Action of the 1-bit control signals
- RegDst, RegWrite
- ALUSrcA
- MemRead, MemWrite, MemtoRe
- IorD
- IRWrite
- PCWrite, PCWriteCond
- Action of the 2-bit control signals
- ALUOp
- ALUSrcB
- PCSource
122Multicycle Control Step (1) Fetch
1
1
0
0
0
X
010
0
X
1
0
1
123Multicycle Control Step (2)Instruction Decode
Register Fetch
- A RegIR25-21 (A Regrs)
- B RegIR20-15 (B Regrt)
- ALUOut (PC sign-extend(IR15-0) ltlt 2)
0
0
X
0
0
X
010
X
X
0
0
3
124Multicycle Control Step (3)Memory Reference
Instructions
- ALUOut A sign-extend(IR15-0)
0
0
X
1
0
X
010
X
X
0
0
2
125Multicycle Control Step (3)ALU Instruction
(R-Type)
0
0
X
1
0
X
???
X
X
0
0
0
126Multicycle Control Step (3)Branch Instructions
0
1 if Zero1
X
1
0
X
011
1
X
0
0
0
127Multicycle Execution Step (3)Jump Instruction
- PC PC21-28 concat (IR25-0 ltlt 2)
0
1
X
X
0
X
XXX
2
X
0
0
X
128Multicycle Control Step (4)Memory Access - Read
(lw)
0
0
1
X
0
X
XXX
X
X
1
0
X
129Multicycle Execution Steps (4)Memory Access -
Write (sw)
0
0
1
X
1
X
XXX
X
X
0
0
X
130Multicycle Control Step (4)ALU Instruction
(R-Type)
- RegIR1511 ALUOut (RegRd ALUOut)
0
IRWrite
I
28
32
0
I
Instruction
jmpaddr
R
ltlt2
CONCAT
I250
5
PCWr
rd
rt
rs
X
X
RegDst
0
32
5
5
XXX
1
IorD
ALUSrcA
5
PC
Operation
RN1
RN2
WN
3
MemWrite
M
ADDR
PCSource
Registers
Zero
X
D
A
RD1
Memory
ALU
WD
R
RD
ALU
OUT
B
RD2
WD
4
MemRead
MemtoReg
1
RegWrite
0
1
ALUSrcB
16
32
immediate
X
ltlt2
131Multicycle Execution Steps (5)Memory Read
Completion (lw)
0
IRWrite
I
0
28
32
I
Instruction
jmpaddr
R
ltlt2
CONCAT
I250
5
PCWr
X
rd
rt
rs
X
0
RegDst
32
0
XXX
5
5
IorD
ALUSrcA
5
PC
Operation
RN1
RN2
WN
3
MemWrite
M
ADDR
PCSource
Registers
X
Zero
D
A
RD1
Memory
ALU
WD
R
RD
ALU
OUT
B
RD2
WD
4
MemRead
MemtoReg
0
0
RegWrite
1
ALUSrcB
X
16
32
immediate
ltlt2
132CPI in a Multicycle CPU
- What is the CPI assuming each step requires 1
clock cycle? - An instruction mix of 25 loads, 10 stores, 11
branches, 2 jumps, and 52 ALU - Solution
- Number of clock cycles from previous slide for
each instruction class - loads 5, stores 4, ALU 4, branches 3, jumps 3
- CPI CPU clock cycles / instruction count
- ? (instruction countclass i ?
CPIclass i) / instruction count - ? (instruction countclass I /
instruction count) ? CPIclass I - 0.25 ? 5 0.10 ? 4 0.52 ? 4
0.11 ? 3 0.02 ? 3 - 4.12
- Better than the worst-case CPI of 5.0
133Conclusion
- If instructions take different amounts of time,
multi-cycle is better - We havent dived into the gory details of
implementing a multi-cycle processors - What weve talked covers Sections 5.1, 5.2, 5.3,
5.4, and a small subset of Section 5.5 - This is all you need to read in the book
- Dont worry about most of the stuff in Section
5.5 - We are now ready to talk about our big topic
Pipelining
134 135- Chapter 5 Datapath and Control (?????????)
- Single-Cycle Implementation v.s. Multi-Cycle
Implementation - MIPS Instruction types and formats
- What is Datapath? What are the datapath elements
of MIPS? - What are the five steps of MIPS datapath?
- Control unit design
- What are the two kinds of control unit design?
Describe their implementations and compare them. - Exception and Interrupt
- Definitions
- Operations
136Example
- Assume the base address of word array A is
stored in the register s0. The following code is
used for the calculation - A2 A0 A1 .
- Highlight the running path of the following
instructions in blue in the simple datapath
and mark the control signal. Assume the first
instruction is stored in the address of 0040
1000hex . - lw t0, 0(s0)
- lw t1, 4(s0)
- add t1, t1, t0
- slt t0, t1, zero
- beq t0, zero, Label
- sub t1, zero, t1
- sw t1, 8(s0)
- j Exit
- Label sw t1, 8(s0)
- Exit
137The Simple Datapath with Controls
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
138LW t0, 0/4(s0)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
139The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
140add t1, t1, t0 / slt t0, t1,
zero / sub t1, zero, t1
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
141The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type 1 0 0 1 0 0 0 1 0
142beq t0, zero, Label (the case t0 zero)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
143beq t0, zero, Label (the case t0 ! zero)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
144The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type 1 0 0 1 0 0 0 1 0
beq x 0 x 0 0 0 1 0 1
145sw t1, 8(s0)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
146The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type 1 0 0 1 0 0 0 1 0
beq x 0 x 0 0 0 1 0 1
sw x 1 x 0 0 1 0 0 0