Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control - PowerPoint PPT Presentation

1 / 146
About This Presentation
Title:

Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control

Description:

Chapter 5: The Processor: Datapath and Control Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan sscc6991_at_gmail.com – PowerPoint PPT presentation

Number of Views:632
Avg rating:3.0/5.0
Slides: 147
Provided by: edut1550
Category:

less

Transcript and Presenter's Notes

Title: Computer Organization and Architecture Chapter 5: The Processor: Datapath and Control


1
Computer Organization and ArchitectureChapter
5 The Processor Datapath and Control
  • Yu-Lun Kuo
  • Computer Sciences and Information Engineering
  • University of Tunghai, Taiwan
  • sscc6991_at_gmail.com

2
5.1 Introduction
  • The performance of a machine
  • Instruction count
  • Clock cycle time
  • Clock cycles per instruction (CPI)
  • The compiler and the instruction set architecture
  • Determine the instruction count required for a
    given instruction

3
5.1 Introduction
  • Both the clock cycle time and the number of CPI
  • Determined by the implementation of the processor
  • We construct the datapath and control unit for
    two different implementations of the MIPS
    instruction set
  • Single cycle implementation
  • Multi cycle implementation

4
5.1 Introduction
  • We are going to see how the processor is
    implemented
  • starting with a very simple processor, and adding
    some more complexity

5
Basic MPIS Implementation
  • Include a subset of the MIPS instruction
  • Memory-reference instructions lw and sw
  • The ALU instructions add, sub, and, or, slt
  • Control flow instructions beq and j
  • Generic Implementation
  • Use the program counter (PC) to supply
    instruction address
  • Fetch the instruction from memory
  • Read one/two registers
  • Use the instruction to decide exactly what to do

6
Basic MPIS Implementation
  • All instructions use the ALU after reading the
    registers (except jump)
  • Memory-reference instructions use ALU for address
    calculation
  • Arithmetic-logical instructions for the operation
    execution
  • Branches for comparison

7
Our Processor, sort of
  • Whats missing
  • How to combine input that are joined together
  • How to tell which component what to do?

8
(No Transcript)
9
Multiplexers and Controllers
  • In the previous figure we have two or more
    wires going into the input of a component
  • This is because depending on the instruction
    being executed different input should be provided
  • So, based on the instruction, we need to decide
    which input should be selected
  • This is done with a multiplexer (???)

M U X
input 1
. . .
selected output
input n
control ceil(log2(n)) bits
10
What about the Control?
  • So great, now we can control multiplexers
  • Need a controller sends the appropriate control
    bits to all the multiplexers and the components
  • Besides, there are other things to control
  • Example the ALU has a bunch of control bits,
    that tells it what to do

00 ADD 01 SUB 10 MUL 11 SHIFT
2-bit control
11
Control Unit (Simplified)
. . . offset
0 or 1
M U X
PC
input 1
input 0
Add
4
12
A More Complete Picture
13
(No Transcript)
14
5.2 Logic Design Conventions
  • The functional units (????) in the MIPS
    implementation consist of two different types of
    logic elements
  • Elements that operate on data values
    (combinational)
  • Outputs depend only on the current inputs
  • Always produces the same output
  • It has no internal storage
  • Elements that contain state (sequential)
  • Has at least two inputs and one output
  • Data value to be written into the element
  • Clock determine when the data value is written
  • The value that was written in a previous clock
    cycle

15
Clocking Methodology
  • Clocking methodology
  • When signals can be read and when they can be
    written
  • If a signal is written at the same time it is
    read. Computer designs cannot tolerate such
    unpredictability
  • The clock cycle/period is divided into two
    portions
  • high clock
  • low clock

falling edge
rising edge
clock cycle
16
Edge-triggered Clocking
  • Edge-triggered clocking (????)
  • meaning that state changes (in state elements)
    occur only at a clock edge
  • Using either the rising edge or the falling edge
  • Typical execution
  • Read contents of some state elements
  • Send values through some combinational logic
  • Write results to one or more state elements

Combinational logic
State element 1
State element 2
Clock cycle
17
The Clock
state element 1
state element 2
combinatorial circuit
stable
updated on edge
stable by edge
clock cycle
  • In the above, we want to use the value in state
    element 1 to modify the value in state element
    2 It takes one cycle
  • We need all signals to be stabilized

18
(No Transcript)
19
Read/Write in a Clock Cycle
  • A great implication of edge-triggered clocking
  • A state element can be read and written in the
    same clock cycle
  • We will say things like reads happen in the
    first half of the clock cycle, writes happen in
    the second half

state element 1
state element 2
combinatorial circuit
stable
updated on edge
stable by edge
20
(No Transcript)
21
Write Control Signal (p.291)
  • Both the clock signal and the write control
    signal are inputs
  • The state element is changed only when
  • The write control signal is asserted
  • Clock edge occurs
  • Assuming a rising edge update
  • While the control bit stays at 0, nothing happen
  • If we set the control bit to 1, the state element
    will be updated at the next rising edge

22
Busses and bus width
  • Many of the state elements and combinational
    elements take multi-bit inputs (often 32-bit
    inputs)
  • The term bus refers to a wire that carries more
    than one bit
  • multiple 1-bit wires, really
  • We simply indicate the width of the busses as
    follows

16
control signal
8
23
Building a Datapath
  • A datapath is an element in the processor that is
    supposed to operate on or hold data
  • instruction memory, data memory, register file,
    ALU, adders
  • Lets re-examine the datapath elements we only
    barely introduced earlier

24
Building a Datapath
  • Start by looking at which datapath elememts each
    instruction needs
  • Also show their control signals
  • Program Counter (PC) (?????)
  • (Register) Memory unit to store the instructions
    of a program and supply instructions given an
    address
  • 32 bits register that will written at the end of
    every clock cycle (not need a write control
    signal)
  • Adder (???)
  • Increment the PC to the address of the next
    instruction
  • Combinational. Built from the ALU

25
The Three Elements
  • Two state element are needed to store and access
    instructions
  • The instruction memory only provide read
  • Output at any time reflects the contents of the
    location specified by the address input
  • An adder is needed to compute the next
    instruction address (4 Bytes)
  • ALU wired to always perform an add

26
Fetching Instructions
read address, instruction retrieved from
instruction memory
32
PC
4
32
PC 4 latched into PC
read address
Instruction
32
Instruction Memory
The PC gets updated in 1 clock cycle because we
use edge-triggered clocking
27
Register File
  • The processors 32 general-purpose registers
  • Stored in a structure called register file
  • Register file
  • Collection of registers in which any register can
    be read or written by specifying the number of
    the register in the file

Clock
5 bits
32 bits
5 bits
5 bits
32 bits
32 bits
Control signal
28
Datapath Instruction Store/Fetch PC Increment

Three elements used to store and fetch
instructions and increment the PC
Datapath
29
Animating the Datapath
Instruction lt- MEMPC PC lt- PC 4
30
What about R-type instructions
  • These instructions take 3 registers as arguments
  • 1 output register
  • 2 input registers
  • Example add t1, t2, t3
  • Which reads t2 and t3 and writes t1
  • We need an input that contains data to be written
    into the output register
  • Typically comes from the ALU
  • We need a Write signal to trigger the register
    write on the next clock edge
  • A write anytime during the clock cycle could lead
    to race conditions if that register is also read

31
Datapath R-Type Instruction
Two elements used to implement R-type instructions
Datapath
32
Register File and ALU
Extracted from the 32-bit instruction code
5
Read register 1
Read data 1
5
Register number
Read register 2
32
zero
32
5
Write register
32
32
Read data 2
Operation
4
32
Write data
32
Register File
RegWrite
33
Add t1, t1, t2 (sketch)
i n s t r u c t i o n
5
Read register 1
t1
Read data 1
5
Read register 2
t2
32
5
Write register
t1
zero
Read data 2
Operation
4
32
Write data
32
Register File
RegWrite (must be set only at the next edge)
34
Animating the Datapath (R-type)
add rd, rs, rt
Rrd lt- Rrs Rrt
35
What about the Load/Store
  • Ex. lw t1, offset(t2)
  • The memory _at_ is computed by adding the 16-bit
    signed offset to the input register
  • The offset of 16-bit, but memory addresses are
    32-bit
  • Therefore, the offset must be sign-extended into
    a 32-bit value before being added to the input
    register
  • The memory has both read and write control
  • MemWrite control signal
  • MemRead control signal

36
Datapath Load/Store Instruction
Two additional elements used To implement
load/stores
Datapath
37
Implementing Load/Store
MemWrite
Read data
Address
sign extend
32
32
32
16
Write data
32
Data Memory
Sign-extension Unit
MemRead
Data Memory Unit
38
(No Transcript)
39
Implementing lw s1,offset(s2)
5
Read register 1
5
Read data 1
Read register 2
32
5
Write register
32
Read data 2
Write data
32
Register File
i n s t r u c t i o n
MemWrite (not set)
RegWrite (set on next edge)
s1
Read data
Address
32
32
s2
Write data
offset
32
32
Data Memory
MemRead (set)
40
Animating the Datapath (Load)
lw rt, offset(rs)
Rrt lt- MEMRrss_extend(offset)
41
Animating the Datapath (Store)
sw rt, offset(rs)
MEMRrssign_extend(offset) lt- Rrt
42
What about the Branch (beq)
  • 2 registers that are compared
  • To do a branch we must
  • Compute the branchs target address based on its
    offset
  • Decide whether the branch is taken or not taken
  • Taken branch target address becomes the new PC
  • PC (PC4)4(target field)
  • Not taken if the operands are not equal,
  • PCPC4 as usual

43
Branch Datapath
No shift hardware required simply connect wires
from input to output, each shifted left 2 bits
Datapath
44
(No Transcript)
45
Animating the Datapath (branch)
beq rs, rt, offset
if (Rrs Rrt) then PC lt- PC4
s_extend(offsetltlt2)
46
Putting it altogether
  • The simplest design is one in which
  • all instructions are executed in a single clock
    cycle
  • In this case, every element of the datapath is
    used only once per clock cycle
  • No duplication of hardware needed
  • Or only of a few adders perhaps here and there
  • And we need separate Data and Instruction
    memories
  • Lets at first put together the pieces for the
    R-type (ALU) instructions and the memory
    instructions as they are quite similar.

47
Altogether (not quite)
Combining the datapaths for R-type instructions
and load/stores using two multiplexors
We simply add multiplexer (???) for choosing
between the datapath for the ALU instructions
and the memory instructions
48
(No Transcript)
49
Animating the Datapath R-type Instruction
add rd,rs,rt
50
Animating the Datapath Load Instruction
lw rt,offset(rs)
51
Animating the Datapath Store Instruction
sw rt,offset(rs)
52
Separate adder as ALU operations and PC increment
occur in the same clock cycle
Separate instruction memory as instruction and
data read occur in the same clock cycle
Adding instruction fetch
53
(No Transcript)
54
Complete Altogether
New multiplexor
Extra adder needed as both adders operate in each
cycle
Instruction address is either PC4 or branch
target address
Adding branch capability and
another multiplexor
Important note in a single-cycle implementation
data cannot be stored during an instruction it
only moves through combinational logic Question
is the MemRead signal really needed?! Think of
RegWrite!
55
5.4 What now?
  • At this point weve identified most of the
    component for an almost full datapath for a very
    simple implementation of the MIPS ISA
  • Let us now design the logic that makes it all
    work
  • i.e., how we set the control signals

56
Datapath Executing add
add rd, rs, rt
57
Datapath Executing lw
lw rt,offset(rs)
58
Datapath Executing sw
sw rt,offset(rs)
59
Datapath Executing beq
beq r1,r2,offset
60
Control Unit
  • Lets go through the type of control signals that
    need to be generated
  • An important set of signals if for the ALU
  • Our ALU has four control signals

ALU controls Function
0 0 0 0 AND
0 0 0 1 OR
0 0 1 0 add
0 1 1 0 subtract
0 1 1 1 set on less than
1 1 0 0 NOR
61
Controlling the ALU
  • Depending on the instruction, the ALU will need
    to perform on of these five function
  • For Load/Store the ALU needs to add
  • For R-type instructions depends on the 6-bit
    function field in the low-order bits of the
    instructions (Remember Chapter 2)
  • For branch the ALU needs to subtract

62
Controlling the ALU
  • We can generate the 4-bit ALU control using a
    small control unit that takes
  • 2 control bits called ALUOp
  • add (00), sub (01), depends (10)
  • the instructions function field
  • ALU control inputs based on
  • 2-bit ALUOp control
  • 6-bit function code

63
(No Transcript)
64
Determining ALU Control Bits
Dont Care
Inst. Opcode ALUop Inst. Operation Func. Field Desired ALU action ALU control input
lw 00 load xxxxxx add 0010
sw 00 store xxxxxx add 0010
beq 01 branch xxxxxx subtract 0110
R-type 10 add 100000 add 0010
R-type 10 subtract 100010 subtract 0110
R-type 10 and 100100 and 0000
R-type 10 or 100101 or 0001
R-type 10 Set on lt 101010 Set on lt 0111
65
Design ALU Control Unit
  • Designing logic
  • Useful to create a truth table for the
    interesting combinations of the function code
    field and the ALUOp bits
  • It can be optimized and then turned into gates

66
The Three Instruction Classes
  • R-type, load and store, and branch formats
  • Need to add a multiplexor to select which field
    of the instruction is used to indicate the
    destination register
  • 2016 bit position (rt) for load
  • 1511 bit position (rd) for R-type instruction

R-type
3126 2521 2016 1511
106 50
Load store
35 or 43
rs
rt
address
3126 2521 2016
150
4 or 5
rs
rt
address
Branch
3126 2521 2016
150
67
New Control Signals
  • RegDst destination comes from rt vs. rd
  • RegWrite register should be written
  • ALUSrc ALU operand from register vs.
    instruction
  • PCSrc PC from adder vs. branch target
  • MemRead for lw
  • MemWrite for store
  • MemtoReg register write from ALU vs.
    memory

68
(No Transcript)
69
The Seven Control Signals
Signal Name Effect when deasserted (????????) Effect when asserted (???????)
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite
ALUSrc
PCSrc
MemRead
MemWrite
Mem2Reg
70
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc
PCSrc
MemRead
MemWrite
Mem2Reg
71
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc
MemRead
MemWrite
Mem2Reg
72
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead
MemWrite
Mem2Reg
73
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite
Mem2Reg
74
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite None Data memory contents designated by the address are replaced by the write data input
Mem2Reg
75
The Seven Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number comes from rt field (2016) The register destination number comes from rd field (1511)
RegWrite None The write register is written with the value on the write data input
ALUSrc The second ALU operand comes from Read data 2 The second ALU operand is the sign-extended, lower 16 bits
PCSrc The PC is replaced by the output of the adder, PC4 The PC is replaced by the output of the adder, the branch target
MemRead None Data memory contents designated by the address are put on the Read data output
MemWrite None Data memory contents designated by the address are replaced by the write data input
Mem2Reg Write to the register. Write data input comes from the ALU Write to the register. Write data input comes from the data memory
76
(No Transcript)
77
PCSrc cannot be set directly from the opcode
zero test outcome is required
Determining control signals for the MIPS datapath
based on instruction opcode
78
Control Signals R-Type Instruction
0
1
0
0
1
0
Control signals shown in blue
0
79
Control Signals lw Instruction
0
010
0
0
1
1
1
Control signals shown in blue
1
80
Control Signals sw Instruction
0
010
X
1
X
0
1
Control signals shown in blue
0
81
Control Signals beq Instruction
110
X
0
X
0
0
Control signals shown in blue
0
82
Single-Cycle Design Problems (p.314)
  • Assuming fixed-period clock every instruction
    datapath uses one clock cycle implies
  • CPI 1
  • Cycle time determined by length of the longest
    instruction path (load)
  • But several instructions could run in a shorter
    clock cycle waste of time
  • Resources used more than once in the same cycle
    need to be duplicated
  • waste of hardware and chip area

83
Performance of Single-Cycle
  • Memory units 200 ps
  • ALU and adder 100ps
  • Register file (read/write) 50ps
  • multiplexors, control unit, PC accesses, sign
    extension, wires no delay
  • Assume instruction mix as follows
  • all loads take same time and comprise 25
  • all stores take same time and comprise 10
  • R-format instructions comprise 45
  • branches comprise 15
  • jumps comprise 5
  • Compare the performance of
  • (a) a single-cycle implementation using a
    fixed-period clock with
  • (b) one using a variable-period clock where each
    instruction executes in one clock cycle that is
    only as long as it needs to be (not really
    practical but pretend its possible!)

84
Solution (1/3)
  • CPU time Instruction_count x CPI x clock_cycle
  • CPU time Instruction_count x clock_cycle
    (CPI1)
  • We need only find the clock cycle time, since
    instruction count and CPI are the same for both
    implementations

Instruction class Functional units used by the instruction class Functional units used by the instruction class Functional units used by the instruction class Functional units used by the instruction class Functional units used by the instruction class
R-type Inst. fetch Reg. access ALU Reg. access
Load word Inst. fetch Reg. access ALU Memory access Reg. access
Store word Inst. fetch Reg. access ALU Memory access
Branch Inst. fetch Reg. access ALU
Jump Inst. fetch
85
Solution (2/3)
Instruction class Inst. Memory Reg. read ALU operation Data memory Reg. write Total
R-type 200 50 100 0 50 400 ps
Load word 200 50 100 200 50 600 ps
Store word 200 50 100 200 550 ps
Branch 200 50 100 0 350 ps
Jump 200 200 ps
  • Machine with a single clock for all instruction
  • be determined by the longest instruction ? 600 ps
  • Machine with a variable clock
  • Find average clock cycle length
  • 400456002555010350152005 447.5ps
  • It is clearly faster

86
Solution (3/3)
  • Unfortunately, implementing a variable-speed
    clock for each instruction class is extremely
    difficult
  • Overhead for such an approach could be larger
    than any advantage gained

87
Example Practice
  • Consider a machine with an additional floating
    point unit. Assume functional unit delays as
    follows
  • memory 2 ns., ALU and adders 2 ns., FPU add 8
    ns., FPU multiply 16 ns., register file access
    (read or write) 1 ns.
  • multiplexors, control unit, PC accesses, sign
    extension, wires no delay
  • Assume instruction mix as follows
  • all loads take same time and comprise 31
  • all stores take same time and comprise 21
  • R-format instructions comprise 27
  • branches comprise 5
  • jumps comprise 2
  • FP adds and subtracts take the same time and
    totally comprise 7
  • FP multiplys and divides take the same time and
    totally comprise 7
  • Compare the performance of (a) a single-cycle
    implementation using a fixed-period clock with
    (b) one using a variable-period clock where each
    instruction executes in one clock cycle that is
    only as long as it needs to be (not really
    practical but pretend its possible!)

88
Solution
Instruction Instr. Register ALU
Data Register FPU FPU Total
class mem. read oper.
mem. write add/ mul/ time

sub
div ns. Load word 2 1
2 2 1 8 Store word 2 1 2
2 7 R-format
2 1 2 0 1
6 Branch 2 1 2
5 Jump 2
2 FP mul/div 2 1
1 16 20 FP
add/sub 2 1 1 8
12
  • Clock period for fixed-period clock longest
    instruction time 20 ns.
  • Average clock period for variable-period clock
  • 8 ? 31 7 ? 21 6 ? 27 5 ? 5 2 ? 2
    20 ? 7 12 ? 7
  • 7.0 ns.
  • Therefore, performancevar-period
    /performancefixed-period 20/7 2.9

89
5.5 Multi-Cycle Implementation
  • The design of a multi-cycle implementation
  • The idea is to have the functional units and a
    set of additional registers
  • to hold important values in between the cycles of
    a single instruction
  • This way a functional unit can be shared between
    cycles of the same instruction
  • provided some multiplexers are added to decide
    where the input should come from
  • This sharing can help reduce the amount of
    hardware required

90
Multi-cycle Design
  • Major Advantages
  • Instructions to take different numbers of clock
    cycles
  • Share functional units within the execution of a
    single instruction
  • Compare with single-cycle version
  • Single memory unit is used for both instructions
    and data
  • Single ALU (not ALU and two adders)
  • One or more registers are added after every
    functional unit to hold the output
  • Until the value is used in a subsequent clock
    cycle

91
Multi-cycle Design
  • The clock cycle can accommodate at most one of
    the following operations
  • Memory access
  • Register file access (two reads or one write)
  • ALU operation
  • So, data produced by one of these three
    functional units must be saved
  • Into a temporary register for use on a later
    cycle

92
Temporary Register
  • Instruction register (IR)
  • Save the output of the memory for an instruction
    read
  • Memory data register (MDR)
  • Save the output of the memory for a data read
  • A and B registers
  • Hold the register operand values read from the
    register file
  • ALUOut register
  • Hold the output of the ALU

93
Multi-cycle vs. single-cycle
  • single memory for data
  • and instructions
  • single ALU, no extra adders
  • extra registers to
  • hold data between
  • clock cycles

Single-cycle datapath
Multicycle datapath (high-level view)
94
Multicycle Datapath
Basic multicycle MIPS datapath handles R-type
instructions and load/stores new internal
register in red ovals, new multiplexors in blue
ovals
95
Breaking Instructions into Steps
  • Our goal is to break up the instructions into
    steps so that
  • Each step takes one clock cycle
  • The amount of work to be done in each step/cycle
    is about equal
  • Each cycle uses at most once each major
    functional unit so that such units do not have to
    be replicated
  • Functional units can be shared between different
    cycles within one instruction
  • Data at end of one cycle to be used in next must
    be stored !!

96
Breaking Instructions into Steps
  • For MIPS, we can think of the instruction running
    in 5 1-cycle stages
  • Instruction fetch and PC increment (IF)
  • Instruction decode and register fetch (ID)
  • Execution, memory address computation, or branch
    completion (EX)
  • Memory access or R-type instruction completion
    (MEM)
  • Memory read completion (WB)
  • Each MIPS instruction takes from 3 5 cycles
    (steps)

97
  • For MIPS, we can think of the instruction running
    in 5 1-cycle stages
  • Instruction fetch and PC increment (IF)
  • Instruction decode and register fetch (ID)
  • Execution, memory address computation, or branch
    completion (EX)
  • Memory access or R-type instruction completion
    (MEM)
  • Memory read completion (WB)

98
Step 1 Instruction Fetch PC Increment
(IF)
  • IR MemoryPC PC PC 4
  • Use PC to get instruction and write the
    instruction into instruction register (IR)
  • Increment the PC by 4 and put the result back in
    the PC
  • The new value of the PC is not visible until the
    next clock cycle (stored into ALUOut)
  • In this step we dont know yet what the
    instruction does

99
  • For MIPS, we can think of the instruction running
    in 5 1-cycle stages
  • Instruction fetch and PC increment (IF)
  • Instruction decode and register fetch (ID)
  • Execution, memory address computation, or branch
    completion (EX)
  • Memory access or R-type instruction completion
    (MEM)
  • Memory read completion (WB)

100
Step 2 Instruction Decode and
Register Fetch (ID)
  • Read registers rs and rt in case we need them
  • Read them from the register file and store the
    values into the temporary register A and B
  • Compute the branch address with the ALU and save
    it in a temporary register
  • A RegIR25-21B RegIR20-16ALUOut
    PC(sign-extend(IR15-0) ltlt 2)

101
  • For MIPS, we can think of the instruction running
    in 5 1-cycle stages
  • Instruction fetch and PC increment (IF)
  • Instruction decode and register fetch (ID)
  • Execution, memory address computation, or branch
    completion (EX)
  • Memory access or R-type instruction completion
    (MEM)
  • Memory read completion (WB)

102
Step 3 Execution, Address Computation or Branch
Completion (EX)
  • Action to be taken depending on the instruction
    class
  • Memory reference (lw and sw, rsoffset)
  • ALUOut A sign-extend(IR15-0)
  • Arithmetic-logical instruction (R-type)
  • ALUOut A op B
  • Branch (A-B ? 0)
  • if (AB) PC ALUOut
  • Jump
  • PC PC31-28 (IR(25-0) ltlt 2)

103
  • For MIPS, we can think of the instruction running
    in 5 1-cycle stages
  • Instruction fetch and PC increment (IF)
  • Instruction decode and register fetch (ID)
  • Execution, memory address computation, or branch
    completion (EX)
  • Memory access or R-type instruction completion
    (MEM)
  • Memory read completion (WB)

104
Step 4 Memory access or R-type Instruction
Completion (MEM)
  • Load or Store instruction accesses memory and an
    arithmetic-logical instruction writes its result
  • If the instruction is a load
  • Value is retrieved from memory, it is stored into
    the memory data register (MDR)
  • If the instruction is a store
  • Data is written to memory
  • If the instruction is a R-type instruction
  • Place the result from the ALU into a temporary
    register (ALUOut), write to rd

105
  • For MIPS, we can think of the instruction running
    in 5 1-cycle stages
  • Instruction fetch and PC increment (IF)
  • Instruction decode and register fetch (ID)
  • Execution, memory address computation, or branch
    completion (EX)
  • Memory access or R-type instruction completion
    (MEM)
  • Memory read completion (WB)

106
Step 5 Memory Read Completion (WB)
  • Loads complete by writing back the value from
    memory
  • Write the load data, which was stored into MDR
  • Write back into the register rt
  • RegIR20-16 MDR

107
Summary of Instruction Execution
Step
1 IF
2 ID
3 EX
4 MEM
5 WB
108
The schematic view
IF
ID
Mem
WB
uses the memory
uses the register file
uses the register file
uses the memory
uses the ALU
Very important to remember the content of this
slide
109
Multicycle Execution Step (1)Instruction Fetch
  • IR MemoryPC
  • PC PC 4

PC 4
110
Multicycle Execution Step (2)Instruction Decode
Register Fetch
  • A RegIR25-21 (A Regrs)
  • B RegIR20-15 (B Regrt)
  • ALUOut (PC sign-extend(IR15-0) ltlt 2)

Branch Target Address
111
Multicycle Execution Step (3)Memory Reference
Instructions
  • ALUOut A sign-extend(IR15-0)

112
Multicycle Execution Step (3)ALU Instruction
(R-Type)
  • ALUOut A op B

113
Multicycle Execution Step (3)Branch Instructions
  • if (A B) PC ALUOut

Branch Target Address
114
Multicycle Execution Step (3)Jump Instruction
  • PC PC31-28 concat (IR25-0 ltlt 2)

Jump Address
115
Multicycle Execution Step (4)Memory Access -
Read (lw)
  • MDR MemoryALUOut

Mem. Data
116
Multicycle Execution Step (4)Memory Access -
Write (sw)
  • MemoryALUOut B

117
Multicycle Execution Step (4)ALU Instruction
(R-Type)
  • RegIR1511 ALUOUT

118
Multicycle Execution Step (5)Memory Read
Completion (lw)
  • RegIR20-16 MDR

119
Multicycle Datapath with Control I
with control lines and the ALU control block
added not all control lines are shown
120
Multicycle Datapath with Control II
New gates
New multiplexor
For the jump address
Complete multicycle MIPS datapath (with branch
and jump capability) and showing the main control
block and all control lines
121
Action of the Control Signals
  • Action of the 1-bit control signals
  • RegDst, RegWrite
  • ALUSrcA
  • MemRead, MemWrite, MemtoRe
  • IorD
  • IRWrite
  • PCWrite, PCWriteCond
  • Action of the 2-bit control signals
  • ALUOp
  • ALUSrcB
  • PCSource

122
Multicycle Control Step (1) Fetch
  • IR MemoryPC
  • PC PC 4

1
1
0
0
0
X
010
0
X
1
0
1
123
Multicycle Control Step (2)Instruction Decode
Register Fetch
  • A RegIR25-21 (A Regrs)
  • B RegIR20-15 (B Regrt)
  • ALUOut (PC sign-extend(IR15-0) ltlt 2)

0
0
X
0
0
X
010
X
X
0
0
3
124
Multicycle Control Step (3)Memory Reference
Instructions
  • ALUOut A sign-extend(IR15-0)

0
0
X
1
0
X
010
X
X
0
0
2
125
Multicycle Control Step (3)ALU Instruction
(R-Type)
  • ALUOut A op B

0
0
X
1
0
X
???
X
X
0
0
0
126
Multicycle Control Step (3)Branch Instructions
  • if (A B) PC ALUOut

0
1 if Zero1
X
1
0
X
011
1
X
0
0
0
127
Multicycle Execution Step (3)Jump Instruction
  • PC PC21-28 concat (IR25-0 ltlt 2)

0
1
X
X
0
X
XXX
2
X
0
0
X
128
Multicycle Control Step (4)Memory Access - Read
(lw)
  • MDR MemoryALUOut

0
0
1
X
0
X
XXX
X
X
1
0
X
129
Multicycle Execution Steps (4)Memory Access -
Write (sw)
  • MemoryALUOut B

0
0
1
X
1
X
XXX
X
X
0
0
X
130
Multicycle Control Step (4)ALU Instruction
(R-Type)
  • RegIR1511 ALUOut (RegRd ALUOut)

0
IRWrite
I
28
32
0
I
Instruction
jmpaddr
R
ltlt2
CONCAT
I250
5
PCWr
rd
rt
rs
X
X
RegDst
0
32
5
5
XXX
1
IorD
ALUSrcA
5
PC
Operation
RN1
RN2
WN
3
MemWrite
M
ADDR
PCSource
Registers
Zero
X
D
A
RD1
Memory
ALU
WD
R
RD
ALU
OUT
B
RD2
WD
4
MemRead
MemtoReg
1
RegWrite
0
1
ALUSrcB
16
32
immediate
X
ltlt2
131
Multicycle Execution Steps (5)Memory Read
Completion (lw)
  • RegIR20-16 MDR

0
IRWrite
I
0
28
32
I
Instruction
jmpaddr
R
ltlt2
CONCAT
I250
5
PCWr
X
rd
rt
rs
X
0
RegDst
32
0
XXX
5
5
IorD
ALUSrcA
5
PC
Operation
RN1
RN2
WN
3
MemWrite
M
ADDR
PCSource
Registers
X
Zero
D
A
RD1
Memory
ALU
WD
R
RD
ALU
OUT
B
RD2
WD
4
MemRead
MemtoReg
0
0
RegWrite
1
ALUSrcB
X
16
32
immediate
ltlt2
132
CPI in a Multicycle CPU
  • What is the CPI assuming each step requires 1
    clock cycle?
  • An instruction mix of 25 loads, 10 stores, 11
    branches, 2 jumps, and 52 ALU
  • Solution
  • Number of clock cycles from previous slide for
    each instruction class
  • loads 5, stores 4, ALU 4, branches 3, jumps 3
  • CPI CPU clock cycles / instruction count
  • ? (instruction countclass i ?
    CPIclass i) / instruction count
  • ? (instruction countclass I /
    instruction count) ? CPIclass I
  • 0.25 ? 5 0.10 ? 4 0.52 ? 4
    0.11 ? 3 0.02 ? 3
  • 4.12
  • Better than the worst-case CPI of 5.0

133
Conclusion
  • If instructions take different amounts of time,
    multi-cycle is better
  • We havent dived into the gory details of
    implementing a multi-cycle processors
  • What weve talked covers Sections 5.1, 5.2, 5.3,
    5.4, and a small subset of Section 5.5
  • This is all you need to read in the book
  • Dont worry about most of the stuff in Section
    5.5
  • We are now ready to talk about our big topic
    Pipelining

134
  • Q A

135
  • Chapter 5 Datapath and Control (?????????)
  • Single-Cycle Implementation v.s. Multi-Cycle
    Implementation
  • MIPS Instruction types and formats
  • What is Datapath? What are the datapath elements
    of MIPS?
  • What are the five steps of MIPS datapath?
  • Control unit design
  • What are the two kinds of control unit design?
    Describe their implementations and compare them.
  • Exception and Interrupt
  • Definitions
  • Operations

136
Example
  • Assume the base address of word array A is
    stored in the register s0. The following code is
    used for the calculation
  • A2 A0 A1 .
  • Highlight the running path of the following
    instructions in blue in the simple datapath
    and mark the control signal. Assume the first
    instruction is stored in the address of 0040
    1000hex .
  • lw t0, 0(s0)
  • lw t1, 4(s0)
  • add t1, t1, t0
  • slt t0, t1, zero
  • beq t0, zero, Label
  • sub t1, zero, t1
  • sw t1, 8(s0)
  • j Exit
  • Label sw t1, 8(s0)
  • Exit

137
The Simple Datapath with Controls
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
138
LW t0, 0/4(s0)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
139
The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0



140
add t1, t1, t0 / slt t0, t1,
zero / sub t1, zero, t1
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
141
The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type 1 0 0 1 0 0 0 1 0


142
beq t0, zero, Label (the case t0 zero)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
143
beq t0, zero, Label (the case t0 ! zero)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
144
The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type 1 0 0 1 0 0 0 1 0
beq x 0 x 0 0 0 1 0 1

145
sw t1, 8(s0)
0 M U X 1
4
Branch
Shift left 2
Control
3126
PC
Read address
Instruction 310 Instruction Memory
RegWrite
2521
Read register 1
Read data 1 Read
register 2 Write register
Read data 2 Write
data Register
files
MemWrite
2016
Mem2Reg
Zero
0 M U X 1
Address Read
data Write data Data Memory
ALU
ALUsrc
1511
0 M U X 1
0 M U X 1
RegDst
Sign- extend
ALU control
MemRead
150
32
16
ALUop
50
146
The Setting of Control Lines
Instruc-tions RegDst ALUSrc Mem2Reg Reg- Write Mem- Read Mem- Write Branch ALUOp1 ALUOp0
lw 0 1 1 1 1 0 0 0 0
R-type 1 0 0 1 0 0 0 1 0
beq x 0 x 0 0 0 1 0 1
sw x 1 x 0 0 1 0 0 0
Write a Comment
User Comments (0)
About PowerShow.com