Title: Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank
1Computer Architecture Lecture Notes Spring
2005Dr. Michael P. Frank
- Competency Area 5
- Processor Datapath Control
2Introduction
- We have discussed
- Performance
- Instruction Sets
- Computer Arithmetic
- Now, processor implementation (i.e. hardware for
implementing instructions) through study of the
datapath and control components of a computer.
3Introduction
- Typical MIPS implementation includes the
following components
- For every instruction, the first two steps are
the same - Instruction Fetch ? Fetch instruction from memory
_at_PC - Read Registers ? Select which register(s) to read
(for loads - stores, and immediate ops, only read one
register) - Use ALU to ? - Calculate Address (mem-ref
instructions) - - Execute operations (arithmetic-logic)
- - Compare registers (branches)
4Introduction
- If instruction is arithmetic-logical, the result
from the ALU is written to a register. - If instruction is a load/store, use a path from
memory to registers (for reading memory) and from
registers to memory (for writing memory). - Branches will use the ALU output to determine the
next instruction. Well look at more details
later.
- Clocking Methodologies
- It is important to understand logic
implementations and clocking - when designing machines.
- Well introduce some terminology used for this
next lecture, - most of which involves understanding of
combinational logic.
5Timing Considerations
- Clocking Methodologies
- It is important to understand logic
implementations and clocking - when designing machines. A clocking
methodology defines - when signals can be read and written.
- Some Terminology
- - a logically asserted signal indicates a
logic true - - To assert indicates a signal should be driven
to true - Any processor consists of two types of elements
Combinational Elements Given a set of inputs,
they produce the same set of outputs for
each Execution ? No internal storage (e.g. ALU)
Sequential or State Elements Has internal storage
which allows values to be saved and synchronized
(e.g. register file, instruction and data
memories)
6State Elements
- A state element has at least two inputs (value to
be written and clock) and one output (value that
was written from earlier clock cycle). - For edge-triggered clocking methodologies values
that are stored in the machine are updated on a
clock edge.
7State Elements
- Combinational logic elements must have their data
coming from state elements. - Inputs are values written in the previous clock
cycle outputs are values that can be used in the
following clock cycle.
A clocked system is also called a synchronous
system wherein the signals that are written into
state elements must be valid when the active
clock edge occurs. (i.e. a signal is valid if it
is stable or unchanging)
8State Elements
- How do we construct state elements?
- - We use latches and flip-flops.
- - Latches State changes whenever input changes,
and the clock is asserted. - - Flip-flops State changes only on a clock
edge. - The simplest memory elements are unclocked which
means that they dont have any clock input. - Example The set-reset (S-R) latch has an output
that depends on present and past inputs (not on
clock signal).
9D Latch
- In computers we use clocked memory storage
elements. - In particular, we use the D latch and D
flip-flops. - Consider the D Latch
- Two inputs
- the data value to be stored (D)
- the clock signal (C) indicating when to read
store D - Two outputs
- the value of the internal state (Q) and it's
complement - We use flip-flops to build registers, which
become the basic building blocks of smaller
memories.
10D Latch
When the clock input, C, is asserted, the latch
is open and the Q output assumes the value of the
D input
(Logical Equation)
11D Flip-flop
- Falling edge-triggered flip-flop where output
changes only on the clock edge
12Register File
- The register file contains a set of registers
that can be read and written by supplying a
register number to be accessed. - The register file could be built using D
flip-flops. - In practice, simpler clocked storage elements are
used instead - E.g., SRAM cells
- Since reading a register does not change the
state, we need only supply the register number as
input and the output is the data contained in
that register.
13Register File
- The read port can be implemented using a pair of
multiplexors
14Register File
- Writing to a register is a little more
complicated. - In the write port, we use a decoder to determine
which register to write to. - When the write signal is asserted, the clock
input to only the selected register is asserted. - An active edge on the C input only occurs for the
selected register.
15Building the Datapath
- Our ultimate goal is understand how to build a
datapath (i.e. the processor component that
performs arithmetic operations) in MIPS hardware
as illustrated below.
16Simple Implementation
- These are some of the functional units that we
need for our instructions.
17Lets Design
- Path for instruction fetch and PC increment
4
32
18Lets Design
- Datapath for R-type instructions (add, sub, etc.)
Inst2125
Inst1620
Inst1115
Inst031
19Lets Design
- Datapath for load word/store word (lw/sw)
Inst2125
Inst1620
Inst015
20Lets Design
- Datapath for BEQ instructions
4
Inst015
21Lets Design
- Datapath for J (jump) instruction
control
MUX
4
Inst015
32
Inst025
JDest227
Jdest01
00
JDest2831
PC2831
22Simple Implementation (10/28)
- Recall from last time that we designed a datapath
sequence for instruction fetch, R-type
instructions (add, sub, and, etc), load and store
word instructions, and branch-equal and jump
instructions. - We used the following elements for individual
designs. - To build a complete datapath, we need to combine
the separate datapaths and add some control
signals to create a single datapath for
instructions.
23Simple Implementation
- Many instructions use the same functional units
in their datapath construction. We can use this
information to share datapaths for different
instructions. - When we build a single datapath, we can use a mux
to select different source inputs. - Consider, the datapath for R-type instructions
24Simple Implementation
- The datapath for memory-reference instructions
- We can combine these instructions by using a mux
to select which source data to use (either
sign-extended input or Read data 2 input. - Also we need a mux to select whether data is
written to memory or to a register.
25Simple Implementation
- If we include the instruction fetch hardware, the
modified datapath for R-type and
memory-reference instructions is
26Simple Implementation
- The datapath hardware implementation for all 3
instruction classes (R-type, memory-references,
and branches/jumps) is given as
27Control
- We have designed a single datapath for all
instructions. How do we determine which
instruction gets executed? - We design control units to specify desired
instructions. - Recall, that ALU operation has 3 inputs
ALU Control Input Function
000 AND
001 OR
010 ADD
110 SUB
111 SLT
- To design the ALU control unit, we use as
inputs, the function - field of the instruction and a 2-bit control
field called ALUOp.
28Control
- Recall that the instruction formats for the 3
different instruction - classes are
29Control
- The figure illustrates the ALU control unit with
Instruction - bits 5-0 identified as the function field for
R-type instructions as input to the ALU Control.
30Control
- The following table illustrates how to set the
ALU inputs for desired instructions.
Instruction Opcode ALUOp Instruction Operation Funct Field Desired ALU Action ALU cntl Input
Lw 00 Load word XXXXXX Add 010
Sw 00 Store word XXXXXX Add 010
beq 01 Branch equal XXXXXX Subtract 110
R-type 10 Add 100000 Add 010
R-type 10 Subtract 100010 Subtract 110
R-type 10 AND 100100 And 000
R-type 10 OR 100101 Or 001
R-type 10 SLT 101010 Set less than 111
- Note that the ALUOp bits are determined by the
main control unit, but in general - for loads/stores (00), beq (01), and R-type
instructions (10), which indicates that - the operation is encoded in the function field.
- Only for ALUOp10, is the function field used to
determine the desired ALU - action.
31Control
- We must generate a mapping of the 2-bit ALUOp
and the 6-bit function - code inputs of the ALU control unit to the
3-bit ALU operation. - We can use a truth table. Noting that a 11
ALUOp is not used so we can - substitute a dont care entry
- From this truth table, we can generate a
hardware implementation of the - ALU Control unit using basic logic gates.
32Control
- Lets consider some examples
- Given the following instruction lw s3,
10(s2) - i) Identify the machine code for this
instruction. - ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 4-bit ALU operation for this
instruction. - iii) Using the given figure, identify the
appropriate datapath for the given signal.
33Example 1
- There were 9 different examples (a) (i). Well
look at 4 of them. - Example 1 lw s3, 10(s2)
- Identify the machine code for this instruction.
- ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction
34Example 1
- Example 1 lw s3, 10(s2)
- iii) Using the figure given, identify the correct
datapath for this instruction.
Control Signals RegDst 1 ALUOp 00
ALUSrc 0 MemtoReg 1 PCSrc 1
35Example 2
- Example 2 addi s1, s2, 144
- Identify the machine code for this instruction.
- ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction
(decimal)
(binary)
36Example 2
- Example 2 addi s1, s2, 144
- iii) Using the figure given, identify the correct
datapath for this instruction.
Control Signals RegDst 1 ALUOp 00
ALUSrc 0 MemtoReg 0 PCSrc 1
37Example 3
- Example 3 sub s2, s4, t1
- Identify the machine code for this instruction.
- ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction
(decimal)
(binary)
38Example 3
- Example 3 sub s2, s4, t1
- Using the figure given, identify the correct
datapath for this instruction.
Control Signals RegDst 0 ALUOp 10
ALUSrc 1 MemtoReg 0 PCSrc 1
39Example 4
- Example 4 beq s0, s1, exit
assume exit is located at 30,000 - Identify the machine code for this instruction.
- ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction
(decimal)
(binary)
40Example 4
- Example 4 beq s0, s1, exit
assume exit is located at 30,000 - iii) Using the figure given, identify the correct
datapath for this instruction.
Control Signals RegDst 1 ALUOp 01
ALUSrc 1 MemtoReg X (set from previous
instruction) PCSrc 1, if branch not taken
0, if branch taken
41Main Control Unit (11/02)
- We are now ready to discuss that main control
unit
42Main Control Unit
- Consider the control signals for the main control
unit. - There are 7 control signals that can be set in
the main control unit (9 in all, if we include
the 2-bit ALUOp)
Signal Name If signal bit0 (deasserted) If signal bit1 (asserted)
RegDst Destination Reg is given by rt field bits 20-16 Destination Reg for Write reg is given by rd field 15-11
RegWrite No effect Write data value is written to Write register input
ALUSrc Second ALU operand comes from register 2 output Second ALU operand is lower 16 bits of instructions
PCSrc PC PC4 (points to next instruction) PC is replaced by calculated branch target address
MemRead No effect Data Memory contents at specified addr are sent to Read data output
MemWrite No effect Data Memory contents at specified addr are replaced by write data
MemtoReg ALU output is fed back to write data input Data memory value is fed to wirte data input
43Main Control Unit
- All but one of the control signals are completely
determined by the opcode bits 31-26. Do you
know which one? - The following table illustrates the truth table
for the control signals for different instruction
classes
Instruction RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp
R-TYPE 1 0 0 1 0 0 0 10
LW 0 1 1 1 1 0 0 00
SW X 1 X 0 0 1 0 00
BEQ X 0 X 0 0 0 1 01
44Main Control Unit
- Note that Instruction bits 31-26 is the input
for the main control unit. - Also note that for branches, if the zero detect
signal is asserted then the PC is updated with
the branch target address, hence the need for the
AND gate.
45Main Control Unit
- Since the opcode completely characterizes the
control unit (with the exception of the PCSrc
signal), we can create a truth table that maps
the opcode into control signals.
Name Opcode Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0)
Name Opcode Op5 Op4 Op3 Op2 Op1 Op0
R-type 010 0 0 0 0 0 0
LW 3510 1 0 0 0 1 1
SW 4310 1 0 1 0 1 1
BEQ 410 0 0 0 1 0 0
46Main Control Unit
Input or Output Signal R-type LW SW BEQ
Inputs Op5 0 1 1 0
Inputs Op4 0 0 0 0
Inputs Op3 0 0 1 0
Inputs Op2 0 0 0 1
Inputs Op1 0 1 1 0
Inputs Op0 0 1 1 0
Outputs RegDst 1 0 X X
Outputs ALUSrc 0 1 1 0
Outputs MemtoReg 0 1 X X
Outputs RegWrite 1 1 0 0
Outputs MemRead 0 1 0 0
Outputs MemWrite 0 0 1 0
Outputs Branch 0 0 0 1
Outputs ALUOp1 1 0 0 0
Outputs ALUOp0 0 0 0 1
47Logic for Control Units
- Simple combinational logic (truth tables)
48Single Cycle Implementation
- Recall the basic implementation of a single-cycle
datapath implementation as given below.
49Single-cycle versus Multicycle
- Remember that single-cycle hardware
implementations have many drawbacks - including
- (1) Functional Unit Delay increases as program
complexity increases - (2) Violation of Design Principle 1
Simplicity favors regularity - (3) Inefficient in performance, cost, and
hardware utilization - - Multiple redundant memory units, adders, etc.
- There are two main alternates to single-cycle
implementations - multicycle implementation
- pipelining (will cover later, if time)
- Multicycle implementations improve performance
by breaking instructions into short steps each
of which is executed in a shorter clock cycle. - - Instructions that require fewer steps can then
finish in less time. - We must now define what the steps of the
instruction are...
50Multicycle Implementation (11/04)
- Here is a basic datapath for a multicycle
implementation. - Control signals are omitted for now, for
simplicity.
Intra-cycle logic
I
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
D
a
t
a
P
C
A
d
d
r
e
s
s
A
R
e
g
i
s
t
e
r
I
n
s
t
r
u
c
t
i
o
n
A
L
U
A
L
U
O
u
t
M
e
m
o
r
y
R
e
g
i
s
t
e
r
s
o
r
d
a
t
a
R
e
g
i
s
t
e
r
M
e
m
o
r
y
d
a
t
a
B
D
a
t
a
r
e
g
i
s
t
e
r
R
e
g
i
s
t
e
r
Cycles1,4
Cycles2,5
Cycles1,2,3
Inter-cycle clocked registers
51Multicycle Implementation
- Break up the instructions into steps, each step
takes a cycle - balance the amount of work to be done
- restrict each cycle to use only one major
functional unit - At the end of a cycle
- store values for use in later cycles (easiest
thing to do) - introduce additional internal registers
- ALU is also used to compute addresses and to
increment PC - Memory unit is used for both instructions and
data - The control signals are not solely functions of
the instruction - B/c they must be different in different clock
cycles. - Well use a finite state machine (FSM) for control
52Multicycle Implementation
- We need multiplexors to specify
- instruction address or data memory address
- Destination Register (rt or rd fields)
- Memory to register output
- ALUSrcA (either for updating PC or executing
instruction) - ALUSrcB (ALU input is 1 of 4 specified inputs)
53Control Signals
- Here weve added major control signals needed for
datapath implementations
54Control Signals
Signal Name(1-bit) Deasserted effects (Bit0) Asserted effects (Bit1)
RegDst Write register comes from the rt Write register is specified by rd field
RegWrite None Write data input is written to write register
ALUSrcA ALU operand is the PC ALU operand comes from A reg
MemRead None Memory contents specified by addr is sent to data output
MemWrite None Memory contents specified by addr is replaced by data value
MemtoReg ALUOut value is sent to register file Write data Register file Write data is specified by MDR (memory)
IorD PC supplies addr to mem unit ALUOut supplies addr to memory
IRWrite None Output of Memory sent to IR
PCWrite None PC is written PCSource controls source selection
PCWriteCond None PC written is zero signal is active (beq)
55Control Signals
Signal Name (2-bit) Value Effect
ALUOp 00 ALU performs add
ALUOp 01 ALU performs subtract
ALUOp 10 Function field determines ALU operation
ALUSrcB 00 2nd input to ALU is from B register
ALUSrcB 01 2nd input to ALU is the constant 4
ALUSrcB 10 2nd input to ALU is sign-extended value of IR
ALUSrcB 11 2nd input to ALU is sign-extended value of IR left shifted by 2
PCSource 00 Output of the ALU (PC4) is sent to PC
PCSource 01 ALUOut (branch target address) value is sent to PC
PCSource 10 Jump target address is sent to PC (IR25-0 left shifted by 2 concatenated with (PC4)31-28)
56Control Signals
57Five Execution Steps
- Instruction Fetch
- Instruction Decode and Register Fetch
- Execution, Memory Address Computation, or Branch
Completion - Memory Access or R-type instruction completion
- Write-back step INSTRUCTIONS TAKE FROM 3 - 5
CYCLES!
58Step 1 Instruction Fetch
- Use PC to get instruction and put it in the
Instruction Register. - Increment the PC by 4 and put the result back in
the PC. - Can be described succinctly using RTL
- "Register-Transfer Language" IR
MemoryPC PC PC 4
59Step 2 Instruction Decode Register Fetch
- Read registers rs and rt in case we need them
- Compute the branch address in case the
instruction is a branch - RTL A RegIR25-21 B RegIR20-16
ALUOut PC (sign-extend(IR15-0)ltlt2) - We aren't setting any control lines based on the
instruction type (we are busy "decoding" it in
our control logic)
60Step 3 (Instruction Dependent)
- ALU is performing one of three functions, based
on instruction type - Memory ReferenceALUOut A sign-extend(IR15-0
) - R-typeALUOut A op B
- Branchif (AB) PC ALUOut
61Step 4 R-type or Memory Access
- Loads and stores access memory MDR
MemoryALUOut or MemoryALUOut B - R-type instructions finish RegIR15-11
ALUOutThe write actually takes place at the
end of the cycle on the edge
62Step 5 Write back Step
- Load data is written back to the register file
- RegIR20-16 MDR
63Summary
64Finite State Machine for Control (11/18)
- (Brief Review of FSMs)
- Recall that when we want to design a sequential
circuit, we first develop a finite state machine
(FSM) model. - An FSMs behavior depends on states and inputs.
- State and inputs determine next state and
possible outputs. - From FSM transition diagram, we produce a state
table. - A.k.a. transition table.
- From transition table, we can produce a
sequential circuit. - Consider the JK Flip-flop Example
- Recall when J1, sets flip-flop output to 1, when
K1 sets flip-flop to 0, when JK1 invert
flip-flop. The characteristic table
J K Q(next)
0 0 Q
0 1 0
1 0 1
1 1 Q
FSM Review is adopted from www.csis.gvsu.edu/Notes
/Architecture/fsm.html
65Finite State Machine for Control
- The are two states defined by the characteristic
table, Y and Z. The state table becomes
Present State J K Q Next State
Y 0 0 0 Y
Y 0 1 0 Y
Y 1 0 1 Z
Y 1 1 1 Z
Z 0 0 1 Z
Z 0 1 0 Y
Z 1 0 1 Z
Z 1 1 0 Y
- Consider present state Y To remain in state
Y ? JK JK J To change to state Z ? JK
JK J - Consider present state Z To
remain in state Z ? JK JK K To change
to state Y ? JK JK K
FSM Review is adopted from www.csis.gvsu.edu/Notes
/Architecture/fsm.html
66Finite State Machine for Control
Present State J K Q Next State
Y 0 X 0 Y
Y 1 X 1 Z
Z X 0 1 Z
Z X 1 0 Y
- Moore versus Mealy FSMs
- Moore machines associate outputs with states
- (i.e. an output symbol is assigned to each
state). - Mealy machines associate outputs with
transitions - (i.e. an output state is defined by a pair of
state and input symbols) - For multicycle control lines we tend to use the
Moore FSM - because output depends only on current state
- less hardware is required to implement it.
FSM Review is adopted from www.csis.gvsu.edu/Notes
/Architecture/fsm.html
67Graphical Specifications of FSM
- The complete control FSM for multicycle
datapath implementation
68Multicycle Implementation
- Recall that for multicycle implementations, we
break up an instruction into - five basic steps
- 1) Instruction Fetch
- 2) Instruction Decode/Register Fetch
- 3) Execution, address computation, branch/jump
completion - 4) Memory access or R-type completion
- 5) Memory Read Completion
- We also determined that the control unit for a
multicycle datapath is not dependent - upon instruction classes, but rather by the
signals to be set in any step and the next - step in the sequence.
- Therefore, we can design our multicycle control
using finite state machines - (FSMs).
69Multicycle Control Review
- A finite state machine a set of states and
directions on how to change states. - The directions are defined by the next state
function, which maps the current state and the
inputs to a new state. - Each state specifies a set of output signals that
are asserted when the machine is in that state.
Note that if the output is not explicitly
asserted, then it is assumed to be deasserted
rather than a dont care value. - The FSM corresponds to the 5 steps of instruction
execution. - Recall that for the multicycle implementation,
each step will execute in one clock cycle, which
is also true for the control FSM. - Each state in will execute in a single clock
cycle as well.
70Multicycle Control
- A high-level view of the FSM control is
illustrated.
- Notice that for all instructions, the first two
steps are always the - same. The next steps are determined by the
instruction opcode and - are used to complete the instruction. Upon
completion, the control - returns to fetch a new instruction.
71Multicycle Control FSM
-
- How many state bits will we need?
- 10 states are needed for complete multicycle
- control FSM.
- Each state is represented by a circle.
- The labels on the arc are the
- conditions that are tested.
- The signals shown for
- each state represent the
- output signals.
72Graphical Representation of FSM
- Consider the instruction fetch and decode portion
of the multicycle control.
The control sequence for instruction fetch
(State 0) MemRead ALUSrcB 01
ALUSrcA 0 ALUOp 00 IorD 0
PCWrite IRWrite PCSource 00
instruction decode (State 1) ALUSrcA 0
ALUSrcB 11 ALUOp 00
73Graphical Representation of FSM
- Consider the memory-reference portion of the
multicycle control.
The control sequence for memory reference
(States 2-5) Address Calculation ALUSrcA
1 ALUSrcB 10 ALUOp 00 Memory Access
LW MemRead IorD 1 Memory Access
SW MemWrite IorD 1 Write back Step (for
lw only) Regwrite MemtoReg 1 RegDst 0
74Graphical Representation of FSM
- Consider the R-type instructions in the
multicycle control.
The control sequence for memory reference
(States 6-7) Execution ALUSrcA 1 ALUSrcB
00 ALUOp 10 R-type Completion RegDst
1 RegWrite MemtoReg 0 Return to IF State
75Graphical Representation of FSM
- Consider the branch instructions in the
multicycle control.
The control sequence for Branch instructions
(State 8) Branch Completion ALUSrcA
1 ALUSrcB 00 ALUOp 01 PCWriteCond PCSourc
e 01 Return to IF State
76Graphical Representation of FSM
- Consider the jump instructions in the multicycle
control.
The control sequence for Jump instructions
(State 9) Jump Completion
PCWrite PCSource 10 Return to IF State
77Putting it all together
- Graphical Specification of
- Multicycle control FSM
-
- How many state bits will we need?
78Finite State Machine for Control
- Implementation for multicycle control FSM
79ProgrammedLogic Array
PLA Implementation
- Could you explain control functions of the
machine represented the highlighted vertical
lines?
80PLA Implementation
- Vertical Line 1 indicates
- S3 0, S2 1, S1 1, S0 1 ? State 7 which
we know is the control for the R-type completion
step. - Also, RegWrite is asserted and RegDst is
asserted. - Vertical Line 2 indicates
- Op51, Op40, Op31, Op20, Op11, Op1 (which
instruction does this opcode represent?) - S3 0, S2 0, S1 1, S0 0 ? State 2 which
we know is the control for the memory address
computation step.