Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank presentation

About This Presentation

Transcript and Presenter's Notes

Title: Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

1
Computer Architecture Lecture Notes Spring
2005Dr. Michael P. Frank

Competency Area 5
Processor Datapath Control

2
Introduction

We have discussed
Performance
Instruction Sets
Computer Arithmetic
Now, processor implementation (i.e. hardware for
implementing instructions) through study of the
datapath and control components of a computer.

3
Introduction

Typical MIPS implementation includes the
following components

For every instruction, the first two steps are
the same
Instruction Fetch ? Fetch instruction from memory
_at_PC
Read Registers ? Select which register(s) to read
(for loads
stores, and immediate ops, only read one
register)
Use ALU to ? - Calculate Address (mem-ref
instructions)
- Execute operations (arithmetic-logic)
- Compare registers (branches)

4
Introduction

If instruction is arithmetic-logical, the result
from the ALU is written to a register.
If instruction is a load/store, use a path from
memory to registers (for reading memory) and from
registers to memory (for writing memory).
Branches will use the ALU output to determine the
next instruction. Well look at more details
later.

Clocking Methodologies
It is important to understand logic
implementations and clocking
when designing machines.
Well introduce some terminology used for this
next lecture,
most of which involves understanding of
combinational logic.

5
Timing Considerations

Clocking Methodologies
It is important to understand logic
implementations and clocking
when designing machines. A clocking
methodology defines
when signals can be read and written.
Some Terminology
- a logically asserted signal indicates a
logic true
- To assert indicates a signal should be driven
to true
Any processor consists of two types of elements

Combinational Elements Given a set of inputs,
they produce the same set of outputs for
each Execution ? No internal storage (e.g. ALU)
Sequential or State Elements Has internal storage
which allows values to be saved and synchronized
(e.g. register file, instruction and data
memories)
6
State Elements

A state element has at least two inputs (value to
be written and clock) and one output (value that
was written from earlier clock cycle).
For edge-triggered clocking methodologies values
that are stored in the machine are updated on a
clock edge.

7
State Elements

Combinational logic elements must have their data
coming from state elements.
Inputs are values written in the previous clock
cycle outputs are values that can be used in the
following clock cycle.

A clocked system is also called a synchronous
system wherein the signals that are written into
state elements must be valid when the active
clock edge occurs. (i.e. a signal is valid if it
is stable or unchanging)
8
State Elements

How do we construct state elements?
- We use latches and flip-flops.
- Latches State changes whenever input changes,
and the clock is asserted.
- Flip-flops State changes only on a clock
edge.
The simplest memory elements are unclocked which
means that they dont have any clock input.
Example The set-reset (S-R) latch has an output
that depends on present and past inputs (not on
clock signal).

9
D Latch

In computers we use clocked memory storage
elements.
In particular, we use the D latch and D
flip-flops.
Consider the D Latch
Two inputs
the data value to be stored (D)
the clock signal (C) indicating when to read
store D
Two outputs
the value of the internal state (Q) and it's
complement
We use flip-flops to build registers, which
become the basic building blocks of smaller
memories.

10
D Latch
When the clock input, C, is asserted, the latch
is open and the Q output assumes the value of the
D input
(Logical Equation)
11
D Flip-flop

Falling edge-triggered flip-flop where output
changes only on the clock edge

12
Register File

The register file contains a set of registers
that can be read and written by supplying a
register number to be accessed.
The register file could be built using D
flip-flops.
In practice, simpler clocked storage elements are
used instead
E.g., SRAM cells
Since reading a register does not change the
state, we need only supply the register number as
input and the output is the data contained in
that register.

13
Register File

The read port can be implemented using a pair of
multiplexors

14
Register File

Writing to a register is a little more
complicated.
In the write port, we use a decoder to determine
which register to write to.
When the write signal is asserted, the clock
input to only the selected register is asserted.
An active edge on the C input only occurs for the
selected register.

15
Building the Datapath

Our ultimate goal is understand how to build a
datapath (i.e. the processor component that
performs arithmetic operations) in MIPS hardware
as illustrated below.

16
Simple Implementation

These are some of the functional units that we
need for our instructions.

17
Lets Design

Path for instruction fetch and PC increment

4
32
18
Lets Design

Datapath for R-type instructions (add, sub, etc.)

Inst2125
Inst1620
Inst1115
Inst031
19
Lets Design

Datapath for load word/store word (lw/sw)

Inst2125
Inst1620
Inst015
20
Lets Design

Datapath for BEQ instructions

4
Inst015
21
Lets Design

Datapath for J (jump) instruction

control
MUX
4
Inst015
32
Inst025
JDest227
Jdest01
00
JDest2831
PC2831
22
Simple Implementation (10/28)

Recall from last time that we designed a datapath
sequence for instruction fetch, R-type
instructions (add, sub, and, etc), load and store
word instructions, and branch-equal and jump
instructions.
We used the following elements for individual
designs.
To build a complete datapath, we need to combine
the separate datapaths and add some control
signals to create a single datapath for
instructions.

23
Simple Implementation

Many instructions use the same functional units
in their datapath construction. We can use this
information to share datapaths for different
instructions.
When we build a single datapath, we can use a mux
to select different source inputs.
Consider, the datapath for R-type instructions

24
Simple Implementation

The datapath for memory-reference instructions
We can combine these instructions by using a mux
to select which source data to use (either
sign-extended input or Read data 2 input.
Also we need a mux to select whether data is
written to memory or to a register.

25
Simple Implementation

If we include the instruction fetch hardware, the
modified datapath for R-type and
memory-reference instructions is

26
Simple Implementation

The datapath hardware implementation for all 3
instruction classes (R-type, memory-references,
and branches/jumps) is given as

27
Control

We have designed a single datapath for all
instructions. How do we determine which
instruction gets executed?
We design control units to specify desired
instructions.
Recall, that ALU operation has 3 inputs

ALU Control Input Function
000 AND
001 OR
010 ADD
110 SUB
111 SLT

To design the ALU control unit, we use as
inputs, the function
field of the instruction and a 2-bit control
field called ALUOp.

28
Control

Recall that the instruction formats for the 3
different instruction
classes are

29
Control

The figure illustrates the ALU control unit with
Instruction
bits 5-0 identified as the function field for
R-type instructions as input to the ALU Control.

30
Control

The following table illustrates how to set the
ALU inputs for desired instructions.

Instruction Opcode ALUOp Instruction Operation Funct Field Desired ALU Action ALU cntl Input
Lw 00 Load word XXXXXX Add 010
Sw 00 Store word XXXXXX Add 010
beq 01 Branch equal XXXXXX Subtract 110
R-type 10 Add 100000 Add 010
R-type 10 Subtract 100010 Subtract 110
R-type 10 AND 100100 And 000
R-type 10 OR 100101 Or 001
R-type 10 SLT 101010 Set less than 111

Note that the ALUOp bits are determined by the
main control unit, but in general
for loads/stores (00), beq (01), and R-type
instructions (10), which indicates that
the operation is encoded in the function field.
Only for ALUOp10, is the function field used to
determine the desired ALU
action.

31
Control

We must generate a mapping of the 2-bit ALUOp
and the 6-bit function
code inputs of the ALU control unit to the
3-bit ALU operation.
We can use a truth table. Noting that a 11
ALUOp is not used so we can
substitute a dont care entry

From this truth table, we can generate a
hardware implementation of the
ALU Control unit using basic logic gates.

32
Control

Lets consider some examples
Given the following instruction lw s3,
10(s2)
i) Identify the machine code for this
instruction.
ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 4-bit ALU operation for this
instruction.
iii) Using the given figure, identify the
appropriate datapath for the given signal.

33
Example 1

There were 9 different examples (a) (i). Well
look at 4 of them.
Example 1 lw s3, 10(s2)
Identify the machine code for this instruction.
ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction

34
Example 1

Example 1 lw s3, 10(s2)
iii) Using the figure given, identify the correct
datapath for this instruction.

Control Signals RegDst 1 ALUOp 00
ALUSrc 0 MemtoReg 1 PCSrc 1
35
Example 2

Example 2 addi s1, s2, 144
Identify the machine code for this instruction.
ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction

(decimal)
(binary)
36
Example 2

Example 2 addi s1, s2, 144
iii) Using the figure given, identify the correct
datapath for this instruction.

Control Signals RegDst 1 ALUOp 00
ALUSrc 0 MemtoReg 0 PCSrc 1
37
Example 3

Example 3 sub s2, s4, t1
Identify the machine code for this instruction.
ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction

(decimal)
(binary)
38
Example 3

Example 3 sub s2, s4, t1
Using the figure given, identify the correct
datapath for this instruction.

Control Signals RegDst 0 ALUOp 10
ALUSrc 1 MemtoReg 0 PCSrc 1
39
Example 4

Example 4 beq s0, s1, exit
assume exit is located at 30,000
Identify the machine code for this instruction.
ii) Determine the 2-bit ALUOp, 6-bit Function
Field, and the 3-bit ALU operation for this
instruction

(decimal)
(binary)
40
Example 4

Example 4 beq s0, s1, exit
assume exit is located at 30,000
iii) Using the figure given, identify the correct
datapath for this instruction.

Control Signals RegDst 1 ALUOp 01
ALUSrc 1 MemtoReg X (set from previous
instruction) PCSrc 1, if branch not taken
0, if branch taken
41
Main Control Unit (11/02)

We are now ready to discuss that main control
unit

42
Main Control Unit

Consider the control signals for the main control
unit.
There are 7 control signals that can be set in
the main control unit (9 in all, if we include
the 2-bit ALUOp)

Signal Name If signal bit0 (deasserted) If signal bit1 (asserted)
RegDst Destination Reg is given by rt field bits 20-16 Destination Reg for Write reg is given by rd field 15-11
RegWrite No effect Write data value is written to Write register input
ALUSrc Second ALU operand comes from register 2 output Second ALU operand is lower 16 bits of instructions
PCSrc PC PC4 (points to next instruction) PC is replaced by calculated branch target address
MemRead No effect Data Memory contents at specified addr are sent to Read data output
MemWrite No effect Data Memory contents at specified addr are replaced by write data
MemtoReg ALU output is fed back to write data input Data memory value is fed to wirte data input
43
Main Control Unit

All but one of the control signals are completely
determined by the opcode bits 31-26. Do you
know which one?
The following table illustrates the truth table
for the control signals for different instruction
classes

Instruction RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp
R-TYPE 1 0 0 1 0 0 0 10
LW 0 1 1 1 1 0 0 00
SW X 1 X 0 0 1 0 00
BEQ X 0 X 0 0 0 1 01
44
Main Control Unit

Note that Instruction bits 31-26 is the input
for the main control unit.
Also note that for branches, if the zero detect
signal is asserted then the PC is updated with
the branch target address, hence the need for the
AND gate.

45
Main Control Unit

Since the opcode completely characterizes the
control unit (with the exception of the PCSrc
signal), we can create a truth table that maps
the opcode into control signals.

Name Opcode Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0) Opcode in Binary (Instruction Bits 31-26Op5-0)
Name Opcode Op5 Op4 Op3 Op2 Op1 Op0
R-type 010 0 0 0 0 0 0
LW 3510 1 0 0 0 1 1
SW 4310 1 0 1 0 1 1
BEQ 410 0 0 0 1 0 0
46
Main Control Unit
Input or Output Signal R-type LW SW BEQ
Inputs Op5 0 1 1 0
Inputs Op4 0 0 0 0
Inputs Op3 0 0 1 0
Inputs Op2 0 0 0 1
Inputs Op1 0 1 1 0
Inputs Op0 0 1 1 0
Outputs RegDst 1 0 X X
Outputs ALUSrc 0 1 1 0
Outputs MemtoReg 0 1 X X
Outputs RegWrite 1 1 0 0
Outputs MemRead 0 1 0 0
Outputs MemWrite 0 0 1 0
Outputs Branch 0 0 0 1
Outputs ALUOp1 1 0 0 0
Outputs ALUOp0 0 0 0 1
47
Logic for Control Units

Simple combinational logic (truth tables)

48
Single Cycle Implementation

Recall the basic implementation of a single-cycle
datapath implementation as given below.

49
Single-cycle versus Multicycle

Remember that single-cycle hardware
implementations have many drawbacks
including
(1) Functional Unit Delay increases as program
complexity increases
(2) Violation of Design Principle 1
Simplicity favors regularity
(3) Inefficient in performance, cost, and
hardware utilization
- Multiple redundant memory units, adders, etc.
There are two main alternates to single-cycle
implementations
multicycle implementation
pipelining (will cover later, if time)
Multicycle implementations improve performance
by breaking instructions into short steps each
of which is executed in a shorter clock cycle.
- Instructions that require fewer steps can then
finish in less time.
We must now define what the steps of the
instruction are...

50
Multicycle Implementation (11/04)

Here is a basic datapath for a multicycle
implementation.
Control signals are omitted for now, for
simplicity.

Intra-cycle logic
I
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
D
a
t
a
P
C
A
d
d
r
e
s
s
A
R
e
g
i
s
t
e
r

I
n
s
t
r
u
c
t
i
o
n
A
L
U
A
L
U
O
u
t
M
e
m
o
r
y
R
e
g
i
s
t
e
r
s
o
r

d
a
t
a
R
e
g
i
s
t
e
r

M
e
m
o
r
y
d
a
t
a

B
D
a
t
a
r
e
g
i
s
t
e
r
R
e
g
i
s
t
e
r

Cycles1,4
Cycles2,5
Cycles1,2,3
Inter-cycle clocked registers
51
Multicycle Implementation

Break up the instructions into steps, each step
takes a cycle
balance the amount of work to be done
restrict each cycle to use only one major
functional unit
At the end of a cycle
store values for use in later cycles (easiest
thing to do)
introduce additional internal registers
ALU is also used to compute addresses and to
increment PC
Memory unit is used for both instructions and
data
The control signals are not solely functions of
the instruction
B/c they must be different in different clock
cycles.
Well use a finite state machine (FSM) for control

52
Multicycle Implementation

We need multiplexors to specify
instruction address or data memory address
Destination Register (rt or rd fields)
Memory to register output
ALUSrcA (either for updating PC or executing
instruction)
ALUSrcB (ALU input is 1 of 4 specified inputs)

53
Control Signals

Here weve added major control signals needed for
datapath implementations

54
Control Signals
Signal Name(1-bit) Deasserted effects (Bit0) Asserted effects (Bit1)
RegDst Write register comes from the rt Write register is specified by rd field
RegWrite None Write data input is written to write register
ALUSrcA ALU operand is the PC ALU operand comes from A reg
MemRead None Memory contents specified by addr is sent to data output
MemWrite None Memory contents specified by addr is replaced by data value
MemtoReg ALUOut value is sent to register file Write data Register file Write data is specified by MDR (memory)
IorD PC supplies addr to mem unit ALUOut supplies addr to memory
IRWrite None Output of Memory sent to IR
PCWrite None PC is written PCSource controls source selection
PCWriteCond None PC written is zero signal is active (beq)
55
Control Signals
Signal Name (2-bit) Value Effect
ALUOp 00 ALU performs add
ALUOp 01 ALU performs subtract
ALUOp 10 Function field determines ALU operation
ALUSrcB 00 2nd input to ALU is from B register
ALUSrcB 01 2nd input to ALU is the constant 4
ALUSrcB 10 2nd input to ALU is sign-extended value of IR
ALUSrcB 11 2nd input to ALU is sign-extended value of IR left shifted by 2
PCSource 00 Output of the ALU (PC4) is sent to PC
PCSource 01 ALUOut (branch target address) value is sent to PC
PCSource 10 Jump target address is sent to PC (IR25-0 left shifted by 2 concatenated with (PC4)31-28)
56
Control Signals
57
Five Execution Steps

Instruction Fetch
Instruction Decode and Register Fetch
Execution, Memory Address Computation, or Branch
Completion
Memory Access or R-type instruction completion
Write-back step INSTRUCTIONS TAKE FROM 3 - 5
CYCLES!

58
Step 1 Instruction Fetch

Use PC to get instruction and put it in the
Instruction Register.
Increment the PC by 4 and put the result back in
the PC.
Can be described succinctly using RTL
"Register-Transfer Language" IR
MemoryPC PC PC 4

59
Step 2 Instruction Decode Register Fetch

Read registers rs and rt in case we need them
Compute the branch address in case the
instruction is a branch
RTL A RegIR25-21 B RegIR20-16
ALUOut PC (sign-extend(IR15-0)ltlt2)
We aren't setting any control lines based on the
instruction type (we are busy "decoding" it in
our control logic)

60
Step 3 (Instruction Dependent)

ALU is performing one of three functions, based
on instruction type
Memory ReferenceALUOut A sign-extend(IR15-0
)
R-typeALUOut A op B
Branchif (AB) PC ALUOut

61
Step 4 R-type or Memory Access

Loads and stores access memory MDR
MemoryALUOut or MemoryALUOut B
R-type instructions finish RegIR15-11
ALUOutThe write actually takes place at the
end of the cycle on the edge

62
Step 5 Write back Step

Load data is written back to the register file
RegIR20-16 MDR

63
Summary
64
Finite State Machine for Control (11/18)

(Brief Review of FSMs)
Recall that when we want to design a sequential
circuit, we first develop a finite state machine
(FSM) model.
An FSMs behavior depends on states and inputs.
State and inputs determine next state and
possible outputs.
From FSM transition diagram, we produce a state
table.
A.k.a. transition table.
From transition table, we can produce a
sequential circuit.
Consider the JK Flip-flop Example
Recall when J1, sets flip-flop output to 1, when
K1 sets flip-flop to 0, when JK1 invert
flip-flop. The characteristic table

J K Q(next)
0 0 Q
0 1 0
1 0 1
1 1 Q
FSM Review is adopted from www.csis.gvsu.edu/Notes
/Architecture/fsm.html
65
Finite State Machine for Control

The are two states defined by the characteristic
table, Y and Z. The state table becomes

Present State J K Q Next State
Y 0 0 0 Y
Y 0 1 0 Y
Y 1 0 1 Z
Y 1 1 1 Z
Z 0 0 1 Z
Z 0 1 0 Y
Z 1 0 1 Z
Z 1 1 0 Y
- Consider present state Y To remain in state
Y ? JK JK J To change to state Z ? JK
JK J - Consider present state Z To
remain in state Z ? JK JK K To change
to state Y ? JK JK K
FSM Review is adopted from www.csis.gvsu.edu/Notes
/Architecture/fsm.html
66
Finite State Machine for Control

Alternatively,

Present State J K Q Next State
Y 0 X 0 Y
Y 1 X 1 Z
Z X 0 1 Z
Z X 1 0 Y

Moore versus Mealy FSMs
Moore machines associate outputs with states
(i.e. an output symbol is assigned to each
state).
Mealy machines associate outputs with
transitions
(i.e. an output state is defined by a pair of
state and input symbols)
For multicycle control lines we tend to use the
Moore FSM
because output depends only on current state
less hardware is required to implement it.

FSM Review is adopted from www.csis.gvsu.edu/Notes
/Architecture/fsm.html
67
Graphical Specifications of FSM
- The complete control FSM for multicycle
datapath implementation
68
Multicycle Implementation

Recall that for multicycle implementations, we
break up an instruction into
five basic steps
1) Instruction Fetch
2) Instruction Decode/Register Fetch
3) Execution, address computation, branch/jump
completion
4) Memory access or R-type completion
5) Memory Read Completion
We also determined that the control unit for a
multicycle datapath is not dependent
upon instruction classes, but rather by the
signals to be set in any step and the next
step in the sequence.
Therefore, we can design our multicycle control
using finite state machines
(FSMs).

69
Multicycle Control Review

A finite state machine a set of states and
directions on how to change states.
The directions are defined by the next state
function, which maps the current state and the
inputs to a new state.
Each state specifies a set of output signals that
are asserted when the machine is in that state.
Note that if the output is not explicitly
asserted, then it is assumed to be deasserted
rather than a dont care value.
The FSM corresponds to the 5 steps of instruction
execution.
Recall that for the multicycle implementation,
each step will execute in one clock cycle, which
is also true for the control FSM.
Each state in will execute in a single clock
cycle as well.

70
Multicycle Control

A high-level view of the FSM control is
illustrated.

Notice that for all instructions, the first two
steps are always the
same. The next steps are determined by the
instruction opcode and
are used to complete the instruction. Upon
completion, the control
returns to fetch a new instruction.

71
Multicycle Control FSM

How many state bits will we need?

10 states are needed for complete multicycle
control FSM.
Each state is represented by a circle.
The labels on the arc are the
conditions that are tested.
The signals shown for
each state represent the
output signals.

72
Graphical Representation of FSM

Consider the instruction fetch and decode portion
of the multicycle control.

The control sequence for instruction fetch
(State 0) MemRead ALUSrcB 01
ALUSrcA 0 ALUOp 00 IorD 0
PCWrite IRWrite PCSource 00
instruction decode (State 1) ALUSrcA 0
ALUSrcB 11 ALUOp 00
73
Graphical Representation of FSM

Consider the memory-reference portion of the
multicycle control.

The control sequence for memory reference
(States 2-5) Address Calculation ALUSrcA
1 ALUSrcB 10 ALUOp 00 Memory Access
LW MemRead IorD 1 Memory Access
SW MemWrite IorD 1 Write back Step (for
lw only) Regwrite MemtoReg 1 RegDst 0

74
Graphical Representation of FSM

Consider the R-type instructions in the
multicycle control.

The control sequence for memory reference
(States 6-7) Execution ALUSrcA 1 ALUSrcB
00 ALUOp 10 R-type Completion RegDst
1 RegWrite MemtoReg 0 Return to IF State

75
Graphical Representation of FSM

Consider the branch instructions in the
multicycle control.

The control sequence for Branch instructions
(State 8) Branch Completion ALUSrcA
1 ALUSrcB 00 ALUOp 01 PCWriteCond PCSourc
e 01 Return to IF State
76
Graphical Representation of FSM

Consider the jump instructions in the multicycle
control.

The control sequence for Jump instructions
(State 9) Jump Completion
PCWrite PCSource 10 Return to IF State

77
Putting it all together

Graphical Specification of
Multicycle control FSM

How many state bits will we need?

78
Finite State Machine for Control

Implementation for multicycle control FSM

79
ProgrammedLogic Array
PLA Implementation

Could you explain control functions of the
machine represented the highlighted vertical
lines?

80
PLA Implementation

Vertical Line 1 indicates
S3 0, S2 1, S1 1, S0 1 ? State 7 which
we know is the control for the R-type completion
step.
Also, RegWrite is asserted and RegDst is
asserted.
Vertical Line 2 indicates
Op51, Op40, Op31, Op20, Op11, Op1 (which
instruction does this opcode represent?)
S3 0, S2 0, S1 1, S0 0 ? State 2 which
we know is the control for the memory address
computation step.

Write a Comment

User Comments (0)

About PowerShow.com

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank PowerPoint PPT Presentation