Title: We're ready to look at an implementation of the MIPS
1The Processor Datapath Control
- We're ready to look at an implementation of the
MIPS - Simplified to contain only
- memory-reference instructions lw, sw
- arithmetic-logical instructions add, sub, and,
or, slt - control flow instructions beq, j
- Generic Implementation
- use the program counter (PC) to supply
instruction address - get the instruction from memory
- read operand registers
- use the instruction to decide exactly what to do
(the actions are largely the same) - All instructions use the ALU after reading the
registers - memory-reference address calculation
- arithmetic operation execution
- control flow comparison
2Implementation Details
- Abstract / Simplified View
- Two types of functional units
- elements that operate on data values
(combinational) - elements that contain state (sequential)
3State Elements
- Unclocked vs. Clocked
- Clocks are used in synchronous logic
- edge-triggered clocking methodology
falling edge
cycle time
rising edge
4Latches and Flip-flops
- Output is equal to the stored value inside the
element (don't need to ask for permission to
look at the value) - Change of state (value) is based on the inputs
- Latches same inputs determine the new state and
timing - Flip-flops state changes only on a clock edge,
the other inputs determine the new state - D-latch and D flip-flop are the simplest ones to
use
5D-latch
- Two inputs
- the data value to be stored (D)
- the control signal (C) indicating when to read
store D - Two outputs
- the value of the internal state (Q) and it's
complement
SR latch
6D flip-flop
- Output changes only on the clock edge
- Master-slave structure
- The other possibility is the edge-triggered
structure
7Our Implementation
- An edge triggered methodology
- Typical execution
- read contents of some state elements,
- send values through some combinational logic
- write results to one or more state elements
- Feedback from a state element to itself is
possible
8Register File
9Register File
10Register File
- Note we still use the real clock to determine
when to write
11Building the Datapath
- Datapath for fetching instructions and
incrementing the program counter
12Building the Datapath
- Datapath for R-type instructions
13Building the Datapath
- Datapath for a load or store
14Building the Datapath
15Building the Datapath
- Use multiplexors to stitch everything together
16Control
- Selecting the operations to perform (ALU,
read/write, etc.) - Controlling the flow of data (multiplexor inputs)
- Information comes from the 32 bits of the
instruction - Example add 8, 17, 18 Instruction
Format 000000 10001 10010 01000
00000 100000 op rs rt rd shamt
funct - ALU's operation based on instruction type and
function code
17Control
- ALU control input (Bnegate and Operation)
- 000 AND 001 OR 010 add 110 subtract 111 set
-on-less-than - Must describe hardware to compute the ALU control
input - given instruction type 00 lw, sw 01 beq,
11 arithmetic - function code for arithmetic
ALUOp computed from instruction type
18ALU Control Input
- Instruction Instruction
Desired ALU control - opcode ALUOp operation Funct field
ALU action input - LW 00 load word XXXXXX
add 010 - SW 00 store word XXXXXX
add 010 - Beq 01 branch eq XXXXXX
subtract 110 - R-type 10 add 100000
add 010 - R-type 10 subtract 100010
subtract 110 - R-type 10 AND 100100
and 000 - R-type 10 OR 100101
or 001 - R-type 10 slt 101010
set on less than 111
19ALU Control Input
- Truth table for ALU control bits
20Instruction formats
- R-type instruction
- Load or store instruction
- Branch instruction
- Opcode bits Op?5-0?
bits 31-26 25-21
20-16 15-11 10-6
5-0
0 rs rt
rd shamt
funct
operands
result
35 or 43 rs rt
address
register
address
4 rs rt
address
condition
21Control
22Control Signals
23ALU Control Bits
24Control Signals
- Opcodes
- R-type 000000
- lw 100011
- sw 101011
- beq 000100
25Implementing Jumps
- Concatenate the upper 4 bits of PC4 to the
26-bit address. - The low-order 2 bits are always 00.
- Control signal Jump is asserted only when the
opcode is 2.
2
address
bits 31-26
25-0
26Implementing Jumps
27Our Simple Control Structure
- All of the logic is combinational
- We wait for everything to settle down, and the
right thing to be done - ALU might not produce right answer right away
- we use write signals along with clock to
determine when to write - Cycle time determined by length of the longest
path
We are ignoring some details like setup and hold
times.
28Single Cycle Implementation
- The clock cycle is determined by the longest
path. - Calculate cycle time assuming negligible delays
except - memory (2ns), ALU and adders (2ns), register file
access (1ns) - Instruction Instr Reg ALU
Data Reg Total - class mem read oper
mem write - ALU type 2 1 2
1 6 - Load word 2 1 2
2 1 8 - Store word 2 1 2
2 7 - Branch 2 1 2
5 - Jump 2
2
29Single Cycle Implementation
30Where we are headed
- Single Cycle Problems
- what if we had a more complicated instruction
like floating point? - wasteful of area
- One Solution
- use a smaller cycle time
- have different instructions take different
numbers of cycles - a multicycle datapath
31Multicycle datapath
32Multicycle Datapath
- We will be reusing functional units
- ALU used to compute address and to increment PC
- Memory used for instructions and data
- Our control signals will not be determined solely
by instruction - Well use a finite state machine for control
33Review finite state machines
- Finite state machines
- a set of states and
- next state function (determined by current state
and the input) - output function (determined by current state and
possibly input) -
- Well use a Moore machine (output based only on
current state).
N
e
x
t
s
t
a
t
e
N
e
x
t
-
s
t
a
t
e
C
u
r
r
e
n
t
s
t
a
t
e
f
u
n
c
t
i
o
n
C
l
o
c
k
I
n
p
u
t
s
O
u
t
p
u
t
O
u
t
p
u
t
s
f
u
n
c
t
i
o
n
34Multicycle Approach
- Break up the instructions into steps, each step
takes a cycle - balance the amount of work to be done
- restrict each cycle to use only one major
functional unit - At the end of a cycle
- store values for use in later cycles (easiest
thing to do) - introduce additional internal registers (IR,
MDR, A, B, ALUOut) - introduce additional multiplexors and expand
existing ones
35Multicycle Approach
36Five Execution Steps
- Instruction Fetch
- Instruction Decode and Register Fetch
- Execution, Memory Address Computation, or Branch
Completion - Memory Access or R-type instruction completion
- Write-back step INSTRUCTIONS TAKE FROM 3 - 5
CYCLES!
37Multicycle Implementation
38Multicycle Control Signals
39Step 1 Instruction Fetch
- Use PC to get instruction and put it in the
Instruction Register. - Increment the PC by 4 and put the result back in
the PC. - Can be described succinctly using RTL
"Register-Transfer Language" IR
MemoryPC PC PC 4
40Step 2 Instruction Decode and Register Fetch
- Read registers rs and rt in case we need them
- Compute the branch address in case the
instruction is a branch - RTL A RegIR25-21 B
RegIR20-16 ALUOut PC (sign-extend(IR15-
0) ltlt 2) - We aren't setting any control lines based on the
instruction type (we are busy "decoding" it in
our control logic)
41Step 3 (instruction dependent)
- ALU is performing one of three functions, based
on instruction type - Memory Reference ALUOut A
sign-extend(IR15-0) - R-type ALUOut A op B
- Branch if (AB) PC ALUOut
- Jump
- PC (PC31-28??IR25-0)ltlt2
42Step 4 (R-type or memory-access)
- Loads and stores access memory MDR
MemoryALUOutor MemoryALUOut B - R-type instructions finish RegIR15-11
ALUOutThe write actually takes place at the
end of the cycle on the edge!
43Step 5 Memory read completion
44Summary
- Instructions take from three to five execution
steps.
45Implementing the Multicycle Control
- Value of control signals is dependent upon
- what instruction is being executed
- which step is being performed
- Use the information weve accumulated to specify
a finite state machine - specify the finite state machine graphically, or
- use microprogramming
- Implementation can be derived from specification
46High-level view of the control
47Graphical Specification of FSM
48Finite State Machine for Control
49PLA Implementation
50ROM Implementation
- ROM "Read Only Memory"
- values of memory locations are fixed ahead of
time - A ROM can be used to implement a truth table
- if the address is m-bits, we can address 2m
entries in the ROM. - Our outputs are the bits of data that the address
points to. - Often wasteful, since for lots of the entries,
the outputs are the same.
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
51Another Implementation Style
- Complex instructions the "next state" is often
current state 1
52Microprogramming
- A specification methodology
- appropriate if hundreds of opcodes, modes,
cycles, etc. - signals specified symbolically using
microinstructions - Each microinstruction defines the set of control
signals that must be asserted in a given state. - Also sequencing must be specified.
53Microprogramming
54MIPS Microinstruction Fields
- ALU control specifies the operation done by
the ALU. - SRC1 specifies the source for the first ALU
operand. - SRC2 specifies the source for the seconf ALU
operand. - Register control specifies read or write
for the register file, and the source of the
value for a write. - Memory control specifies read or write, and
the source for the memory. For a read, it
specifies the destination register. - PCWrite control specifies the writing of the
PC. - Sequencing specifies how to choose the next
microinstruction to be executed.
55MIPS Microinstruction Fields
- Dispatch jumps are based on the IR
56Microinstruction format
57Minimally vs. Maximally Encoded
- No encoding
- 1 bit for each datapath operation
- faster, requires more memory (logic)
-
- Lots of encoding
- send the microinstructions through logic to get
control signals - uses less memory, slower
58Exceptions and Interrupts
- Exceptions are unexpected events from within the
processor - arithmetic overflow
- invoke the operating system from user program
- using an undefined instruction
- hardware malfunction
- Interrupts cause unexpected changes in control
flow but come from outside the processor - I/O device request
- Exception handling
- recovery, service to the user program, abort
operation or process, reboot - Interrupt handling
- service to the I/O device
59Exceptions
- Exceptions that our current implementation can
generate are - undefined instruction
- arithmetic overflow
- Exception detection
- undefined instruction check the value of the
opcode field - arithmetic overflow detected by the ALU
- Action
- save the address of the offending instruction in
the exception program counter (EPC) save the
cause of the exception in the Cause register - transfer control to the operating system at some
specified address - OS can terminate the program or may continue its
execution
60Multicycle Datapath with Exceptions
61FSM with Exception Detection