Chapter 5: Datapath and Control - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 5: Datapath and Control

Description:

Review negative-logic (inverted) inputs and outputs. NAND, NOR, XNOR ... Review of muxes and decoders. Boolean algebra equations vs. digital logic gate schematics ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 59
Provided by: admi49
Learn more at: https://cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5: Datapath and Control


1
Chapter 5 Datapath and Control
  • CS 447
  • Jason Bakos

2
Review of Digital Logic
  • Review AND, OR, NOT, and XOR gates
  • Review negative-logic (inverted) inputs and
    outputs
  • NAND, NOR, XNOR
  • Sum-of-products with NAND gates
  • Product-of-sums with NOR gates
  • Double-bubble cancellation
  • DeMorgans Law
  • Completeness of NAND and NOR gates
  • Review of muxes and decoders
  • Boolean algebra equations vs. digital logic gate
    schematics
  • Review of truth tables
  • Product-of-sums

3
Review of Digital Logic
  • Logic minimization
  • Boolean algebra
  • Identity Law
  • A0A and A1A
  • Zero and One Laws
  • A11 and A00
  • Inverse Laws
  • A (not A)1 and A(not A)0
  • Commutative Laws
  • ABBA and ABBA
  • Associative Laws
  • A(BC)(AB)C and A(BC)(AB)C
  • Distributive Laws
  • A(BC)ABAC and A(BC)(AB)(AC)
  • DeMorgans Law
  • not (AB)(not A)(not B) and not(AB)(not
    A)(not B)

4
Review of Digital Logic
  • Review Karnaugh Map logic minimization
  • mux2 example
  • Review dont care logic minimization
  • mux2 example
  • Review Boolean algebra logic minimization
  • mux2 example

5
Memory Devices
  • Consider cross-coupled NOR gates
  • This is the most simple memory device, called an
    SR-flip-flop

Lets eliminate the S input and provide a clock
input In this configuration, the clock acts as an
enable and is a level sensitive clock
6
Memory Devices
  • Clocked memory devices are divided into two
    categories
  • Latches are level-sensitive devices where the
    output samples the input the entire time the
    clock signal is high
  • Latches are transparent, they are open whenever
    the clock is asserted
  • Flip-flips only sample the input on the rising or
    falling edge of the clock
  • We only want state changes on one of the edges of
    the clock

7
Memory Devices
  • Heres a master-slave approach to designing a
    falling-edge triggered FF
  • Heres a timing diagram for this device

8
Memory Devices
  • Flip flops, depending on their design and
    technology, have set-up and hold times
  • Set-up time is the amount of time the input
    signal (D) must be stable prior to the clock edge
    that samples it
  • Hold time is the amount of time the input signal
    (D) must be stable after the clock edge

9
Memory Devices
  • For the master-slave design, the set-up time was
    very long, which is why we need a better design
  • We wont get into other ways to design
    edge-triggered flip-flips, but there are many
    with varying numbers of gates
  • Usually the classic SR-latch acts as a building
    block for such devices
  • Flip-flips also have asynchronous sets/resets and
    sometimes enables
  • Some textbooks refer to the last design as a
    pulse-trigger flip-flip, since the input must
    be stable for the entire clock pulse

10
Finite State Machines (FSM)
  • So far weve mainly did circuit design with
    combinational logic systems
  • Combinational logic circuits have an output that
    is some function of the inputs
  • Next were going to start using sequential
    systems
  • Sequential circuits have an output that is some
    function of the inputs and its input history
  • The first example of these are state machines

11
Finite State Machines (FSM)
  • State machines can be either synchronous or
    asynchronous
  • Synchronous state machines only change state with
    a clock event (edge)
  • Asynchronous state machines do not have this
    restriction
  • Well start by building a synchronous state
    machine
  • Well assume we have access to good positive edge
    triggered D flip-flip cells

12
Finite State Machines
  • Heres two different representations of the FSM
    in digital logic

13
Finite State Machines
  • There are two different ways of designing state
    machines Mealy and Moore
  • In all state machines, the next state (which will
    be the current state after the next clock edge)
    is computed as a combinational function of the
    current state and the inputs
  • The outputs, on the other hand, are computed
    either as a function of the current state or as a
    function of the current state AND the inputs
    (hence Moore vs. Mealy)
  • Note Moore is less, because Moore machines are
    restricted to synchronous outputs (outputs that
    only change on a clock edge) Mealy machines do
    not have this restriction

14
Finite State Machines
  • In order to build a state machine, we must first
    have our input signals and output signals
  • Then we start adding states and transitions
  • For a Mealy machine, the outputs will be on the
    transitions
  • For a Moore machine, the outputs will be in the
    states

15
Finite State Machines
  • Next, we need to encode state values for each of
    our states
  • Try to minimize bit changes on state transitions
  • Recall Well need lg n flip-flops if we have n
    states
  • Then, use Karnaugh maps to minimize our
    next-state and output logic
  • Note we could use a state machine table (truth
    table)

16
Finite State Machine Examples
  • First, lets tackle an example
  • 3 bit counter
  • Outputs 3 counter bits (no inputs)
  • Heres another example
  • Lets design a combination lock with 2-bit
    combination inputs and an enter key
  • The output will be an unlock signal
  • Next, lets do a Coke machine example (where a
    coke is 35 cents)
  • Inputs quarter, dime, nickel
  • Output release_coke

17
Registers
  • A register is simply an array of D-flip-flops
    (8-bit, 32-bit, etc.)
  • The important distinction between flip-flips and
    registers is that it is VERY important for
    registers to have enable inputs

18
Wide Multiplexors
  • Wide multiplexors (not an official name) are
    simply an array of single muxes
  • For example, if we want a 32 bit 4-to-1 mux, we
    need to array 32 4-to-1 muxes
  • Using state machine controllers, registers, and
    muxes, we can very easily implement control for a
    digital system

19
Example Checksummer
  • You are to design a device that accepts a data
    packet comprised of a series of 8-bit words. The
    packet format is the following
  • Each 8-bit word is valid on the falling edge of
    each clock. The synch. characters signal the
    beginning of a new packet. Synch. character 1 is
    00110011 and synch. character 2 is 11001100.
    The length field specifies how many words are
    contained in the data portion of the packet. The
    data payload is the actual data payload of the
    packet (which can be anything). Your device will
    keep a running modulo 256 sum of these data words
    and compare that value to the value of the
    checksum field at the end of the packet.

20
Example Checksummer
  • Your device has the following input signals
  • Clock clock input
  • DataIn 8-bit bus that puts a new character out
    on every falling edge of the clock
  • Reset active-high reset
  • The device will have the following output
    signals
  • ChecksumError this signal will be asserted for
    one clock cycle following the data input if there
    is a checksum error in the data packet. I must be
    valid on the rising edge that defines the end of
    the checksum word.
  • DataValid this signal goes high at the on the
    rising edge that defines the beginning of the
    payload and goes low on the rising edge the
    defines the beginning of the checksum word.

21
Example Checksummer
  • First, what type of components do we need for
    this device?
  • How do we design the state machine control?
  • Theres too many signals to actually implement
    the controller on the board
  • How do we interconnect this device?

22
Chapter 5 Datapath and Control(Part 2)
  • CS 447
  • Jason Bakos

23
Building a Datapath
  • Which components do we need for the A/L, load,
    and branch classes of MIPS instructions?
  • First, we need a memory to hold our instructions
  • Assume it has an address input, data output, and
    a MemRead and MemWrite control signals
  • A Program Counter (PC) register to hold the
    address of the next instruction
  • Typical register (clk, en, rst, D, and Q)
  • ALU (the one we built in Chap. 4)
  • A, B, ALUOp, and Out
  • Register file
  • Dual-port (ReadAddr1, ReadAddr2, WriteReg,
    WriteData, RegWrite, ReadData1, ReadData2)
  • Instruction Register
  • Like the PC, but holds the current instruction
    word

24
Building a Datapath
25
Datapaths
  • Assuming our instruction is already fetched,
    using our components we need to build datapaths
    for the following
  • PCPC4
  • Executing A/L R-type instruction and writing back
    result
  • Executing load/store effective address
    calculation
  • We need a sign extender for this
  • Computing a branch target address and determining
    whether or not a branch should be taken (for beq)
  • We need a sign extender and a 2-bit shifter for
    this

26
Datapaths
PC4 datapath
R-type A/L datapath
27
Datapaths
Load/Store Datapath
28
Datapaths
Branch (beq) Datapath
29
Simple CPU Implementation
  • We want to implement the simplest possible
    implementation of our MIPS subset of instructions
  • lw/sw
  • beq
  • add, sub, and, or, and slt

30
Combining Datapaths
  • Lets combine the datapaths that we looked at
    into a single datapath
  • Lets assume that we want to execute all our
    instructions in a single clock cycle
  • This means that we can only use each datapath
    component once per instruction
  • We need a separate instruction and data memory
  • We may need to duplicate some components (but we
    can share components across different instruction
    types)
  • We need multiplexors for this

31
Integrated Datapaths
  • Here we combine all our datapaths
  • We also add our fetch hardware
  • Next well need a control unit to assert the
    control signals

32
Control Signals
  • Recall the ALU control table
  • Lets create a small control lookup table for
    the ALU...

33
Control Signals
  • Note that ALUOp will come from the main control
    unit

34
Designing the Main Control Unit
  • First, lets take a look at all our current
    control signals and their effect...

35
CPU with Control Unit
36
R-type Control
  • For an R-type instruction, lets decide what
    needs to be done (note this is done in parallel)
  • Fetch instruction and increment PC by 4
  • Read two registers
  • ALU does computation
  • Result is written back to register file

37
Load/Store Control
  • Lets decide what needs to be done for a lw
    instruction
  • Fetch/increment PC
  • Read base register from reg. file
  • ALU computes effective address (baseoffset)
  • Data from memory is written back to register file

38
Branch-on-Equal Control
  • Finally, lets decide what needs to be done in
    order to perform the beq instruction
  • Fetch/increment PC
  • Read two registers
  • ALU subtracts
  • ALU computes effective branch target
    (PCoffset4)
  • Zero result from ALU decides if we should write
    the new value to the PC

39
Control Signals
40
Control
  • Next time well find out why a single-cycle CPU
    like this is not practical
  • We need a FSM to handle control in order to reuse
    components during a single instruction execution

41
Chapter 5 Datapath and Control(Part 3)
  • CS 447
  • Jason Bakos

42
Single-Cycle CPU
  • CPI of the single cycle CPU from the last lecture
    had a CPI of 1
  • Clock cycle is determined by the longest possible
    path in the machine
  • loads are the worst they use 5 functional units
    in series
  • Performance, utilization, and efficiency are not
    going to be good, because most instructions dont
    need such a long clock cycle
  • A variable-speed clock could be used to solve
    this problem, but hinders parallelism
  • Pipelining overlaps instruction executions

43
Multicycle Implementation
  • Break instructions into steps, where each step
    requires one clock cycle
  • We want to reuse functional units within an
    instruction instead of just across instructions
  • Reduces hardware
  • Use single memory for instructions and data
  • Single ALU instead of one ALU and two adders
  • Add registers to functional units to hold
    intermediate results (state data) for future
    cycles
  • Use within instruction executions
  • Register file and memory hold state data to be
    used across instruction executions
  • These are programmer-visible
  • We will need a FSM to control CPU

44
Registers
  • Locations of registers is determined by the
    following
  • What combinatorial units will fit in one clock
    cycles
  • Assume memory access, regfile access (two reads
    or one write), or ALU operation
  • Any data needed by these operations must be
    stored in a temporary register
  • Instruction Register, Memory Data Register, A, B,
    and ALUOut registers added to design
  • All these except IR only need to hold data
    between two adjacent clock cycles
  • What data are needed in later cycles implementing
    the instruction

45
Multiplexors
  • Need to add extra multiplexors (or expand
    existing muxes) to facilitate the reuse of the
    ALU within instructions
  • Add mux to first ALU input
  • Expand mux to second ALU input

46
Multicycle CPU
47
Breaking Instruction Execution into Clock Cycles
  • Goal is to balance the latency of the operations
    performed during each clock cycle
  • At most one of the following can occur in series
  • One ALU operation
  • One register file access (or multiple in
    parallel)
  • One memory access (this is a joke, but well
    accept this for now)

48
Execution Stages
  • In order to clearly define the CPU operation for
    each step in the operation, well use RTL
    (register transfer language)
  • Architecture research has defined 5 standard
    phases of instruction execution
  • Instruction fetch
  • Decode
  • Fetch register values from register file
  • Execute
  • Perform arithmetic/logic operation
  • Memory
  • Load/Store memory
  • Write back
  • Write register result back to register file

49
Execution Stages
  • Fetch
  • IRMemoryPC
  • PCPC4
  • Decode
  • ARegIR25..21
  • BRegIR20..16
  • ALUOutPC(sign_extend(IR15..0) ltlt 2

50
Execution Stages
  • Execute
  • Memory access
  • ALUOutAsign_extend(IR15..0)
  • R-type
  • ALUOutA op B
  • Branch (beq)
  • if (AB) PCALUOut
  • PCPC31..28 (IR25..0ltlt2)

51
Execution Stages
  • Memory Access/Write Back
  • Load
  • MDRMemoryALUOut
  • Store
  • MemoryALUOutB
  • R-type
  • RegIR15..11ALUOut
  • Memory Read Completion
  • Load
  • RegIR20..16MDR

52
Control Signals
  • Control Unit signals
  • Refer to figure 5.34 (pg. 384) in the book
  • ALU Control signals
  • Provide an appropriate ALUOp signal based on what
    the ALU is being used for (if for an R-type,
    perform lookup based on function code)

53
Control Signals
  • All thats left is for us to build the control
    unit as a FSM and the ALU control as a lookup
    table

54
Control Unit
  • The fetch and decode stages are the same for
    every instruction...

55
Control Unit
  • Heres the states and transitions for the
    memory-reference instructions

56
Control Unit
  • Heres the states and transitions for R-type,
    branch, and jump instructions

57
Control Unit
  • Final control unit FSM...

58
Problems to Think About
  • How could we add bne, blt, and bgez instructions
    to our CPU?
  • Do do you calculate CPI for our CPU if we are
    given instruction-type distributions?
Write a Comment
User Comments (0)
About PowerShow.com