The Processor Data Path - PowerPoint PPT Presentation

About This Presentation
Title:

The Processor Data Path

Description:

As pointed out earlier, a single clock cycle design has a performance bottleneck ... PC[31-28] || (IR[25-0] 2); /* concatenate 26 bit offset shifted to 28 bits ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 27
Provided by: csBing
Category:

less

Transcript and Presenter's Notes

Title: The Processor Data Path


1
The ProcessorData Path ControlChapter 5Part
2 - Multi-Clock Cycle Design
  • N. Guydosh
  • 2/29/04

2
A Multicycle Design
  • As pointed out earlier, a single clock cycle
    design has a performance bottleneck namely the
    instruction requiring the longest time will
    determine the time for all other instructions
    even simple instructions such as jump (j).
  • The offending instruction is the load word memory
    instruction (lw)lw uses five functional units of
    in series
  • Instruction memory (fetch)
  • The register file (read)
  • The ALU (compute address)
  • Data memory (read)
  • The register file (write)
  • Several instruction classes could fit into a
    shorter clock cycle, thus overall performance
    will be compromised memory instructions may not
    be a frequently used instruction but it
    determines timing for simpler instructions.
  • In addition single cycle increases hardware units
    cannot multiplex in time.
  • See performance example on pp.373-375

3
Multiple Clock Cycle Design ... Overview
  • Allows shorter clock cycle
  • Clock cycle derived from longest functional unit
    delay and not the longest total data path delay
  • Multiple clock pulses per instruction
  • Use a clock pulse for each functional unit
    (memory, register file, alu, ... ). ...
    multiplex (share) in time not in space (as
    with single clock)
  • Instead of a single long clock pulse, use a
    sequence of short pulses Average instruction
    time will be shorter - short instructions will
    not have to wait idle for the clock to time out.
  • Another advantage hardware is reduced.Instructio
    ns and data stored in same memory A single ALU
    will do all the arithmetic Time shared
    functional units.
  • This approach is easily extended to pipelining
    which allows multiple instructions to executed at
    one time and which will further enhance
    performance (chapter 6).

4
Multiple Clock Cycle DesignDesign Details
  • We will use the single clock design as a starting
    point fig. 5.29, p. 372)
  • This design will be compressed into what you
    see in fig. 5.30, p. 378 ... Single inst memory
    one ALU for all (see next).
  • Single memory unit for both instructions and data
  • Single ALU instead of ALU and two adders
  • One or more buffer registers added after every
    major functional unit to hold output until next
    subsequent clock cycle.
  • Because functional units are now shared for
    various phases of execution, must add MUXs and
    extend some existing MUXs. See fig 5.31, page
    380 (see later).
  • A few additional hardware elements will also be
    added to resolve situations where stored data may
    change in one execution phase before a down
    stream phase gets to us it
  • For now we add an instruction register (IR) -
    this is to hold the instruction for later phases
    during execution after the PC is changed during
    the fetch stage and perhaps the memory get reused
    during execution.

5
MultiClock Cycle DesignDesign Details High
Level View
Fig 5.30 Add register buffers between functional
stages. Single memory holds instructions
data One ALU does all
6
MultiClock Cycle DesignSupport for Basic
Instructions
Fig 5.31
7
Multiple Clock Cycle DesignDesign Details
(continued)
  • We will also need more control lines, and modify
    existing ones Control is now sequential and
    dynamic
  • Time and opcode dependent
  • Things start to get hairy when we design the
    controller ... But Dont Panic! as Douglas
    Adams would say.
  • The first cut at the data path design with
    control lines shown is fig 5.32, page 381 All
    storage elements will need a separate write
    signal, and a read signal is needed for memory.
    The old ALU controller from single clock pulse
    will be reused.
  • See fig 5.34 p. 384 for a complete summary of
    the control line functions.

8
MultiClock Cycle DesignControl Lines Shown
Fig 5.32 Add and extend MUXs for hardware
sharing. Add control lines which must be
generated.
9
MultiClock Cycle DesignControl Units To Be
Designed are Depicted
Write PC if(beq ALU is 0) or j inst or PC4
PC4
op
?beq address

rs?
rt ?
rd
For sw or lw?
? for beq
sw ? addr
  • ?Data to
  • memory
  • for sw

reg ? data write
?inst funct field
?memory addr. for sw data write
or register data write for R
inst.



Blue is for comments
Fig 5.33 Shows control units to be
designed.Compare to fig. 5.29 for single clock
cycle
10
MultiClock Cycle DesignControl Lines Defined
Part 1
Fig 5.34 part 1, Notes lw uses rt as
destination For de-asserting RegWrite Register
file is read by default MemtoReg chooses between
memory or ALU as a source IorD chooses between
Inst fetch ors data access addresses Although
IRWrite cause memory output to go the IR, it also
benignly goes to MDR PCWrite changes PC only for
PC4 or j instructions is de-asserted fir
beq. PCWriteCond is for beq instruction
11
MultiClock Cycle DesignControl Lines Defined
Part 2
beq
j inst
Fig 5.34 part 2
12
MultiClock Cycle DesignSplitting The Instruction
Execution Into Clock Cycle Phases
  • Goal break up execution into phases in such a
    way as to balance the amount of work done in
    each phase.
  • Each phase will be one clock cycle and correspond
    to one of the functional units in the single
    clock cycle design
  • Restrict each phase to contain at most one ALU
    operation, or one register file (or other
    register) access, or one memory access.
  • All operations for one phase occur in parallel
    within one clock cycle.
  • This strategy also will put us in a good position
    for pipelining later.
  • NOTE In the control signal descriptions which
    follows, we assumed that if a control signal is
    not mentioned or assigned a value, it is
    de-asserted by default.

13
MultiClock Cycle DesignGeneric Phases Phase 1
  • Phase 1 Instruction Fetch
  • IR memoryPC
  • PC 4
  • Assert IRWrite and MemRead and set IorD to 0
    (select PC) and bump PC by 4.
  • gt ALUSrcA 0, ALUSrcB 01, ALUOp 00 (add).
  • Store incremented address back to PC by setting
    PCSource 00 and PCWrite 1Note that the
    incremented PC is also stored in ALUOut which is
    redundant and benign. Note Asserting PCSource
    00 is not explicitly mentioned in the book.
  • Comment the memory access based on PC and
    incrementing of the PC is allowed because if our
    edge triggering assumption see examples in
    slides 3 and 4 of the 1st set of Chapter 5 PPT
    notes (single clock cycle). We assume that the
    PC value is captured by the memory unit before it
    gets updated - a consequence of edge triggering.
    This value is available at the very beginning of
    the clock cycle, and because of delays, the PC is
    updated a little later in the cycle.
  • Comment it is also assumed that the memory can
    be read and set into the IR during this same
    clock cycle edge. Assume a faster point-to-point
    memory to IR connection rather than a bus. See
    Elaboration on page 382.
  • Comment it appears that the MDR benignly also
    get the instruction because it is unconditionally
    set. If the instructions is lw, this value in
    the MDR will get correctly overwritten later in
    the cycle. This is not mentioned in the book.

14
MultiClock Cycle DesignGeneric Phases Phase 2
  • Phase 2 Decode Register Fetch
  • optimistic (maybe premature) actions done here
    - may not use all results down stream - but
    faster and will not cause any problem if not
    used.
  • A register IR25-21 / rs field / B
    register IR20-16 / rt field /ALUOut
    Target_addr pc ( sign_ext(IR15-0) ltlt 2)
    / target_addr calc is optimistic /
  • Target_addr may get used (if instruction is
    conditional branch) otherwise it gets
    harmlessly discarded. it is easier to
    calculate it early and through it away if not
    needed than to have to recalculate it later.
  • ALUSrcA set to 0 (choose PC) ALUSrcB set to
    11(choose offset field which is both sign
    extended and shifted to byte boundary)
  • and ALUOp 00 (add)

15
MultiClock Cycle Design Instruction Content
Depended Phases Phase 3
  • Memory address computation arithmetic R-type
    execution
  • Memory Reference For Data
  • ALUOut A sign_extend( IR15-0) / ALU
    operation /
  • ALUSrcA set to 1ALUSrcB set to 10 (use sign
    extended value)ALUOp set to 00 (add)
  • Arithmetic-logical R-type instruction
  • ALUOut A op B
  • ALUSrcA set to 1ALUSrcB set to 00ALUOp set to
    10 funct field used to determine the ALU
    control settings

16
MultiClock Cycle DesignInstruction Content
Depended Phases Phase 3 (continued)
  • Branch completion
  • Branch (beq)
  • if (A B) PC ALUOut
  • equal compare reg A B, if equal, set Zero
    output of ALU which means branch successful
  • ALUSrcA set to 1ALUSrcB set to 00ALUOp set to
    01 (subtract)PCWriteCond assertedPCSource set
    to 01 / PC taken from ALUOut
    /PCWrite de-asserted (by default) prevents
    beq address from being set when unsuccessful
    branch - ALU zero output must be set to use beq
    address.
  • For successful beqs, few write the PC twice
    once from direct ALU output during decode/fetch,
    and once from ALUout as as above in this step
    the last one is used
  • Jump (j)
  • PC PC31-28 (IR25-0 ltlt2) / concatenate
    26 bit offset shifted to 28 bits /
    / to PC high 4 bits /
  • Set PCSource 01 / selects jump address
    /PCWrite asserted to overwrite current contents
    of PC with jump address.

17
MultiClock Cycle DesignInstruction Content
Depended Phases Phase 4
  • Memory access
  • MDR MemoryALUOut / for lw
    /MemoryALUOut B / sw, source
    operand saved in B /
  • MemRead for lw or MemWrite for sw are
    assertedIorD set to 1 to get data address
    instead of PC
  • Arithmetic-logical R-type instruction completion
  • RegIR15-11 ALUOut / set rd from ALUout
    /
  • RegDst set to 1 to pick up rd and no rtassert
    RegWriteSet MemtoReg to 0 to write from ALU and
    not from memory

18
MultiClock Cycle DesignInstruction Content
Depended Phases Phase 5
  • Memory read completion step (for lw the longest
    instruction)
  • RegIR20-16 MDR /write back to reg from
    memory for sw / / writes to rt
    register /
  • Set MemtoReg to 1 to write from memory and not
    ALUassert RegWrite to cause a write to register
    fileset RegDst to 0 to choose the rt register
    and not rd

19
MultiClock Cycle Design Summary of steps in each
phase
Fig. 5.35
20
High Level View of Finite State Machine Control
Fig. 5.36
21
Instruction Fetch Decode
Fig. 5.37
22
Memory Reference Instructions
Fig. 5.38
23
R-type Instruction
Fig. 5.39
24
Branch and Jump Instruction
Branch (beq) instruction
jump instruction
Fig. 5.40
Fig. 5.41
25
Complete State Machine for Multi-cycle Controller
(fig 5.33)
Fig. 5.42
26
A Possible Implementation of the Multi-cycle
Control Unit
See appendix C For implementation Details.
?Current state
Fig. 5.43
Write a Comment
User Comments (0)
About PowerShow.com