Appendix%20A.%20Pipelining:%20Basic%20and%20Intermediate%20Concept - PowerPoint PPT Presentation

About This Presentation
Title:

Appendix%20A.%20Pipelining:%20Basic%20and%20Intermediate%20Concept

Description:

Rung-Bin Lin Appendix A. Pipelining: Basic and Intermediate Concept What is Pipelining? Pipelining is an implementation technique whereby multiple instructions are ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:5.0/5.0
Slides: 54
Provided by: Run102
Category:

less

Transcript and Presenter's Notes

Title: Appendix%20A.%20Pipelining:%20Basic%20and%20Intermediate%20Concept


1
Appendix A. Pipelining Basic and Intermediate
Concept
Rung-Bin Lin
  • What is Pipelining?
  • Pipelining is an implementation technique whereby
    multiple instructions are overlaped in execution.
  • Pipe stage (pipe segment)
  • Throughput
  • Machine cycle The time required between moving
    an instruction one step down the pipeline. This
    time is equal to the time required for the
    slowest pipe stage.
  • In a computer, the machine cycle is usually one
    clock cycle.
  • The pipeline designers goal is to balance the
    length of each pipe stage.
  • If the stages are perfectly balanced,

2
A Simple Implementation of A RISC ISA
  • Five-cycle implementation
  • Instruction fetch cycle (IF)
  • Instruction decode/register fetch cycle (ID)
  • Operand fetches
  • Sign-extending the immediate field
  • Decoding is done in parallel with reading
    registers. This technique is known as fixed-field
    decoding
  • Test branch condition and computed branch
    address finished branching at the end of this
    cycle.
  • Execution/effective address cycle (EX)
  • Memory reference
  • Register-Register ALU instruction
  • Register-Immediate ALU instruction
  • Memory access/branch completion cycle (MEM)
  • Write-back cycle (WB)
  • Register-Register ALU instruction
  • Register-Immediate ALU instruction
  • Load instruction

3
Performance of the Five-Cycle Implementation
  • CPI4.54
  • Branch instructions (12) take 2 cycles
  • Store instructions (10) require 4 cycles
  • Others takes 5 cycles

4
The Classic Five-Stage Pipeline for a RSIC
Processor
5
The RISC Pipeline with Registers
6
Instruction Issue
  • The process of letting an instruction move from
    the instruction decode stage (ID) into execution
    stage (EX) of this pipeline.

7
Basic Performance Issues in Pipelining
  • Pipelining increasing instruction execution
    throughput, but it does not reduce the execution
    time of an individual instruction due to pipeline
    overhead.
  • Register delay
  • Clock skew
  • The limitation of pipeline depth is due to
  • Pipeline latency
  • Pipe stage imbalance
  • Pipeline overhead
  • Example in A-10.

8
The Major Hurdle of Pipelining - Pipelining
Hazards
  • A hazard is a situation that prevents the next
    instruction in the instruction stream from
    executing during its designated clock cycle.
  • Three classes of hazards
  • Structural hazard Arise from resource conflicts.
  • Data hazard Arise when an instruction depends on
    the results of a previous instruction.
  • Control hazard Arise from branches and other
    instructions that change the PC.
  • A pipeline can be stalled by a hazard. To
    eliminate hazards,
  • Instructions issued later than the stalled
    instruction are also stalled.
  • Instructions issued earlier than the stalled one
    must continue.
  • Note that a cache miss stalls the whole pipeline.

9
Performance of Pipeline with Stalls
  • When pipelining is thought of as decreasing the
    CPI,

10
  • When pipelining is thought of as improving the
    clock cycle time,

11
Structural Hazards
  • Due to resource conflicts (Example in A-14)
  • Due to some functional unit being not fully
    pipelined.
  • When some resources have not been duplicated
    enough.

12
Data Hazards
  • A memory access depends on the results of
    unfinishing instructions.

13
Forwarding (Bypassing) ALU Results To Minimize
Hazards
14
Forwarding (Bypassing) Results to Store
15
Bypassing Results of LOAD
16
Data Hazard Classification
  • Consider two instructions i and j, with i
    occurring before j, the possible hazards are,
  • RAW (read after write) j tries to read a source
    before i writes it.
  • WAW (write after write) j tries to write an
    operand before it is written by i. For example,
  • LW R1, 0(R2) IF ID EX MEM1
    MEM2 WB
  • DADD R1, R2, R3 IF ID EX
    WB
  • WAR (write after read) j tries to write a
    destination before it is read by i. For example,
    if read is done in the second half of MEM2, and
    write is done in the first half of WB.
  • SW 0(R1), R2 IF ID EX MEM1 MEM2
    WB
  • DADD R2, R3, R4 IF ID EX
    WB
  • RAR (read after read) not a hazard.

17
Data Hazards Requiring Stalls
  • Pipeline interlock
  • A piece of hardware that detects a hazard and
    stalls the pipeline until the hazard is cleared.
  • Load interlock
  • Example (Fig. A.10 at A-21)

18
Control Hazards
  • Caused by the instructions that change PC.
  • Some basics
  • If a branch changes the PC to its target address,
    it is a taken branch. If it does not change the
    PC, it falls through or it is not taken.
  • Recall that if an instruction i is a taken
    branch, the PC is normally not changed until the
    end of ID. A stall cycle is required.
  • Branch Instruction IF ID EX MEM WB
  • Branch successor IF IF ID EX
    MEM WB
  • Branch successor1 IF
    ID EX MEM WB
  • Branch successor2
    IF ID EX MEM WB

19
Branch Penalty
  • Branch delay The length of a control hazard.
  • Branch penalty The branch delay, unless it is
    dealt with, turns into branch penalty.
  • The deeper the pipeline, the worse the branch
    penalty.
  • The number of branch stalls can be reduced by two
    steps
  • Find out whether the branch is taken or not taken
    earlier in the pipeline.
  • Compute the taken PC (i.e., the address of the
    branch target) earlier.
  • Branch behavior in programs
  • Average frequency of taken branches 67
  • 60 of the forward branches are taken.
  • 85 of the backward branches are taken.

20
Reducing Pipeline Branch Penalties
  • Static branch prediction methods (Compile-time
    guess).
  • Free or flush the pipeline
  • Holding or deleting any instructions after the
    branch until the branch destination is known.
  • Predict-not-taken (untaken) (Fig. A.12 in A-23)
  • Predict-taken
  • Does it have any advantage? Ans no.
  • Delayed branch
  • The execution cycle with a branch delay n is
  • Branch instruction
  • Sequential successor 1
  • Sequential successor 2
  • Sequential successor n (n1 for MIPS)
  • Branch target if taken

21
Scheduling the Branch Delay Slot
22
Effectiveness of Scheduling Branch Delay Slots
  • Requirements for being effective
  • Scheduling from before Always
  • Scheduling from target Taken
  • Scheduling from fall through Not taken
  • The limitation on delayed-branch scheduling
    arises from
  • The restrictions on the instructions that are
    scheduled into the delay slots.
  • The ability to predict at compile time whether a
    branch is likely to be taken or not.
  • Using canceling or nullified branch to relieve
    the limlits
  • In a canceling branch, the instruction includes
    the direction that the branch was predicted. When
    the branch behaves as predicted, the instruction
    in the branch delay slot is simply executed.
    Otherwise, the instruction in the branch delay
    slot is simply turned into a No-Op.

23
How Is Pipelining Implemented?
  • Unpipelined 5-cycle implementation

24
Simple Pipelining Implementation for MIPS
25
Implementing the Control for MIPS Pipeline
  • Implementing the control focuses on detecting of
    hazards and generating of control signals for
    forwarding.
  • Hazard detection
  • All the data hazards can be checked and
    forwarding control signals can be set during the
    ID phase. If a data hazard exists, the
    instruction is stalled before it is issued.
  • Or, alternatively, hazards forwarding are checked
    at the beginning of a clock cycle that uses an
    operand (EX and MEM for the MIPS pipeline).
  • Implementing the logic for hazard detection
  • Hazard detection by comparing the destination and
    sources of adjacent instructions (fig. A.20 on
    page A-34).
  • An example shows detecting of all load interlocks
    when the instruction using the load result in the
    ID stage (fig. A.21 on page A-34).

26
Implementing Forwarding Logic
  • Forwarding sources ALU or data memory output.
  • Forwarding destination ALU input, data memory
    input, or zero detection unit (for BRANCH).
  • The forwarding can be implemented by checking the
    following conditions
  • EX/MEM.IR.destination ID/EX.IR.source ?
  • MEM/WB.IR.destination ID/EX.IR.source ?
  • MEM/WB.IR.destination EX/MEM.IR.source?

27
Forwarding Data to the Two ALU Inputs
28
Dealing with Branches in the Pipeline
29
What Makes Pipelining Hard to Implement
  • Exception (interrupt, fault) makes pipelining
    difficult to implement.
  • Instruction set complications

30
Types of Exceptions
  • Types
  • I/O device request
  • Invoking an OS service from a user program
  • Tracing instruction execution
  • Breakpoint
  • Integer arithmetic overflow or underflow
  • FP arithmetic anomaly
  • Page fault
  • Misaligned memory access
  • Memory-protection violation
  • Using an undefined instruction
  • Hardware malfunction
  • Power failure
  • Exceptions for different architecture (fig. A.26
    on page A-40).

31
Classification of Exceptions
  • Synchronous versus asynchronous
  • If the event occurs at the same place every time
    that the program is executed with the same data
    and memory allocation, the event is called
    synchronous.
  • User requested versus coerced
  • User maskable versus nonmaskable
  • Within versus between instruction
  • Depend on whether the event prevents instruction
    completion by occurring in the middle of
    execution or whether it is recognized between
    instructions.
  • Resume versus terminate (fig. 3.40 on page 182).

32
Action Requirements for Different Exception Types
(Fig. A.27 on page A-42)
  • Actions
  • Resume
  • Terminate
  • The most difficult exceptions have two
    properties
  • They occur within instructions (i.e. at EX or MEM
    stages).
  • They must be restartable (must save the PC of the
    instruction at which to restart).

33
Exception Handling
  • Stopping and restarting execution
  • Force a trap instruction on the next IF
  • Until the trap is taken, turn off all writes for
    the faulting instruction and for all instructions
    that follow in the pipeline.
  • After the exception-handling routine in the
    operating system receives control, it immediately
    saves the PC of the faulting instruction.
  • IF ID EX MEM WB lt--- Faulting instruction
  • IF ID EX MEM WB
  • IF ID EX MEM WB
  • IF ID EX MEM WB
  • IF ID EX MEM
  • Trap instruction -gt IF ID EX
  • If delayed branch is used, we need to save and
    restore as many PCs as the length of the branch
    delay plus one.

34
Precise Interrupt
  • If a pipeline can be stopped so that the
    instructions just before the faulting instruction
    are completed and those after it can be restarted
    from scratch.
  • Supporting precise interrupts is a requirement in
    many systems.
  • Exceptions in DLX
  • With pipelining, multiple exceptions may occur in
    the same clock cycle. (fig. A.28 on page A-44).

35
Implementations of Precise Exceptions
  • Principle
  • The pipeline should be able to handle the
    exceptions caused by instruction i prior to the
    exceptions caused by instruction i1.
  • Implementation
  • Hardware posts all exceptions caused by a given
    instruction in a status vector associated that
    instruction.
  • Once an exception indication is set in the
    exception status vector, any control signal that
    may cause a data value to be written is turned
    off.
  • When an instruction enters WB, the exception
    status vector is checked, if any exceptions are
    posted, they are handled in the order in which
    they would occur in time on an unpipelined
    machine.
  • This will guarantee that all exceptions will be
    seen on instruction i before any are seen on i1.

36
Instruction Committed
  • When an instruction is guaranteed to complete, it
    is called committed.
  • In the MIPS pipeline, all instructions are
    committed when they reach the end of the MEM
    stage and no instruction updates the state before
    that stage. Thus precise exceptions are straight
    forward.

37
Instruction Set Complications
  • Some machines have instructions that change the
    state in the middle if the instruction execution.
  • VAX Autoincrement addressing mode.
  • VAX or IBM 360 String copy.
  • Implicitly set condition code.
  • Cause difficulties in scheduling any pipeline
    delays between setting condition code and the
    branch.
  • ADD XXX lt--- Set condition code C.
  • lt- Can not place
    instructions that change C.
  • BR C, YYY lt--- Use C for branch.
  • In fact, the condition code must be treated as an
    operand that requires hazard detection for RAW
    hazards with branch no matter the condition code
    is set implicitly or explicitly
  • Multicycle operations in VAX

38
Extending the MIPS Pipeline to Handle Multi-Cycle
Operations
  • Assuming four separate functional units in our
    MIPS implementation
  • Integer unit
  • Handle loads and stores, ALU operations and
    branches.
  • FP and integer multiplier
  • FP adder
  • FP and integer divider
  • If an instruction cannot proceed to the EX stage
    , the entire pipeline behind that instruction
    will be stalled.

39
MIPS Pipeline with Multi-cycle Functional Units
40
Pipelining Multi-cycle Functional Units
41
Latency and Initiation(repeat interval)
  • Latency
  • The number of intervening cycles between an
    instruction that produces a result and an
    instruction that uses the result.
  • Initiation (repeat) interval
  • The number of cycles that must elapse between
    issuing two operations of a given type.
  • Latency and initiation interval for pipelining
    multi-cycle functional units
  • Functional Unit Latency Initiation interval
  • Integer ALU 0 1
  • Data memory access 1 1
  • FP add 3 1
  • FP (integer) multiply 6 1
  • FP (integer) divide 24 25

42
Hazards and Forwarding in Longer Latency Pipelines
  • Hazard detection and forwarding for a pipeline as
    before.
  • Structural hazards can occur because the divide
    unit is not fully pipelined.
  • The number of register writes can be larger than
    1 because the instructions have varying running
    time.
  • WAW hazards are possible, but WAR hazards are not
    possible.
  • Instructions can complete in a different order
    than they were issued, causing problems with
    exceptions.
  • Stalls for RAW hazards will be more frequent
    because of longer latency.
  • Assuming all hazard detection is done in ID,
    three checks must be done before issuing an
    instruction
  • Check for structural hazards
  • Check for a RAW data hazard
  • Check for a WAW data hazard

43
RAW Hazards Caused by Longer Pipeline
  • Fig. A.33

44
Structural Hazards in Longer Pipeline
  • Fig. A.34

45
Maintaining Precise Exceptions (1)
  • Problems caused by out-of-order completion
  • DIV.D F0, F2, F4
  • ADD.D F10, F10, F8
  • SUB.D F12, F12, F14
  • Four possible approaches
  • Ignore the problem and settle for imprecise
    exceptions
  • Buffer the results of an operation until all the
    operations that were issued earlier are
    completed.
  • History file approach Buffer the original
    register values.
  • Future file approach Keep the newer values of
    registers.
  • Allow the exceptions to become somewhat
    imprecise, but to keep enough information so that
    the trap-handling routines can create a precise
    sequence for exceptions. This means knowing what
    operations were in the pipeline and their PCs.

46
Maintaining Precise Exceptions (2)
  • Worst-case scenario
  • Instruction 1 A long-running instruction that
    interrupts.
  • Instruction 2 not completed.
  • .
  • Instruction n-1 not completed.
  • Instruction n completed. lt-- The latest
    completed instruction.
  • The software must simulate the instruction 1
    through instruction n-1 and restart the execution
    at instruction n1.
  • Allows the instruction issue to continue only if
    it is certain that all the instructions before
    the issuing instruction will complete without
    causing an exception. This sometimes means
    stalling the machine to maintain precise
    exceptions.

47
Number of Stalls per FP Operation
48
Performance of a MIPS FP Pipeline
49
Overview of The MIPS R4000 Pipeline
  • An implementation of MIPS64
  • Eight pipeline stages (superpipelining)

50
Load Delay in MIPS R4000
51
Branch Delay in MIPS R4000
52
CPI of MIPS R4000
53
Concluding Remarks
  • We can spend a little money to buy a very
    powerful computer today.
Write a Comment
User Comments (0)
About PowerShow.com