Computer Architecture Lecture Notes Spring 2005 Dr' Michael P' Frank - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Computer Architecture Lecture Notes Spring 2005 Dr' Michael P' Frank

Description:

Stalls are delays, and may be called 'bubbles' There are three ... If we can do better, it can help with static code scheduling to reduce data hazard stalls... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 49
Provided by: Jam9175
Learn more at: https://eng.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Computer Architecture Lecture Notes Spring 2005 Dr' Michael P' Frank


1
Computer Architecture Lecture Notes Spring
2005Dr. Michael P. Frank
  • (New) Competency Area 6
  • Introduction to Pipelining

2
Basic Pipelining Concepts
  • PH 3rd ed., Chapter 6
  • HP 3rd ed. A.1

3
Pipelining - The Basic Concept
  • In early CPUs, deep combinational logic networks
    were used in between state updates.
  • Signal delays may vary widely across different
    paths.
  • New input cannot be provided to the network until
    the slowest paths have finished.
  • Slow clock speed, slow overall processing rates.
  • In pipelined design, deep logic networks are
    subdivided into relatively shallow slices
    (pipeline stages).
  • Delays through the network are made uniform.
  • A new input can be provided to each slice as soon
    as its quick, shallow network has finished.
  • Multiple inputs are processed simultaneously
    across stages.
  • Clock cycle is only as long as the slowest
    pipeline stage.

4
Generic Pipelining Illustration
  • Let represent any of a variety of logic
    gates
  • Initial, non-pipelined design for some random
    block of complex logic

latch
latch
5
Pipelining Illustration cont.
  • Aggressively pipelined version of same logic
  • Insert extra pipeline registers periodically
  • Here, after every 1-2 logic layers
  • This design can process 5x as much data at once!

latch
latch
6
Another View of Pipelining
  • Space-time diagrams
  • Here, each colored area shows which parts of the
    logic network are occupied with data computed
    from a given input item, at which times.

Depth in logic network
Depth in logic network
Data 1
Time
Time
Data 2
Pipelined (depth 6)
Non-Pipelined
7
Simple Multicycle RISC Datapath
IF
ID
EX
MEM
WB
Next PC
Loadfr. Mem.Data
ProgramCounter
Inst.Reg.
8
Basic RISC Execution Pipeline
  • Basic idea of instruction-execution pipelining
  • Each instruction spends 1 clock cycle in each of
    the execution stages (in our example, there are
    5).
  • ? during 1 clock cycle, the pipeline can be
    processing (different stages of) 5 different
    instructions simultaneously!

stage
time
9
Different Visualizations
Same Time,Different Places
Same instruction, different steps
Same Time,DifferentData Item /Instruction
Same Time, Different Places
Skew
Same Place, Different Times
Same Place, Different Times
10
More Graphical Detail
11
Adding Pipeline Registers
12
Description of Pipe Stages
13
Dependences
  • (from HP 3rd ed. 3.1)

14
Dependences
  • A dependence is a way in which one instruction
    can depend on (be impacted by) another for
    scheduling purposes.
  • Three major dependence types
  • Data dependence
  • Name dependence
  • Control dependence
  • Ill sometimes use the word dependency for a
    particular instance of one instruction depending
    on another.
  • The instructions cant be effectively (as opposed
    to just syntactically) fully parallelized, or
    reordered.

15
Data Dependence
  • Recursive definition
  • Instruction B is data dependent on instruction A
    iff
  • B uses a data result produced by instruction A,
    or
  • There is another instruction C such that B is
    data dependent on C, and C is data dependent on
    A.
  • When a data dependence is present, there is a
    potential RAW hazard.
  • Loop LD F0,0(R1)
  • ADDD F4,F0,F2
  • SD 0(R1),F4
  • SUBI R1,R1,8
  • BNEZ R1,Loop

A
A
B
C
B
Direct data dependenciesin a simple
examplecode fragment
16
Name Dependence
  • When two instructions access the same data
    storage location, but are not data dependent.
  • Also, at least one of the accesses must be a
    write.
  • Two sub-types (for inst. B after inst. A)
  • Antidependence A reads, then B writes.
  • Potential for a WAR hazard.
  • Output dependence A writes, then B writes.
  • Potential for a WAW hazard.
  • Note Name dependencies can be avoided by
    changing instructions to use different locations
  • (Rather than reusing 1 location for 2 purposes.)
  • This fix is called renaming.

A
time
B
A
time
B
17
Control Dependence
  • Occurs when the execution of an instruction
  • (as in, will it be executed, or not?)
  • depends on the outcome of some earlier,
    conditional branch instruction.
  • We generally cant easily change which branches
    an instruction depends on w/o ruining the
    programs functional behavior.
  • However, there are exceptions.

18
Hazards, Stalls, Forwarding
  • HP 3rd ed. A.2-3

19
Hazards
  • Hazards are circumstances which may lead to
    stalls in the pipeline if not addressed.
  • Stalls are delays, and may be called bubbles
  • There are three major types of hazards
  • Structural hazards
  • Not enough HW resources to keep all instrs.
    moving.
  • Data hazards
  • Data results of earlier instrs. not yet avail.
    when needed.
  • Control hazards
  • Control decisions resulting from earlier instrs.
    (branches) not yet made dont know which new
    instrs. to execute.

20
Structural Hazard Example
Suppose you had a combined instructiondata
memory w. only 1 read port
21
Hazards Produce Bubbles
Bubble rises
Progress through pipe
Time
Unskew
22
Textual View
A pipeline stalled for a structural hazard a
load with only one memory port
23
Example Data Hazards
24
Forwarding for Data Hazards
25
Another Forwarding Example
26
Three Types of Data Hazards
  • Let i be an earlier instruction, j a later one.
  • RAW (read after write)
  • j is supposed to Read a value After i Writes it,
  • But instead j tries to read the value before i
    has written it
  • WAW (write after write)
  • j should Write to a given place After i Writes
    there,
  • But they end up writing in the wrong order.
  • Only occurs if gt1 pipeline stage can write.
  • WAR (write after read)
  • j should Write a new value After i Reads the old,
  • But instead j writes the new value before i has
    read the old one.
  • Only occurs if writes can happen before reads in
    pipeline.

27
An Unavoidable Stall
28
Stalling in midst of instruction
29
Data Hazard Prevention
  • A clever compiler can often reschedule
    instructions to avoid a stall.
  • A simple example
  • Original code lw r2, 0(r4) add r1, r2, r3
    ? Note Stall happens here! lw r5, 4(r4)
  • Transformed code lw r2, 0(r4) lw r5, 4(r4)
    add r1, r2, r3 ? No stall needed!

30
Simple RISC Pipeline Stall Statistics
Note that 1 in 5loads causes a stallin many
programs!
Percentageof loads thatcause a stall
Benchmark
31
Data Hazard Detection
32
Hazard Detection Logic
  • Example Detecting whether an instruction that
    has just been fetched needs to be stalled 1 cycle
    because of an immediately preceding load.

IF/ID
ID/EX
EX/ME
ME/WB
IF
ID
EX
ME
WB
IF/ID
33
Forwarding Situations in DLX
34
Implementing Forwarding in HW
35
Control Hazards, Branch Prediction, Delayed
Branches
  • HP 3rd ed., A.2-3 4.2

36
Control Hazards
  • Suppose the new PC value was not computed until
    the MEM stage (like orig. RISC design).
  • Then we must stall 3 clocks after every branch!

37
Early Branch Resolution
38
New Pipeline Logic
39
Control Instruction Statistics
  • 10 of dynamic insts.are fwd. cond. branches
  • only 3 are backwardscond. branches
  • similar percentage areunconditional branches

40
Stats on Taken Branches
67 of cond.branches aretaken
41
Predict-Not-Taken
42
Delayed Branches
Machine code sequence Branch instruction Delay
slot instruction(s) Post-branch instructions
Branch is taken(if taken) at this point
43
Filling the Branch-Delay Slot
44
Static Branch Prediction
  • Earlier we discussed predict-taken,
    predict-not-taken static prediction strategies
  • Applied uniformly across all branches in program
  • Static analysis in compiler may be able to do
    better, if it can non-uniformly predict whether
    each specific branch is likely to be taken or not
  • One way Backwards taken, forwards not taken.
  • If we can do better, it can help with static code
    scheduling to reduce data hazard stalls
  • Also may assist later dynamic prediction

45
Prediction Helps Static Scheduling
Some data dependences
  • LD R1,0(R2)
  • DSUBU R1,R1,R3
  • BEQZ R1,else
  • OR R4,R5,R6
  • DADDU R10,R4,E3
  • J after
  • else DADDU R7,R8,R9
  • after

Codemovementsto consider
Potential load delay to fill
Which way will thisbranch go?
Ifcase
If-then-elsecontrol flow
Elsecase
46
Some Static Prediction Schemes
  • Always predict taken
  • 34 mispredict rate on SPEC (range 9-54)
  • Backwards predict taken, forwards not taken
  • In SPEC, more than ½ of forwards are taken!
  • This does worse than always predict taken
    strategy
  • Usu. not better than 30-40 misprediction rate
  • Better than either Use profile information!
  • Collect statistics on earlier program runs.
  • Works well because individual branches tend to be
    strongly biased (taken or not) given average data
  • Bias tends to remain stable across multiple runs

47
Profile-Based Predictor Statistics
Floating-Point
48
Predict-Taken vs. Profile-Based
Instructions executed in between mispredictions
Floating-point
(Logscale!)
Write a Comment
User Comments (0)
About PowerShow.com