Title: Computer Architecture Lecture Notes Spring 2005 Dr' Michael P' Frank
1Computer Architecture Lecture Notes Spring
2005Dr. Michael P. Frank
- (New) Competency Area 6
- Introduction to Pipelining
2Basic Pipelining Concepts
- PH 3rd ed., Chapter 6
- HP 3rd ed. A.1
3Pipelining - The Basic Concept
- In early CPUs, deep combinational logic networks
were used in between state updates. - Signal delays may vary widely across different
paths. - New input cannot be provided to the network until
the slowest paths have finished. - Slow clock speed, slow overall processing rates.
- In pipelined design, deep logic networks are
subdivided into relatively shallow slices
(pipeline stages). - Delays through the network are made uniform.
- A new input can be provided to each slice as soon
as its quick, shallow network has finished. - Multiple inputs are processed simultaneously
across stages. - Clock cycle is only as long as the slowest
pipeline stage.
4Generic Pipelining Illustration
- Let represent any of a variety of logic
gates - Initial, non-pipelined design for some random
block of complex logic
latch
latch
5Pipelining Illustration cont.
- Aggressively pipelined version of same logic
- Insert extra pipeline registers periodically
- Here, after every 1-2 logic layers
- This design can process 5x as much data at once!
latch
latch
6Another View of Pipelining
- Space-time diagrams
- Here, each colored area shows which parts of the
logic network are occupied with data computed
from a given input item, at which times.
Depth in logic network
Depth in logic network
Data 1
Time
Time
Data 2
Pipelined (depth 6)
Non-Pipelined
7Simple Multicycle RISC Datapath
IF
ID
EX
MEM
WB
Next PC
Loadfr. Mem.Data
ProgramCounter
Inst.Reg.
8Basic RISC Execution Pipeline
- Basic idea of instruction-execution pipelining
- Each instruction spends 1 clock cycle in each of
the execution stages (in our example, there are
5). - ? during 1 clock cycle, the pipeline can be
processing (different stages of) 5 different
instructions simultaneously!
stage
time
9Different Visualizations
Same Time,Different Places
Same instruction, different steps
Same Time,DifferentData Item /Instruction
Same Time, Different Places
Skew
Same Place, Different Times
Same Place, Different Times
10More Graphical Detail
11Adding Pipeline Registers
12Description of Pipe Stages
13Dependences
14Dependences
- A dependence is a way in which one instruction
can depend on (be impacted by) another for
scheduling purposes. - Three major dependence types
- Data dependence
- Name dependence
- Control dependence
- Ill sometimes use the word dependency for a
particular instance of one instruction depending
on another. - The instructions cant be effectively (as opposed
to just syntactically) fully parallelized, or
reordered.
15Data Dependence
- Recursive definition
- Instruction B is data dependent on instruction A
iff - B uses a data result produced by instruction A,
or - There is another instruction C such that B is
data dependent on C, and C is data dependent on
A. - When a data dependence is present, there is a
potential RAW hazard. - Loop LD F0,0(R1)
- ADDD F4,F0,F2
- SD 0(R1),F4
- SUBI R1,R1,8
- BNEZ R1,Loop
A
A
B
C
B
Direct data dependenciesin a simple
examplecode fragment
16Name Dependence
- When two instructions access the same data
storage location, but are not data dependent. - Also, at least one of the accesses must be a
write. - Two sub-types (for inst. B after inst. A)
- Antidependence A reads, then B writes.
- Potential for a WAR hazard.
- Output dependence A writes, then B writes.
- Potential for a WAW hazard.
- Note Name dependencies can be avoided by
changing instructions to use different locations - (Rather than reusing 1 location for 2 purposes.)
- This fix is called renaming.
A
time
B
A
time
B
17Control Dependence
- Occurs when the execution of an instruction
- (as in, will it be executed, or not?)
- depends on the outcome of some earlier,
conditional branch instruction. - We generally cant easily change which branches
an instruction depends on w/o ruining the
programs functional behavior. - However, there are exceptions.
18Hazards, Stalls, Forwarding
19Hazards
- Hazards are circumstances which may lead to
stalls in the pipeline if not addressed. - Stalls are delays, and may be called bubbles
- There are three major types of hazards
- Structural hazards
- Not enough HW resources to keep all instrs.
moving. - Data hazards
- Data results of earlier instrs. not yet avail.
when needed. - Control hazards
- Control decisions resulting from earlier instrs.
(branches) not yet made dont know which new
instrs. to execute.
20Structural Hazard Example
Suppose you had a combined instructiondata
memory w. only 1 read port
21Hazards Produce Bubbles
Bubble rises
Progress through pipe
Time
Unskew
22Textual View
A pipeline stalled for a structural hazard a
load with only one memory port
23Example Data Hazards
24Forwarding for Data Hazards
25Another Forwarding Example
26Three Types of Data Hazards
- Let i be an earlier instruction, j a later one.
- RAW (read after write)
- j is supposed to Read a value After i Writes it,
- But instead j tries to read the value before i
has written it - WAW (write after write)
- j should Write to a given place After i Writes
there, - But they end up writing in the wrong order.
- Only occurs if gt1 pipeline stage can write.
- WAR (write after read)
- j should Write a new value After i Reads the old,
- But instead j writes the new value before i has
read the old one. - Only occurs if writes can happen before reads in
pipeline.
27An Unavoidable Stall
28Stalling in midst of instruction
29Data Hazard Prevention
- A clever compiler can often reschedule
instructions to avoid a stall. - A simple example
- Original code lw r2, 0(r4) add r1, r2, r3
? Note Stall happens here! lw r5, 4(r4) - Transformed code lw r2, 0(r4) lw r5, 4(r4)
add r1, r2, r3 ? No stall needed!
30Simple RISC Pipeline Stall Statistics
Note that 1 in 5loads causes a stallin many
programs!
Percentageof loads thatcause a stall
Benchmark
31Data Hazard Detection
32Hazard Detection Logic
- Example Detecting whether an instruction that
has just been fetched needs to be stalled 1 cycle
because of an immediately preceding load.
IF/ID
ID/EX
EX/ME
ME/WB
IF
ID
EX
ME
WB
IF/ID
33Forwarding Situations in DLX
34Implementing Forwarding in HW
35Control Hazards, Branch Prediction, Delayed
Branches
36Control Hazards
- Suppose the new PC value was not computed until
the MEM stage (like orig. RISC design). - Then we must stall 3 clocks after every branch!
37Early Branch Resolution
38New Pipeline Logic
39Control Instruction Statistics
- 10 of dynamic insts.are fwd. cond. branches
- only 3 are backwardscond. branches
- similar percentage areunconditional branches
40Stats on Taken Branches
67 of cond.branches aretaken
41Predict-Not-Taken
42Delayed Branches
Machine code sequence Branch instruction Delay
slot instruction(s) Post-branch instructions
Branch is taken(if taken) at this point
43Filling the Branch-Delay Slot
44Static Branch Prediction
- Earlier we discussed predict-taken,
predict-not-taken static prediction strategies - Applied uniformly across all branches in program
- Static analysis in compiler may be able to do
better, if it can non-uniformly predict whether
each specific branch is likely to be taken or not - One way Backwards taken, forwards not taken.
- If we can do better, it can help with static code
scheduling to reduce data hazard stalls - Also may assist later dynamic prediction
45Prediction Helps Static Scheduling
Some data dependences
- LD R1,0(R2)
- DSUBU R1,R1,R3
- BEQZ R1,else
- OR R4,R5,R6
- DADDU R10,R4,E3
- J after
- else DADDU R7,R8,R9
-
- after
Codemovementsto consider
Potential load delay to fill
Which way will thisbranch go?
Ifcase
If-then-elsecontrol flow
Elsecase
46Some Static Prediction Schemes
- Always predict taken
- 34 mispredict rate on SPEC (range 9-54)
- Backwards predict taken, forwards not taken
- In SPEC, more than ½ of forwards are taken!
- This does worse than always predict taken
strategy - Usu. not better than 30-40 misprediction rate
- Better than either Use profile information!
- Collect statistics on earlier program runs.
- Works well because individual branches tend to be
strongly biased (taken or not) given average data - Bias tends to remain stable across multiple runs
47Profile-Based Predictor Statistics
Floating-Point
48Predict-Taken vs. Profile-Based
Instructions executed in between mispredictions
Floating-point
(Logscale!)