Type of Conditional Branches - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Type of Conditional Branches

Description:

To minimize the decline in the Pipeline Throughput caused by stalls caused due ... Predictions are: T, NT, NT, NT, T, T. 2 wrong (in red), 4 correct = 66% accuracy ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 55
Provided by: parthasa
Category:

less

Transcript and Presenter's Notes

Title: Type of Conditional Branches


1
Type of Conditional Branches
  • Forward Branches JMP, Conditional CALL,
    Conditional Return, Exception.
  • Backward Branches Normally Loop Closing
    Branches.

2
Branch Prediction Key Idea
  • Doing something is better than waiting around
    doing nothing.
  • To minimize the decline in the Pipeline
    Throughput caused by stalls caused due to the
    presence of any Conditional Branch Instruction ,
    one needs to be able to correctly predict /
    determine the Successor/ Target Instructions to
    be taken up for execution immediately after the
    corresponding Conditional branch.

3
Branch Prediction The Principle
  • Guess / Predict branch target immediately after
    Instruction Decode stage , subsequently start
    fetching Instructions from the predicted
    location.
  • Execute Branch, verify (check) your Prediction
  • Minimizes penalty/ stalls if Prediction is
    right ( Not necessarily to zero ?! )
  • May increase penalty for wrong Predictions.
  • This case of predicting the correct Conditional
    Branch Target a-priori represents the most
    effective way to tackle Pipeline Hazard posed by
    the Conditional Branches.

4
Branch Prediction Methods- 1
  • A. Static Done prior to Execution. Can be
    done by Compiler ?!
  • A1. Fixed
  • a) Predict never taken 47 Actually
    not taken ( Common for Forward Conditional
    Branch ).
  • 1. Predict / guess an unresolved
    conditional Branch always as not taken.
  • .
  • 2. Continue with the execution of the
    Sequential Path ,but in preparation for a wrong
    guess / mis-prediction start with the execution
    of the taken path in parallel (?!).
  • 3. Do not change any state no WRITE
    BACK /register write(?) till Branch instruction
    gets executed ( may introduce Stalls in case of
    Deep Pipeline).
  • 4. When the condition can be
    evaluated check the prediction. For
    mis-prediction i.e. if branch is taken , turn the
    fetched instructions into no-op/ Stall the
    Pipeline, restart fetch at target address At
    least 1 cycle penalty.

5
Conditional Branch Execution Effect in MIPS
  • Instruction i happen to be the Conditional
    Branch Instruction.
  • Its immediate successor Instruction i1 may
    have to be discarded/ squashed if Branch is
    taken.
  • Issue is to predict whether the next Instruction
    to be executed happens to be the immediate
    successor (i1) ( Branch NOT taken) OR the
    actual Target (T) with Target Address Branch
    taken . This Target Address is assumed to be
    computed during Instruction Decode stage itself.

6
Predict Never / Not Taken MIPS 5 Stage Pipeline
Example - 1
  • A. Correct Guess Scenario
  • Clock K1K2 K3 K 4
    K5 K6 K7


  • --------------------------------------------------
    -----------------------------------------
  • Untaken Branch
  • Instr i IF ID
    EX MEM WB  
  • --------------------------------------------------
    ----------------------------------------
  •  Instr i1 IF
    ID EX MEM WB  
  • --------------------------------------------------
    -------------------------
  •  Instr i2  
    IF ID EX MEM WB  

7
Predict Never / Not Taken MIPS 5 Stage Pipeline
Example - 2
  • B. Incorrect Guess Scenario
  • Clock K1K2 K3 K
    4 K5 K6 K7


  • --------------------------------------------------
    -----------------------------------------
  • Taken Branch
  • Instr i IF ID
    EX MEM WB  
  • --------------------------------------------------
    ----------------------------------------
  •  Instr i1 IF
    Stall Stall Stall Stall  
  • --------------------------------------------------
    -------------------------
  •  Instr i2  
    IF ID EX MEM WB  

8
Predict Never Taken The Issues
  • Which one is preferable in case of a
    Mis-Prediction ?
  • i. Stall the Mis-Predicted / Offending
    Instructions IMMEDIATELY.
  • ii. Allow them to proceed , Only Stall
    before the Final Write Back Stage.
  • i. Presents a better option in general case
    since Instruction Decode ID , Execute EX as
    well as Memory Access MEM may introduce
    certain Unwanted System Changes.
  • What is the Effect on Pipeline Throughput for a
    Mis-prediction ?
  • Minimum 1 Cycle (10 ? ) provided we
    can predict the Correct Target at the Instruction
    Decode ID Stage.
  • Maximum M Cycles i.e. M number of
    Instructions are to be FLUSHED from the pipeline/
    STALLED (?!) where M Number of Stages the
    Execution Stage is away from the Instruction
    Fetch Stage at which point the Correct Target is
    known,.

9
Branch Prediction Methods- 2
  • A. Static Aided by Compiler (?!).
  • A2 . Fixed ( contd.)
  • a) Predict always taken 53 Actually
    always taken
  • Common for Backward Conditional Branch /
    Conditional Call Return (?!) .
  • 1. Must know the Actual Branch Target
    at the Instruction Decode stage itself (not
    possible normally in MIPS) except Conditional
    CALL / RETURN Instructions.
  • 2. Inevitable Stalls since Target
    Address Computation may involve the ALU.
  • 3. In case of mis-prediction one will
    have to replace the guessed Instruction by the
    very next instruction lying after the Conditional
    Branch. Needs additional Instruction Block
    Storage/ Localized Instruction Frame store inside
    CPU ?

10
Always Not Taken vs Always Taken
11
Always not taken Penalty Figures
12
Penalty Figures for the Always taken Prediction
Approach
13
Performance Measures of Fixed Prediction of
Branch Processing
  • ??Pt branch penalties for taken
  • ??Pnt branch penalties for not-taken
  • ??ft frequencies of taken
  • ??fnt frequencies for not-taken
  • ??P effective penalty of branch processing
  • P ft Pt fnt Pnt
  • e.g. 80386 P 0.75 8 0.25 2 6.5
    cycles
  • ??e.g. i486 P 0.75 2 0.25 0 1.5
    cycles
  • Branch prediction correctly or mis
    predicted
  • P fc Pc fnt Pnt
  • ??e.g. Pentium P 0.9 0 0.1 3.5 0.35
    cycles

14
Static Branch Prediction Methods

15
Static Branch Prediction Op Code Based
implemented in the MC88110
16
Direction Based Prediction
  • Simple to implement (say, branch is taken)
  • However, often branch behaviour is variable
    (dynamic). Misprediction rates vary from 59 to
    9 (average 34)
  • Cant capture such behaviour at compile time with
    simple direction based prediction!
  • Need history (aka profile)-based prediction.

17
Compiler Aided Branch Prediction Hints Taken
NOT Taken Switch
  • Individual Branches tend to be Strongly Bi-Modal.
  • Set a bit in the Op-Code i.e. Change the
    Instruction Encoding pattern.
  • Instruction Fetch is steered Accordingly.
  • Good for Loops.

18
Profile Guided Static Prediction
  • Consider the MIPS Instruction
  • BEQ r1,r2, L1 Backward Branch
  • Earliest possible stage to detect the TARGET
    Address L1 is in the 2nd Instruction Decode
    (ID) Stage.
  • Suppose the BRANCH Bit in the encoded Instruction
    is set to 1 (Assuming the generally adopted
    BRANCH Prediction Policy ).
  • This will enable Fetching of the Instruction from
    the TARGET Address L1 immediately after the ID
    stage thereby only stalling/ flushing out a
    single Successor Instruction.
  • But the actual Branch Condition is known only
    after one more stage i.e. the EXECUTION Stage.
  • Hence any Mis-prediction / wrong Instruction
    Encoding (Branch NOT needed to be taken) will
    force the system to stall / flush out this
    fetched Target Instruction as well.

19
Profile Guided Static Prediction -2
20
Profile Guided Static Prediction - 1
21
Heuristic Based Static Branch Prediction Ball /
Larus
  • The Basis
  • void p malloc (numBytes)
  • if (p NULL)
  • Error_Handling_Function ()
  • Ref Thomas Ball and James Larus Branch
    Prediction for Free ACM SIGPLAN Symposium on
    Principles and Practice of Parallel Programming ,
    pages 300-313 , May 1993.

22
Summary of Heuristic Based Static Branch
Prediction - 1
  • Heuristic Description
  • Name
  • __________________________________________________
    _____________
  • Loop Branch If the branch target is back to
    the head of a loop, predict taken.
  • --------------------------------------------------
    --------------------------------------------------
    ------
  • Pointer If a branch compares a
    pointer with NULL, or if two pointers are
  • compared, predict in the
    direction that corresponds to the pointer being
  • not NULL, or the two
    pointers not being equal.
  • --------------------------------------------------
    --------------------------------------------------
    --------
  • Opcode If a branch is testing that an
    integer is less than zero, less than or equal to
  • zero, or equal to a
    constant, predict in the direction that
    corresponds to
  • the test evaluating to
    false.
  • __________________________________________________
    _________________
  • Guard If the operand of the branch
    instruction is a register that gets used before
  • being redefined in the
    successor block, predict that the branch goes to
    the
  • successor block.

23
Summary of Heuristic Based Static Branch
Prediction - 2
  • Heuristic Description
  • Name
  • Loop Exit If a branch occurs inside a
    loop, and neither of the targets is the loop
    head, then predict
  • that the branch does not
    go to the successor that is the loop exit.
  • __________________________________________________
    ___________________________
  • Loop Header Predict that the successor block of
    a branch that is a loop header or a loop
    pre-header
  • is taken.
  • __________________________________________________
    _____________________________
  • Call If a successor block
    contains a subroutine call, predict that the
    branch goes to that
  • successor block.
  • __________________________________________________
    ___________________________
  • Store If a successor block contains
    a store instruction, predict that the branch does
    not go to
  • that successor block.
  • __________________________________________________
    _________________________
  • Return If a successor block contains a
    return from subroutine instruction, predict that
    the branch
  • does not go to the
    Successor Block.

24
Static Branch Prediction(Summary)
  • Fixed Prediction.
  • 1. Predict NOT Taken.
  • 2. Predict ALWAYS Taken.
  • Profile-based
  • 1. Instrument program binary.
  • 2. Run with representative (?) input set.
  • 3. Recompile program.
  • a. Annotate branch Op Codes with hint
    bits, OR
  • b. Restructure code to match predict
    not-taken.
  • Best performance 75-80 accuracy

25
Dynamic Branch Prediction The Key Issues
26
Dynamic Branch Prediction
  • - Use past behaviour to predict the future.
  • Main advantages
  • Learn branch behaviour autonomously
  • No compiler analysis, heuristics, or
    profiling
  • Adapt to changing branch behaviour
  • Program phase changes branch behaviour
  • First proposed in 1980
  • US Patent 4,370,711, Branch predictor using
  • Random Access Memory, James. E. Smith
  • Continually refined since then.

27
History-based / State Based Branch Prediction
Temporal Locality ?
  • Needs 2 parts
  • Predictor Bits to guess where/if instruction
    will branch (and to where).
  • Recovery Mechanism i.e. a way to fix
    mistakes / handle Mispredicted Branch Situations.

28
History-based Branch Prediction
  • One bit predictor
  • Use result from last time this instruction
    executed.
  • Problem
  • Even if branch is almost always taken, we will be
    wrong at least twice
  • if branch alternates between taken, not taken
  • We get 0 accuracy

29
Branch Prediction BufferBranch History Table
  • A small sized ( compared to System Cache Size) ,
    High Speed Cache like , Electronic Memory.
  • Indexed by lower bits of the Branch Instruction
    Address PC.
  • Contains Branch Predictor / History bits for the
    most recently executed Branch Instructions(?!).

30
Dynamic Branch Prediction The Smith Hardware
Jim E. Smith. A Study of Branch Prediction
Strategies. International Symposium on Computer
Architecture, pages 135-148, May 1981 Widely
employed Intel Pentium, PowerPC 604, PowerPC
620, etc.
31
Typical Branch History Table Organization
32
Simplest Dynamic Branch Predictor
33
1 Bit Branch Predictor Structure

34
FSM of the 1-Bit Predictor
35
Example using 1 Bit Branch Predictor History Table
60 Accuracy
36
Example
  • Let initial value T, actual outcome of branches
    is - NT, NT, NT, T, T, T
  • Predictions are T, NT, NT, NT, T, T
  • 2 wrong (in red), 4 correct 66 accuracy
  • 2-bit predictors can do even better
  • In general, can have k-bit predictors

37
2-bit Dynamic Branch Prediction Scheme
  • Change prediction only if twice mispredicted
  • Adds hysteresis to decision making process

Incremented if taken, decremented if not taken
T
Predict Taken
NT
Predict Taken
11
10
T
T
NT
NT
Predict Not Taken
00
01
Predict Not Taken
T
NT
38
Branch Prediction Flowchart
39
Branch Prediction State Diagram
40
2- Bit Saturating Up/Down Counter Predictor - 1

41
2 Bit Counter Predictor ( Another Scheme)
42
Improved Performance using 2 Bit Predictor
43
2-bit Predictor
  • What is the prediction accuracy using a 4096
    entry 2-bit branch predictor for a typical
    application?
  • 99 to 80 depending upon the application.
  • Can an n-bit (ngt2) predictor do better?
  • 2-bit predictors do almost as well as any n-bit
    predictors.
  • Can the accuracy of branch prediction be
    improved?
  • Correlating branch predictor.

44
Software-based Scheduling vs. Hardware-based
Scheduling
  • Disadvantage with compilers
  • In many cases, many information can not be
    extracted from code
  • Examples
  • pointers to the same memory location.
  • Value of the induction variable of a loop
  • It is still possible to assist hardware by
    exposing more ILP
  • Rearrange instructions for increased performance

45
An Example of Computing Performance
  • Program assumptions
  • 23 loads and in ½ of cases, next instruction
    uses load value
  • 13 stores
  • 19 conditional branches
  • 2 unconditional branches
  • 43 other

46
Example
  • Machine Assumptions
  • 5 stage pipe
  • Penalty of 1 cycle on use of load value
    immediately after a load.
  • Jumps are resolved in ID stage for a 1 cycle
    branch penalty.
  • 75 branch prediction accuracy.
  • 1 cycle delay on misprediction.

47
Example
  • CPI penalty calculation
  • Loads
  • 50 of the 23 of loads have 1 cycle penalty
    .5.23 0.115
  • Jumps
  • All of the 2 of jumps have 1 cycle penalty
    0.021 0.02
  • Conditional Branches
  • 25 of the 19 are mispredicted, have a 1 cycle
    penalty 0.250.191 0.0475
  • Total Penalty 0.115 0.02 0.0475 0.1825
  • Average CPI 1 0.1825 1.1825

48
Some Discussions on State-Based Predictor
  • If an instruction is decoded as a branch
  • If the branch is predicted taken, fetching begins
    as soon as the target address is known.
  • Branch taken prediction technique is of little
    use in MIPS 5 stage pipeline.
  • Why?
  • Useful in deeper pipelines.
  • What are the pros and cons of a using large BPB?

49
Predictors in Simple Pipelines
  • Initial pipelined processors, e.g. MIPS, SOLARIS,
    etc.
  • Did only trivial branch predictions.
  • Possible reasons could be
  • The penalty of mis-predictions not as severe as
    in deeper pipelined processors.
  • Sophisticated branch predictors did not exist.
  • Advanced branch prediction techniques have now
    become very important with
  • Use of deeper pipelines.
  • Introduction of superscalar processor.

50
Handling Control Hazards Branch Predictions
  • Unless satisfactory resolution mechanisms are in
    place
  • Branches can significantly degrade the
    performance of a pipeline
  • We had so far looked at some very simple branch
    prediction techniques
  • Yet, yielded reasonably good performance
    benefits of the order of 50 to 100.
  • Can we do better by deploying more advanced
    branch prediction techniques?

51
Multi Level Branch PredictionCapturing Global
Behaviour
52
Correlating Branch Predictor
  • It may be possible to improve the accuracy of
    branch prediction
  • By observing the recent behavior of other
    branches.
  • Example

if (a2) b2 if(b2 b0
53
Correlating Branch Predictor
  • An (m,n) predictor
  • Makes use of the outcomes observed for the last
    m branches
  • Uses m number of n-bit predictors.
  • Behavior of a branch can be predicted by choosing
    from 2m branch predictors.

54
Correlating Branch Predictor
  • Why does the outcome of one branch depend on the
    outcome of another branch?
  • Depending on whether some preceding branch is
    taken or not
  • Some variable may be set to some value or not.
Write a Comment
User Comments (0)
About PowerShow.com