Title: Type of Conditional Branches
1Type of Conditional Branches
- Forward Branches JMP, Conditional CALL,
Conditional Return, Exception. - Backward Branches Normally Loop Closing
Branches.
2Branch Prediction Key Idea
- Doing something is better than waiting around
doing nothing. - To minimize the decline in the Pipeline
Throughput caused by stalls caused due to the
presence of any Conditional Branch Instruction ,
one needs to be able to correctly predict /
determine the Successor/ Target Instructions to
be taken up for execution immediately after the
corresponding Conditional branch.
3Branch Prediction The Principle
- Guess / Predict branch target immediately after
Instruction Decode stage , subsequently start
fetching Instructions from the predicted
location. - Execute Branch, verify (check) your Prediction
- Minimizes penalty/ stalls if Prediction is
right ( Not necessarily to zero ?! ) - May increase penalty for wrong Predictions.
- This case of predicting the correct Conditional
Branch Target a-priori represents the most
effective way to tackle Pipeline Hazard posed by
the Conditional Branches.
4Branch Prediction Methods- 1
- A. Static Done prior to Execution. Can be
done by Compiler ?! - A1. Fixed
- a) Predict never taken 47 Actually
not taken ( Common for Forward Conditional
Branch ). - 1. Predict / guess an unresolved
conditional Branch always as not taken. - .
- 2. Continue with the execution of the
Sequential Path ,but in preparation for a wrong
guess / mis-prediction start with the execution
of the taken path in parallel (?!). - 3. Do not change any state no WRITE
BACK /register write(?) till Branch instruction
gets executed ( may introduce Stalls in case of
Deep Pipeline). - 4. When the condition can be
evaluated check the prediction. For
mis-prediction i.e. if branch is taken , turn the
fetched instructions into no-op/ Stall the
Pipeline, restart fetch at target address At
least 1 cycle penalty.
5Conditional Branch Execution Effect in MIPS
- Instruction i happen to be the Conditional
Branch Instruction. - Its immediate successor Instruction i1 may
have to be discarded/ squashed if Branch is
taken. - Issue is to predict whether the next Instruction
to be executed happens to be the immediate
successor (i1) ( Branch NOT taken) OR the
actual Target (T) with Target Address Branch
taken . This Target Address is assumed to be
computed during Instruction Decode stage itself.
6Predict Never / Not Taken MIPS 5 Stage Pipeline
Example - 1
- A. Correct Guess Scenario
- Clock K1K2 K3 K 4
K5 K6 K7 -
- --------------------------------------------------
----------------------------------------- - Untaken Branch
- Instr i IF ID
EX MEM WB - --------------------------------------------------
---------------------------------------- - Instr i1 IF
ID EX MEM WB - --------------------------------------------------
------------------------- - Instr i2
IF ID EX MEM WB
7Predict Never / Not Taken MIPS 5 Stage Pipeline
Example - 2
- B. Incorrect Guess Scenario
- Clock K1K2 K3 K
4 K5 K6 K7 -
- --------------------------------------------------
----------------------------------------- - Taken Branch
- Instr i IF ID
EX MEM WB - --------------------------------------------------
---------------------------------------- - Instr i1 IF
Stall Stall Stall Stall - --------------------------------------------------
------------------------- - Instr i2
IF ID EX MEM WB
8Predict Never Taken The Issues
- Which one is preferable in case of a
Mis-Prediction ? - i. Stall the Mis-Predicted / Offending
Instructions IMMEDIATELY. - ii. Allow them to proceed , Only Stall
before the Final Write Back Stage. - i. Presents a better option in general case
since Instruction Decode ID , Execute EX as
well as Memory Access MEM may introduce
certain Unwanted System Changes. - What is the Effect on Pipeline Throughput for a
Mis-prediction ? - Minimum 1 Cycle (10 ? ) provided we
can predict the Correct Target at the Instruction
Decode ID Stage. - Maximum M Cycles i.e. M number of
Instructions are to be FLUSHED from the pipeline/
STALLED (?!) where M Number of Stages the
Execution Stage is away from the Instruction
Fetch Stage at which point the Correct Target is
known,. -
9Branch Prediction Methods- 2
- A. Static Aided by Compiler (?!).
- A2 . Fixed ( contd.)
- a) Predict always taken 53 Actually
always taken - Common for Backward Conditional Branch /
Conditional Call Return (?!) . - 1. Must know the Actual Branch Target
at the Instruction Decode stage itself (not
possible normally in MIPS) except Conditional
CALL / RETURN Instructions. - 2. Inevitable Stalls since Target
Address Computation may involve the ALU. - 3. In case of mis-prediction one will
have to replace the guessed Instruction by the
very next instruction lying after the Conditional
Branch. Needs additional Instruction Block
Storage/ Localized Instruction Frame store inside
CPU ?
10Always Not Taken vs Always Taken
11Always not taken Penalty Figures
12Penalty Figures for the Always taken Prediction
Approach
13Performance Measures of Fixed Prediction of
Branch Processing
- ??Pt branch penalties for taken
- ??Pnt branch penalties for not-taken
- ??ft frequencies of taken
- ??fnt frequencies for not-taken
- ??P effective penalty of branch processing
- P ft Pt fnt Pnt
- e.g. 80386 P 0.75 8 0.25 2 6.5
cycles - ??e.g. i486 P 0.75 2 0.25 0 1.5
cycles - Branch prediction correctly or mis
predicted - P fc Pc fnt Pnt
- ??e.g. Pentium P 0.9 0 0.1 3.5 0.35
cycles
14Static Branch Prediction Methods
15Static Branch Prediction Op Code Based
implemented in the MC88110
16Direction Based Prediction
- Simple to implement (say, branch is taken)
- However, often branch behaviour is variable
(dynamic). Misprediction rates vary from 59 to
9 (average 34) - Cant capture such behaviour at compile time with
simple direction based prediction! - Need history (aka profile)-based prediction.
17 Compiler Aided Branch Prediction Hints Taken
NOT Taken Switch
- Individual Branches tend to be Strongly Bi-Modal.
- Set a bit in the Op-Code i.e. Change the
Instruction Encoding pattern. - Instruction Fetch is steered Accordingly.
- Good for Loops.
18Profile Guided Static Prediction
- Consider the MIPS Instruction
- BEQ r1,r2, L1 Backward Branch
- Earliest possible stage to detect the TARGET
Address L1 is in the 2nd Instruction Decode
(ID) Stage. - Suppose the BRANCH Bit in the encoded Instruction
is set to 1 (Assuming the generally adopted
BRANCH Prediction Policy ). - This will enable Fetching of the Instruction from
the TARGET Address L1 immediately after the ID
stage thereby only stalling/ flushing out a
single Successor Instruction. - But the actual Branch Condition is known only
after one more stage i.e. the EXECUTION Stage. - Hence any Mis-prediction / wrong Instruction
Encoding (Branch NOT needed to be taken) will
force the system to stall / flush out this
fetched Target Instruction as well.
19Profile Guided Static Prediction -2
20Profile Guided Static Prediction - 1
21Heuristic Based Static Branch Prediction Ball /
Larus
- The Basis
- void p malloc (numBytes)
- if (p NULL)
- Error_Handling_Function ()
- Ref Thomas Ball and James Larus Branch
Prediction for Free ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming ,
pages 300-313 , May 1993.
22Summary of Heuristic Based Static Branch
Prediction - 1
- Heuristic Description
- Name
- __________________________________________________
_____________ - Loop Branch If the branch target is back to
the head of a loop, predict taken. - --------------------------------------------------
--------------------------------------------------
------ - Pointer If a branch compares a
pointer with NULL, or if two pointers are - compared, predict in the
direction that corresponds to the pointer being - not NULL, or the two
pointers not being equal. - --------------------------------------------------
--------------------------------------------------
-------- - Opcode If a branch is testing that an
integer is less than zero, less than or equal to - zero, or equal to a
constant, predict in the direction that
corresponds to - the test evaluating to
false. - __________________________________________________
_________________ - Guard If the operand of the branch
instruction is a register that gets used before - being redefined in the
successor block, predict that the branch goes to
the - successor block.
23Summary of Heuristic Based Static Branch
Prediction - 2
- Heuristic Description
- Name
- Loop Exit If a branch occurs inside a
loop, and neither of the targets is the loop
head, then predict - that the branch does not
go to the successor that is the loop exit. - __________________________________________________
___________________________ - Loop Header Predict that the successor block of
a branch that is a loop header or a loop
pre-header - is taken.
- __________________________________________________
_____________________________ - Call If a successor block
contains a subroutine call, predict that the
branch goes to that - successor block.
- __________________________________________________
___________________________ - Store If a successor block contains
a store instruction, predict that the branch does
not go to - that successor block.
- __________________________________________________
_________________________ - Return If a successor block contains a
return from subroutine instruction, predict that
the branch - does not go to the
Successor Block.
24Static Branch Prediction(Summary)
- Fixed Prediction.
- 1. Predict NOT Taken.
- 2. Predict ALWAYS Taken.
- Profile-based
- 1. Instrument program binary.
- 2. Run with representative (?) input set.
- 3. Recompile program.
- a. Annotate branch Op Codes with hint
bits, OR - b. Restructure code to match predict
not-taken. - Best performance 75-80 accuracy
25Dynamic Branch Prediction The Key Issues
26Dynamic Branch Prediction
- - Use past behaviour to predict the future.
- Main advantages
- Learn branch behaviour autonomously
- No compiler analysis, heuristics, or
profiling - Adapt to changing branch behaviour
- Program phase changes branch behaviour
- First proposed in 1980
- US Patent 4,370,711, Branch predictor using
- Random Access Memory, James. E. Smith
- Continually refined since then.
27History-based / State Based Branch Prediction
Temporal Locality ?
- Needs 2 parts
- Predictor Bits to guess where/if instruction
will branch (and to where). - Recovery Mechanism i.e. a way to fix
mistakes / handle Mispredicted Branch Situations.
28History-based Branch Prediction
- One bit predictor
- Use result from last time this instruction
executed. - Problem
- Even if branch is almost always taken, we will be
wrong at least twice - if branch alternates between taken, not taken
- We get 0 accuracy
29Branch Prediction BufferBranch History Table
- A small sized ( compared to System Cache Size) ,
High Speed Cache like , Electronic Memory. - Indexed by lower bits of the Branch Instruction
Address PC. - Contains Branch Predictor / History bits for the
most recently executed Branch Instructions(?!).
30Dynamic Branch Prediction The Smith Hardware
Jim E. Smith. A Study of Branch Prediction
Strategies. International Symposium on Computer
Architecture, pages 135-148, May 1981 Widely
employed Intel Pentium, PowerPC 604, PowerPC
620, etc.
31Typical Branch History Table Organization
32Simplest Dynamic Branch Predictor
331 Bit Branch Predictor Structure
34FSM of the 1-Bit Predictor
35Example using 1 Bit Branch Predictor History Table
60 Accuracy
36Example
- Let initial value T, actual outcome of branches
is - NT, NT, NT, T, T, T - Predictions are T, NT, NT, NT, T, T
- 2 wrong (in red), 4 correct 66 accuracy
- 2-bit predictors can do even better
- In general, can have k-bit predictors
37 2-bit Dynamic Branch Prediction Scheme
- Change prediction only if twice mispredicted
- Adds hysteresis to decision making process
Incremented if taken, decremented if not taken
T
Predict Taken
NT
Predict Taken
11
10
T
T
NT
NT
Predict Not Taken
00
01
Predict Not Taken
T
NT
38Branch Prediction Flowchart
39Branch Prediction State Diagram
402- Bit Saturating Up/Down Counter Predictor - 1
412 Bit Counter Predictor ( Another Scheme)
42Improved Performance using 2 Bit Predictor
432-bit Predictor
- What is the prediction accuracy using a 4096
entry 2-bit branch predictor for a typical
application? - 99 to 80 depending upon the application.
- Can an n-bit (ngt2) predictor do better?
- 2-bit predictors do almost as well as any n-bit
predictors. - Can the accuracy of branch prediction be
improved? - Correlating branch predictor.
44Software-based Scheduling vs. Hardware-based
Scheduling
- Disadvantage with compilers
- In many cases, many information can not be
extracted from code - Examples
- pointers to the same memory location.
- Value of the induction variable of a loop
- It is still possible to assist hardware by
exposing more ILP - Rearrange instructions for increased performance
45An Example of Computing Performance
- Program assumptions
- 23 loads and in ½ of cases, next instruction
uses load value - 13 stores
- 19 conditional branches
- 2 unconditional branches
- 43 other
46Example
- Machine Assumptions
- 5 stage pipe
- Penalty of 1 cycle on use of load value
immediately after a load. - Jumps are resolved in ID stage for a 1 cycle
branch penalty. - 75 branch prediction accuracy.
- 1 cycle delay on misprediction.
47Example
- CPI penalty calculation
- Loads
- 50 of the 23 of loads have 1 cycle penalty
.5.23 0.115 - Jumps
- All of the 2 of jumps have 1 cycle penalty
0.021 0.02 - Conditional Branches
- 25 of the 19 are mispredicted, have a 1 cycle
penalty 0.250.191 0.0475 - Total Penalty 0.115 0.02 0.0475 0.1825
- Average CPI 1 0.1825 1.1825
48Some Discussions on State-Based Predictor
- If an instruction is decoded as a branch
- If the branch is predicted taken, fetching begins
as soon as the target address is known. - Branch taken prediction technique is of little
use in MIPS 5 stage pipeline. - Why?
- Useful in deeper pipelines.
- What are the pros and cons of a using large BPB?
49Predictors in Simple Pipelines
- Initial pipelined processors, e.g. MIPS, SOLARIS,
etc. - Did only trivial branch predictions.
- Possible reasons could be
- The penalty of mis-predictions not as severe as
in deeper pipelined processors. - Sophisticated branch predictors did not exist.
- Advanced branch prediction techniques have now
become very important with - Use of deeper pipelines.
- Introduction of superscalar processor.
50Handling Control Hazards Branch Predictions
- Unless satisfactory resolution mechanisms are in
place - Branches can significantly degrade the
performance of a pipeline - We had so far looked at some very simple branch
prediction techniques - Yet, yielded reasonably good performance
benefits of the order of 50 to 100. - Can we do better by deploying more advanced
branch prediction techniques?
51Multi Level Branch PredictionCapturing Global
Behaviour
52Correlating Branch Predictor
- It may be possible to improve the accuracy of
branch prediction - By observing the recent behavior of other
branches. - Example
if (a2) b2 if(b2 b0
53Correlating Branch Predictor
- An (m,n) predictor
- Makes use of the outcomes observed for the last
m branches - Uses m number of n-bit predictors.
- Behavior of a branch can be predicted by choosing
from 2m branch predictors.
54Correlating Branch Predictor
- Why does the outcome of one branch depend on the
outcome of another branch? - Depending on whether some preceding branch is
taken or not - Some variable may be set to some value or not.