Title: CSE 420598 Computer Architecture Lec 9 Chapter 2 Branch Prediction
1CSE 420/598 Computer Architecture Lec 9
Chapter 2 - Branch Prediction
- Sandeep K. S. Gupta
- School of Computing and Informatics
- Arizona State University
Based on Slides by David Patterson, Al Davis, and
Luddy Harrison
2Agenda
- Dynamic Branch Prediction
- 1-Bit Predictor
- 2-Bit Predictor
- Correlating Predictor
- Tournament Predictor
- Programming Assignment 1 Case Study 2 on pg 149
Modeling a Branch Predictor in C or JAVA.
3Need for Better than Static Branch Prediction
Techniques
4Dynamic Branch Prediction
- Why does prediction work?
- Underlying algorithm has regularities
- Data that is being operated on has regularities
- Instruction sequence has redundancies that are
artifacts of way that humans/compilers think
about problems - Is dynamic branch prediction better than static
branch prediction? - Seems to be
- There are a small number of important branches in
programs which have dynamic behavior
5Control Hazard (Recap)
- In the 5-stage in-order processor assume always
taken or assume always not taken if the branch
goes the other way, squash mis-fetched
instructions - Modern out-of-order processors dynamic branch
prediction - Branch predictor a cache of recent branch
outcomes
6Pipeline without Branch Predictor
IF (br)
PC
Reg Read Compare Br-target
PC 4
In the 5-stage pipeline, a branch completes in
two cycles ? If the branch went the wrong way,
one incorrect instr is fetched ? One stall cycle
per incorrect branch
7Pipeline with Branch Predictor
IF (br)
PC
Reg Read Compare Br-target
Branch Predictor
In the 5-stage pipeline, a branch completes in
two cycles ? If the branch went the wrong way,
one incorrect instr is fetched ? One stall cycle
per incorrect branch
8Branch Mispredict Penalty
- Performance (accuracy, cost of misprediction)
- Assume no data or structural hazards only
control hazards every 5th instruction is a
branch branch predictor accuracy is 90 - Slowdown 1 / (1 stalls per instruction)
- Stalls per instruction branches x mispreds x
penalty - 20 x 10 x 1
- 0.02
- Slowdown 1/1.02 if penalty 20, slowdown
1/1.4
9Dynamic Branch Prediction 1 Bit Prediction
- Branch History Table (BHT) Lower bits of PC
address index table of 1-bit values - Says whether or not branch taken last time
- No address check
- For each branch, keep track of what happened last
time and use that outcome as the prediction
101-bit BHT a.k.a Branch Prediction Buffer (BPB)
PredictIf BPB entry is 0, fetch PC1If BPB
entry is 1, fetch L UpdateIf branch is taken,
BPB 1If branch is not taken, BPB 0
11State Diagram of 1-bit Predictor
12Twice Mispredicted Loop Branches
- M ADD R1, R2, R3
- L ADD R4, R5, R6 MUL R7, R8, R9 SUB R11, R11,
1 BNE L - SUB R10, R10, 1 BNE M
13Sequence of Predictions
14Problem with 1-bit BHT
- What are prediction accuracies for branches 1 and
2 ? - while (1)
- for (i0ilt10i) branch-1
-
-
- for (j0jlt20j) branch-2
-
-
- Problem in a loop, 1-bit BHT will cause two
mispredictions (avg is 9 iterations before exit) - End of loop case, when it exits instead of
looping as before - First time through loop and on next time through
code, when it predicts exit instead of looping
152-Bit Prediction
- For each branch, maintain a 2-bit saturating
counter - if the branch is taken counter
min(3,counter1) - if the branch is not taken counter
max(0,counter-1) - If (counter gt 2), predict taken, else predict
not taken - Advantage a few atypical branches will not
influence the prediction (a better measure of
the common case) - Especially useful when multiple branches share
the same counter (some bits of the branch PC are
used to index into the branch predictor) - Can be easily extended to N-bits (in most
processors, N2)
16Dynamic Branch Prediction
- Solution 2-bit scheme where change prediction
only if get misprediction twice in a row - Red stop, not taken
- Green go, taken
- Adds hysteresis to decision making process
17Bimodal Predictor
Table of 16K entries of 2-bit saturating counters
14 bits
Branch PC
18BHT Accuracy
- Mispredict because either
- Wrong guess for that branch
- Got branch history of wrong branch when index the
table - 4096 entry table
Integer
Floating Point
19Correlating Predictors
- Basic branch prediction maintain a 2-bit
saturating counter for each entry (or use 10
branch PC bits to index into one of 1024
counters) captures the recent common case for
each branch - Can we take advantage of additional information?
- If a branch recently went 01111, expect 0 if it
recently went 11101, expect 1 can we have a
separate counter for each case? - If the previous branches went 01, expect 0 if
the previous branches went 11, expect 1 can we
have a separate counter for each case? - Hence, build correlating predictors
20Local/Global Predictors
- Instead of maintaining a counter for each branch
to capture the common case, - Maintain a counter for each branch and
surrounding pattern - If the surrounding pattern belongs to the branch
being predicted, the predictor is referred to as
a local predictor - If the surrounding pattern includes neighboring
branches, the predictor is referred to as a
global predictor
21Global Predictor
A single register that keeps track of recent
history for all branches
Table of 16K entries of 2-bit saturating counters
00110101
8 bits
6 bits
Branch PC
Also referred to as a two-level predictor
22Local Predictor
Also a two-level predictor that only uses local
histories at the first level
Branch PC
Table of 16K entries of 2-bit saturating counters
Use 6 bits of branch PC to index into local
history table
10110111011001
14-bit history indexes into next level
Table of 64 entries of 14-bit histories for a
single branch
23Correlated Branch Prediction
- Idea record m most recently executed branches
as taken or not taken, and use that pattern to
select the proper n-bit branch history table - In general, (m,n) predictor means record last m
branches to select between 2m history tables,
each with n-bit counters - Thus, old 2-bit BHT is a (0,2) predictor
- Global Branch History m-bit shift register
keeping T/NT status of last m branches. - Each entry in table has m n-bit predictors.
24Correlating Branches
(2,2) predictor Behavior of recent branches
selects between four predictions of next branch,
updating just that prediction
Branch address
4
2-bits per branch predictor
Prediction
2-bit global branch history
25Accuracy of Different Schemes
20
4096 Entries 2-bit BHT Unlimited Entries 2-bit
BHT 1024 Entries (2,2) BHT
18
16
14
12
11
Frequency of Mispredictions
10
8
6
6
6
6
5
5
4
4
2
1
1
0
0
nasa7
matrix300
doducd
spice
fpppp
gcc
expresso
eqntott
li
tomcatv
4,096 entries 2-bits per entry
Unlimited entries 2-bits/entry
1,024 entries (2,2)
26Tournament Predictors
- A local predictor might work well for some
branches or - programs, while a global predictor might work
well for others - Provide one of each and maintain another
predictor to - identify which predictor is best for each branch
Local Predictor
M U X
Global Predictor
Branch PC
Tournament Predictor
Table of 2-bit saturating counters
27Global Predictor Example
What is the total capacity of this branch
predictor?
A single register that keeps track of recent
history for all branches
Table of 2-bit saturating counters
00110101
10 bits
4 bits
Branch PC
Also referred to as a two-level predictor
28Local Predictor Example
What is the total capacity of this branch
predictor?
Branch PC
Table of 2-bit saturating counters
Use 8 bits of branch PC to index into local
history table
10110111
Table of 8-bit histories for a single branch
29Example
- Consider the following tournament branch
predictor Fourteen bits of - the PC are used to index into a table of 3-bit
saturating counters that - predict whether we should use a local or global
prediction. The global - predictor concatenates 8 bits of branch PC and
6 bits of global history - to index into 2-bit saturating counters. The
local predictor uses 8 bits - of branch PC to select an 8-bit local history
that then indexes into a - table of 2-bit saturating counters. What is the
capacity of each - structure in this branch predictor?
30Tournament Predictors
- Multilevel branch predictor
- Use n-bit saturating counter to choose between
predictors - Usual choice between global and local predictors
31Tournament Predictors
- Tournament predictor using, say, 4K 2-bit
counters indexed by local branch address.
Chooses between - Global predictor
- 4K entries index by history of last 12 branches
(212 4K) - Each entry is a standard 2-bit predictor
- Local predictor
- Local history table 1024 10-bit entries
recording last 10 branches, index by branch
address - The pattern of the last 10 occurrences of that
particular branch used to index table of 1K
entries with 3-bit saturating counters
32Comparing Predictors (Fig. 2.8)
- Advantage of tournament predictor is ability to
select the right predictor for a particular
branch - Particularly crucial for integer benchmarks.
- A typical tournament predictor will select the
global predictor almost 40 of the time for the
SPEC integer benchmarks and less than 15 of the
time for the SPEC FP benchmarks
33Pentium 4 Misprediction Rate (per 1000
instructions, not per branch)
?6 misprediction rate per branch SPECint (19
of INT instructions are branch) ?2 misprediction
rate per branch SPECfp(5 of FP instructions are
branch)
SPECint2000
SPECfp2000
34Branch Target Prediction
- In addition to predicting the branch direction,
we must - also predict the branch target address
- Branch PC indexes into a predictor table
indirect branches - might be problematic
- Most common indirect branch return from a
procedure - can be easily handled with a stack of return
addresses
35Summary
- When comparing Branch predictors ensure that
they are of same size. - Correlating predictors predict branch direction
based on behavior of neighboring branches - Tournament predictors select between global and
local predictors - Integer benchmarks benefit greatly from global
and correlating predictors - Next class BTB, Dynamic Scheduling of
Instructions.