CSE 420598 Computer Architecture Lec 9 Chapter 2 Branch Prediction - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

CSE 420598 Computer Architecture Lec 9 Chapter 2 Branch Prediction

Description:

Global Predictor Example. A single register that keeps track ... Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. ... – PowerPoint PPT presentation

Number of Views:686
Avg rating:3.0/5.0
Slides: 36
Provided by: impac1
Category:

less

Transcript and Presenter's Notes

Title: CSE 420598 Computer Architecture Lec 9 Chapter 2 Branch Prediction


1
CSE 420/598 Computer Architecture Lec 9
Chapter 2 - Branch Prediction
  • Sandeep K. S. Gupta
  • School of Computing and Informatics
  • Arizona State University

Based on Slides by David Patterson, Al Davis, and
Luddy Harrison
2
Agenda
  • Dynamic Branch Prediction
  • 1-Bit Predictor
  • 2-Bit Predictor
  • Correlating Predictor
  • Tournament Predictor
  • Programming Assignment 1 Case Study 2 on pg 149
    Modeling a Branch Predictor in C or JAVA.

3
Need for Better than Static Branch Prediction
Techniques

4
Dynamic Branch Prediction
  • Why does prediction work?
  • Underlying algorithm has regularities
  • Data that is being operated on has regularities
  • Instruction sequence has redundancies that are
    artifacts of way that humans/compilers think
    about problems
  • Is dynamic branch prediction better than static
    branch prediction?
  • Seems to be
  • There are a small number of important branches in
    programs which have dynamic behavior

5
Control Hazard (Recap)
  • In the 5-stage in-order processor assume always
    taken or assume always not taken if the branch
    goes the other way, squash mis-fetched
    instructions
  • Modern out-of-order processors dynamic branch
    prediction
  • Branch predictor a cache of recent branch
    outcomes

6
Pipeline without Branch Predictor
IF (br)
PC
Reg Read Compare Br-target
PC 4
In the 5-stage pipeline, a branch completes in
two cycles ? If the branch went the wrong way,
one incorrect instr is fetched ? One stall cycle
per incorrect branch
7
Pipeline with Branch Predictor
IF (br)
PC
Reg Read Compare Br-target
Branch Predictor
In the 5-stage pipeline, a branch completes in
two cycles ? If the branch went the wrong way,
one incorrect instr is fetched ? One stall cycle
per incorrect branch
8
Branch Mispredict Penalty
  • Performance (accuracy, cost of misprediction)
  • Assume no data or structural hazards only
    control hazards every 5th instruction is a
    branch branch predictor accuracy is 90
  • Slowdown 1 / (1 stalls per instruction)
  • Stalls per instruction branches x mispreds x
    penalty
  • 20 x 10 x 1
  • 0.02
  • Slowdown 1/1.02 if penalty 20, slowdown
    1/1.4

9
Dynamic Branch Prediction 1 Bit Prediction
  • Branch History Table (BHT) Lower bits of PC
    address index table of 1-bit values
  • Says whether or not branch taken last time
  • No address check
  • For each branch, keep track of what happened last
    time and use that outcome as the prediction

10
1-bit BHT a.k.a Branch Prediction Buffer (BPB)
PredictIf BPB entry is 0, fetch PC1If BPB
entry is 1, fetch L UpdateIf branch is taken,
BPB 1If branch is not taken, BPB 0
11
State Diagram of 1-bit Predictor
12
Twice Mispredicted Loop Branches
  • M ADD R1, R2, R3
  • L ADD R4, R5, R6 MUL R7, R8, R9 SUB R11, R11,
    1 BNE L
  • SUB R10, R10, 1 BNE M

13
Sequence of Predictions
14
Problem with 1-bit BHT
  • What are prediction accuracies for branches 1 and
    2 ?
  • while (1)
  • for (i0ilt10i) branch-1
  • for (j0jlt20j) branch-2
  • Problem in a loop, 1-bit BHT will cause two
    mispredictions (avg is 9 iterations before exit)
  • End of loop case, when it exits instead of
    looping as before
  • First time through loop and on next time through
    code, when it predicts exit instead of looping

15
2-Bit Prediction
  • For each branch, maintain a 2-bit saturating
    counter
  • if the branch is taken counter
    min(3,counter1)
  • if the branch is not taken counter
    max(0,counter-1)
  • If (counter gt 2), predict taken, else predict
    not taken
  • Advantage a few atypical branches will not
    influence the prediction (a better measure of
    the common case)
  • Especially useful when multiple branches share
    the same counter (some bits of the branch PC are
    used to index into the branch predictor)
  • Can be easily extended to N-bits (in most
    processors, N2)

16
Dynamic Branch Prediction
  • Solution 2-bit scheme where change prediction
    only if get misprediction twice in a row
  • Red stop, not taken
  • Green go, taken
  • Adds hysteresis to decision making process

17
Bimodal Predictor
Table of 16K entries of 2-bit saturating counters
14 bits
Branch PC
18
BHT Accuracy
  • Mispredict because either
  • Wrong guess for that branch
  • Got branch history of wrong branch when index the
    table
  • 4096 entry table

Integer
Floating Point
19
Correlating Predictors
  • Basic branch prediction maintain a 2-bit
    saturating counter for each entry (or use 10
    branch PC bits to index into one of 1024
    counters) captures the recent common case for
    each branch
  • Can we take advantage of additional information?
  • If a branch recently went 01111, expect 0 if it
    recently went 11101, expect 1 can we have a
    separate counter for each case?
  • If the previous branches went 01, expect 0 if
    the previous branches went 11, expect 1 can we
    have a separate counter for each case?
  • Hence, build correlating predictors

20
Local/Global Predictors
  • Instead of maintaining a counter for each branch
    to capture the common case,
  • Maintain a counter for each branch and
    surrounding pattern
  • If the surrounding pattern belongs to the branch
    being predicted, the predictor is referred to as
    a local predictor
  • If the surrounding pattern includes neighboring
    branches, the predictor is referred to as a
    global predictor

21
Global Predictor
A single register that keeps track of recent
history for all branches
Table of 16K entries of 2-bit saturating counters
00110101
8 bits
6 bits
Branch PC
Also referred to as a two-level predictor
22
Local Predictor
Also a two-level predictor that only uses local
histories at the first level
Branch PC
Table of 16K entries of 2-bit saturating counters
Use 6 bits of branch PC to index into local
history table
10110111011001
14-bit history indexes into next level
Table of 64 entries of 14-bit histories for a
single branch
23
Correlated Branch Prediction
  • Idea record m most recently executed branches
    as taken or not taken, and use that pattern to
    select the proper n-bit branch history table
  • In general, (m,n) predictor means record last m
    branches to select between 2m history tables,
    each with n-bit counters
  • Thus, old 2-bit BHT is a (0,2) predictor
  • Global Branch History m-bit shift register
    keeping T/NT status of last m branches.
  • Each entry in table has m n-bit predictors.

24
Correlating Branches
(2,2) predictor Behavior of recent branches
selects between four predictions of next branch,
updating just that prediction
Branch address
4
2-bits per branch predictor
Prediction
2-bit global branch history
25
Accuracy of Different Schemes
20
4096 Entries 2-bit BHT Unlimited Entries 2-bit
BHT 1024 Entries (2,2) BHT
18
16
14
12
11
Frequency of Mispredictions
10
8
6
6
6
6
5
5
4
4
2
1
1
0
0
nasa7
matrix300
doducd
spice
fpppp
gcc
expresso
eqntott
li
tomcatv
4,096 entries 2-bits per entry
Unlimited entries 2-bits/entry
1,024 entries (2,2)
26
Tournament Predictors
  • A local predictor might work well for some
    branches or
  • programs, while a global predictor might work
    well for others
  • Provide one of each and maintain another
    predictor to
  • identify which predictor is best for each branch

Local Predictor
M U X
Global Predictor
Branch PC
Tournament Predictor
Table of 2-bit saturating counters
27
Global Predictor Example
What is the total capacity of this branch
predictor?
A single register that keeps track of recent
history for all branches
Table of 2-bit saturating counters
00110101
10 bits
4 bits
Branch PC
Also referred to as a two-level predictor
28
Local Predictor Example
What is the total capacity of this branch
predictor?
Branch PC
Table of 2-bit saturating counters
Use 8 bits of branch PC to index into local
history table
10110111
Table of 8-bit histories for a single branch
29
Example
  • Consider the following tournament branch
    predictor Fourteen bits of
  • the PC are used to index into a table of 3-bit
    saturating counters that
  • predict whether we should use a local or global
    prediction. The global
  • predictor concatenates 8 bits of branch PC and
    6 bits of global history
  • to index into 2-bit saturating counters. The
    local predictor uses 8 bits
  • of branch PC to select an 8-bit local history
    that then indexes into a
  • table of 2-bit saturating counters. What is the
    capacity of each
  • structure in this branch predictor?

30
Tournament Predictors
  • Multilevel branch predictor
  • Use n-bit saturating counter to choose between
    predictors
  • Usual choice between global and local predictors

31
Tournament Predictors
  • Tournament predictor using, say, 4K 2-bit
    counters indexed by local branch address.
    Chooses between
  • Global predictor
  • 4K entries index by history of last 12 branches
    (212 4K)
  • Each entry is a standard 2-bit predictor
  • Local predictor
  • Local history table 1024 10-bit entries
    recording last 10 branches, index by branch
    address
  • The pattern of the last 10 occurrences of that
    particular branch used to index table of 1K
    entries with 3-bit saturating counters

32
Comparing Predictors (Fig. 2.8)
  • Advantage of tournament predictor is ability to
    select the right predictor for a particular
    branch
  • Particularly crucial for integer benchmarks.
  • A typical tournament predictor will select the
    global predictor almost 40 of the time for the
    SPEC integer benchmarks and less than 15 of the
    time for the SPEC FP benchmarks

33
Pentium 4 Misprediction Rate (per 1000
instructions, not per branch)
?6 misprediction rate per branch SPECint (19
of INT instructions are branch) ?2 misprediction
rate per branch SPECfp(5 of FP instructions are
branch)
SPECint2000
SPECfp2000
34
Branch Target Prediction
  • In addition to predicting the branch direction,
    we must
  • also predict the branch target address
  • Branch PC indexes into a predictor table
    indirect branches
  • might be problematic
  • Most common indirect branch return from a
    procedure
  • can be easily handled with a stack of return
    addresses

35
Summary
  • When comparing Branch predictors ensure that
    they are of same size.
  • Correlating predictors predict branch direction
    based on behavior of neighboring branches
  • Tournament predictors select between global and
    local predictors
  • Integer benchmarks benefit greatly from global
    and correlating predictors
  • Next class BTB, Dynamic Scheduling of
    Instructions.
Write a Comment
User Comments (0)
About PowerShow.com