A%20256%20Kbits%20L-TAGE%20branch%20predictor - PowerPoint PPT Presentation

About This Presentation

Title:

A%20256%20Kbits%20L-TAGE%20branch%20predictor

Description:

2bcgskew was state-of-the-art, but: but was lagging behind neural inspired ... Priviledge the smallest possible history. To minimize footprint. But not too much ... – PowerPoint PPT presentation

Number of Views:221

Avg rating:3.0/5.0

Slides: 30

Provided by: sez

Category:

more less

Transcript and Presenter's Notes

Title: A%20256%20Kbits%20L-TAGE%20branch%20predictor

1
A 256 Kbits L-TAGE branch predictor

André Seznec
IRISA/INRIA/HIPEAC

Directly derived from
A case for (partially) tagged branch
predictors,
A. Seznec and P. Michaud JILP Feb. 2006
Tricks
Loop predictor
Kernel/user histories

3
TAGE TAgged GEometric history length predictors
The genesis
4
Back around 2003

2bcgskew was state-of-the-art, but
but was lagging behind neural inspired
predictors on a few benchmarks
Just wanted to get best of both behaviors and
maintain
Reasonable implementation cost
Use only global history
Medium number of tables
In-time response

5
The basis A Multiple length global history
predictor
TO
T1
T2
?
L(0)
T3
L(1)
L(2)
T4
L(3)
L(4)
6
GEometric History Length predictor
The set of history lengths forms a geometric
series
Capture correlation on very long histories
0, 2, 4, 8, 16, 32, 64, 128
most of the storage for short history !!
What is important L(i)-L(i-1) is drastically
increasing
7
Combining multiple predictions ?

Classical solution
Use of a meta predictor
wasting storage !?!
chosing among 5 or 10 predictions ??
Neural inspired predictors, Jimenez and Lin 2001
Use an adder tree instead of a meta-predictor
Partial matching
Use tagged tables and the longest matching
history
Chen et al 96, Michaud 2005

8
CBP-1 (2004) OGEHL Final computation through a
sum
L(0)
PredictionSign
12 components 3.670 misp/KI
9
TAGEGeometric history length PPM-like
optimized update policy
Tagless base predictor
10
(No Transcript)
11
Prediction computation

General case
Longest matching component provides the
prediction
Special case
Many mispredictions on newly allocated entries
weak Ctr
On many applications, Altpred more accurate than
Pred
Property dynamically monitored through a single
4-bit counter

12
TAGE update policy

General principle
Minimize the footprint of the prediction.
Just update the longest history matching
component and allocate at most one entry on
mispredictions

13
A tagged table entry

Ctr 3-bit prediction counter
U 2-bit useful counter
Was the entry recently useful ?
Tag partial tag

14
Updating the U counter

If (Altpred ? Pred) then
Pred taken U U 1
Pred ? taken U U - 1

Graceful aging
Periodic shift of all U counters
implemented through the reset of a single bit

15
Allocating a new entry on a misprediction

Find a single useless entry with a longer
history
Priviledge the smallest possible history
To minimize footprint
But not too much
To avoid ping-pong phenomena
Initialize Ctr as weak and U as zero

16
Improve the global history

Address conditional branch history
path confusion on short histories ?
Address path
Direct hashing leads to path confusion ?
Represent all branches in branch history
Use also path history ( 1 bit per branch, limited
to 16 bits)

17
Design tradeoff for CBP2 (1)

13 components
Bring the best accuracy on distributed traces
8 components not very far !
History length
Min4 , Max 640
Could use any Min in 2,6 and any Max in 300,
2000

18
Design tradeoff for CBP2 (2)

Tag width tradeoff
(destructive) false match is better tolerated on
shorter history
7 bits on T1 to 15 bits on T12
Tuning the number of table entries
Smaller number for very long histories
Smaller number for very short histories

19
Adding a loop predictor

The loop predictor captures the number of
iterations of a loop
When successively encounters 4 times the same
number of iterations, the loop predictor
provides the prediction.
Advantages
Very reliable
Small storage budget 256 52-bit entries
Complexity ?
Might be difficult to manage speculative
iteration numbers on deep pipelines

20
Using a kernel history and a user history

Traces mix user and kernel activities
Kernel activity after exception
Global history pollution
Solution use two separate global histories
User history is updated only in user mode
Kernel history is updated in both modes

21
L-TAGE submission accuracy (distributed traces)
3.314 misp/KI
22
Reducing L-TAGE complexity

Included 241,5 Kbits TAGE predictor
3.368 misp/KI
Loop predictor beneficial only on gzip
Might not be worth the extra complexity

23
Using less tables

8 components 256 Kbits TAGE predictor
3.446 misp/KI

24
TAGE prediction computation time ?

3 successive steps
Index computation
Table read
Partial match multiplexor
Does not fit on a single cycle
But can be ahead pipelined !

25
Ahead pipelining a global history branch
predictor (principle)

Initiate branch prediction X1 cycles in advance
to provide the prediction in time
Use information available
X-block ahead instruction address
X-block ahead history
To ensure accuracy
Use intermediate path information

26
Practice
A
C
B
bc
Ahead pipelined TAGE 4// prediction computations
Ha A
27
3-branch ahead pipelined 8 component 256 Kbits
TAGE
3.552 misp/KI
28
A final case for the Geometric History Length
predictors