A%20256%20Kbits%20L-TAGE%20branch%20predictor - PowerPoint PPT Presentation

About This Presentation
Title:

A%20256%20Kbits%20L-TAGE%20branch%20predictor

Description:

2bcgskew was state-of-the-art, but: but was lagging behind neural inspired ... Priviledge the smallest possible history. To minimize footprint. But not too much ... – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 30
Provided by: sez
Category:

less

Transcript and Presenter's Notes

Title: A%20256%20Kbits%20L-TAGE%20branch%20predictor


1
A 256 Kbits L-TAGE branch predictor
  • André Seznec
  • IRISA/INRIA/HIPEAC

2
  • Directly derived from
  • A case for (partially) tagged branch
    predictors,
  • A. Seznec and P. Michaud JILP Feb. 2006
  • Tricks
  • Loop predictor
  • Kernel/user histories

3
TAGE TAgged GEometric history length predictors
The genesis
4
Back around 2003
  • 2bcgskew was state-of-the-art, but
  • but was lagging behind neural inspired
    predictors on a few benchmarks
  • Just wanted to get best of both behaviors and
    maintain
  • Reasonable implementation cost
  • Use only global history
  • Medium number of tables
  • In-time response

5
The basis A Multiple length global history
predictor
TO
T1
T2
?
L(0)
T3
L(1)
L(2)
T4
L(3)
L(4)
6
GEometric History Length predictor
The set of history lengths forms a geometric
series
Capture correlation on very long histories
0, 2, 4, 8, 16, 32, 64, 128
most of the storage for short history !!
What is important L(i)-L(i-1) is drastically
increasing
7
Combining multiple predictions ?
  • Classical solution
  • Use of a meta predictor
  • wasting storage !?!
  • chosing among 5 or 10 predictions ??
  • Neural inspired predictors, Jimenez and Lin 2001
  • Use an adder tree instead of a meta-predictor
  • Partial matching
  • Use tagged tables and the longest matching
    history
  • Chen et al 96, Michaud 2005

8
CBP-1 (2004) OGEHL Final computation through a
sum
L(0)
PredictionSign
12 components 3.670 misp/KI
9
TAGEGeometric history length PPM-like
optimized update policy
Tagless base predictor
10
(No Transcript)
11
Prediction computation
  • General case
  • Longest matching component provides the
    prediction
  • Special case
  • Many mispredictions on newly allocated entries
    weak Ctr
  • On many applications, Altpred more accurate than
    Pred
  • Property dynamically monitored through a single
    4-bit counter

12
TAGE update policy
  • General principle
  • Minimize the footprint of the prediction.
  • Just update the longest history matching
    component and allocate at most one entry on
    mispredictions

13
A tagged table entry
  • Ctr 3-bit prediction counter
  • U 2-bit useful counter
  • Was the entry recently useful ?
  • Tag partial tag

14
Updating the U counter
  • If (Altpred ? Pred) then
  • Pred taken U U 1
  • Pred ? taken U U - 1
  • Graceful aging
  • Periodic shift of all U counters
  • implemented through the reset of a single bit

15
Allocating a new entry on a misprediction
  • Find a single useless entry with a longer
    history
  • Priviledge the smallest possible history
  • To minimize footprint
  • But not too much
  • To avoid ping-pong phenomena
  • Initialize Ctr as weak and U as zero

16
Improve the global history
  • Address conditional branch history
  • path confusion on short histories ?
  • Address path
  • Direct hashing leads to path confusion ?
  • Represent all branches in branch history
  • Use also path history ( 1 bit per branch, limited
    to 16 bits)

17
Design tradeoff for CBP2 (1)
  • 13 components
  • Bring the best accuracy on distributed traces
  • 8 components not very far !
  • History length
  • Min4 , Max 640
  • Could use any Min in 2,6 and any Max in 300,
    2000

18
Design tradeoff for CBP2 (2)
  • Tag width tradeoff
  • (destructive) false match is better tolerated on
    shorter history
  • 7 bits on T1 to 15 bits on T12
  • Tuning the number of table entries
  • Smaller number for very long histories
  • Smaller number for very short histories

19
Adding a loop predictor
  • The loop predictor captures the number of
    iterations of a loop
  • When successively encounters 4 times the same
    number of iterations, the loop predictor
    provides the prediction.
  • Advantages
  • Very reliable
  • Small storage budget 256 52-bit entries
  • Complexity ?
  • Might be difficult to manage speculative
    iteration numbers on deep pipelines

20
Using a kernel history and a user history
  • Traces mix user and kernel activities
  • Kernel activity after exception
  • Global history pollution
  • Solution use two separate global histories
  • User history is updated only in user mode
  • Kernel history is updated in both modes

21
L-TAGE submission accuracy (distributed traces)
3.314 misp/KI
22
Reducing L-TAGE complexity
  • Included 241,5 Kbits TAGE predictor
  • 3.368 misp/KI
  • Loop predictor beneficial only on gzip
  • Might not be worth the extra complexity

23
Using less tables
  • 8 components 256 Kbits TAGE predictor
  • 3.446 misp/KI

24
TAGE prediction computation time ?
  • 3 successive steps
  • Index computation
  • Table read
  • Partial match multiplexor
  • Does not fit on a single cycle
  • But can be ahead pipelined !

25
Ahead pipelining a global history branch
predictor (principle)
  • Initiate branch prediction X1 cycles in advance
    to provide the prediction in time
  • Use information available
  • X-block ahead instruction address
  • X-block ahead history
  • To ensure accuracy
  • Use intermediate path information

26
Practice
A
C
B
bc
Ahead pipelined TAGE 4// prediction computations
Ha A
27
3-branch ahead pipelined 8 component 256 Kbits
TAGE
3.552 misp/KI
28
A final case for the Geometric History Length
predictors
  • delivers state-of-the-art accuracy
  • uses only global information
  • Very long history 300 bits !!
  • can be ahead pipelined
  • many effective design points
  • OGEHL or TAGE ?
  • Nb of tables, history lengths

29
The End ?
Write a Comment
User Comments (0)
About PowerShow.com