Predicting%20Conditional%20Branches%20With%20Fusion-Based%20Hybrid%20Predictors - PowerPoint PPT Presentation

About This Presentation

Title:

Predicting%20Conditional%20Branches%20With%20Fusion-Based%20Hybrid%20Predictors

Description:

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Yale University Dept. of Computer Science Gabriel H. Loh Yale University Depts. of Elec. Eng ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 34

Provided by: susan768

Learn more at: https://arcb.csc.ncsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Predicting%20Conditional%20Branches%20With%20Fusion-Based%20Hybrid%20Predictors

1
Predicting Conditional Branches With Fusion-Based
Hybrid Predictors
Gabriel H. Loh Yale University Dept. of Computer Science
Dana S. Henry Yale University Depts. of Elec. Eng. Comp. Sci.
This research was funded by NSF Grant MIP-9702281
2
The Branch Prediction Problem
PC Compute
Branch resolution

1 out of 5 instructions is a branch
May require many cycles to resolve
P4 has 20 cycle branch resolution pipeline
Future pipeline depths likely to increase
Sprangle02
Predict branches to keep pipeline full

3
Bigger Predictors More Accurate
(but bigger predictors slower)

Larger predictors tend to yield more accurate
predictions
Faster cycle times force smaller branch
predictors
Overriding predictor couples small, fast
predictor with a large, multi-cycle predictor
Jiménez2000
performs close to ideal large-fast predictor

4
Hybrid Predictors

Wide variety of branch prediction algorithms
available
Hybrid combines more than one stand-alone or
component predictor McFarling93

P1
P2
Meta- Predictor
Final Prediction
5
Multi-Hybrids
P1
P2
M1
P3
P4
M2
P1
P2
Pn

M3

Pr. Encoder
Final Prediction
Final Prediction
Multi-Hybrid Evers96
Quad-Hybrid Evers00
6
Our Idea Prediction Fusion
P1
P2
P3
Pn

Prediction Selection
7
Early Attempt from ML
P2
P8
P7
P3
P6
P5
P1
P4
0.487
0.513
P2, P6 and P7 say not-taken
P1, P3, P4, P5 and P8 say taken

Weighted Majority algorithm LW94
Better predictors get assigned larger weights
Make final prediction with larger sum
Predictor with largest weight not always correct

8
Outline

COLT Predictor
Choosing parameters and components
Performance
Prediction distributions, component choice

9
COLT Organization
P1
P2
P3
Pn

Branch Address
Mapping Table
Branch History
1
0
1
0

Final Prediction
VMT
10
Pathological Example
P1
P2
P3
0
0
0
Actual outcome 1 (taken)
11
Example (contd)
Selection
COLT
P1
P2
P3
P1
P2
P3
VMT
0 0 0
1 1 0 1
0
0
0
Can recognize and remember this pattern
Outcome is always wrong
1
12
COLT Lookup Delay
time
P1
P2
Pn

1
0
0
1
1
...
...
.
.
.
.
.
.
Prediction
13
Design Choices

of branch address bits
of branch history bits
of components
Choice of components
gshare, PAs, gskewed,
History length, PHT size,

Determines number of mapping tables

Determines size of individual MTs
14
Predictor Components

Global History
gshare McFarling93
Bi-Mode Lee97
Enhanced gskewed Michaud97
YAGS Eden98
Local History
PAs Yeh94
pskewed Evers96
Other
2bC (bimodal) Smith81
Loop Chang95
alloyed Perceptron Jiménez02

history lengths optimized on test data sets
Total of 59 configurations Sizes vary up to 64KB
15
Huge Search Space

259 ways to choose components
? ways to choose COLT parameters
We use a genetic search

gene format

bit-k 0 means dont include Pk bit-k 1 means
do include Pk
VMT Size
history length
16
Methodology

SPEC2000 integer benchmarks
For tuning/optimization 10M branches from test
For evaluation 500M branches from train
Skipped first 100M branches
Compiled with cc arch ev6 O4 fast non_shared
SimpleScalar simulator
sim-safe for trace collection
MASE for ILP simulations

17
Genetic Search COLT Results
Name Size (KB) Components VMT Counter width History length
a 16 alpct(34/10) gskewed(12) gshare(8) 2048 4 8
b 32 alpct(34/10) gshare(15) gshare(9) PAs(7) 8192 4 7
g 64 alpct(40/14) gshare(16) YAGS(11) pskewed(6) 16384 4 10
d 128 alpct(40/14) alpct(38/14) gshare(16) gskewed(13) YAGS(12) PAs(8) 16384 4 7
h 256 alpct(50/18) alpct(34/10) gshare(18) Bi-Mode(16) gskewed(15) PAs(8) 32768 4 4
18
Overall Predictor Performance
19
Per-Benchmark Performance
20
ILP Performance

Simulated CPU
6-issue
20 cycle pipeline
Same functional units, latencies, caches as Intel
P4/NetBurst microarchitecture

1-cycle 2bC
4-cycle OR alpct
4-cycle OR COLT
Ideal 1-cycle COLT
21
ILP Impact
22
COLT Parameter Sensitivity

Mapping table counter widths
Number of mapping tables
Number of history bits for VMT index

23
Counter Width
24
VMT Size
25
History Length
26
Explaining Choice of Components

Parameter sensitivity results shows GA performed
well for the COLT parameters
Why did it choose the component predictors that
it did?

27
Classifying COLT Predictions

We examined the b (32KB) COLT config.
For each mapping table lookup, we examine the
neighboring entries

entry 0001 NT
0010
P1
P2
P3
P4
1
0
0
1
entry 1001 T
1111
entry 1101 T
1001
28
Classifying Predictions (contd)
gshare (9)
gshare (14)
PAs (7)
alpct (34/10)
32KB COLT
Classes

easy all neighboring entries agree
short only gshare(9) distinguishes
long only gshare(14) distinguishes
local only PAs(7) distinguishes
perceptron only alpct(34/10) distinguishes
multi-length mix of gshare(9), (14) or alpct
mixed both global and local components

29
Prediction Classifications
30
Related Work/Issues

Alloyed history Skadron00
Variable path history length Stark98
Dynamic history length fitting Juan98
Interference reduction lots
COLT handles all of these cases
Doesnt support partial update policies

31
Open Research

Better individual components
Augment with SBI Manne99, agree Sprangle97
Better fusion algorithms
Hybrid fusion/selection algorithms
Other domains (branch confidence prediction,
value prediction, memory dependence prediction,
instruction criticality prediction, )

32
Summary

Fusion is more powerful than selection
Combines multiple sources of information
Branch behavior is very varied
Need long, short, global and local histories,
multiple simultaneous lengths and types of
history
COLT is one possible fusion-based predictor
Combines multiple types of information
Current best purely dynamic predictor

33
Questions?

Write a Comment

User Comments (0)