Title: Glitch Reduction for Altera Stratix II devices
1Glitch Reduction for Altera Stratix II devices
- Tomasz S. Czajkowski
- PhD Candidate
- University of Toronto
- Supervisor Professor Stephen D. Brown
2Outline
- Motivation
- Power Model
- Glitch Reduction Algorithm
- Results
- Conclusion
3Motivation
- Glitches
- Undesirable logic transitions that occur due to
delay imbalance in the logic circuit - Waste power and do not provide any useful
functionality - Can increase the average toggle rate of a net by
as much as a factor of 2 - Glitches can be filtered out by strategically
inserting negative edge triggered FFs
4Glitches in FPGAs
- Due to unequal arrival time of signals at the
inputs of LUTs - Glitches can be propagated through LUTs
5Reducing Glitches
- Insert a negative edge triggered FF after a LUT
that produces or propagates glitches
4LUT
4LUT
No glitches
clock
6Alternatives
- Gated D-latch
- Implement a gated D-latch in a LUT
- Input signal is transparent during the latter
half of the clock period - Gated LUT
- Gate the output of a LUT with the clock input
using an AND or an OR gate - Similar effect as gated D-latch
- Can generate glitches too
- When implemented
- Gated D-latch consumes 50 more power than a FF
and double that of a gated LUT - Neither alternative is very effective
7Background on Dynamic Power
- Average Net Dynamic Power Dissipation
- Pavg is average power
- V is supply voltage
- fclock is the clock frequency
- si is the average per cycle toggle rate of a net
- Ci is the capacitance of a net
8Power Model
- Goal
- To be able to compute the change in dynamic power
dissipation in the logic elements affected by a
negative edge triggered FF insertion - Power dissipated by a LUT and a FF
- Toggle Rate of logic signals (si)
- Net capacitance (Ci)
9LUT Power
- The LUT itself dissipates an non-trivial amount
of power when its inputs toggle - We look at how the power dissipated by a LUT
relates to the frequency of its output transitions
10LUT Power Model
11FF Power
- How much power would it cost to insert a FF into
a circuit? - What about the power cost of alternatives to a
FFs? - Gated LUT
- Gated D-latch
12Clocked Element Power Comparison
13Wire Properties
Name Description Notation
Static Probability Probability that a wire assumes the logic value 1 in any given clock cycle. Py
Transition Probability The average number of state transitions, excluding glitches. Pt(y)
Low to High Transition Probability Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. Py1 y0
High to Low Transition Probability Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. Py0 y1
Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y)
Average Number of Glitches per cycle The average number of useless transitions per clock cycle D(y)-Pt(y)
14Examples of Wires
Py Pt(y) Py1 y0 Py0 y1 D(y) D(y) Pt(y)
½ 1 1 1 1 0
½ ½ 0.4 0.4 ½ 0
1/8 ¼ 1/8 1 ¼ 0
1/8 ¼ 1/8 1 ½ ¼
Clock
A
B
C
D
15Example 1
0
2
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
1
1
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
x1 ½ ½ ½ ½ ½
x2 ½ ½ ½ ½ ½
16Static Probability
17Probability of a specific Transition
- Compute the probability of a specific transition
by using the static probability, 1?0 and 0?1
transition probability of each wire
18Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
19Transition Density
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
200?1 Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
211?0 Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
22Properties of wire y in Example 1
0
2
1
1
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
y ¼ 3/8 ¼ ¾ ½
23Example 2
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
x3 ½ ½ ½ ½ ½
y ¼ 3/8 ¼ ¾ ½
24Computing Properties of wire z
- Same computations as in Example 1.
- Increase D(z) to account for glitches that occur
on wire y (Dglitch(z)). Do so only when x3
remains at constant 1 for the duration of the
clock cycle.
25Minimum Pulse Width
- When using the table to compute of transition
on a wire given initial and final state of LUT
inputs we can compute intermediate transitions
and their duration - Some intermediate pulses will be too short to
cause a full logic change at the logic output - This parameter depends on the target device used
- We remove those pulses from computation
- Any pulse with duration less than .25ns is
removed
26Estimate Error
27Particular Example mux64_16bit
28Particular Example des_perf_opt
29Particular Example cf_fir_24_8_8
30Particular Example huffman
31Net Capacitance
- We need to be able to estimate net capacitance to
figure out the difference in dynamic power
dissipation due to a change in the transition
density of a net - Relate net capacitance (unavailable directly) to
net delay (available through timing report) - Distinguish between nets of different fanout
32Fanout 1 Net Capacitance
33Fanout 2 Net Capacitance
34Fanout 3 Net Capacitance
35Fanout 4 Net Capacitance
36Higher Fanout Net Capacitance
- In our benchmark set fewer than 5 of the nets
had fanout greater than 4 - Clock net is excluded from calculation
- Approximate capacitance of net with fanout ngt4
as - Not exact, but supports the fact that glitches on
nets with high fanout are bad - Average estimate error of 22
37Algorithm
- Scan all nets in a logic circuit to determine if
negative edge FF insertion can be applied - Analyze the resulting set of nets to determine
the benefit of applying the optimization to each
net (determined by the cost function) - Apply the optimization to a net on which the most
power could be saved - Repeat until no beneficial choices are found
38Cost Function
- Compute change in power (?P)
- cost of adding a FF
- - power saved on the modified net
- - power saved on nets and LUTs in the transitive
fanout of the added FF - Compute the change in the minimum clock period
(?T) - Specify ?T allowed (?Ta)
- where u(x) is the step function
- Accept change when ?C lt 0
39Example
LUT
FF
LUT
LUT
LUT
FF
LUT
LUT
FF
40Example Inserted FF
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
41Example Compute change in the of glitches
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
42Example Compute change in the of glitches
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
43Example Compute change in LUT power dissipation
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
44Experimental Results
- 8 benchmark circuits taken from QUIP package
- Synthesize, place, route and analyze timing of a
circuit using Quartus II 5.1 - Apply algorithm to reduce glitches in a circuit
- Aim to decrease the minimum clock period by no
more than 5 - Perform timing analysis once the circuit has been
modified - Use ModelSIM-Altera 6.0c for simulation
- Simulate a circuit both pre- and post-
modification using the same clock frequency - Use PowerPlay Power analyzer to estimate the
average dynamic power dissipation of each circuit
45Experimental Results
Circuit name Simulation Clock Frequency (MHz) Minimum Clock Period Minimum Clock Period Minimum Clock Period Dynamic Power Dissipation Dynamic Power Dissipation Dynamic Power Dissipation
Circuit name Simulation Clock Frequency (MHz) Initial (ns) Final (ns) Change () Initial (mW) Final (mW) Change ()
Barrel64 200 4.386 4.806 8.74 229.94 189.7 -17.50
mux64_16bit 275 3.052 3.052 0 389.24 389.24 0.00
fip_cordic_rca 125 7.551 7.851 3.82 43.28 39.49 -8.76
oc_des_perf_opt 290 2.989 3.07 2.64 1058.8 796.7 -24.75
oc_video_compression_systems_huffman_enc 260 3.626 3.626 0 94.88 95.19 0.33
cf_fir_24_8_8 170 5.375 5.71 5.87 290.41 292.9 0.84
aes128_fast 140 6.251 6.569 4.84 879.24 870.6 -0.99
rsacypher 140 6.376 6.563 2.85 50.73 48.22 -4.95
Average Average Average Average 3.6 -7.0
46Observations (1)
- oc_des_perf_opt
- Large number of XOR gates present
- Removing glitches from one node removes a lot of
glitches on the nodes in its transitive fanout
(up to the next FF) - mux64_16bit
- The cost function determined that no net was a
good candidate for optimization - Very few glitches were present in the circuit and
the power they dissipate was not large enough to
warrant the insertion of FFs
47Observations (2)
- cf_fir_24_8_8
- Overestimated toggle rate caused the algorithm to
apply negative edge triggered FF insertion too
excessively - Need to include spatial correlation in the toggle
rate model - aes128_fast
- Toggle rate is 50 higher than in oc_des_perf_opt
- Most nets use local LAB connections, causing
little power dissipation - Insertion of 173 FFs only achieved 1 power
reduction - Saved 35.14 mW in routing alone, because toggle
rate on all affected wires was reduced by 50-70 - Added 24.6 mW due to FF insertion
- Added 1.86 mW to the power dissipated by the
clock network, because new LABs were connected to
the clock network - Net win of 8.68 mW
48Conclusion
- Negative edge triggered FF insertion can work
well to reduce glitches in a circuit - Unlike retiming, our approach only needs to
ensure that exactly one negative edge triggered
FF is on any given combinational path - Retiming may require the translation of more than
a single FF to be valid
49Future Work
- Better toggle rate prediction algorithm that
includes spatial correlation - Having FFs that can be negative edge triggered
without using an additional LAB clock line would
make the cost of this optimization lower - Silicon area cost vs. frequency of use trade-off
50Acknowledgement
- Wed like to express our gratitude to Altera for
funding this research - Wed like to thank Altera Toronto in particular
for dedicating some of their time to answer our
questions and provide insight throughout the
course of this work
51Questions?