Glitch Reduction for Altera Stratix II devices - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Glitch Reduction for Altera Stratix II devices

Description:

Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D. Brown Outline Motivation ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 52

Provided by: Tomas88

Category:

more less

Transcript and Presenter's Notes

Title: Glitch Reduction for Altera Stratix II devices

1
Glitch Reduction for Altera Stratix II devices

Tomasz S. Czajkowski
PhD Candidate
University of Toronto
Supervisor Professor Stephen D. Brown

2
Outline

Motivation
Power Model
Glitch Reduction Algorithm
Results
Conclusion

3
Motivation

Glitches
Undesirable logic transitions that occur due to
delay imbalance in the logic circuit
Waste power and do not provide any useful
functionality
Can increase the average toggle rate of a net by
as much as a factor of 2
Glitches can be filtered out by strategically
inserting negative edge triggered FFs

4
Glitches in FPGAs

Due to unequal arrival time of signals at the
inputs of LUTs
Glitches can be propagated through LUTs

5
Reducing Glitches

Insert a negative edge triggered FF after a LUT
that produces or propagates glitches

4LUT
4LUT
No glitches
clock
6
Alternatives

Gated D-latch
Implement a gated D-latch in a LUT
Input signal is transparent during the latter
half of the clock period
Gated LUT
Gate the output of a LUT with the clock input
using an AND or an OR gate
Similar effect as gated D-latch
Can generate glitches too
When implemented
Gated D-latch consumes 50 more power than a FF
and double that of a gated LUT
Neither alternative is very effective

7
Background on Dynamic Power

Average Net Dynamic Power Dissipation
Pavg is average power
V is supply voltage
fclock is the clock frequency
si is the average per cycle toggle rate of a net
Ci is the capacitance of a net

8
Power Model

Goal
To be able to compute the change in dynamic power
dissipation in the logic elements affected by a
negative edge triggered FF insertion
Power dissipated by a LUT and a FF
Toggle Rate of logic signals (si)
Net capacitance (Ci)

9
LUT Power

The LUT itself dissipates an non-trivial amount
of power when its inputs toggle
We look at how the power dissipated by a LUT
relates to the frequency of its output transitions

10
LUT Power Model
11
FF Power

How much power would it cost to insert a FF into
a circuit?
What about the power cost of alternatives to a
FFs?
Gated LUT
Gated D-latch

12
Clocked Element Power Comparison
13
Wire Properties
Name Description Notation
Static Probability Probability that a wire assumes the logic value 1 in any given clock cycle. Py
Transition Probability The average number of state transitions, excluding glitches. Pt(y)
Low to High Transition Probability Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. Py1 y0
High to Low Transition Probability Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. Py0 y1
Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y)
Average Number of Glitches per cycle The average number of useless transitions per clock cycle D(y)-Pt(y)
14
Examples of Wires
Py Pt(y) Py1 y0 Py0 y1 D(y) D(y) Pt(y)
½ 1 1 1 1 0
½ ½ 0.4 0.4 ½ 0
1/8 ¼ 1/8 1 ¼ 0
1/8 ¼ 1/8 1 ½ ¼
Clock
A
B
C
D
15
Example 1
0
2
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
1
1
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
x1 ½ ½ ½ ½ ½
x2 ½ ½ ½ ½ ½
16
Static Probability

Let y f(x1,x2)x1x2

17
Probability of a specific Transition

Compute the probability of a specific transition
by using the static probability, 1?0 and 0?1
transition probability of each wire

18
Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
19
Transition Density
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
20
0?1 Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
21
1?0 Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
22
Properties of wire y in Example 1
0
2
1
1
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
y ¼ 3/8 ¼ ¾ ½
23
Example 2
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
x3 ½ ½ ½ ½ ½
y ¼ 3/8 ¼ ¾ ½
24
Computing Properties of wire z

Same computations as in Example 1.
Increase D(z) to account for glitches that occur
on wire y (Dglitch(z)). Do so only when x3
remains at constant 1 for the duration of the
clock cycle.

25
Minimum Pulse Width

When using the table to compute of transition
on a wire given initial and final state of LUT
inputs we can compute intermediate transitions
and their duration
Some intermediate pulses will be too short to
cause a full logic change at the logic output
This parameter depends on the target device used
We remove those pulses from computation
Any pulse with duration less than .25ns is
removed

26
Estimate Error
27
Particular Example mux64_16bit
28
Particular Example des_perf_opt
29
Particular Example cf_fir_24_8_8
30
Particular Example huffman
31
Net Capacitance

We need to be able to estimate net capacitance to
figure out the difference in dynamic power
dissipation due to a change in the transition
density of a net
Relate net capacitance (unavailable directly) to
net delay (available through timing report)
Distinguish between nets of different fanout

32
Fanout 1 Net Capacitance
33
Fanout 2 Net Capacitance
34
Fanout 3 Net Capacitance
35
Fanout 4 Net Capacitance
36
Higher Fanout Net Capacitance

In our benchmark set fewer than 5 of the nets
had fanout greater than 4
Clock net is excluded from calculation
Approximate capacitance of net with fanout ngt4
as
Not exact, but supports the fact that glitches on
nets with high fanout are bad
Average estimate error of 22

37
Algorithm

Scan all nets in a logic circuit to determine if
negative edge FF insertion can be applied
Analyze the resulting set of nets to determine
the benefit of applying the optimization to each
net (determined by the cost function)
Apply the optimization to a net on which the most
power could be saved
Repeat until no beneficial choices are found

38
Cost Function

Compute change in power (?P)
cost of adding a FF
- power saved on the modified net
- power saved on nets and LUTs in the transitive
fanout of the added FF
Compute the change in the minimum clock period
(?T)
Specify ?T allowed (?Ta)
where u(x) is the step function
Accept change when ?C lt 0

39
Example
LUT
FF
LUT
LUT
LUT
FF
LUT
LUT
FF
40
Example Inserted FF
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
41
Example Compute change in the of glitches
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
42
Example Compute change in the of glitches
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
43
Example Compute change in LUT power dissipation
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
44
Experimental Results

8 benchmark circuits taken from QUIP package
Synthesize, place, route and analyze timing of a
circuit using Quartus II 5.1
Apply algorithm to reduce glitches in a circuit
Aim to decrease the minimum clock period by no
more than 5
Perform timing analysis once the circuit has been
modified
Use ModelSIM-Altera 6.0c for simulation
Simulate a circuit both pre- and post-
modification using the same clock frequency
Use PowerPlay Power analyzer to estimate the
average dynamic power dissipation of each circuit

45
Experimental Results
Circuit name Simulation Clock Frequency (MHz) Minimum Clock Period Minimum Clock Period Minimum Clock Period Dynamic Power Dissipation Dynamic Power Dissipation Dynamic Power Dissipation
Circuit name Simulation Clock Frequency (MHz) Initial (ns) Final (ns) Change () Initial (mW) Final (mW) Change ()
Barrel64 200 4.386 4.806 8.74 229.94 189.7 -17.50
mux64_16bit 275 3.052 3.052 0 389.24 389.24 0.00
fip_cordic_rca 125 7.551 7.851 3.82 43.28 39.49 -8.76
oc_des_perf_opt 290 2.989 3.07 2.64 1058.8 796.7 -24.75
oc_video_compression_systems_huffman_enc 260 3.626 3.626 0 94.88 95.19 0.33
cf_fir_24_8_8 170 5.375 5.71 5.87 290.41 292.9 0.84
aes128_fast 140 6.251 6.569 4.84 879.24 870.6 -0.99
rsacypher 140 6.376 6.563 2.85 50.73 48.22 -4.95
Average Average Average Average 3.6 -7.0
46
Observations (1)

oc_des_perf_opt
Large number of XOR gates present
Removing glitches from one node removes a lot of
glitches on the nodes in its transitive fanout
(up to the next FF)
mux64_16bit
The cost function determined that no net was a
good candidate for optimization
Very few glitches were present in the circuit and
the power they dissipate was not large enough to
warrant the insertion of FFs

47
Observations (2)

cf_fir_24_8_8
Overestimated toggle rate caused the algorithm to
apply negative edge triggered FF insertion too
excessively
Need to include spatial correlation in the toggle
rate model
aes128_fast
Toggle rate is 50 higher than in oc_des_perf_opt
Most nets use local LAB connections, causing
little power dissipation
Insertion of 173 FFs only achieved 1 power
reduction
Saved 35.14 mW in routing alone, because toggle
rate on all affected wires was reduced by 50-70
Added 24.6 mW due to FF insertion
Added 1.86 mW to the power dissipated by the
clock network, because new LABs were connected to
the clock network
Net win of 8.68 mW

48
Conclusion

Negative edge triggered FF insertion can work
well to reduce glitches in a circuit
Unlike retiming, our approach only needs to
ensure that exactly one negative edge triggered
FF is on any given combinational path
Retiming may require the translation of more than
a single FF to be valid

49
Future Work

Better toggle rate prediction algorithm that
includes spatial correlation
Having FFs that can be negative edge triggered
without using an additional LAB clock line would
make the cost of this optimization lower
Silicon area cost vs. frequency of use trade-off

50
Acknowledgement

Wed like to express our gratitude to Altera for
funding this research
Wed like to thank Altera Toronto in particular
for dedicating some of their time to answer our
questions and provide insight throughout the
course of this work

51
Questions?

Write a Comment

User Comments (0)