Glitch Reduction for Altera Stratix II devices - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Glitch Reduction for Altera Stratix II devices

Description:

Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D. Brown Outline Motivation ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 52
Provided by: Tomas88
Category:

less

Transcript and Presenter's Notes

Title: Glitch Reduction for Altera Stratix II devices


1
Glitch Reduction for Altera Stratix II devices
  • Tomasz S. Czajkowski
  • PhD Candidate
  • University of Toronto
  • Supervisor Professor Stephen D. Brown

2
Outline
  • Motivation
  • Power Model
  • Glitch Reduction Algorithm
  • Results
  • Conclusion

3
Motivation
  • Glitches
  • Undesirable logic transitions that occur due to
    delay imbalance in the logic circuit
  • Waste power and do not provide any useful
    functionality
  • Can increase the average toggle rate of a net by
    as much as a factor of 2
  • Glitches can be filtered out by strategically
    inserting negative edge triggered FFs

4
Glitches in FPGAs
  • Due to unequal arrival time of signals at the
    inputs of LUTs
  • Glitches can be propagated through LUTs

5
Reducing Glitches
  • Insert a negative edge triggered FF after a LUT
    that produces or propagates glitches

4LUT
4LUT
No glitches
clock
6
Alternatives
  • Gated D-latch
  • Implement a gated D-latch in a LUT
  • Input signal is transparent during the latter
    half of the clock period
  • Gated LUT
  • Gate the output of a LUT with the clock input
    using an AND or an OR gate
  • Similar effect as gated D-latch
  • Can generate glitches too
  • When implemented
  • Gated D-latch consumes 50 more power than a FF
    and double that of a gated LUT
  • Neither alternative is very effective

7
Background on Dynamic Power
  • Average Net Dynamic Power Dissipation
  • Pavg is average power
  • V is supply voltage
  • fclock is the clock frequency
  • si is the average per cycle toggle rate of a net
  • Ci is the capacitance of a net

8
Power Model
  • Goal
  • To be able to compute the change in dynamic power
    dissipation in the logic elements affected by a
    negative edge triggered FF insertion
  • Power dissipated by a LUT and a FF
  • Toggle Rate of logic signals (si)
  • Net capacitance (Ci)

9
LUT Power
  • The LUT itself dissipates an non-trivial amount
    of power when its inputs toggle
  • We look at how the power dissipated by a LUT
    relates to the frequency of its output transitions

10
LUT Power Model
11
FF Power
  • How much power would it cost to insert a FF into
    a circuit?
  • What about the power cost of alternatives to a
    FFs?
  • Gated LUT
  • Gated D-latch

12
Clocked Element Power Comparison
13
Wire Properties
Name Description Notation
Static Probability Probability that a wire assumes the logic value 1 in any given clock cycle. Py
Transition Probability The average number of state transitions, excluding glitches. Pt(y)
Low to High Transition Probability Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. Py1 y0
High to Low Transition Probability Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. Py0 y1
Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y)
Average Number of Glitches per cycle The average number of useless transitions per clock cycle D(y)-Pt(y)
14
Examples of Wires
Py Pt(y) Py1 y0 Py0 y1 D(y) D(y) Pt(y)
½ 1 1 1 1 0
½ ½ 0.4 0.4 ½ 0
1/8 ¼ 1/8 1 ¼ 0
1/8 ¼ 1/8 1 ½ ¼
Clock
A
B
C
D
15
Example 1
0
2
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
1
1
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
x1 ½ ½ ½ ½ ½
x2 ½ ½ ½ ½ ½
16
Static Probability
  • Let y f(x1,x2)x1x2

17
Probability of a specific Transition
  • Compute the probability of a specific transition
    by using the static probability, 1?0 and 0?1
    transition probability of each wire

18
Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
19
Transition Density
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
20
0?1 Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
21
1?0 Transition Probability
Initial state x1x2 Final state x1x2 Transitions on y (Trans(x1x2,x1x2))
00 00 0
00 01 0
00 10 0
00 11 1
01 00 0
01 01 0
01 10 2
01 11 1
10 00 0
10 01 0
10 10 0
10 11 1
11 00 1
11 01 1
11 10 1
11 11 0
22
Properties of wire y in Example 1
0
2
1
1
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
y ¼ 3/8 ¼ ¾ ½
23
Example 2
Name Py Pt(y) Py1 y0 Py0 y1 D(y)
x3 ½ ½ ½ ½ ½
y ¼ 3/8 ¼ ¾ ½
24
Computing Properties of wire z
  • Same computations as in Example 1.
  • Increase D(z) to account for glitches that occur
    on wire y (Dglitch(z)). Do so only when x3
    remains at constant 1 for the duration of the
    clock cycle.

25
Minimum Pulse Width
  • When using the table to compute of transition
    on a wire given initial and final state of LUT
    inputs we can compute intermediate transitions
    and their duration
  • Some intermediate pulses will be too short to
    cause a full logic change at the logic output
  • This parameter depends on the target device used
  • We remove those pulses from computation
  • Any pulse with duration less than .25ns is
    removed

26
Estimate Error
27
Particular Example mux64_16bit
28
Particular Example des_perf_opt
29
Particular Example cf_fir_24_8_8
30
Particular Example huffman
31
Net Capacitance
  • We need to be able to estimate net capacitance to
    figure out the difference in dynamic power
    dissipation due to a change in the transition
    density of a net
  • Relate net capacitance (unavailable directly) to
    net delay (available through timing report)
  • Distinguish between nets of different fanout

32
Fanout 1 Net Capacitance
33
Fanout 2 Net Capacitance
34
Fanout 3 Net Capacitance
35
Fanout 4 Net Capacitance
36
Higher Fanout Net Capacitance
  • In our benchmark set fewer than 5 of the nets
    had fanout greater than 4
  • Clock net is excluded from calculation
  • Approximate capacitance of net with fanout ngt4
    as
  • Not exact, but supports the fact that glitches on
    nets with high fanout are bad
  • Average estimate error of 22

37
Algorithm
  1. Scan all nets in a logic circuit to determine if
    negative edge FF insertion can be applied
  2. Analyze the resulting set of nets to determine
    the benefit of applying the optimization to each
    net (determined by the cost function)
  3. Apply the optimization to a net on which the most
    power could be saved
  4. Repeat until no beneficial choices are found

38
Cost Function
  • Compute change in power (?P)
  • cost of adding a FF
  • - power saved on the modified net
  • - power saved on nets and LUTs in the transitive
    fanout of the added FF
  • Compute the change in the minimum clock period
    (?T)
  • Specify ?T allowed (?Ta)
  • where u(x) is the step function
  • Accept change when ?C lt 0

39
Example
LUT
FF
LUT
LUT
LUT
FF
LUT
LUT
FF
40
Example Inserted FF
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
41
Example Compute change in the of glitches
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
42
Example Compute change in the of glitches
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
43
Example Compute change in LUT power dissipation
LUT
FF
LUT
LUT
Neg FF
LUT
FF
LUT
LUT
FF
44
Experimental Results
  • 8 benchmark circuits taken from QUIP package
  • Synthesize, place, route and analyze timing of a
    circuit using Quartus II 5.1
  • Apply algorithm to reduce glitches in a circuit
  • Aim to decrease the minimum clock period by no
    more than 5
  • Perform timing analysis once the circuit has been
    modified
  • Use ModelSIM-Altera 6.0c for simulation
  • Simulate a circuit both pre- and post-
    modification using the same clock frequency
  • Use PowerPlay Power analyzer to estimate the
    average dynamic power dissipation of each circuit

45
Experimental Results
Circuit name Simulation Clock Frequency (MHz) Minimum Clock Period Minimum Clock Period Minimum Clock Period Dynamic Power Dissipation Dynamic Power Dissipation Dynamic Power Dissipation
Circuit name Simulation Clock Frequency (MHz) Initial (ns) Final (ns) Change () Initial (mW) Final (mW) Change ()
Barrel64 200 4.386 4.806 8.74 229.94 189.7 -17.50
mux64_16bit 275 3.052 3.052 0 389.24 389.24 0.00
fip_cordic_rca 125 7.551 7.851 3.82 43.28 39.49 -8.76
oc_des_perf_opt 290 2.989 3.07 2.64 1058.8 796.7 -24.75
oc_video_compression_systems_huffman_enc 260 3.626 3.626 0 94.88 95.19 0.33
cf_fir_24_8_8 170 5.375 5.71 5.87 290.41 292.9 0.84
aes128_fast 140 6.251 6.569 4.84 879.24 870.6 -0.99
rsacypher 140 6.376 6.563 2.85 50.73 48.22 -4.95
Average Average Average Average 3.6 -7.0
46
Observations (1)
  • oc_des_perf_opt
  • Large number of XOR gates present
  • Removing glitches from one node removes a lot of
    glitches on the nodes in its transitive fanout
    (up to the next FF)
  • mux64_16bit
  • The cost function determined that no net was a
    good candidate for optimization
  • Very few glitches were present in the circuit and
    the power they dissipate was not large enough to
    warrant the insertion of FFs

47
Observations (2)
  • cf_fir_24_8_8
  • Overestimated toggle rate caused the algorithm to
    apply negative edge triggered FF insertion too
    excessively
  • Need to include spatial correlation in the toggle
    rate model
  • aes128_fast
  • Toggle rate is 50 higher than in oc_des_perf_opt
  • Most nets use local LAB connections, causing
    little power dissipation
  • Insertion of 173 FFs only achieved 1 power
    reduction
  • Saved 35.14 mW in routing alone, because toggle
    rate on all affected wires was reduced by 50-70
  • Added 24.6 mW due to FF insertion
  • Added 1.86 mW to the power dissipated by the
    clock network, because new LABs were connected to
    the clock network
  • Net win of 8.68 mW

48
Conclusion
  • Negative edge triggered FF insertion can work
    well to reduce glitches in a circuit
  • Unlike retiming, our approach only needs to
    ensure that exactly one negative edge triggered
    FF is on any given combinational path
  • Retiming may require the translation of more than
    a single FF to be valid

49
Future Work
  • Better toggle rate prediction algorithm that
    includes spatial correlation
  • Having FFs that can be negative edge triggered
    without using an additional LAB clock line would
    make the cost of this optimization lower
  • Silicon area cost vs. frequency of use trade-off

50
Acknowledgement
  • Wed like to express our gratitude to Altera for
    funding this research
  • Wed like to thank Altera Toronto in particular
    for dedicating some of their time to answer our
    questions and provide insight throughout the
    course of this work

51
Questions?
Write a Comment
User Comments (0)
About PowerShow.com