Title: Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection
1Investigating The Robust Design of Finite State
Machines Using MVSISA Comparison of
Error-Correction Schemes
Ruth Wang EE290N Project 5/20/2004 ruthwang_at_eecs
2Outline
- Motivation
- Current Solutions
- Project Proposal Using MVSIS
- Experimental Setup
- Results
- Conclusion
3Motivation
- The emerging trend of VLSI designs in wireless
applications is tending toward ultra-low cost,
power, and energy dissipation (PicoRadio, Smart
Dust, etc.)
- Energy dissipation E a Vdd2
- Lower Vdd -gt get quadratic energy savings!
- BUT, this energy savings comes at a great cost
1000 Monte Carlo simulations 4-bit static CMOS
adder in 130nm technology
- As Vdd scales, process and operating variations
become drastically worsened, causing gate delays
to become not only longer but also harder to
predict!
4Implications For Circuit Design
- Generalized representation of a digital system
Vdd
Vdd
assume latches are error-free, and errors only
manifest in logic circuitry
- how is the clock period determined for nominal
Vdd?
- how is the clock period determined for lowered
Vdd?
- Use standard worst-case timing methodology
- Clock period is set by the worst-case delay
(critical) path - All delay paths (critical and non-critical) are
guaranteed to finish evaluating within this clock
period
- Standard worst-case timing methodology is NOT
reasonable - The critical path delay is prohibitively long and
hard to predict - Set the clock period at some point and accept the
non-zero probability of logic evaluation errors
5Problem Statement
- Errors propagate through design abstraction
layers! - The errors caused by over-clocking occur at the
transistor level but will affect higher-level
abstraction layers of the system - These are static errors once the chip is
fabricated, each transistor has a fixed set of
manufactured process parameters that determine
its speed. - This performance is fixed and will not change at
run-time
FSM
0
1
0
1
0
1
logic gates
transistors
- Question Can we compensate for the erroneous
state transitions using fault-tolerant design
techniques? - Consider tradeoff Hardware overhead vs. Energy
savings
6Current Solutions (1)
- Triple Modular Redundancy 1 (for dynamic
errors) - Make 3 exact copies of the same FSM and take a
majority vote as the correct behavior
copy 1
majority voter
in
- Disadvantages
- 200 extra overhead due to 2 extra FSMs, plus
majority voting hardware - No error detection capability
- Not practical for static errors
- Used mainly for extremely critical systems (e.g.
space borne electronics)
copy 2
copy 3
1 S. Niranjan, J. Frenzel, A Comparison of
Fault-Tolerant State Machine Architectures for
Space-Borne ElectronicsIEEE Transactions on
Reliability, March 1996.
7Current Solutions (2)
- Adaptive Body-Biasing 3 (for static errors)
- A transistor-level compensation technique
- Uses a transistors body bias voltage as a
control knob to adjust its speed
body bias voltage
- Disadvantages
- Range of speed improvement is limited and values
of available bias voltages must be discretized - Low-level approach with very fine granularity
- Requires an exhaustive search of slow paths at
the transistor level
calculate appropriate body bias voltage to meet
certain delay requirement
measure transistor speed
transistor
3 J. Tschanz, et al, Adaptive Body Bias for
reducing impacts of die-to-die and within-die
parameter variations on microprocessor frequency
and leakage JSSC, November 2002.
8Proposed Solution
- Use MVSIS unknown component solving capability to
explore possible error correction schemes
Spec
outputs (o)
F
inputs (i)
Faulty FSM
ctrl signals (v)
(u)
Soln X
Error Detector/ Compensator
- Add an error detector module to compensate for
errors - Builds fault tolerance into the design based upon
a-priori knowledge of likely faulty transitions - Advantage high level approach (coarse
granularity)
9Error Control Scheme
- Compare the effectiveness of adding enable
signals at the behavior (FSM) and structural
(binary gate) levels - In both cases the system is modeled as a Mealy
Machine the output variables can be set
independently of state transition -
Behavioral Level
Structural Level
- Enabling state transitions
- Enabling state transitions (toggle state bits)
E
s0
s0
- Enabling output values (toggle output)
E
out1
out1
10The FSM Under Study
Rx Controller from PicoRadio Charm Chip
- 5 states
- 3 inputs
- 4 outputs
11Behavioral Level fixed_orig.aut
- This is the same file as spec.aut - it describes
the desired I/O behavior without errors or enable
signals - Existence of self-loops in spec causes some
solutions of unknown component to allow
deadlocked states because self-loops are
considered acceptable behavior even if they are
infinite - Must manually discard solutions that do not
eventually return to idle state
- e.g. this is not an acceptable solution
12Structural Level fixed_orig.blif
- Algorithm for mapping automata
- into binary network
- Call mvsis.rugged script for node optimization
- Call strash to map entire network into 2-input
AND gates
- 3 latches (one per state bit)
- 4 outputs
- 23 AND gates
13Experimental Setup
Structural Level
Behavioral Level
- Consider this a successfully fixed error if
solution is non-empty and contains no deadlocked
states - Choose particular solution to be MGS with DC
state removed, for simplicity (not optimal) - Map solution into binary gates and calculate
added overhead - Compare cost vs. effectiveness of both schemes
add enable
map to binary gates
add enable
map to binary gates
MVSIS language solving script
inject errors
inject errors
one error is injected per iteration. there is
one iteration per node in the binary network
14An Example of a Successful Result
- Attempted Fix at structural level, add 1 Enable
signals to toggle state bit NS0 - Result At average cost of 192 overhead, fixes
33 (7/18) of errors
s2
- One example of faulty Rx Controller
- 6 states (1 extra unwanted)
- Faulty transitions
- (e.g. s5 -gt s2)
s5
unwanted
- fixed_error_40.aut 25 AND gates in binary
mapping - This is the modification to the original Rx
controller with 1 enable signal added at
structural level, and one error injected
15An Example of a Successful Result
Inputs same inputs seen by faulty Rx as well as
outputs produced by faulty Rx
Output enable signal to fix any errors as
necessary
- x_40_nodc.aut 49 AND gates in binary mapping
- This is the error control module that sends the
appropriate sequence of enable signal values into
faulty version of Rx controller
(fixed_error_40.aut) to compensate for the errors
16Results Enabling State Transitions
- Behavioral level an enable on an arc allows
unconditional transition
- Structural level enable signals allows toggling
state bits value
Example previously shown
Structural level error correction is more
effective and less costly!
17Results Enabling Outputs
- Behavioral level enable signal allows output 1
regardless of state transition - Structural level enable signal allows toggling
of output value - One enable signal added in all cases
Again structural level error correction wins in
the majority of cases!
18Another Idea
- The error detection scheme with the least added
area penalty still required 192 extra overhead
and was only 33 successful - The most successful error detection scheme only
fixed 44 of errors and cost 236 extra overhead - Triple Modular Redundancy uses just over 200
extra overhead, but how effective would it be for
the Rx controller example?
majority voter
19Another Idea
- Question
- How many of the 18 possible error manifestations
still meet the behavior specification?
map to binary gates
- Answer 0
- TMR is not a feasible solution if an error is
likely to happen each time!
inject errors (x 18, one per binary node)
Empty Solution Space
MVSIS language solving script
20Conclusion
- Building error correction into the structural
level is easy (just addition of XORs) and has
been shown to be more effective than enabling
arcs at the behavioral level. - Adding the ability to toggle state bits is a
far-reaching control scheme much more so than
enabling individual transitions - Understanding the nature of the errors is vital
to the design of a smart correction scheme
TMR as a brute force approach will not suffice
for these types of static faults - Directions for future work
- Explore methods for optimization of the
particular solution currently it is simply MGS
with DC state removed - Investigate other error injection simulation
methods besides one toggled bit at a time - Examine a more complex FSM with more state
transitions will error correction incur more or
less of a hardware penalty? - Explore Büchi automata as method to exclude
deadlocked states from spec