Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection

Description:

... toward ultra-low cost, power, and energy dissipation (PicoRadio, Smart Dust, etc. ... the nature of the errors is vital to the design of a 'smart' correction ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection


1
Investigating The Robust Design of Finite State
Machines Using MVSISA Comparison of
Error-Correction Schemes
Ruth Wang EE290N Project 5/20/2004 ruthwang_at_eecs
2
Outline
  • Motivation
  • Current Solutions
  • Project Proposal Using MVSIS
  • Experimental Setup
  • Results
  • Conclusion

3
Motivation
  • The emerging trend of VLSI designs in wireless
    applications is tending toward ultra-low cost,
    power, and energy dissipation (PicoRadio, Smart
    Dust, etc.)
  • Energy dissipation E a Vdd2
  • Lower Vdd -gt get quadratic energy savings!
  • BUT, this energy savings comes at a great cost

1000 Monte Carlo simulations 4-bit static CMOS
adder in 130nm technology
  • As Vdd scales, process and operating variations
    become drastically worsened, causing gate delays
    to become not only longer but also harder to
    predict!

4
Implications For Circuit Design
  • Generalized representation of a digital system

Vdd
Vdd
assume latches are error-free, and errors only
manifest in logic circuitry
  • how is the clock period determined for nominal
    Vdd?
  • how is the clock period determined for lowered
    Vdd?
  • Use standard worst-case timing methodology
  • Clock period is set by the worst-case delay
    (critical) path
  • All delay paths (critical and non-critical) are
    guaranteed to finish evaluating within this clock
    period
  • Standard worst-case timing methodology is NOT
    reasonable
  • The critical path delay is prohibitively long and
    hard to predict
  • Set the clock period at some point and accept the
    non-zero probability of logic evaluation errors

5
Problem Statement
  • Errors propagate through design abstraction
    layers!
  • The errors caused by over-clocking occur at the
    transistor level but will affect higher-level
    abstraction layers of the system
  • These are static errors once the chip is
    fabricated, each transistor has a fixed set of
    manufactured process parameters that determine
    its speed.
  • This performance is fixed and will not change at
    run-time

FSM
0
1
0
1
0
1
logic gates
transistors
  • Question Can we compensate for the erroneous
    state transitions using fault-tolerant design
    techniques?
  • Consider tradeoff Hardware overhead vs. Energy
    savings

6
Current Solutions (1)
  • Triple Modular Redundancy 1 (for dynamic
    errors)
  • Make 3 exact copies of the same FSM and take a
    majority vote as the correct behavior

copy 1
majority voter
in
  • Disadvantages
  • 200 extra overhead due to 2 extra FSMs, plus
    majority voting hardware
  • No error detection capability
  • Not practical for static errors
  • Used mainly for extremely critical systems (e.g.
    space borne electronics)

copy 2
copy 3
1 S. Niranjan, J. Frenzel, A Comparison of
Fault-Tolerant State Machine Architectures for
Space-Borne ElectronicsIEEE Transactions on
Reliability, March 1996.
7
Current Solutions (2)
  • Adaptive Body-Biasing 3 (for static errors)
  • A transistor-level compensation technique
  • Uses a transistors body bias voltage as a
    control knob to adjust its speed

body bias voltage
  • Disadvantages
  • Range of speed improvement is limited and values
    of available bias voltages must be discretized
  • Low-level approach with very fine granularity
  • Requires an exhaustive search of slow paths at
    the transistor level

calculate appropriate body bias voltage to meet
certain delay requirement
measure transistor speed
transistor
3 J. Tschanz, et al, Adaptive Body Bias for
reducing impacts of die-to-die and within-die
parameter variations on microprocessor frequency
and leakage JSSC, November 2002.
8
Proposed Solution
  • Use MVSIS unknown component solving capability to
    explore possible error correction schemes

Spec
outputs (o)
F
inputs (i)
Faulty FSM
ctrl signals (v)
(u)
Soln X
Error Detector/ Compensator
  • Add an error detector module to compensate for
    errors
  • Builds fault tolerance into the design based upon
    a-priori knowledge of likely faulty transitions
  • Advantage high level approach (coarse
    granularity)

9
Error Control Scheme
  • Compare the effectiveness of adding enable
    signals at the behavior (FSM) and structural
    (binary gate) levels
  • In both cases the system is modeled as a Mealy
    Machine the output variables can be set
    independently of state transition

Behavioral Level
Structural Level
  • Enabling state transitions
  • Enabling state transitions (toggle state bits)

E
s0
s0
  • Enabling output values
  • Enabling output values (toggle output)

E
out1
out1
10
The FSM Under Study
Rx Controller from PicoRadio Charm Chip
  • 5 states
  • 3 inputs
  • 4 outputs

11
Behavioral Level fixed_orig.aut
  • This is the same file as spec.aut - it describes
    the desired I/O behavior without errors or enable
    signals
  • Existence of self-loops in spec causes some
    solutions of unknown component to allow
    deadlocked states because self-loops are
    considered acceptable behavior even if they are
    infinite
  • Must manually discard solutions that do not
    eventually return to idle state
  • e.g. this is not an acceptable solution

12
Structural Level fixed_orig.blif
  • Algorithm for mapping automata
  • into binary network
  • Call mvsis.rugged script for node optimization
  • Call strash to map entire network into 2-input
    AND gates
  • 3 latches (one per state bit)
  • 4 outputs
  • 23 AND gates

13
Experimental Setup
Structural Level
Behavioral Level
  • Consider this a successfully fixed error if
    solution is non-empty and contains no deadlocked
    states
  • Choose particular solution to be MGS with DC
    state removed, for simplicity (not optimal)
  • Map solution into binary gates and calculate
    added overhead
  • Compare cost vs. effectiveness of both schemes

add enable
map to binary gates
add enable
map to binary gates
MVSIS language solving script
inject errors
inject errors
one error is injected per iteration. there is
one iteration per node in the binary network
14
An Example of a Successful Result
  • Attempted Fix at structural level, add 1 Enable
    signals to toggle state bit NS0
  • Result At average cost of 192 overhead, fixes
    33 (7/18) of errors

s2
  • One example of faulty Rx Controller
  • 6 states (1 extra unwanted)
  • Faulty transitions
  • (e.g. s5 -gt s2)

s5
unwanted
  • fixed_error_40.aut 25 AND gates in binary
    mapping
  • This is the modification to the original Rx
    controller with 1 enable signal added at
    structural level, and one error injected

15
An Example of a Successful Result
Inputs same inputs seen by faulty Rx as well as
outputs produced by faulty Rx
Output enable signal to fix any errors as
necessary
  • x_40_nodc.aut 49 AND gates in binary mapping
  • This is the error control module that sends the
    appropriate sequence of enable signal values into
    faulty version of Rx controller
    (fixed_error_40.aut) to compensate for the errors

16
Results Enabling State Transitions
  • Behavioral level an enable on an arc allows
    unconditional transition
  • Structural level enable signals allows toggling
    state bits value

Example previously shown
Structural level error correction is more
effective and less costly!
17
Results Enabling Outputs
  • Behavioral level enable signal allows output 1
    regardless of state transition
  • Structural level enable signal allows toggling
    of output value
  • One enable signal added in all cases

Again structural level error correction wins in
the majority of cases!
18
Another Idea
  • The error detection scheme with the least added
    area penalty still required 192 extra overhead
    and was only 33 successful
  • The most successful error detection scheme only
    fixed 44 of errors and cost 236 extra overhead
  • Triple Modular Redundancy uses just over 200
    extra overhead, but how effective would it be for
    the Rx controller example?

majority voter
19
Another Idea
  • Question
  • How many of the 18 possible error manifestations
    still meet the behavior specification?

map to binary gates
  • Answer 0
  • TMR is not a feasible solution if an error is
    likely to happen each time!

inject errors (x 18, one per binary node)
Empty Solution Space
MVSIS language solving script
20
Conclusion
  • Building error correction into the structural
    level is easy (just addition of XORs) and has
    been shown to be more effective than enabling
    arcs at the behavioral level.
  • Adding the ability to toggle state bits is a
    far-reaching control scheme much more so than
    enabling individual transitions
  • Understanding the nature of the errors is vital
    to the design of a smart correction scheme
    TMR as a brute force approach will not suffice
    for these types of static faults
  • Directions for future work
  • Explore methods for optimization of the
    particular solution currently it is simply MGS
    with DC state removed
  • Investigate other error injection simulation
    methods besides one toggled bit at a time
  • Examine a more complex FSM with more state
    transitions will error correction incur more or
    less of a hardware penalty?
  • Explore Büchi automata as method to exclude
    deadlocked states from spec
Write a Comment
User Comments (0)
About PowerShow.com