Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection

Description:

... toward ultra-low cost, power, and energy dissipation (PicoRadio, Smart Dust, etc. ... the nature of the errors is vital to the design of a 'smart' correction ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 21

Provided by: ruth2

Category:

more less

Transcript and Presenter's Notes

Title: Investigating The Robust Design of Finite State Machines Using MVSIS A Comparison of ErrorCorrection

1
Investigating The Robust Design of Finite State
Machines Using MVSISA Comparison of
Error-Correction Schemes
Ruth Wang EE290N Project 5/20/2004 ruthwang_at_eecs
2
Outline

Motivation
Current Solutions
Project Proposal Using MVSIS
Experimental Setup
Results
Conclusion

3
Motivation

The emerging trend of VLSI designs in wireless
applications is tending toward ultra-low cost,
power, and energy dissipation (PicoRadio, Smart
Dust, etc.)

Energy dissipation E a Vdd2
Lower Vdd -gt get quadratic energy savings!
BUT, this energy savings comes at a great cost

1000 Monte Carlo simulations 4-bit static CMOS
adder in 130nm technology

As Vdd scales, process and operating variations
become drastically worsened, causing gate delays
to become not only longer but also harder to
predict!

4
Implications For Circuit Design

Generalized representation of a digital system

Vdd
Vdd
assume latches are error-free, and errors only
manifest in logic circuitry

how is the clock period determined for nominal
Vdd?

how is the clock period determined for lowered
Vdd?

Use standard worst-case timing methodology
Clock period is set by the worst-case delay
(critical) path
All delay paths (critical and non-critical) are
guaranteed to finish evaluating within this clock
period

Standard worst-case timing methodology is NOT
reasonable
The critical path delay is prohibitively long and
hard to predict
Set the clock period at some point and accept the
non-zero probability of logic evaluation errors

5
Problem Statement

Errors propagate through design abstraction
layers!
The errors caused by over-clocking occur at the
transistor level but will affect higher-level
abstraction layers of the system
These are static errors once the chip is
fabricated, each transistor has a fixed set of
manufactured process parameters that determine
its speed.
This performance is fixed and will not change at
run-time

FSM
0
1
0
1
0
1
logic gates
transistors

Question Can we compensate for the erroneous
state transitions using fault-tolerant design
techniques?
Consider tradeoff Hardware overhead vs. Energy
savings

6
Current Solutions (1)

Triple Modular Redundancy 1 (for dynamic
errors)
Make 3 exact copies of the same FSM and take a
majority vote as the correct behavior

copy 1
majority voter
in

Disadvantages
200 extra overhead due to 2 extra FSMs, plus
majority voting hardware
No error detection capability
Not practical for static errors
Used mainly for extremely critical systems (e.g.
space borne electronics)

copy 2
copy 3
1 S. Niranjan, J. Frenzel, A Comparison of
Fault-Tolerant State Machine Architectures for
Space-Borne ElectronicsIEEE Transactions on
Reliability, March 1996.
7
Current Solutions (2)

Adaptive Body-Biasing 3 (for static errors)
A transistor-level compensation technique
Uses a transistors body bias voltage as a
control knob to adjust its speed

body bias voltage

Disadvantages
Range of speed improvement is limited and values
of available bias voltages must be discretized
Low-level approach with very fine granularity
Requires an exhaustive search of slow paths at
the transistor level

calculate appropriate body bias voltage to meet
certain delay requirement
measure transistor speed
transistor
3 J. Tschanz, et al, Adaptive Body Bias for
reducing impacts of die-to-die and within-die
parameter variations on microprocessor frequency
and leakage JSSC, November 2002.
8
Proposed Solution

Use MVSIS unknown component solving capability to
explore possible error correction schemes

Spec
outputs (o)
F
inputs (i)
Faulty FSM
ctrl signals (v)
(u)
Soln X
Error Detector/ Compensator

Add an error detector module to compensate for
errors
Builds fault tolerance into the design based upon
a-priori knowledge of likely faulty transitions
Advantage high level approach (coarse
granularity)

9
Error Control Scheme

Compare the effectiveness of adding enable
signals at the behavior (FSM) and structural
(binary gate) levels
In both cases the system is modeled as a Mealy
Machine the output variables can be set
independently of state transition

Behavioral Level
Structural Level

Enabling state transitions

Enabling state transitions (toggle state bits)

E
s0
s0

Enabling output values

Enabling output values (toggle output)

E
out1
out1
10
The FSM Under Study
Rx Controller from PicoRadio Charm Chip

5 states
3 inputs
4 outputs

11
Behavioral Level fixed_orig.aut

This is the same file as spec.aut - it describes
the desired I/O behavior without errors or enable
signals
Existence of self-loops in spec causes some
solutions of unknown component to allow
deadlocked states because self-loops are
considered acceptable behavior even if they are
infinite
Must manually discard solutions that do not
eventually return to idle state

e.g. this is not an acceptable solution

12
Structural Level fixed_orig.blif

Algorithm for mapping automata
into binary network
Call mvsis.rugged script for node optimization
Call strash to map entire network into 2-input
AND gates

3 latches (one per state bit)
4 outputs
23 AND gates

13
Experimental Setup
Structural Level
Behavioral Level

Consider this a successfully fixed error if
solution is non-empty and contains no deadlocked
states
Choose particular solution to be MGS with DC
state removed, for simplicity (not optimal)
Map solution into binary gates and calculate
added overhead
Compare cost vs. effectiveness of both schemes

add enable
map to binary gates
add enable
map to binary gates
MVSIS language solving script
inject errors
inject errors
one error is injected per iteration. there is
one iteration per node in the binary network
14
An Example of a Successful Result

Attempted Fix at structural level, add 1 Enable
signals to toggle state bit NS0
Result At average cost of 192 overhead, fixes
33 (7/18) of errors

One example of faulty Rx Controller

6 states (1 extra unwanted)
Faulty transitions
(e.g. s5 -gt s2)

s5
unwanted

fixed_error_40.aut 25 AND gates in binary
mapping
This is the modification to the original Rx
controller with 1 enable signal added at
structural level, and one error injected

15
An Example of a Successful Result
Inputs same inputs seen by faulty Rx as well as
outputs produced by faulty Rx
Output enable signal to fix any errors as
necessary

x_40_nodc.aut 49 AND gates in binary mapping
This is the error control module that sends the
appropriate sequence of enable signal values into
faulty version of Rx controller
(fixed_error_40.aut) to compensate for the errors

16
Results Enabling State Transitions

Behavioral level an enable on an arc allows
unconditional transition

Structural level enable signals allows toggling
state bits value

Example previously shown
Structural level error correction is more
effective and less costly!
17
Results Enabling Outputs

Behavioral level enable signal allows output 1
regardless of state transition
Structural level enable signal allows toggling
of output value
One enable signal added in all cases

Again structural level error correction wins in
the majority of cases!
18
Another Idea

The error detection scheme with the least added
area penalty still required 192 extra overhead
and was only 33 successful
The most successful error detection scheme only
fixed 44 of errors and cost 236 extra overhead
Triple Modular Redundancy uses just over 200
extra overhead, but how effective would it be for
the Rx controller example?

majority voter
19
Another Idea

Question
How many of the 18 possible error manifestations
still meet the behavior specification?

map to binary gates

Answer 0
TMR is not a feasible solution if an error is
likely to happen each time!

inject errors (x 18, one per binary node)
Empty Solution Space
MVSIS language solving script
20
Conclusion

Building error correction into the structural
level is easy (just addition of XORs) and has
been shown to be more effective than enabling
arcs at the behavioral level.
Adding the ability to toggle state bits is a
far-reaching control scheme much more so than
enabling individual transitions
Understanding the nature of the errors is vital
to the design of a smart correction scheme
TMR as a brute force approach will not suffice
for these types of static faults
Directions for future work
Explore methods for optimization of the
particular solution currently it is simply MGS
with DC state removed
Investigate other error injection simulation
methods besides one toggled bit at a time
Examine a more complex FSM with more state
transitions will error correction incur more or
less of a hardware penalty?
Explore Büchi automata as method to exclude
deadlocked states from spec