ED4I: Error Detection by Diverse Data and Duplicated Instructions - PowerPoint PPT Presentation

About This Presentation

Title:

ED4I: Error Detection by Diverse Data and Duplicated Instructions

Description:

ED4I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky ED4I Background A code transformation system developed at the Stanford Center for ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 30

Provided by: csCornell

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: ED4I: Error Detection by Diverse Data and Duplicated Instructions

1
ED4I Error Detection by Diverse Data and
Duplicated Instructions

Greg Bronevetsky

2
ED4I Background

A code transformation system developed at the
Stanford Center for Reliable Computing.
Authors Nahmsuk Oh, Subhasish Mitra,
Edward J. McCluskey
ED4I allows us to run a program on two slightly
different inputs and still be able to compare
results at the end.

3
Motivation

The simplest way to detect Byzantine Faults is to
run the same program on multiple processors and
compare results.
ED4I is Byzantine Fault detection for
uniprocessors.
Must take into account both temporary and and
permanent faults.

4
Definitions

Temporary Faults any fault that temporarily
affects a processor, long enough to execute
several instructions.
Ex Radiation hitting wires, frayed wires.
Permanent Faults a fault that affects a
processor for a long period of time.
Ex Spilling Coke on the chip, cut wires.

5
Problem Statement

We can detect Byzantine Failures by running each
program or procedure twice and comparing the
results.
However, this does not guard against permanent
faults since the results of both runs will be the
same.
Need to make the two runs different so that the
same fault will affect the results differently.
Overhead 100.

6
Key Idea

Lets feed into the program two different sets of
data and then compare the results.
Key Insight
If the program only uses arithmetic operations,
we can alter the input by multiplying all input
numbers by a constant.
Then the modified output will be the (real
output) (the constant).
Thus, you can verify that the two computations
succeeded AND the two computations will be
affected by errors differently.

7
New Program

If we alter the input to the program, we must
alter the program to work with this modified
input.
The transformation is given the constant k
(called the diversity factor) and it creates
the k-factor diverse program.
The new program will have the same control flow
graph as the old program but all the variables
will be k-multiples of the of original ones.

8
Transformations

If klt0, branches flip directions (gt ? lt, ? )
All constants in code get multiplied by k.
Addition and Subtraction of variables unchanged.
Multiplication v1v2....vn ?
(v1v2....vn)/kn-1
Division v1/v2 ? (v1/v2)k

9
Fault Detection Probability

For functional unit hi (such as the adder), fault
f and diversity factor k
Xi is the set of inputs to hi
Ei subset of X containing the inputs that will
result in erroneous output due to the fault.
E'i subset of Ei that will escape detection
Ci(k) Probability of catching an error in hi.

10
Data Integrity Probability

For functional unit hi, fault f and diversity
factor k
Xi is the set of inputs to hi
Ei subset of X containing the inputs that will
result in erroneous output due to the fault.
E'i subset of Ei that will escape detection
Di(k) Probability of missing no errors in hi.

11
Choosing the value of k

For some functional units we can derive Ci(k) and
Di(k) analytically for each k.
This is too hard in general so we resort to
trying out a range of k's empirically to
determine Ci(k) and Di(k).

12
Bus Signal Line

Bus wire stuck at either 0 or 1.
Derived results for a 12-bit bus

13
Adder

Experimental results for a 12-bit ripple carry
adder
Experimental results for a 12-bit carry
look-ahead adder

14
Multiplier Divider

Experimental Results for
12-bit array multiplier
8-bit Wallace Tree multiplier
SRT divider

15
Shifter

Experimental Results for 16-bit multiplexer-based
shifter

16
Using Benchmarks to pick k

Need to determine how much each functional unit
is used in the average program.
Add, sub, mult and shift use the obvious
functional units.
memory access uses the memory bus
branch uses a carry-lookahead adder

17
Benchmarked Data Integrity

Calculated Data IntegrityDi(k) given above usage
statistics. (high Di(k) top priority)
Highlighted columns provide the best data
integrity for each benchmark.

18
Benchmarked Detection Probability

Calculated Detection ProbabilityCi(k) given
above usage statistics.
Highlighted columns provide the best detection
probability for each benchmark.

19
Optimum k

Optimum k selected
Must maximize the Data IntegrityDi(k).
Given maximum Di(k), maximize Ci(k).
For each program, should get an estimate of how
it uses the different functional units and pick k
accordingly.

20
Dealing with Overflow

By multiplying all variables by k, we may cause
them to overflow.
Can scale variables up to next largest type.
Scale down variables by dividing by k. Must only
check higher order bits when comparing new
results to results of original program.
Can use compile-time range checking to determine
vulnerability to overflow and pick k accordingly

21
Floating Point Numbers

Above technique fails for floating point numbers.
IEEE 754 format
K-2 will only change the sign bit and some bits
in the exponent.
Solution pick separate k's for the exponent and
the mantissa and run the program once with each
k.
Overhead 200.

22
Picking k for the mantissa

To find errors in mantissa, pick k to be 3/2.
A stuck-at-1 fault
In original program, variable x's value corrupted
to
In transformed program,Since However, the
mantissa must be lt2, so if
the mantissa is right shifted by 1 and
normalized.

23
Transformed variables

So now, the value in transformed program is
Value in original program is

24
Fault Detection in Mantissa

If there is a stuck-at-1 fault
Value in transformed program
Value in original program k (for checking)

25
We can detect Mantissa errors!

Note that the error values for the original and
the transformed programs are different!
We actually use k in order to flip the sign
bit for improved detection capability

26
k for exponents

In order to flip all the bits of the exponent,
need to transform program to use k
and k
If a fault invalidates a bit of the exponent, the
fault will be detected by comparing to the
exponents of one of the two transformed programs.

27
Effectiveness for Mantissa

Effectiveness of k
(for IEEE 754 single precision)

28
Effectiveness for Exponent

Effectiveness of k
(for IEEE 754 single precision)

29
Summary

ED4I effectively detects Byzantine Failures in
numerical applications on uniprocessors.
Purely software solution using Data Diversity.
Detects permanent and temporary faults.
Works with fixed-point and floating point
numbers.
Compatible with arithmetic and logical operations
(probably with any bitwise logical operation if
it can be recast into arithmetic)
High overhead 100 or 200.