ED4I: Error Detection by Diverse Data and Duplicated Instructions - PowerPoint PPT Presentation

About This Presentation
Title:

ED4I: Error Detection by Diverse Data and Duplicated Instructions

Description:

ED4I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky ED4I Background A code transformation system developed at the Stanford Center for ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: csCornell
Category:

less

Transcript and Presenter's Notes

Title: ED4I: Error Detection by Diverse Data and Duplicated Instructions


1
ED4I Error Detection by Diverse Data and
Duplicated Instructions
  • Greg Bronevetsky

2
ED4I Background
  • A code transformation system developed at the
    Stanford Center for Reliable Computing.
  • Authors Nahmsuk Oh, Subhasish Mitra,
    Edward J. McCluskey
  • ED4I allows us to run a program on two slightly
    different inputs and still be able to compare
    results at the end.

3
Motivation
  • The simplest way to detect Byzantine Faults is to
    run the same program on multiple processors and
    compare results.
  • ED4I is Byzantine Fault detection for
    uniprocessors.
  • Must take into account both temporary and and
    permanent faults.

4
Definitions
  • Temporary Faults any fault that temporarily
    affects a processor, long enough to execute
    several instructions.
  • Ex Radiation hitting wires, frayed wires.
  • Permanent Faults a fault that affects a
    processor for a long period of time.
  • Ex Spilling Coke on the chip, cut wires.

5
Problem Statement
  • We can detect Byzantine Failures by running each
    program or procedure twice and comparing the
    results.
  • However, this does not guard against permanent
    faults since the results of both runs will be the
    same.
  • Need to make the two runs different so that the
    same fault will affect the results differently.
  • Overhead 100.

6
Key Idea
  • Lets feed into the program two different sets of
    data and then compare the results.
  • Key Insight
  • If the program only uses arithmetic operations,
    we can alter the input by multiplying all input
    numbers by a constant.
  • Then the modified output will be the (real
    output) (the constant).
  • Thus, you can verify that the two computations
    succeeded AND the two computations will be
    affected by errors differently.

7
New Program
  • If we alter the input to the program, we must
    alter the program to work with this modified
    input.
  • The transformation is given the constant k
    (called the diversity factor) and it creates
    the k-factor diverse program.
  • The new program will have the same control flow
    graph as the old program but all the variables
    will be k-multiples of the of original ones.

8
Transformations
  • If klt0, branches flip directions (gt ? lt, ? )
  • All constants in code get multiplied by k.
  • Addition and Subtraction of variables unchanged.
  • Multiplication v1v2....vn ?
    (v1v2....vn)/kn-1
  • Division v1/v2 ? (v1/v2)k

9
Fault Detection Probability
  • For functional unit hi (such as the adder), fault
    f and diversity factor k
  • Xi is the set of inputs to hi
  • Ei subset of X containing the inputs that will
    result in erroneous output due to the fault.
  • E'i subset of Ei that will escape detection
  • Ci(k) Probability of catching an error in hi.

10
Data Integrity Probability
  • For functional unit hi, fault f and diversity
    factor k
  • Xi is the set of inputs to hi
  • Ei subset of X containing the inputs that will
    result in erroneous output due to the fault.
  • E'i subset of Ei that will escape detection
  • Di(k) Probability of missing no errors in hi.

11
Choosing the value of k
  • For some functional units we can derive Ci(k) and
    Di(k) analytically for each k.
  • This is too hard in general so we resort to
    trying out a range of k's empirically to
    determine Ci(k) and Di(k).

12
Bus Signal Line
  • Bus wire stuck at either 0 or 1.
  • Derived results for a 12-bit bus

13
Adder
  • Experimental results for a 12-bit ripple carry
    adder
  • Experimental results for a 12-bit carry
    look-ahead adder

14
Multiplier Divider
  • Experimental Results for
  • 12-bit array multiplier
  • 8-bit Wallace Tree multiplier
  • SRT divider

15
Shifter
  • Experimental Results for 16-bit multiplexer-based
    shifter

16
Using Benchmarks to pick k
  • Need to determine how much each functional unit
    is used in the average program.
  • Add, sub, mult and shift use the obvious
    functional units.
  • memory access uses the memory bus
  • branch uses a carry-lookahead adder

17
Benchmarked Data Integrity
  • Calculated Data IntegrityDi(k) given above usage
    statistics. (high Di(k) top priority)
  • Highlighted columns provide the best data
    integrity for each benchmark.

18
Benchmarked Detection Probability
  • Calculated Detection ProbabilityCi(k) given
    above usage statistics.
  • Highlighted columns provide the best detection
    probability for each benchmark.

19
Optimum k
  • Optimum k selected
  • Must maximize the Data IntegrityDi(k).
  • Given maximum Di(k), maximize Ci(k).
  • For each program, should get an estimate of how
    it uses the different functional units and pick k
    accordingly.

20
Dealing with Overflow
  • By multiplying all variables by k, we may cause
    them to overflow.
  • Can scale variables up to next largest type.
  • Scale down variables by dividing by k. Must only
    check higher order bits when comparing new
    results to results of original program.
  • Can use compile-time range checking to determine
    vulnerability to overflow and pick k accordingly

21
Floating Point Numbers
  • Above technique fails for floating point numbers.
  • IEEE 754 format
  • K-2 will only change the sign bit and some bits
    in the exponent.
  • Solution pick separate k's for the exponent and
    the mantissa and run the program once with each
    k.
  • Overhead 200.

22
Picking k for the mantissa
  • To find errors in mantissa, pick k to be 3/2.
  • A stuck-at-1 fault
  • In original program, variable x's value corrupted
    to
  • In transformed program,Since However, the
    mantissa must be lt2, so if
  • the mantissa is right shifted by 1 and
    normalized.

23
Transformed variables
  • So now, the value in transformed program is
  • Value in original program is

24
Fault Detection in Mantissa
  • If there is a stuck-at-1 fault
  • Value in transformed program
  • Value in original program k (for checking)

25
We can detect Mantissa errors!
  • Note that the error values for the original and
    the transformed programs are different!
  • We actually use k in order to flip the sign
  • bit for improved detection capability

26
k for exponents
  • In order to flip all the bits of the exponent,
    need to transform program to use k
    and k
  • If a fault invalidates a bit of the exponent, the
    fault will be detected by comparing to the
    exponents of one of the two transformed programs.

27
Effectiveness for Mantissa
  • Effectiveness of k
  • (for IEEE 754 single precision)

28
Effectiveness for Exponent
  • Effectiveness of k
  • (for IEEE 754 single precision)

29
Summary
  • ED4I effectively detects Byzantine Failures in
    numerical applications on uniprocessors.
  • Purely software solution using Data Diversity.
  • Detects permanent and temporary faults.
  • Works with fixed-point and floating point
    numbers.
  • Compatible with arithmetic and logical operations
    (probably with any bitwise logical operation if
    it can be recast into arithmetic)
  • High overhead 100 or 200.
Write a Comment
User Comments (0)
About PowerShow.com