SWIFT: Software Implemented Fault Tolerance George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August Princeton University International Symposium on Code Generation and Optimization CGO - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

SWIFT: Software Implemented Fault Tolerance George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August Princeton University International Symposium on Code Generation and Optimization CGO

Description:

Hardware Solutions for Transient Faults. To counter transient faults, designers typically introduce redundant hardware: ... Requires no hardware beyond ECC in ... – PowerPoint PPT presentation

Number of Views:954
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: SWIFT: Software Implemented Fault Tolerance George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August Princeton University International Symposium on Code Generation and Optimization CGO


1
SWIFT Software Implemented Fault
ToleranceGeorge A. Reis, Jonathan Chang, Neil
Vachharajani,Ram Rangan, David I.
AugustPrinceton UniversityInternational
Symposium on Code Generation and Optimization
CGO05
  • Nihan Özman - 2005700452

2
Outline
  • Introduction
  • Prior Work
  • Software Fault Detection
  • Control Flow Checking
  • SWIFT
  • Implementation Details
  • Evaluation
  • Conclusion

3
Introduction
  • In recent decades, microprocessor performance has
    been increasing exponentially due to
  • smaller and faster transistors with low threshold
    voltages
  • tighter noise margins enabled by improved
    fabrication technology
  • While these devices yield performance
    enhancements, they will
  • be less reliable
  • make processors that use them more susceptible to
    transient faults

4
Properties of Transient Faults
  • Known as soft errors
  • Unlike manifacturing or design faults, do not
    occur consistently.
  • Caused by external events
  • such as energetic particles striking the chip
  • not cause permanent physical damage to the
    processor
  • alter signal transfers or stored values and thus
    cause incorrect program execution

5
Hardware Solutions for Transient Faults
  • To counter transient faults, designers typically
    introduce redundant hardware
  • Some storage structures, such as caches and
    memory, include error correcting codes (ECC) and
    parity bits redundant bits can be used to detect
    or correct the fault.
  • Combinational logic within the processor can be
    protected by duplication output from the
    duplicated combinational logic blocks can be
    compared to detect faults.

6
Examples of Advanced Hardware Solutions
  • High-availability systems need more redundancy
    hardware than that provided by ECC and parity
    bits, like
  • IBM has added additional logic within its
    mainframe processors for fault tolerance.
  • During design of S/390 G5, IBM fully replicated
    the processors execution units to avoid various
    performance pitfalls with their previous fault
    tolerance approach.
  • Fujitsu used a form of error protection that
    includes ALU parity generation and a mul/divide
    residue check.
  • Boeing designed its 777 aircraft system with
    three different processors and data busses using
    a majority voting scheme to achieve both fault
    detection and recovery.

7
Disadvantages of Hardware Solutions
  • Too expensive for many processor markets,
    including highly price-competitive desktop and
    laptop markets.
  • May have ECC or parity in the memory subsystem,
    but certainly do not posses double or triple
    redundant execution cores.
  • Transient faults in both memory and combinational
    logic will need to be addressed in all aggressive
    processor designs, not only in high-availability
    applications.

8
Proposed Software Solution
  • To achieve redundancy and fault tolerance, a
    software-based, single-threaded approach, SWIFT,
    is proposed.
  • It performs fault detection in a manner
    compatible with most reporting and recovery
    mechanisms (can be easily extended to incorporate
    complete fault tolerance)
  • It is a compiler-based transformation that
  • duplicates the instructions in a program
  • inserts comparison instructions at strategic
    points during code generation.

9
Desirable Features of Software Solution
  • The technique does not require any hardware
    changes.
  • The compiler is free to make use of slack in a
    programs schedule to minimize performance
    degradation.
  • Programmers are free to vary transient policy
    within a program.
  • A compiler orchestrated relationship between the
    duplicated instructions allows for simple methods
    to deal with exception-handling,
    interrupt-handling and shared memory

10
Improvements of SWIFT
  • Requires no hardware beyond ECC in memory
    subsystem
  • Eliminates the need to double the memory
    requirement by acknowledging the use of ECC in
    caches and memory
  • Increases protection at no additional performance
    cost by introducing a new control- flow checking
    mechanism
  • Reduces performance overhead by eliminating
    branch validation code made unnnecessary by this
    enhanced control flow mechanism.
  • Performs better than all known single-threaded
    full software detection techniques.
  • Deployable in both uniprocessor and
    multiprocessor environments (methods to deal with
    exception-handling, interrupt-handling, shared
    memory programs)

11
Implementation of SWIFT
  • SWIFT can be implemented on any architecture and
    can protect individual code segments to varying
    degrees.
  • A full program implementation running on Itanium
    2 is evaluated.
  • In experiments, SWIFT demonstrates
  • exceptional fault-coverage with a reasonable
    performance cost
  • a 14 average speedup compared to the best known
    single-threaded approach utilizing an ECC memory
    system

12
Prior Work Hardware-Based Redundancy
  • Mahmood and McCluskey proposed using a watchdog
    processor to compare and validate the outputs
    against the main running processor.
  • Austin proposed DIVA, uses a main,
    high-performance, out-of-order processor core
    that executes instructions and a second, simpler
    core to validates the execution.
  • Compaq NonStop Himalaya, real system
    implementation that replicates part or all of the
    processor and uses checkers to validate the
    redundant computations.
  • Rotenberg expanded the SMT (Simultaneous
    MultiThreading) redundancy concept with AR-SMT
    (Active Stream/Redundant Stream Simultaneous
    Multithreading).

13
Prior Work Hardware-Based Redundancy
  • Reinhardt and Mukherjee proposed simultaneous
    Redundant MultiThreading (RMT) which increases
    the performance of AR-SMT and compares redundant
    streams before data is stored in the memory.
  • Mukherjee proposed a Chip-level Redundantly
    Threaded multiprocessor (CRT)
  • Gomma expanded CRT approach with CRTR to enable
    recovery.
  • Ray proposed modifying an out-of-order super
    scalar processors microarchitectural components
    to implement redundancy.
  • All HW-based approaches require the addition of
    new hardware logic to meet redundancy
    requirements.

14
Comparison of Various Redundancy Approaches
15
Prior Work Software-Based Redundancy
  • Software-only approaches to redundancy come free
    of cost
  • Oh and McCluskey proposed a novel software
    redundancy approach (EDDI Error Detection by
    Duplicating Instructions) wherein all
    instructions are duplicated and appropriate
    check instructions are inserted to validate
  • Oh et al. developed a pure Software Control-Flow
    Checking Scheme (CFCSS) wherein each control
    transfer generates a run-time signature that is
    validated by eror checking code generated by the
    compiler for every block
  • Venkatasubramanian et al. proposed Assertions for
    Control Flow Checking (ACFC) that assigns an
    execution parity to each basic block and detect
    faults based on parity errors.

16
Prior Work Software-Based Redundancy
  • A sphere of replication (SoR) is the logical
    domain of redundant execution.
  • SWIFT
  • makes several key refinements to EDDI
  • incorporates a software only signature-based
    control-flow checking scheme to achieve
    exceptional fault-coverage
  • The main difference between EDDI and SWIFT is
  • EDDIs SoR includes entire processor core and the
    memory subsystem
  • SWIFT moves memory out of the SoR (memory
    structures are already well-protected by hardware
    schemes like parity and ECC, with or without
    scrubbing)

17
Comparison of Various Redundancy Approaches
18
Software Fault Detection
  • In this section, the following will be explained
  • foundation of SWIFT
  • extending EDDI with control-flow checking with
    software signatures
  • introducing novel extensions that comprise SWIFT
  • The assumptions should be taken into
    consideration
  • a Single-Event Upset (SEU) fault model, in which
    exactly one bit is flipped throughout the entire
    program.
  • memory subsystem, including processor caches, are
    already adequately protected using techniques
    like parity and ECC
  • the transformations are used to detect faults
    (efficacious and cost-effective fault detection
    is of primary concern)

19
EDDI
  • Software-only fault detection system
  • Operates by duplicating program instructions and
    using this redundant execution to achieve fault
    tolerance.
  • Program instructions
  • duplicated by the compiler
  • intertwined with the original program
    instructions
  • Each copy of the program uses different registers
    and different memory location for not to
    interfere with another.
  • Check instructions are inserted at certain
    synchronization points by the compiler
  • the original instructions and their redundant
    copies agree on the computed values.

20
EDDI
  • Program correctness is defined by the output of a
    program
  • Assuming memory-mapped I/O, a program has
    executed correctly if all stores in the program
    have executed correctly.
  • Two types of instructions should be used as
    synchronization points for comparing redundant
    values
  • Store instructions
  • Branch instructions (misdirected branches can
    cause stores to be skipped, incorrect stores to
    be executed, or incorrect values to ultimately
    feed a store)

21
EDDI Fault Detection
  • 1 The load from a global constant address is
    duplicated
  • 2 Add instruction is duplicated (to create
    redundant chain of computation)
  • 3 4 The stores operands are compared to their
    redundant copies.
  • 5 If any difference is detected, an error is
    reported
  • 6 If no difference is detected, storing values
    are executed to non-conflicting addresses.

22
EDDI Fault Detection
  • An optimizing compiler (or dynamic hardware
    scheduler) is free to schedule the instructions
    to use additional available ILP (minimizing the
    performance penalty of the transformaiton).
  • Two different types of redundancy is exploited
  • Temporal Redundancy
  • The redundant duplicates are executed
    sequentially
  • Computes the same data value at two different
    times, usually on the same hardware
  • Spatial Redundancy
  • The redundant duplicates are executed in paralel
  • Computes the same data value in two different
    pieces of hardware, usually at the same time

23
Eliminating the Memory Penalty
  • EDDI is able to effectively detect transient
    faults at the cost of significant memory
    overhead.
  • Each memory location needs a shadow memory
    location for use with redundant duplicate. This
    duplication incurs
  • a significant hardware cost
  • a significant performance cost (since cache sizes
    are effectively halved and additional memory
    traffic is created)
  • In the paper, it is proposed to eliminate the use
    of two distinct memory locations for all memory
    values eliminating duplicate store instructions.
    (Load instructions necessary)
  • Modifications will not reduce the fault detection
    coverage of the system, but will make the
    protected code execute more efficiently and
    require less memory.

24
Eliminating the Memory Penalty
  • EDDI with eliminated memory penalty can be
    referred as
  • EDDI ECC

25
Control Flow Checking
  • EDDI also suffers from incomplete protection for
    control flow faults.
  • A programs control flow can get errorneously
    misdirected without detection. The corruption can
    happen
  • during the execution of the branch
  • during register corruption after branch check
    instructions
  • due to a fault in the instruction pointer update
    logic
  • To make EDDI more robust, additional checks can
    be inserted to ensure control flow is being
    transfered properly

26
Control Flow Checking
  • EDDI ECC with control flow validation can be
    referred as
  • EDDI ECC CF

27
Control Flow Checking
  • Each block will be assigned a signature in order
    to verify that control transfer is in the
    appropriate control block.
  • GSR (General Signature Register), a designated
    general purpose register, will hold the
    signatures and will be used to detect faults.
  • The procedure will go on in the following manner
  • GSR will contain the signature for currently
    executing block
  • Upon entry to any block, GSR will be xored with
    a statistically determined constant to transform
    the previous blocks signature into the current
    blocks signature
  • After transformation, GSR can be compared to the
    statistically assigned signature for the block to
    ensure that a legal control transfer is occured

28
Control Flow Checking
  • Using statistically-determined constant forces
    two blocks which both jump to a common block (a
    control flow merge) to share the same signature
  • undesirable, since faults which transfer control
    to or from blocks that share the same signature
    will go undetected.
  • Run-time adjusting signature can be used instead
  • is assigned to another designated register
  • at entry of a block, this signature, GSR and
    predetermined constant are all xored together to
    form new GSR
  • It can be different depending on the source of
    control transfer, so it can be used to compensate
    for differences in signatures between source
    blocks

29
Control Flow Checking
  • 1 2 Redundant duplicates for add and compare
    instructions
  • 3 to 7 Compare the predicate p11 to its
    redundant duplicate p21 and branch to error code
    if a fault is detected
  • 8 Transforms the GSR from the previous block to
    the signature for this block (The control flow
    additions begin)
  • 9 10 Ensure that signature is correct
    (otherwise error code is invoked)
  • 11 to 13 Handles the synchronization point
    induced by the later store instruction

30
Control Flow Checking
  • The transformation
  • detect any fault that causes a control transfer
    between two blocks that should not jump to one
    another (which yields incorrect signatures even
    if the errorneous transfer jumps to the middle of
    a basic block)
  • ensures only that the control flow is diverted to
    the taken or untaken path
  • does not ensure that the correct direction of the
    conditional branch is taken
  • The base EDDI transformation
  • provides reasonable guarantees (the branches
    input operands are verified prior to its
    execution)
  • does not detect faults that occur during the
    execution of a branch instruction which influence
    branch direction

31
Enhanced Control Flow Checking
  • To extend fault detection coverage to cases where
    branch instruction execution is concerned, an
    enhanced control flow checking transformation is
    proposed
  • EDDI ECC CFE
  • similar for blocks using run-time adjusting
    signatures
  • increases the reliability of the control flow
    checking
  • Enhanced mechanism uses a dynamic equivalent of a
    run-time adjusting signature for all blocks, even
    those that are not control flow merges
  • Each block asserts its target using this
    signature and each target confirms the transfer
    by checking GSR.
  • This signature combined with the GSR serve as a
    redundant duplicate for the program counter.

32
Enhanced Control Flow Checking
  • 1 2 Redundant duplicates for add and compare
    instructions
  • The synchronization check before the branch
    instruction omitted
  • 3 Computes the run-time signature for the target
    of branch by xoring the signature of the current
    block, with signature of target block
  • Branch is predicted, so the assignment to RTS is
    predicted using redundant duplicate for the
    predicate register
  • 4 Equivalent of 3 for the fall through control
    transfer

33
Enhanced Control Flow Checking
  • 5 Xors RTS with the GSR to compute the signature
    of the new block, at the target of a control
    transfer
  • 6 Compares the signature in 5 with the
    statistically assigned signature
  • 7 Error code is invoked if there is a mismatch
    in 6
  • 8 9 Implement the synchronization checks for
    the store instruction
  • 10 Error code is invoked if there is a mismatch
    in 8 or 9

34
Enhanced Control Flow Checking
  • Even if a branch is incorrectly executed, the
    fault will be detected since RTS register will
    have the incorrect value
  • more robustly protects against against transient
    faults
  • The EDDI ECC CF control flow checking
  • ensures that execution is transfered to a valid
    control block
  • does not ensure that correct conditional control
    path is taken
  • The enhanced control flow checking detects this
    case by
  • Dynamically updating the target signature based
    on the redundant conditional instructions (3)
  • Checking at the beginning of each control block
    (5, 6, 7)

35
SWIFT
  • The following optimizations applied to the EDDI
    ECC CFE transformation comprise SWIFT
  • Control flow checking at blocks with stores
  • Redundancy in branch/control flow checking

36
Control Flow Checking at Blocks with Stores
  • It is only the store instructions that ultimately
    send data out of the SoR
  • should execute only if they meant to
  • should write the correct data to the correct
    address
  • This observation can be used to restrict enhanced
    control flow checking only to blocks which have
    stores in them.
  • the updates to GSR and RTS are performed in all
    blocks
  • signature comparisons are restricted to blocks
    with stores (any deviation from valid control
    flow path to that point will be detected before
    memory and output is corrupted)
  • signature check instructions are removed
    (SCFOpti)
  • By this optimization, performance is increased
    and static size is reduced for no reduction in
    reliability.

37
Redundancy in Branch/Control Flow Checking
  • Branch Checking branches are taken in proper
    direction
  • Enhanced Control Flow Checking all control
    transfers are made to the proper address
  • Verifying all control flow subsumes the notion of
    branching in the right direction
  • By removing branch checking (BROpti)
  • reduction in performance and static size overhead
  • no reduction in reliability

38
Undetected Errors Points of Failure
  • Redundancy is introduced solely via software
    instructions
  • a delay between validation and use of the
    validated register values
  • any strikes during this gap might corrupt state
  • bit flips in store address or data registers are
    uncaught
  • incorrect store values or address -gt incorrect
    writes going outside the SoR -gt Incorrect Program
    Execution
  • When an instruction opcode is changed to a store
    instruction by a transient fault
  • The compiler did not see instruction Stores are
    unprotected
  • The store will be free to execute and its value
    will leave SoR

39
Multibit Errors
  • Code transformations are less effective at
    detecting multibit faults, which can cause
    problems in
  • when the same bit is flipped in both the original
    and redundant computation (Case 1)
  • when a bit is flipped in either the original or
    redundant computation and the comparison is also
    flipped such that it does not branch the code
    (Case 2)
  • These patterns of multibit errors are unlikely
    enough to be safely ignored
  • A dual-upset fault model, wherein two faults are
    injected into each program with a uniformly
    random distribution, is used in probability
    calculating

40
Probability of Multibit Errors Case 1
  • The same bit is flipped in both the original and
    redundant computation
  • Assumption
  • The same fault must occur in the same bit of the
    same instruction for the fault to go undetected
  • Probability of Error
  • Probability of that particular instruction being
    chosen (average SPEC benchmark has on the order
    of 109 to 1011 dynamic instr.)
  • times
  • Probability of a particular bit being chosen
    (64-bit registers)

41
Probability of Multibit Errors Case 2
  • a bit is flipped in either the original or
    redundant computation and the comparison is also
    flipped such that it does not branch the code
  • Assumption
  • There is only one comparison for every possible
    fault
  • Probability of Error
  • P(errorcomparisonerrororiginal) 1 /
    instructions
  • This is a gross overestimate because in reality,
    there may be many checks on a faulty value.

42
Implementation Details
  • Details specific to the implementation and
    deployment of SWIFT
  • Different options for calling convention
  • Implementations on multiprocessor systems
  • The effects of using an ISA with prediction (IA64)

43
Function Calls
  • Function calls are made as synchronization
    points
  • Before function call, all input operands are
    checked against their redundant copies
  • if mismatch, fault is detected
  • o/w th original versions are passed as parameters
    to the function
  • At the beginning of function, parameters must be
    duplicated
  • Redundant and original versions
  • Only one version of return (must be duplicated
    into redundant versions for the remaining
    redundant code of function)
  • Adds performance overhead and introduces
    vulnerability
  • Faults that occur on the parameters after the
    checks by caller and before the duplication by
    the callee will not be caught

44
Function Calls
  • The calling convention should be altered
  • to pass multiple sets of computed arguments to a
    function
  • to return multiple return values from a function
  • Arguments passed in the registers need to be
    duplicated, not the ones in memory (memory is
    outside the SoR)
  • Multiple return values require that an extra
    register be reserved for the replicated return
    value
  • additional pressure of twice as many input and
    output registers
  • fault detection is preserved accross function
    calls

45
Shared Memory, Interrupts, and Exceptions
  • When multiple processes communicate with each
    other using shared memory, the compiler can not
    possibly enforce an ordering of reads and writes
    across processes.
  • There is always the possibilty of intervening
    writes from other processes and two loads of a
    duplicated pair of loads are not guaranteed to
    return the same value
  • not reduce the fault-coverage of the system in
    any way
  • increase the detected fault count by contributing
    to the number of detected faults that would not
    caused a failure
  • Similar when an interrupt or exception occurs
    between the two loads of a duplicated pair and
    the interrupt or exception handler changes the
    contents at the load address

46
Shared Memory, Interrupts, and Exceptions
  • Hardware Solutions
  • Safe hardware-based load value duplication
    techniques (the Active Load Address Buffer (ALAB)
    or the Load Value Queue in RMT machines) adapted
    to a SWIFT system (costly)
  • No Duplication for Loads
  • Compiler does only one load (instead of two) and
    duplicates the loaded value for original
    redundant version consumers
  • Removes redundancy from the load execution

47
Shared Memory, Interrupts, and Exceptions
  • Dealing with Potentially-Excepting Instructions
  • Compiler knows a priori, certain instructions may
    cause faults, and enforces a schedule in which
    pairs of loads are not split across them
  • Prevents most exceptions to be raised between two
    verisons of a load instruction
  • Redundancy in load execution
  • Asynchronous signals and interrupts can not be
    handled
  • Hardware solution
  • Single-load solution

48
Logical Masking from Predication
  • Consider the branch br r1 ! r2
  • in the absence of a fault, if a branch were to be
    taken, even after a strike to either r1 or r2,
    condition would still hold true
  • the error can be safely ignored
  • Logical masking
  • allows the fault detection mechanism to be less
    conservative in detecting errors
  • reduces the overall false detected unrecoverable
    fault count
  • Special checks are needed to check logical
    masking
  • Predicted architectures naturally provide logical
    masking
  • IA64 ISA conditional branches are executed based
    on a predicate value, compared by prior
    predicate-defining instructions (no validation
    before them)

49
Evaluation - Performance
  • A pre-release version of OpenIMPACT compiler
    (modified to add redundancy) targeted at Intel
    Itanium 2 processors running RedHat Advanced
    Workstation 2.1 with 4Gb
  • A version created for each of the
  • EDDI ECC CFE
  • SWIFT techniques
  • Versions are also created with each of the
    specific optimizations removed, to see the
    effects individually
  • SWIFT-SCFopti to analyze the control-flow
    checking only at blocks with stores
  • SWIFT-BRopti to analyze branch checking
    optimization

50
Evaluation - Performance
  • Compilers used to evaluate techniques on
    benchmarks
  • SPEC CINT2000, SPEC FP2000,SPEC CINT95,Media
    Bench
  • Executions compared against binaries generated by
    the original OpenIMPACT compiler (have no fault
    detection)
  • The fault detection code was inserted into the
    low level code immediately before register
    allocation and scheduling
  • Optimizations that would have interfered eith the
    duplicated and detection code, Common
    Subexpression Elimination, modified to respect
    the fault detecting code

51
Evaluation - Performance
  • Normalized execution times
  • EDDI ECC CFE geometric mean of 1,62
  • no fault detection IMPACT binaries
  • does comparisons of the values used before every
    branch
  • SWIFT geometric mean of 1,41
  • Indicates that methods are exploiting the unused
    processor resources present during the execution
    of the baseline program
  • Optimization due to control flow checking
    accounts for difference

52
Evaluation - Performance
  • Normalized IPC
  • EDDI ECC CFE geometric mean of 1,53
  • the additional branch checks enable more
    independent work and increase IPC
  • SWIFT geometric mean of 1,48
  • Scheduling both version of program together
    enables a normalized IPC of 1,5 (compared with
    non-detecting executions)

53
Evaluation - Performance
  • Static sizes of the binaries normalized to
    baseline with no detection ability
  • EDDI ECC CFE 2,83x larger
  • SWIFT 2,40x larger (does not generate the extra
    instructions eliminated by the optimizations)
  • Control block 2 Branch checking 13 reduces
    static size
  • Techniques duplicate all instructions except for
    NOPs, stores, branches and then insert detection
    code

54
Evaluation - Performance
  • Light-grey region Fraction of total dynamic
    instructions (NOP)
  • The dynamic instruction counts normalized to
    baseline
  • EDDI ECC CFE geometric mean of 2,73
  • SWIFT geometric mean of 2,23
  • follows the same trend as the static binary size
    numbers
  • however, grows disproportionally to the static
    binary size increases
  • programs spend, on the balance, more of execution
    time in branch-heavy or store-heavy routines.

55
Evaluation Fault Detection
  • Pin is used to instrument binaries
  • Binaries profiled to detect how many times each
    static instruction is executed
  • Libc rand function is used to select number of
    faults to insert into the program
  • The fault injection rate per dynamic instruction
    is normalized (exactly one fault per run on
    baseline builds)
  • For each fault, a number between zero and the
    number of total dynamic instructions is chosen
  • This number is used to choose a static
    instruction using the weights from the profile
  • A specific instance of this static instruction in
    the dynamic instruction stream to instrument is
    chosen

56
Evaluation Fault Detection
  • In the experiments, only the following registers
    are modified
  • general purpose registers
  • floating point registers
  • predicate registers
  • Dynamic instruction is instrumented as follows
  • one of the outputs of the instruction chosen at
    random
  • a random bit of this output register is flipped
  • predicates are considered 1-bit entities
  • execution continues normally and result is
    recorded
  • execution output is also recorded and compared
    against known good inputs

57
Evaluation Fault Detection
  • Correct Both the execution and check are
    successful
  • Segfault The execution fails due to
  • SIGSEGV Segmentation fault due to the access of
    an illegal address
  • SIGILL Illegal instruction due to the
    consumption of a NaT bit generated by a faulting
    speculative load instruction
  • SIGBUS Bus error due to unaligned access
  • Fault Detected The execution fails due to a
    fault being detected
  • Incorrect Runs do not satisfy any of the above

58
Evaluation Fault Detection
  • NOFT No fault tolerance (incorrect results up to
    50 time)
  • EDDI ECC CFE and SWIFT
  • detect all but the most pathological single-upset
    faults -gt detect all of the faults which yields
    incorrect outputs
  • some faults which would have resulted in
    segfaults are detected in builds before the
    segfault can actually occur -gt number of
    segfaults are reduced
  • despite fault injection into every run, binaries
    ran successfully with lower rates of success
    (injected fault number is higher for builds with
    fault detection due to higher dynamic instr.
    count)

59
Evaluation Fault Detection
  • EDDI ECC CFE and SWIFT are a bit overzealous
    in fault detection
  • big difference in the correct rates (when
    compared to NOFT)
  • in 129.compress, rate of correct execution is 1
    (NOFT 63)
  • the balance largely being made up of faults
    detected

60
Evaluation Fault Detection
  • Reasons for the fall in correct execution
  • large portion of execution time spent in
    initializing hash table (orders of magnitude
    larger than input)
  • many of the stores are superfluous (not affect
    output)
  • SWIFT nevertheless detects faults on these stores
    (it can not know a priori whether or not the
    output will depend on them)

61
Evaluation Fault Detection
  • There is a statistically significant difference
    between fault detection rate of EDDI ECC CFE
    (1) versus SWIFT (2)
  • Faults injected on the extra comparison
    instruction generated by 1
  • a fault on these instructions always generates a
    fault detected
  • where as
  • a fault on the general population of instruction
    has a nonzero probability of generating correct
    output (or a segmentation fault) when a fault
    detected
  • SWIFT has a slightly lower fault detection rate
    because SWIFT binaries do not have extra
    comparison instructions to fault on
  • Larger segmentation fault rate in SWIFT binaries
  • Large number of speculative loads in EDDIECCCFE
    binaries (from large number of branches around
    which compiler must schedule loads)
  • Injected fault
  • Cause a segmentation fault in (2)
  • With the insertion of a NaT bit on the output
    register of (1), this bit is checked at the
    comparison code and detected as fault (rather
    than segfault)

62
Conclusion
  • Detection of most transient faults can be
    accomplished without the need for specialized
    hardware.
  • SWIFT is introduced
  • the best performing single-threaded
    software-based approach for full out fault
    detection
  • exploits unused instruction-level parallelism
    resources to efficiently manage fault detection
  • realizes a performance increase through enhanced
    control flow checking (validation points are
    unnecessary)
  • achieves a 14 speedup
  • SWIFT can be integrated into production compilers
    to provide fault detection on todays commodity
    hardware

63
References
  • S. Padmanabhan, T. Malkemus, R. Agarwal, and A.
    Jhingran. Block oriented processing of relational
    database operations in modern computer
    architectures. In Proceedings of ICDE Conference,
    2001.
  • J. Zhou, and K. A. Ross. Bufferring accesses to
    memory-resident index structures. In Proceedings
    of VLDB conference, 2003.
  • K.A. Ross, J. Cieslewicz, J. Rao, and J. Zhou.
    Architecture Sensitive Database Design Examples
    from the Columbia Group. Bulletin of the IEEE
    Computer Society Technical Committe on Data
    Engineering, 2005

64
  • Questions?

65
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com