Design of High Availability Systems and Networks Software Fault Tolerance - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Design of High Availability Systems and Networks Software Fault Tolerance

Description:

Fault Isolation Using Hardware Checkers. Register C. Memory. array 2. Register B. Memory ... and if register C is input. to register B implicated. set FRUs is 3 and 4 ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 54
Provided by: centerforr3
Category:

less

Transcript and Presenter's Notes

Title: Design of High Availability Systems and Networks Software Fault Tolerance


1
Design of High Availability Systems and Networks
Software Fault Tolerance
Ravi K. Iyer Center for Reliable and
High-Performance Computing Department of
Electrical and Computer Engineering
and Coordinated Science Laboratory University of
Illinois at Urbana-Champaign iyer_at_crhc.uiuc.edu
http//www.crhc.uiuc.edu/DEPEND
2
Outline
  • Motivation for software fault tolerance
  • N-Version programming
  • Recovery blocks
  • IBM server example
  • Process pairs
  • Robust data structures

3
Motivation for Software Fault Tolerance
  • Usual method of software reliability is fault
    avoidance using good software engineering
    methodologies
  • Large and complex systems ? fault avoidance not
    successful
  • Redundancy in software needed to detect, isolate,
    and recover software failures
  • Use static redundancy or dynamic redundancy
  • Hardware fault tolerance easier to assess
  • Software is difficult to prove correct

HARDWARE FAULTS SOFTWARE FAULTS 1. Faults
time-dependent Faults time-invariant 2.
Duplicate hardware detects Duplicate software not
effective 3. Random failure is main
cause Complexity is main cause
4
Consequences of Software Failure
  • General Accounting Office reports 4.2 mission
    lost annually due to software errors
  • Launch failure of Mariner I (1962)
  • Destruction of French satellite (1988)
  • Problems with Space Shuttle and Apollo missions
  • STAR WARS (SDI) funding billions of dollars for
    correct software development
  • ATT blockages (error in recovery-recognition
    software)(1990)
  • SS7 (signaling system) protocol implementation -
    untested patch (mistyped character) (1997)
  • Therac 25 (overdose of medical radiation 1000s
    of rads in excess of prescribed dosage)

5
Experiences with Current Software
  • Many computer crashes are due to software
  • Even though one expects software to be correct,
    it never is
  • Software exhibits fairly constant failure
    frequency
  • Number of failures is correlated with
  • Execution time
  • Code density
  • Software timing, synchronization points

6
Experiences with Current Software (cont.)
Key parameters and variables (with defect
reintroduction)
Defect Detection Time Constant s 17.2
Weeks Defect Repair Time Constant t 4.7
Weeks Code Delivery 589810 Lines Initial Error
Density ? 0.00387 Defects per Line Defect
Reintroduction Rate ? 33 Percent Deployment Time
T Week 100 Estimated Remaining Defects ERDT 664
Defects Estimated Current Defects ECDT 445
Defects Testing Process Quality TPQT 90
Percent Testing Process Efficiency TPET 60
Percent
7
Difficulties
  • Improvements in software development
    methodologies reduce the incidence of faults,
    yielding fault avoidance
  • Need for test and verification
  • Formal verification techniques, such as proof of
    correctness, can be applied to rather small
    programs
  • Potential exists of faulty translation of user
    requirements
  • Conventional testing is hit-or-miss. Program
    testing can show the presence of bugs but never
    show their absence, - Dikstra, 1972.
  • There is a lack of good fault models.

8
Approaches to Software Fault Tolerance
  • ROBUSTNESS The extent to which software
    continues to operate despite introduction of
    invalid inputs.
  • Example 1. Check input data
  • gtask for new input
  • gtuse default value and raise flag
  • 2. Self checking software
  • FAULT CONTAINMENT Faults in one module should
    not affect other modules.
  • Example Reasonable checks
  • Watchdog timers
  • Overflow/divide-by-zero detection
  • Assertion checking
  • FAULT TOLERANCE Provides uninterrupted
    operation in presence of program fault through
    multiple implementations of a given function

9
N-Version Programming Basic Model
The N-version software (NVS) model with n3
Consensus Results
10
Recovery Blocks Basic Model
The Recovery Block (RB) Model
EE
Execution Environment (EE)
J -th Recovery Block Software Unit
Alternate 1
Accepted Results
Recovery Cache
Acceptance Test
xi
No
xij
No
Yes
Alternate 2
No
Execution Support Functions
Take Next Alternate
11
Execution Models for Software Fault-Tolerance
Approaches
Start software execution
Start software execution
Version 2 execution
Version 1 execution
Version N execution
Primary alternate execution
...
End Execution Version 2
End Execution Version N
End Execution Version 1
End primary alternate execution
Alternate selection
Acceptance test execution
Gathering versions results
No alternate any more available
Alternate 1 selected lz 2, .N
N
Acceptance test not passed
Start decision algorithm execution
Acceptance test passed
Decision algorithm execution
Alternate 1 execution lz 2,..N
Failed software
No acceptable result provided
Acceptable result provided
End alternate 1 execution
End software execution
Failed software
End software execution
Recovery blocks
N-version programming
12
Execution Models for Software Fault-Tolerance
Approaches (cont.)
Start software execution
Self-checking component 1 execution
Self-checking component 2 execution
...
Self-checking component N execution
No acceptable result
No acceptable result
Acceptable result provided
Acceptable result provided
Acceptable result provided
N
Result selected
No result selected
Failed software
End software execution
N self-checking programming
13
Software Fault-Tolerance Approaches and Their
Equivalent Hardware Counterparts
  • RB is equivalent to the stand-by sparing (of
    passive dynamic redundancy) in HW fault-tolerant
    architectures
  • NVP is equivalent to N-modular redundancy (static
    redundancy) in HW fault-tolerant architectures
  • NSCP is equivalent to active dynamic redundancy
  • A self-checking component results either from
  • The association of an acceptance test to a
    version
  • The association of two variants with a comparison
    algorithm
  • Fault-tolerance is provided by the parallel
    execution of N ? 2 self-checking components

14
Concepts of N-Version Programming
  • N ? 2 versions of functionally equivalent
    programs
  • Independent generations of programs ? carried
    out by N groups of individuals who do not talk to
    each other with respect to programming process
    (different algorithms, different programming
    languages, translation)
  • Initial specification formally done in some
    formal spec. language
  • states unambiguously the functional requirements
  • leaves widest possible choice of implementation
  • By making the development process diverse it is
    hoped that the versions will contain diverse
    faults
  • The inventors of NVP emphasized that
  • the definition of NVP has never postulated an
    assumption of independence and that NVP is a
    rigorous process of software development

15
Assumption of Independence in N-Version
Programming
  • Do the N versions of a program fail
    independently? Are faults unrelated?
  • Does Prob (failure of N-version system) Prob
    (failure of one version)N ??
  • If so, then the system reliability can be very
    high
  • Why this assumption may be false?
  • People make same mistakes, e.g. incorrect
    treatment of boundary conditions
  • Some parts of a problem more difficult than
    others
  • statistics show similarity in programmers view
    of difficult regions

16
Observation from Experiments
  • Assumption of independence of failures of
    versions DOES NOT hold
  • This does not mean N-version programming is
    useless
  • The reliability of the system will not be as high
    as in the case when the faults in different
    versions are independent
  • Example PODS (Project on Diverse Software)
  • All faults were caused by omissions and
    ambiguities in the requirement specifications
  • Two common faults were found in two versions
  • Three different versions of software with failure
    rate 1.5 10-6, 0.8 10-3, and 0.8 10-3,
    resulted in the failure rate of 0.8 10-3 after
    majority voting
  • The common/coincident faults could not be
    excluded by majority voting

17
Limitation of N-Version Programming
  • All N -versions originate from the same initial
    specifications whose correctness, completeness,
    and unambiguity should be assumed
  • Use formal correctness proofs on specs, rather
    than proofs on implementations
  • Exhaustive validation
  • Based on an assumption that software faults are
    distinguishable
  • faults that will cause disagreement between
    versions at specified voting points might be a
    result of independent programming efforts to
    remove identical software defects

18
Concepts of Recovery Blocks
  • Characteristics
  • Incorporates general solution to the problem of
    switching to spare
  • Explicitly structures a software system so that
    extra software for spares and error detection
    does not reduce system reliability
  • First to consider a single sequential process
    later extended to
  • Multiple processes within one system
  • Multiple processes in multiple systems gt
    distributed recovery blocks
  • Can view progress as sequences of basic
    operations, assignments to stored variable
  • Structured program has BLOCKS of code to simplify
    understanding of the functional description
  • Choose blocks as units for error detection and
    recovery.

19
Alternates
  • Primary alternate is the one that is to be used
    normally
  • Other alternates attempt less desirable options
  • One source of alternates is earlier release of
    primary alternates
  • Gracefully degraded alternates
  • E.g., ensure consistent sequence (S)
  • by extend S with (i)
  • else by concatenate to S
  • else by S (empty sequence)
  • else error

20
Acceptance Tests
  • Function ensure the operation of recovery blocks
    is satisfactory
  • Should access variables in the program, NOT local
    to the recovery block, since these cannot have
    effect after exit. Also, different alternates
    use different local variables.
  • Need not check for absolute correctness -
    cost/complexity trade-off
  • Run-time overheads should be LOW
  • NO RESIDUAL EFFECTS should be present, since
    variables, if updated, might result in passing of
    successive alternates

21
Restoration of System State
  • Restoring system state is automatic
  • Taking a copy of entire system state on entry to
    each recovery block is too costly
  • Use Recovery Caches or Recursive Caches
  • When a process is to be backed up, it is to a
    state just before entry to primary alternate
  • Only NONLOCAL variables that have been MODIFIED
    have to be reset

22
Process Conversions
  • A systematic methodology of extending recovery
    blocks across processes by taking process
    interactions into considerations (considers
    time/space)
  • Prevents domino effect

P1
X
X
X
P2
X
X
23
Process Conversations (cont.)
  • Recovery block spanning two or more processes is
    called a conversation
  • Within a conversation, processes communicate
    among themselves, NOT with others
  • Operations of a conversation
  • Within a conversation, communication is only
    among participants, not external
  • On entry, a process establishes a checkpoint
  • If an error is detected by any process, then all
    processes restore their checkpoints
  • Next to ALL processes execute their available
    alternative
  • All processes leave the conversation together
    (perform their acceptance tests just prior to
    leaving)
  • At the end of the conversation, ALL processes
    must satisfy their respective acceptance tests,
    and none may proceed otherwise

24
Nested Conversions
Checkpoint
Inter-process communication
Acceptance test
Conversation boundary
25
Comparison of Recovery Blocks vs. N-Version
Programming
  • Advantages of Recovery Block
  • Most software systems evolve by replacement of
    some modules by new ones - can be used as
    alternates
  • Nice hierarchical design - structured approach
  • Disadvantages of Recovery Block
  • System state must be saved before entry to
    recovery block -- excessive storage
  • Difficult to handle multiple processes -- might
    have domino effect
  • Difficult to undo effects in real-time systems
  • Effectiveness of acceptance test
  • Higher coverage is more complex
  • Lack of formal method to check

26
Comparison of Recovery Block vs.N-Version
Programming (cont.)
  • Advantages of N-Version Programming
  • Immediate masking of software faults -- no delay
    in operation
  • Self-checking (acceptance tests) not required
  • Conventional fault tolerant systems HW and SW
    have redundant hardware e.g. TMR (easier to
    include N-version software on redundant hardware)
  • Disadvantages of N-Version Programming
  • How to get N-versions?
  • Impose design diversity, since randomness does
    not give uncorrelated software faults
  • Extremely dependent on input specifications
    (formal correctness proofs)

27
High-Availability System DesignIBM Mainframe
30xx
28
IBM 30xx Simplified System Model
Expanded storage
Central memory
System controller
Processor controller
CPUs
Power distribution and cooling
Channel control
Channel adapters and servers
29
Fault Isolation Using Hardware Checkers
  • Error checker placement determined by Fault
    Isolation Domains (FID)
  • Checkers define the boundary of fault
    containment

FRU 1
If checker 2 is triggered and if register C is
input to register B ? implicated set FRUs is 3
and 4
Checker 1
FRU 4
FRU 2
Cable
FRU 3
FRU 5
...
Decoder
Checker 2
Red Fault Isolation Domain Blue Field
Replaceable Unit
Checker 3
30
Mapping of Fault Isolation Domains to Field
Replaceable Units
Function FID FRU Syndrome Memory array 1 1
1 C1 Register A 1 1 C1 Checker
1 1 1 C1 Drivers 2 1
C2 Cable 2 2 C2 Memory array 2 2
3 C2 Register B 2 3 C2 Checker
2 2 3 C2 Register C 2 4
C2 Decoder 3 5 C3 Checker 3 3
5 C3
31
IBM 30XX Data Path Overview
Expanded storage Storage controller with
hardware- assisted memory tester
ECC
ECC
Central storage Storage controller with
hardware- assisted memory tester
ECC
P
ECC
P
System controller
Processor controller
CPU
Channel control element
Cache
P
P
Instruction Fetch/decode
Vector execution
P parity
Instruction execution
P
LSSG Logic Support Station Group
Channel adapter LSSG
P
Control Storage Parity
Channel server
32
Hardware-Based Retry
Instruction Execution
Errors are detected by parity checks on
register contents and on data buses and by
pattern validity checks in control logic
circuits.
Instruction execution
Operands into retry buffers
Error detected ?
No
Yes
Instruction and execution elements
Freeze execution
Stop on error and restore operands
Communicate back to processor controller through
LSS
Get instructions/data from retry buffers
Test
No retry or threshold crossed
Retry permitted ?
Signal OS for SW recovery
Instruction retry
Restart execution
33
Checkers in The Central Processor
  • Byte parity on data path registers
  • Parity checks on input/output of adders
  • Parity on microstore
  • Parity on microstore addresses
  • Encoder/decoder checks
  • Single-bit error detection in cache for data
    received from memory
  • Additional illegal pattern checks

34
Levels of Error Recovery
System operation
Machine checkInterruption
System supported restart
Functional recovery
System recovery
System repair
System continues
System continues Task terminated
Successful
1 Perform instruction retry
Unsuccessful
System reloaded
2 Terminate affected task and continue
system operation
Successful
Unsuccessful
Notify operator external repair
Successful
3 Restart system operation, stop for repair
not required
Unsuccessful
4 Stop, repair, restart
35
System Level Facilities for Error Detection and
Recovery
  • Installation error detection capability
  • Tools to build profiles of system software
    modules and inspect correct usage of system
    resources.
  • Software facilities to detect the occurrences of
    selected events, e.g., appendages allow user
    control of I/O SLIP (serviceability level
    indication processing) aids in error detection
    and diagnosis (e.g., access to traps that cause a
    program interruption).
  • User defines detection mechanisms to detect
    programmer-defined exceptions, e.g., incorrect
    address or attempting privileged instructions.
  • The operator detects evident error conditions,
    e.g., loop conditions, endless wait states
  • The data management and supervisor routines
    ensure valid data is processed and
    non-conflicting requests are made

36
Recovery Processing Overview Handling Hardware
and Software Errors
ABEND (AbnormalTermination)
CONTROL
RECOVERY TERMINATION MANAGER
PROGRAM
TERMINATION ROUTINES
RETRY ROUTINES
RECOVERY ROUTINES
37
IBMs S/390 G5 Microprocessor
  • Not superscalar processor in IBMs CMOS
    technology
  • Four logical units
  • The L1-cache, or buffer control element (BCE),
  • contains the cache data arrays, cache directory,
    translation-lookaside buffer (TLB), and address
    translation logic.
  • The I-unit
  • handles instruction fetching, decoding, and
    address generation and contains the queue of
    instructions awaiting execution.
  • The E-unit
  • contains the various execution units, along with
    the local working copy of the general access and
    floating point registers.
  • The R-unit
  • is the recovery unit that holds a checkpointed
    copy of the entire microarchitected state of the
    processor

38
IBM G5 Microprocessor Recovery Support
  • R-unit
  • For every clock cycle in which the E-unit
    produces a result, that value is also written
    into the R-unit copy.
  • The R-unit checks whether the result is correct
    and then it generates ECC on that result.
  • The checkpointed result is written into the
    R-unit registers along with ECC.
  • The contents of R-unit registers represent the
    complete checkpointed state of the processor
    during any given cycle, should it be necessary to
    recover from a hardware error.
  • Millicode
  • Millicode is used to implement instructions that
    are either more complex or relatively
    infrequently used
  • The millicode has complete read/write access to
    all R-unit registers.
  • Millicode also performs various service functions
  • logging data associated with any hardware errors
    that may have occurred, scrubbing memory for
    correctable errors, supporting operator console
    functions, and controlling low-level I/O
    operations.

39
IBM G5 Microprocessor Recovery Support
  • Full duplication of the I-unit and E-unit.
  • On every clock cycle, signals coming from these
    units, including instruction results, are
    cross-compared in the R-unit and the L1-cache.
  • If the signals do not match, hardware error
    recovery is invoked.
  • All arrays in the L1-cache unit are protected
    with parity except for the store buffers, which
    are protected with ECC.
  • If the R-unit or L1-cache detects an error, the
    processor automatically enters an error recovery
    mode of operation.

40
IBM G5 Microprocessor Recovery Procedure
  • The R-unit freezes its checkpoint state and does
    not allow any pending instructions to update it.
  • The L1-cache forwards any store data to the L2
    for instructions that have already been
    checkpointed.
  • All arrays in the L1 cache unit and the BTB are
    reset.
  • Each R-unit register is read out in sequence,
    with ECC logic correcting any errors it may find,
    and the corrected values are written back into
    the register file and to all shadow copies of
    these registers in the I-unit, E-unit, and
    L1-cache.
  • All R-unit registers are read a second time to
    ensure there are no solid correctable errors. If
    there are, the processor is check-stopped, i.e.,
    that chip is no longer available for system
  • The E-unit forces a serialization interrupt,
    which restarts instruction fetching and
    execution.
  • An asynchronous interrupt tells millicode to log
    trace array and other data for later analysis by
    IBM product engineering.
  • Two conditions may cause recovery to fail
  • an uncorrectable error during step 4, or another
    error occurring during step 6 before an
    instruction is successfully completed.
  • both cases result in a check-stop condition

41
IBM G5 Microprocessor System Recovery Features
  • System recovery features are used when the
    processor goes into a check-stopped state.
  • Processor availability facility (PAF).
  • The service element scans out the latches from
    the check-stopped processor and extracts the
    processor architectural state.
  • The data are stored in an area set aside for
    machine check interrupt.
  • The operating system uses the saved data to
    resume executing the job on another processor.
  • Concurrent processor sparing
  • Uses spare processors not visible to the user.
  • Upon a processor check-stop, the user can issue a
    command on the console that lets the operating
    system use one of the spare processors
  • Transparent processor sparing
  • Moves the microarchitected state (checkpointed in
    R-unit) of a failed processor to a spare
    processor in the system.
  • The spare processor begins fetching and executing
    instructions where the failed processor stopped.

42
Process Pairs
  • Applicability
  • Permanent and transient hardware and software
    failures
  • Loosely coupled redundant architectures
  • Message passing process communication
  • Well suited for maintaining data integrity in a
    transactional type of system
  • Can be used to replicate a critical system
    function or user application
  • Assumptions
  • Hardware and software modules design to
    fail-fast, i.e., to rapidly detect errors and
    subsequently terminate processing
  • Errors can be corrected by re-executing the same
    software copy in changed environment

43
Process Pairs - Overview
  • The user application is replicated on two
    processors as primary and backup processes, i.e.,
    as process pairs
  • Normally, only the primary process provides
    service
  • The primary sends checkpoints to the backup
  • The backup can take over the function when the
    primary fails
  • The operating systems halts the processor when it
    detects non-recoverable errors
  • The I am alive message protocol allows the
    other processors to detect the halt and to take
    over the primaries that were running on the
    halted processor

44
Process Pairs Mechanism in Tandem Guardian OS
1. The application executes as Primary 2.
Primary starts a Backup in another processor 3.
Duplicated file images are also created 4.
Primary periodically sends checkpoint information
to Backup 5. Backup reads checkpoint
messages and updates its data, file status,
and program counter - the checkpoint
information is inserted in the
corresponding memory locations of the Backup 7.
Backup loads and executes if the system reports
that Primary processor is down -
the error detection is done by Primary OS or
- Primary fails to respond to I am alive
message 8. All file activities by Primary are
performed on both the primary and backup file
copies 9. Primary periodically asks the OS if a
Backup exists - if there is no Backup,
the Primary can request the
creation of a copy of both the process and
file structure
  • Checkpoint
  • data
  • file status
  • PC

Primary
Backup
Backup exists?
Backup exists?
I/O
I/O
Operating System
Operating System
I/O
I am alive
Mirrored disks
45
Process Pairs Transaction
  • A major issue in the design of loosely coupled
    duplicated systems is how both copies can be kept
    consistent in the face of errors

Step 1 2 3 4 5 6
Requester SeqNo 0 Issue request to write
record Checkpoint results
Requester Backup SeqNo 0 SeqNo
1
Server SeqNo 0 If SeqNo lt MySeqNo,
then return saved status Otherwise, read
disk, perform operation, checkpoint
request Write to disk SeqNo 1 checkpoint
result Return results
Server Backup SeqNo 0 Saves
request Saves result SeqNo 1
46
Process PairsAdvantages Disadvantages
  • Advantages
  • Extremely successful in Tandem OLTP applications
  • Tolerates hardware, operating system, and
    application failures
  • High coverage (gt 90) of hardware and software
    faults
  • The backup does not significantly reduce the
    performance
  • Disadvantages
  • Necessity of error detection checks and signaling
    techniques to make a process fail-fast
  • Process pairs are difficult to construct for
    non-transaction-based applications

47
Robust Data Structures
  • The goal is to find storage structures that are
    robust in the face of errors and failures
  • What do we want to preserve?
  • Semantic integrity - the data meaning is not
    corrupted
  • Structural integrity - the correct data
    representation is preserved
  • Focus on techniques for preserving the structural
    integrity
  • A robust data structure contains redundant data
    which allow erroneous changes to be detected, and
    possibly corrected
  • a change is defined as an elementary (e.g., as
    single word) modification to the encoded (data
    structure representation on a storage medium)
    form of a data structure instance
  • structural redundancy
  • a stored count of the numbers of nodes in a
    structure instance
  • identifier fields
  • additional pointers

48
Robust Data Structures (cont.)
  • Consider data structure which consists of a
    header and a set of nodes
  • the header contains
  • pointers to certain nodes of the instance or to
    parts of itself
  • counts
  • identifier fields
  • a node contains
  • data items
  • structural information pointers and node type
    identifier fields
  • Error detection and correction
  • in-line checks may be introduced into normal
    system code to perform error detection and
    possibly correction, during regular operation

49
Link Lists
  • Non-robust data structure
  • in each node store a pointer to the next node of
    the list
  • place a null pointer in the last node

header
node
node
data
data
next
NULL
next
0-detectable and 0-correctable changing one
pointer to NULL can reduce any list to empty list
50
Robust Data StructuresSingle-Linked List
Implementation
  • Additions for improving robustness
  • an identifier field to each node
  • replace the NULL pointer in the last node by a
    pointer to the header of the list
  • stores a count of the number of nodes

header
node
node
H -ID
ID
ID
data
data
count 3
next
next
next
  • 1-detectable and 0-correctable
  • change to the count can be detected by comparing
    it against the number of nodes find by
    following pointers
  • change to the pointer may be detected by a
    mismatch in count number or the new pointer
    points to a foreign node (which cannot have a
    valid identifier)

51
Robust Data StructuresDouble-Linked List
Implementation
  • Additions for improving robustness
  • a pointer added to each node, pointing to the
    predecessor of the node on the list

header
node
node
H -ID
ID
ID
data
data
count 3
next
next
next
previous
previous
previous
2-detectable and 1-correctable the data structure
has two independent, disjoint sets of pointers,
each of which may be used to reconstruct the
entire list
52
Error Correcting in Double-Linked List
  • Scan the list in the forward direction until an
    identifier field error or forward/backward
    pointer mismatch is detected
  • When this happens scan the list in the reverse
    direction until a similar error is detected
  • Repair the data structure

The forward scan detects a mismatch in Node B
and sets Local_PtrB B (local nodes
pointer) Next_PtrB F (pointer to the next
node) The reverse scan detects a mismatch in
Node C and sets Local_PtrC C (local nodes
pointer) Back_PtrC B (pointer to the previous
node) Correction (Local_PtrB
Back_PtrC ) ? Next_PtrB Local_PtrC
i.e., (Next_PtrB C)
Header
Node
Node
H -ID
ID
ID
data
data
count 3
B
C (F)
A
B
C
A
ID
?
?
?
Node
?
53
Robust Data Structures Concluding Remarks
  • Commonly used techniques for supporting robust
    data structures
  • techniques which preserve structural integrity of
    data
  • binary trees, heaps, fifos, queues, stacks
  • linked data structures
  • content-based techniques
  • checksums, encoding
  • Limitations
  • not transparent to the application
  • best in tolerating errors which corrupt the
    structure of the data (not the semantic)
  • increased complexity of the update routines may
    make them error prone
  • erroneous changes to the data structure may be
    propagated by correct update routines
  • faulty update routines may provoke correlated
    erroneous changes to several fields
Write a Comment
User Comments (0)
About PowerShow.com