332:437 Lecture 2 Fault Tolerance Examples - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

332:437 Lecture 2 Fault Tolerance Examples

Description:

332:437 Lecture 2 Fault Tolerance Examples – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 34
Provided by: guofe
Category:

less

Transcript and Presenter's Notes

Title: 332:437 Lecture 2 Fault Tolerance Examples


1
332437 Lecture 2Fault Tolerance Examples
  • Active Redundancy Techniques
  • Hardware for Active Redundancy Systems
  • Fault Tolerance Applications
  • Lucent Technologies 5 ESS
  • NASA Space Shuttle
  • Galileo Interplanetary Probe
  • Hardware Design Methodology
  • System Partitioning
  • Summary

Material from Design and Analysis of Digital
Fault Tolerant Systems, By Barry Johnson,
Addison-Wesley Publishers.
2
Redundancy Techniques
  • Active Let error happen disrupt system
  • Detect error with test hardware.
  • Reconfigure system restart
  • Example Communications Satellite
  • TMR too expensive
  • Duplication with Comparison
  • Problems
  • Even in fault-free system, digital words may not
    agree
  • Solution Ignore K least significant bits

3
Active Fault Tolerance
4
Active Redundancy Techniques
  • Standby Replacement/Sparing
  • Hot Standby Sparing Process Control
  • Cold Standby Sparing Satellite
  • Needs time to power up initialize the spare
  • Pair a Spare
  • Duplication with Comparison Standby Sparing
  • Uses comparator error information
  • Disconnects broken module inserts spare

5
Redundancy Techniques
  • Hybrid Active Passive
  • N-modular Redundancy with Spares
  • Triple-Duplex
  • Active Passive Hybrid
  • Increasing Hardware Cost

6
N-Modular Redundancy with Spares -- Hybrid
7
N-Modular Redundancy with Spares
System Inputs
8
Software Implementation of Duplication with
Comparison
9
Triple-Duplex Redundancy
10
Switch in Self-Purging System
11
Switch for Self-Purging Full Adder
12
Disagreement Detector Circuit
13
Sift-Out Modular Redundancy Unit
  • Collector combines output to produce system
    output
  • Contributions from faulty modules are ignored

14
Hardware to Identify Faulty Modules
  • Disagreement signals drive JK flip-flops
  • Disagreements identify faulty modules

15
Triple-Duplex Architecture
  • Triplication for fault masking
  • Duplication with comparison detects faults

16
Sift-Out Modular Redundancy
  • Collector sifts out defective modules

17
Case Studies
  • Lucent Technologies 5 ESS
  • 100 duplication of Hardware
  • ESS System Design
  • Time-Space-Time (TST) Switch
  • Signals Translated Pulse-code-modulated
    (PCM) signals
  • Information Routed Over Lines
  • Interchange Time slots (time)
  • Switch Buses (space)
  • Interchange Time slots (time)

18
Installed ATT Electronic Switching Systems
19
Probability of Operational Outage Due to Various
Causes
20
5 ESS Switching Block Diagram
21
5 ESS
  • Uses Duplication-with-Comparison
  • Assume perfect switchover to working computer
  • MTTF m (Mean Time to Failure)
  • 2 l2
  • l Computer Failure Rate
  • m Repair Rate
  • Handles 65,000 metropolitan phone lines
  • Also uses Information Redundancy
  • Processor 1 too unreliable (1 ESS)
  • Broken into 6 subsystems to improve reliability
  • Multiplied MTTF by 6
  • PU, CC, CS, PS, CSB, PSB subsystems

22
Duplex Configuration for Switch
PU0
PU1
PSB1
PSB0
CSB1
CSB0
23
NASA Space Shuttle
24
Space Shuttle Computer
  • Tasks
  • Guidance
  • Navigation
  • Pre-flight checkout
  • Software voting in a 5 computer complex
  • Use 4 computers as a redundant set during
    critical mission phases
  • 5th computer does non-critical tasks acts as
    a backup

25
Voting Method
  • Vote on control outputs of 4 computers at control
    actuators
  • Each computer compares outputs of 3 others to its
    own
  • If disagreement Signal the disagreeing computer
  • Each computer votes on the disagreement signals
  • If defective, removes itself from service
  • Tolerates up to 2 computer failures

26
Reconfiguration After 2nd Failure
  • Converts to a duplex computer system
  • Can survive one more failure because of
    comparison self-tests
  • 2 Vendors to minimize chance of common software
    error
  • Primary Software IBM
  • Backup Software -- Rockwell

27
Spacecraft Systems
  • Sub-systems
  • Propulsion
  • Power
  • Data Communications
  • Attitude Control
  • Command, Control, Payload

28
Fault Tolerance Maintenance Procedures
  • When failure detected enter safe/hold mode
  • Shed all non-essential power loads
  • Stop mission sequencing solar array tracking
  • Orient for maximum solar power
  • Ground personnel diagnose failure from prior
    outputs of 5 subsystems
  • Select spacecraft system reconfiguration
  • Send workaround commands to spacecraft

29
Fault Detection Mechanisms
  • Self-tests of sub-systems
  • Cross-checking between duplicated sensors
  • Ground-initiated tests diagnose/isolate
    failures
  • Ground trend analysis find degraded / worn-out
    units

30
NASA Long-Life GalileoJupiter Fly-By Mission
  • 19 8 mprocessors, 320 Kbyte ROM
  • Uses block redundancy
  • Command Data Subsystem (CDS)
  • Active redundancy each block can issue
    independent commands or both blocks work in
    parallel on critical activity
  • All other systems active/standby pair
  • Few Hardware Fault Detection Mechanisms
  • Harsh Jupiter Environment
  • Radiation
  • Electrostatic Discharge

31
Galileo Orbiter Block Diagram Active/Standby Pair
32
Galileo Error Detection Mechanisms
  • Test event durations (watchdog timer) data
    transfers, parity/checksums on messages
  • Unexpected command codes
  • Check loss of heartbeat between AACS CDS
    (watchdog timer)
  • Mixture of spinning non-spinning scientific
    experiments check for spin rate above/below set
    values
  • Lost of Sun/Star identification no pulse from
    acquisition sensor
  • Too large an error between control setting for
    sub-module and its response

33
Summary
  • Active Redundancy Techniques
  • Hardware for Active Redundancy Systems
  • Fault Tolerance Applications
  • Lucent Technologies 5 ESS
  • NASA Space Shuttle
  • Galileo Interplanetary Probe
  • Hardware Design Methodology
  • System Partitioning
Write a Comment
User Comments (0)
About PowerShow.com