From Anonymity to Ubiquity: A Study of Our Increasing Reliance on Fault-Tolerant Computing - PowerPoint PPT Presentation

Loading...

PPT – From Anonymity to Ubiquity: A Study of Our Increasing Reliance on Fault-Tolerant Computing PowerPoint presentation | free to download - id: 3d8303-NWYwY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

From Anonymity to Ubiquity: A Study of Our Increasing Reliance on Fault-Tolerant Computing

Description:

From Anonymity to Ubiquity: A Study of Our Increasing Reliance on Fault-Tolerant Computing Elwin Ong MIT SERL NASA Goddard OLD December 9, 2003 Abstract Background ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 185
Provided by: klabsOrgr
Learn more at: http://www.klabs.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: From Anonymity to Ubiquity: A Study of Our Increasing Reliance on Fault-Tolerant Computing


1
From Anonymity to Ubiquity A Study of Our
Increasing Reliance on Fault-Tolerant Computing
  • Elwin Ong
  • MIT SERL
  • NASA Goddard OLD
  • December 9, 2003

2
Abstract
This presentation will introduce the role of
fault tolerance in major computing systems. A
literature review will be conducted, outlining
some fundamental elements of the field. A
comparison and discussion of the application of
fault tolerance in the three safety-critical
systems will follow. Aerospace systems to be
discussed in addition to those already mentioned
include the Space Shuttle, Hubble Space
Telescope, Galileo, Landsat7, ST-5, New Horizons,
and C-17. There will also be a short overview of
the Time Triggered protocols TTP/C and FlexRay to
be used in automotive drive-by-wire systems.
3
Background
  • How I came to be at Goddard and OLD
  • Educational Background
  • UCLA Aerospace Engineering
  • Boeing Satellite Systems
  • MIT Aero/Astro
  • Systems Engineering Research Lab
  • Nancy Leveson
  • Safety-Critical Systems
  • Fault Tolerant Systems

4
Purpose of Study
  • What I hope to gain for myself
  • In depth review of fault tolerance
  • Catch up on State-of-the-Art
  • Investigate applications of fault tolerance
  • Become more familiar with spacecraft design
    process

5
Purpose of Study
  • What I hope you will gain
  • A review of fault tolerance
  • An overview of fault tolerance in various
    safety-critical industries
  • Opportunities to learn and improve upon existing
    techniques

6
Purpose of Study
  • What I hope to gain from you
  • An active discussion of fault tolerance as it is
    currently practiced in your projects
  • What are good practices? What works? What
    doesnt?
  • Suggestions for advancements in the field

7
Presentation Outline
  • Literature Review
  • Spacecraft Fault Tolerance
  • Aircraft Fault Tolerance
  • Automotive Fault Tolerance
  • Discussion Conclusion

8
Literature Review Outline
  • What is Fault Tolerance?
  • Define scope of study
  • Fault Tolerance Techniques
  • Fault Intolerance
  • Fault Detection and Reconfiguration
  • Fault Masking and Reconfiguration
  • What about Software?

9
What is a Fault?
  • There are various definitions
  • Must first identify scope
  • Computationally intensive systems
  • Real Time and Safety-Critical (and Distributed)
  • Spacecraft
  • Modern Aircraft Systems
  • Automotive x-by-Wire, drive train controllers
  • Nuclear and Chemical Processing, Maritime
    systems, IT Networks, etc.

10
Definition of a Fault
Fault An incorrect state of hardware or
software resulting from failures of components,
physical interference from the environment,
operator error, or incorrect design. Error The
manifestation of a fault. Failure A result of a
delivered service deviating from the specified
service caused by an error or fault.
11
Fault Classifications
There are various classification methods
Based on Lala Harper, IEEE 1994
12
Fault Classifications
13
Fault Distribution Models
  • Permanent Fault Distribution Models
  • Exponential Distribution
  • Weibull Distribution
  • Geometric Distribution
  • Must match sampled data to distribution models
  • MIL-HDBK-217 Model
  • Various Intermittent and Transient Fault Models

14
How to Defeat Faults
  • Fault Intolerance/Prevention Methods
  • Fault Tolerant Methods
  • Redundancy
  • Fault Detection and Reconfiguration
  • Fault Masking
  • Software Fault Tolerance

15
Fault Tolerance Taxonomy
16
Fault Intolerant Techniques
  • Increase Signal to Noise Ratio
  • Lower Power Dissipation
  • Burn in Testing
  • Factors that most affect failure rates
  • Environment
  • Quality
  • Complexity
  • See MIL-HDBK-217E, NASA Standards

17
Fault Tolerant Systems
  • Redundancy
  • Fault Detection Reconfiguration
  • Duplication, Error Detecting Codes,
    Self-tests, Self-Checking Pairs, etc.
  • Fault Masking Reconfiguration
  • Error Correcting Codes, TMR, NMR
  • Issues related to Fault Tolerant Systems

18
Redundancy
  • All Fault Tolerant Systems employ redundancy
  • Forms of Redundancy
  • Temporal (Retry, Restart)
  • Physical (Duplication)
  • Functional (Analytical Modeling)
  • The only thing (redundancy) guarantees is a
    higher fault arrival rate compared to a
    non-redundant system Lala Harper, IEEE 1994

19
Fault Detection Reconfig.
  • Based on simplex systems with active or passive
    backups.
  • Requires accurate fault detection
  • Employs all 3 types of redundancy
  • Common on unmanned spacecraft

20
Duplication
  • Simplest technique
  • Compare two identical copies
  • Fault identified when copies diverge
  • Does not identify which copy has failed
  • Use in conjunction with other techniques

21
Error Detecting Codes
  • Employ physical redundancy
  • Use extra bits in transmission
  • Hamming Distance
  • The number of bit positions on which two code
    words differ.
  • Minimum distance, d, of a code is defined as the
    minimum Hamming distance found between any 2 code
    words.
  • Number of errors detectable t lt d

22
Hamming Distance
23
Parity Checks
  • Use 1 extra bit at the end of a word
  • Simplest and least expensive
  • Detects all single bit errors and all errors that
    involve an odd number of bits
  • Odd parity or even parity check
  • All 0s failure
  • All 1s failure
  • Ex. MIL-STD-1553

24
Checksums
  • Form block of s words by adding together all of
    the words in the block modulo-n, n is arbitrary.
  • Takes a long time to detect faults, not well
    suited to online processing.
  • Low diagnostic resolution, fault can be in the
    block of s words, the stored checksum, or the
    checking circuitry.
  • Ex. Hard Drives

25
Checksum Example
26
Cyclic Codes
  • Cyclic Redundancy Check (CRC)
  • Easy to Implement with XOR gates
  • Detects all single errors, all burst errors of
    length b ? (n-k)
  • Ex. CDs, TTP/C, FlexRay Protocols

27
Control Flow Monitoring
  • Used to detect Sequential Errors

28
Self-Tests
  • Built-in-Tests
  • Exercise part or all of circuit and logic and
    compare to oracle
  • Extensive use in aerospace systems
  • Consistency Sanity Checks
  • Capability Checks
  • Watchdog Timers
  • Implemented in Hardware or Software

29
Self-Checking Pairs
  • Combination of Duplication and Self Tests

30
Self-Checking Variations
31
Model-Based Diagnosis
  • Employs Analytic Redundancy
  • Compare actual components with an analytic model
    (mathematical model)
  • Depends on the validity of the model, and the
    ability to accurately model a system
  • Relatively straight forward for linear systems,
    difficult for nonlinear systems (most
    software-based systems)

32
Analytical Redundancy
33
Model-Based Diagnosis
  • Residual Generation Decision-Making

34
Parameter Estimation
  • Based on assumption that faults are reflected in
    the physical system parameters such as friction,
    mass, viscosity, resistance, capacitance, etc.
  • Compare online estimations and measurements with
    parameters of model to identify faults.

35
Livingstone Engine
  • Developed at NASA AMES
  • Livingstone accepts a model of the components of
    a complex system such as a spacecraft or chemical
    plant and infers from them the overall behavior
    of the system.

36
Fault Masking Techniques
  • Mask faults by out-voting failed components
  • Error Correcting Codes
  • Triple Modular Redundancy (TMR)
  • NMR
  • Extensive applications in aircraft and manned
    spacecraft

37
Error Correcting Codes
  • Hamming SEC/DED Codes
  • Extensive usage in memories
  • High performance vs. cost ratio
  • Reed-Solomon
  • There are other more advanced ECCs employed
    including convolution codes (communication,
    coding theory)

38
Hamming SEC Code
39
TMR NMR
  • Very simple concept, includes many different
    variations

40
TMR NMR Variations
41
Redundancy Issues
  • Large Overhead?
  • More difficult to validate
  • Asynchronous vs. Synchronous
  • Near Coincidence Errors
  • Generic Faults

42
Asynchronous Issues
  • Voted value is mean, median, or some other
    heuristic-based value.
  • Must set thresholds so that failures are caught,
    but also limit false alarms
  • Can be very difficult to guarantee robustness
  • Requires extensive analyses and testing
  • Ex. F-16B FBW

43
Synchronous Issues
  • Inputs must be the same for each channel
  • Each channel must be synchronized
  • Fault detection is simple, unless
  • Interactive Consistency
  • Near Coincidence
  • Generic Faults
  • Most systems are what are termed loosely
    synchronous

44
Byzantine Generals
  • Affects inputs to synchronous system as well as
    cross-channel voting
  • Stop and restart errors
  • Babbling Idiot Problem
  • Failed component sends different outputs to
    voting elements, confuses good components.
  • Intentional or intelligent malicious attacks
  • See Lamport et al. ACM 1982

45
Interactive Consistency
46
Byzantine Resiliency
  • Fault Containment Region (FCR)
  • A FCR is a collection of components that operate
    correctly regardless of any arbitrary logical
    fault outside the region.
  • Each FCR requires at least an independent power
    supply and clock signal.
  • May also need to be physically separated

47
Byzantine Resiliency
  • To tolerate f Byzantine faults requires
  • 3f1 FCRs
  • FCRs must be interconnected through 2f1 disjoint
    paths
  • Inputs must be exchanged f1 times between FCRs
  • FCRs must be synchronized to bounded skew
  • Simple TMR majority voter circuit is not
    Byzantine Resilient

48
Near Coincidence
  • Possibility that a second fault will occur before
    the system can recover from the first fault.
  • Must be accounted for in the design of redundancy
    management, eg. 777 FBW

49
Generic Faults
  • Externally Induced
  • Physical damage
  • Lightning strike
  • Power transients
  • Internally Induced
  • Hardware Firmware defects, COTS O/S
  • Latent failures
  • Clock anomalies
  • Bad Design?

50
What about Software?
  • Software faults are much more difficult to
    characterize
  • Software is
  • an abstract mathematical object or
  • a concept of how to make a group of hardware
    (system) work together in order to perform a
    specified function
  • includes Hardware design as well
  • Software fault Design fault

51
Software Fault Tolerance
  • No substitute for good design
  • Common software-implemented fault tolerance
    schemes
  • Watchdog Timers
  • Exception Handlers
  • Consistency, Sanity Checks
  • Formal Proofs
  • Testing (Various Methods)
  • Forward and Backward Error Recovery

52
N-Version Recovery Blocks
  • N-Version Programming requires 3 or more separate
    programming teams
  • 3 versions of software are voted on in real-time
  • Studies conducted showed that results are not as
    positive as envisioned
  • Most errors occur during requirements and
    specification phase.
  • See Knight Leveson, IEEE 1986

53
High-Level Programming
  • Autocode Generation
  • Generate code from specification
  • Have seen limited applications
  • Requires certification
  • Some common tools
  • Matlab/Simulink
  • MatrixX
  • SpecTRM, Statecharts, Giotto, etc.

54
Hazard Analysis
  • FMEA, FMECA, FTA are standards
  • Hazard Analysis should be integral with design,
    starting from requirements.
  • Fault tolerance is inherently a safety feature.
  • Design of fault tolerance should result from
    requirements.
  • Requirements should identify shalls and shall
    nots of system

55
Spacecraft Fault Tolerance
  • Manned Spacecraft
  • Gemini, Apollo, Skylab, Shuttle
  • Large Unmanned Spacecraft
  • Galileo, Hubble
  • Other Spacecraft
  • LandSat 7, ST-5

56
Spacecraft Requirements
  • Spacecraft have unique requirements
  • Harsh environment
  • Radiation (Total dose and SEUs)
  • Micro-meteoroids
  • Pressure Temperature
  • Autonomous Operation
  • Communication delay
  • Unmanned probes

57
Gemini Program
58
Gemini Computer
  • Mercury relied on Titan booster for orbit
    insertion, and ground computers to compute
    deorbit burn information.
  • IBM built a digital computer for Gemini program.
  • First used on March 23, 1965 with Gus Grissom and
    John Young on board

59
Gemini Computer
  • Functioned in 6 mission phases
  • prelaunch, ascent backup, insertion, catch-up,
    rendezvous, and re-entry
  • Received data from Titan booster at launch
  • Computer controlled re-entry

60
Gemini Fault Tolerance
  • No hardware redundancy
  • Used nondestructive core memory
  • Engineered quality control
  • Software self-checks prior to launch
  • Tape memory was introduced in Gemini
  • Programs recorded on tape were triplicated
  • Used TMR on readout and input into core memory

61
Gemini Software
  • Considered use of FORTRAN
  • Programmed in machine code using a 16 instruction
    set
  • Software engineering process had not been
    developed
  • Strict standards of documentation
  • Configuration Control
  • Modular software design

62
Apollo Program
63
Apollo Computer System
  • Main consideration for onboard computer was due
    to the 1.5 second communication delay between
    Earth and the moon.
  • Still relied on ground computers, but onboard
    computers could get astronauts home independently
    if needed.
  • Used 2 identical computers for Control Module
    (CM) and Lunar Excursion Module (LEM)
  • LEM had additional backup computer, the Abort
    Guidance System (AGS)

64
Apollo Fault Tolerance
  • Still no hardware redundancy
  • Ground served as backup
  • Extensive testing
  • AGS had ability to abort lunar landing, but was
    never used.
  • Incorporated use of Integrated Circuits, improved
    reliability
  • Advancements in hardware quality

65
Apollo Software
  • Designed and written at MIT
  • NASA served as overseer
  • Created standing committees, acceptance cycles
  • Software Configuration Board created in 1967
  • Apollo software difficult to verify and validate

66
Apollo Software
  • MIT developed a special higher order language
    that translated programs into a series of
    subroutine linkages, which were interpreted at
    execution time.
  • Based on executive and job waitlist
  • Software restarts were developed
  • A restart occurred during Apollo 11 lunar landing.

67
Apollo Lessons Learned
  • Documentation is crucial
  • Verification must proceed through several levels
  • Requirements must be clearly defined and
    carefully managed
  • Good development plans should be created and
    executed
  • More programmers do not mean faster development.

68
Skylab Program
69
Skylab Computer
  • Very successful program
  • Used a pair of "off-the-shelf" IBM 4Pi series
    processors
  • Standby Pair Configuration
  • Transfer register and timer used TMR
  • No in-orbit failures

70
Skylab Fault Tolerance
  • First to use onboard redundancy management
    software
  • Computers ran self-test
  • Ran diagnostic routines to detect and reconfigure
    for failures in non-computer components, eg.
    gyros, sun sensors
  • Computer switchover handled by
  • TMR watchdog timer
  • Self-test program

71
Skylab Software
  • Developed by IBM, 16k 8k
  • Based on executive and application modules with
    priority interrupts
  • Highly modular design
  • Redundancy Management Software
  • Performed self-tests
  • Maintained transfer register (TMR)
  • Extensive simulations on ground

72
The Space Shuttle
73
Space Shuttle Computers
  • Arguably most complex system
  • Quad-Redundant General Purpose Computers (GPCs)
  • COTS IBM AP-101 Processor with special Input
    Output Processor (IOP)
  • 5th GPC used as Backup
  • Based on Fail Op/Fail Safe Requirement
  • 24 data buses, 8 flight critical
  • GPCs transmit on only one bus

74
Space Shuttle Computers
75
Shuttle Redundancy
76
GPC Synchronization
  • 4 GPCs are synchronous
  • Computers are synchronized at every input,
    output, or whenever a new process is started
  • Dedicated hardware used to complete 2 round
    exchange of a 3 bit status register
  • A sumword is exchanged 6.25 times every second
  • Each GPC waits 4 milliseconds, and compares
    register to check health of other GPCs

77
GPC Fault Tolerance
  • Employs dedicated hardware
  • Duplex Fault Tolerance (2 failed)
  • GPC BITs (hardware) cover 95 of faults
  • Watchdog Timers
  • Quad/Triplex Fault Tolerance
  • Check sumwords
  • Bus channel timeout
  • Crew must manually turn-off failed GPC

78
Shuttle Effectors/Sensors
  • Shuttle effectors are hydraulically voted
  • Any single erroneous command resulting in full
    deflection will not affect correct output.
  • Pressure Transducers measure offset and remove
    failed actuator.
  • Vital sensors are quad redundant
  • Use mid-value selection with threshold monitoring
  • Thresholds calculated based on performance
    requirements and tested with simulation

79
Shuttle Redundancy
80
Shuttle Sensor Redundancy
81
Space Shuttle Software
  • Very expensive and underestimated
  • Primary Avionics Software System (PASS)
  • System Software O/S, user interface
  • Operating System written in assembly
  • Application Software guidance, navigation, etc.
  • Application software written in HAL/S
  • NASA set up independent VV Team

82
Shuttle Backup Software
  • Independent contractor works on backup software
    for 5th computer (All 5 GPCs use identical
    hardware).
  • Backup software uses time slice operating system
    instead of asynchronous priority driven system on
    PASS.
  • Employs independent verification team.
  • Backup software has never been used in flight.

83
Shuttle Engine Controller
84
Shuttle Engine Controller
  • Independent set of computers to control space
    shuttle main engines
  • Each engine has separate controller
  • Based on Honeywell HDC-601 Airborne Computer
  • Each engine controller is dual redundant in
    standby pair configuration
  • Input and output electronics are cross-strapped
  • Uses fixed cyclic execution schedule
  • Each major cycle starts and ends with a self-test

85
Shuttle Engine Controller
86
Engine Controller Upgrade
  • Replaced Dual Standby with TMR
  • Fail Op/Fail Safe, 2nd failure results in engine
    shutdown
  • Introduced 1553 bus
  • Modular Design, modules are functionally divided
    into separate electronics
  • Digital Computer Unit, Interchannel Communication
    Unit, etc.
  • Used ADA and Object Oriented Design

87
Engine Controller Upgrade
88
Shuttle Fault Tolerance
  • Many more topics related to shuttle fault
    tolerance
  • Communication Subsystem
  • Orbit Maneuvering Subsystem
  • Crew Interface Subsystem
  • Data Processing
  • Power Distribution Subsystem

89
Unmanned Spacecraft
  • Different requirements than manned missions
  • Less stringent short term reliability
    requirements
  • Longer mission span
  • Large variations in design and requirements
  • More autonomy (no immediate human controller)
  • Strategies depend on type of spacecraft
  • Earth orbiting (safing)
  • Planetary probes (more advanced techniques)

90
Radiation Effects
  • More susceptible to radiation effects
  • Missions are further from Earth atmosphere
  • Smaller electronic components
  • Mitigation strategies must be pre-planned
  • Methods for radiation effects mitigation
  • Use radiation hardened parts
  • Redundancy sparing
  • Fault tolerant design (good design)

91
Hubble Space Telescope
92
Hubble Space Telescope
  • Designed to function like multi-purpose
    ground-based telescope
  • Designed for 15 year mission
  • Can be divided into 2 major sections
  • Optical Telescope Assembly (OTA)
  • Support System Module (SSM)

93
Hubble Space Telescope
94
Hubble Fault Tolerance
  • Employs multiple levels of redundancy and other
    fault tolerant features
  • Extensive standby spares, cross-strapping
  • Error Correcting Codes (Reed-Solomon 224-10)
  • Multiple power routing capabilities
  • Multiple communication links
  • Passive thermal design
  • Extensive safe-mode design considerations

95
Hubble Fault Tolerance
  • Designed for in-orbit maintenance
  • Incorporates modular design
  • Includes over 25 different replaceable units
  • Extra devices to aid astronaut EVA
  • There have been a few well documented Hubble
    servicing missions and a few more planned.
  • Upgrades already made
  • Installed new Science Instruments (SI)

96
Hubble Computers
  • SSM - Data Management System (DMS)
  • DF224 Computer
  • Data Management Unit
  • Data Interface Unit (4)
  • Engineering/Science Recorders (3)
  • Oven Controlled Crystal Oscillators (2)
  • OTA - (Science Instrument (SI) C DH)
  • Based on NSSC-I

97
Hubble SSM Computers
  • DF224
  • Attitude control, executes stored commands,
    telemetry formatting, etc.
  • Consists of 3 CPUs, 6 Memories Units, 3 I/O
    Units, 3 Internal Data buses, 2 External Data
    buses, 6 (2 channel) Power Control Units.
  • Configured as single string, with 2 spare backups
  • Has been updated with 486 processor utilizing
    same architecture

98
Hubble SSM Computers
  • Data Management Unit (DMU)
  • Decodes and forwards uploaded messages from
    ground to various subsystems onboard
  • Serves as interface between DF-224 and all other
    subsystems onboard, eg. Star trackers, gyros, etc
  • Incoming data is error corrected
  • 2 separately redundant channels

99
Hubble SSM Computers
  • Other DMS Components
  • 4 Data Interface Units, each dual-redundant
  • 3 tape recorders (Has been replaced)
  • A Redundant Oscillator for spacecraft timing

100
Hubble DF-224
101
Hubble DMU
102
Hubble OTA Computers
  • SI C DH Computer
  • Interface for all SI and DMU
  • 2 NSSC-I Computers
  • Includes SI specific application software
  • Encodes SI data in Reed Solomon
  • Internal error detection with parity bit

103
Hubble SI CDH
104
NSSC-1 Computer
  • NASA Standard Spacecraft Computer (NSSC-1)
  • Designed at Goddard in 1970s for reuse in
    multiple missions
  • Has proven reliability, one of the first
    computers designed to be radiation-hardened
  • Includes Stored Command Processor (SCP)
  • Can accept various memory configurations
  • Software based on main executive and application
    specific software (re-programmable)

105
NSSC-1 Applications
106
Hubble Safe Mode
  • Designed to operate for 72 hours without ground
    intervention
  • Position spacecraft in thermal safe and positive
    power configuration
  • Fault detection based on various performance
    monitors, watchdog timers, and command buffer.
  • Safe mode dedicated hardware

107
Hubble Safe Mode
  • Designed with several progressive modes
  • Inertial Hold (software)
  • Sun Pointing (software)
  • Sun Pointing (hardware)
  • Gravity Gradient (hardware enabled by ground)

108
Galileo Spacecraft
109
Galileo Overview
  • Voyager Galileo first to implement distributed
    computing system.
  • Separate computers for C DH and AACS subsystems
  • Total of 19 microprocessors, 6 for CDH, 2 for
    AACS.
  • Spun and de-spun sections added to system
    complexity.
  • Sections communicate through serial slip-ring
    interface.

110
Galileo AACS
  • Separated into ACE and DEUCE
  • Each had 2 separate strings of processing
    electronics.
  • ACE Hardware
  • Processors, 1K ROM, 31K RAM, Sun sensor, Star
    Scanner, I/O electronics, Propulsion and Power
    Modules, data bus
  • DEUCE Hardware
  • Gyros and accelerometers

111
Galileo AACS Block Diagram
112
Galileo Fault Protection
  • Software Implemented
  • Detect faults from telemetry
  • Responses
  • Switch H/W
  • Switch communication path
  • Revert to Simpler Mode
  • Algorithms are toggled by C DH

113
Galileo Software
  • Partially coded in HAL/S
  • Memory constraints
  • GRACOS Galileo AACS O/S
  • Manages scheduling, program dispatch
  • Up to 17 concurrent processes

114
LandSat 7
115
LandSat 7 Computer
  • CDH subsystem components
  • Standard Control Processor (SCP)
  • Controls Interface Unit (CIU)
  • Telemetry Data Formatter (TDF)
  • S-Band Transponder (SBT)
  • Solid State Recorder (SSR)
  • All major components are dual redundant
  • Extensive cross-strapping employed

116
LandSat 7 Computer
  • Standard Controls Processor (SCP)
  • Command Processor
  • Compliant with MIL-STD-1750A
  • 64KB ROM, 196KB RAM
  • RAM EDAC SEC/DED
  • 2 SCPs in Dual Standby configuration
  • Uses watchdog timer, write to nonvolatile memory
    every 0.5 seconds

117
LandSat 7 Computer
  • Controls Interface Unit (CIU)
  • Interface between SCPs and rest of satellite
  • Redundant I/O and clock channels
  • Special hardware for placing SCP in control
    (MEOK)
  • FMEA ensured single failure will not leave both
    SCPs not in control.
  • Radiation effects analysis conducted for all CDH
    components. Designed to withstand 2x anticipated
    total dose. No memory write over SA Anomaly

118
LandSat 7 Computer
119
LandSat 7 Fault Tolerance
  • 72 hour autonomous safe mode operation
  • Requirement is recovery from any single failure
    onboard.
  • Redundancy Management performed mostly in
    Software
  • Hazard Analysis during requirements stage
  • All FDAC hardware and software requirements
    were reviewed to get a complete understanding of
    what FDAC was and was not supposed to do.
    Scott, et al., AAS 1999

120
LandSat 7 Software
  • Each SCP has 2 copies of Flight Software
  • Flight Load Package (FLP)
  • Safehold Package (SHP)
  • Boot code in ROM
  • SHP in ROM RAM
  • FLP in RAM
  • Software based on Priority Interrupt
  • Executive Fault Management
  • Application Software

121
LandSat 7 Software
122
LandSat 7 Software
  • Provides majority of fault identification and
    reconfiguration
  • Redundancy Management (REDMN)
  • Interconnect Monitor(ICMON)
  • Status Monitor (SMON)
  • Sun Pointing Attitude Mode (SPAM)

123
LandSat 7 REDMN
  • REDMN in charge of failure detection switching
    of ESA, IMU, RWAs, CSS, etc.
  • Detection is based on telemetry data, use sanity
    consistency checks
  • 4 step process for reconfiguration
  • Switch Hardware Side
  • Switch Bus
  • Switch Time Reference
  • Go to Standby

124
LandSat 7 ICMON
  • ICMON used to identify failure of data path items
    before REDMN executes unnecessary component
    switching.
  • Based on simple weighted-sum algorithm
  • Can switch 6 data path items
  • Clock and Data bus controller, Standard Controls
    Processor (SCP), Telemetry Unit, Remote and
    Telemetry Command Units.

125
LandSat 7 ICMON
126
LandSat 7 SMON
  • SMON used to inform ground controllers of REDMN
    and ICMON status
  • Sends message when
  • Anomalous event detected
  • Corrective action issued
  • Corrective action taken

127
LandSat 7 SPAM
  • SPAM used to position spacecraft in safe mode as
    a last resort.
  • Designed to be simple and robust
  • Clean copy of S/W is loaded from non-volatile ROM
  • 6 well-defined triggering conditions
  • CSSs positioned so that sun detection can be
    achieved from any attitude.
  • Thrusters are not used in this mode

128
ST-5
129
ST-5 Computer
  • Single string Mongoose V Processor
  • 2MB nonvolatile EEPROM and 40MB DRAM
  • Hamming EDAC (Single Correct, Double Detect)
  • Culprit
  • Flight validate CMOS Ultra-Low Power Radiation
    Tolerant logic in Downlink RS Encoder
  • FPGAs employed in magnetometer transponder

130
ST-5 Fault Tolerance
  • Fault Intolerance
  • GSFC Level 2 Parts Requirement
  • Hardware based
  • Processor watchdog timer, uplink command watchdog
    timer
  • Software based
  • Limit checking, watchdog timers, memory check
    summing, thruster monitoring
  • Load Shedding, and Sun Safe Mode

131
ST-5 Fault Tolerance
  • FMEA conducted for spacecraft
  • Resets Restarts
  • Primary method of reconfiguration
  • FSW Warm Restart vs. Cold Restart, triggered by
    memory error watchdog timers
  • Sun Acquisition invoked upon FSW restart
  • Sun Sensors Transponders are reset by power
    switching through FSW

132
Aircraft Computing
  • Aircraft dependence on computers
  • Autopilot
  • Navigation Systems
  • Cabin Monitoring
  • Engine Control
  • Single Axis Flight Control
  • Automatic Landing System
  • Full Fly-by-Wire (FBW)

133
Aircraft FBW
  • Aircraft FBW systems began in 1950s
  • F-8 FBW at NASA Dryden (1972)
  • Borne out of necessity, lower weight, performance
    enhancements
  • First all digital FBW with no mechanical backup
  • Phase 1 used Apollo computers
  • Phase 2 used Shuttle Computers

134
Aircraft FBW
135
Aircraft FBW
  • FBW is now commonplace
  • Military Aircraft
  • F-16 was first operational all FBW aircraft
  • F-117, F-22, B-1, B-2, JSF, C-17, Grippen, Euro
    Fighter, Mirage, etc.
  • Civil Aircraft
  • A320/330/340, A380, 777, 7E7
  • Must pass FAA certification

136
A320/330/340 FBW
137
A320 Overview
  • A320 first commercial aircraft with FBW
  • Concorde had analog full authority augmentation
    system with mechanical backup
  • A310 used computers to control flaps and slats
  • Certified and entered service in 1988
  • A330/340 have closely related systems

138
A320 Computers
  • 5 Self-Checking Pair Sets
  • 1 control channel, 1 monitoring channel
  • Each pair controls a set of control surfaces
  • 2 Elevator and Aileron Computers (ELAC)
  • 3 Spoiler and Elevator Computers (SEC)
  • 3 pairs needed for safe flight (pitch, roll)
  • ELAC and SEC have dissimilar processors

139
A320 Computers
140
A320 Sensors Actuators
  • Each sensor is at least duplicated
  • Stick, inertial system, autopilot computer
  • Sensor information are compared from different
    sources
  • Control surfaces are triplicated, powered by 3
    separate hydraulic lines and actuators
  • Actuators are monitored by computers (both
    channels)

141
A320 Power Hydraulics
  • 5 Electric Power Source
  • 3 Hydraulic Lines (Green, Blue, Yellow)
  • 2 powered by engines
  • 3rd powered by RAT, batteries, or auxiliary
    generator

142
A320 Fault Tolerance
  • Computers run self-test during power up
  • Self-Checking pairs are disconnected if
    comparison of channels are outside thresholds
    (asynchronous operation)
  • Once disconnected, the axis controlled by the
    failed computer is handed over to another pair,
    eg. Pitch control by ELAC1 is handed to ELAC2.
  • Actuators are made passive in the event of
    computer or actuator failure.

143
A320 Fault Tolerance
  • Flight Envelope Protection
  • G-load factor, speed, stall protection
  • Flight control laws are reconfigured upon failure
    detection
  • Physical separation of computers, hydraulic
    lines, and electrical power lines

144
A320 Software
  • 4 different software packages
  • Control Monitoring Channels of ELAC
  • Control Monitoring Channels of SEC
  • Cyclic execution
  • Used high-level specification language
  • Part of A340 code auto-generated from high-level
    specification language

145
Boeing 777 Aircraft
146
777 FBW Overview
  • First full FBW Boeing commercial aircraft
  • 737 Yaw Damper
  • 747 Auto Landing System
  • First commercial flight in 1995
  • Research on FBW prototypes began in 1986
  • Led by GE Allied Signal

147
777 FBW Complex
  • 777 FBW main components
  • 3 Primary Flight Computer (PFC)
  • Pilot interfaces (control columns, levers, etc)
  • Sensor suites, Air Data Inertial Reference
  • 4 Actuator Control Electronics (ACE)
  • 3 ARINC 629 Databuses
  • Airplane Information Management System
  • Hydraulics, control surfaces, power generators

148
777 FBW Control Modes
149
777 FBW Architecture
  • Pilots inputs go to ACEs (analog)
  • ACEs transmit commands to PFCs via ARINC 629
  • PFCs receive data from sensor suites via databus
  • PFCs calculate control surface commands and send
    to ACEs
  • ACEs command control surfaces via hydraulic
    actuators (analog)

150
777 Control Surfaces
151
777 Hydraulics
152
777 FBW Block Diagram
153
777 FBW Databus
  • There are 3 lanes of ARINC 629 buses
  • Used to ensure a complete set of redundant
    resources is available to each lane.
  • Minimize complexity of input interface
  • Each bus is physically electrically separated,
    labeled Left, Center, Right
  • ARINC 629 Terminal Controller
  • Each word is encoded with CRC
  • Employs dedicated hardware for error detection
    and correction

154
777 FBW ACE
155
777 FBW PFC
  • Each PFC is made up 3 internal lanes

156
777 FBW PFC
  • Each PFC has 3 different processing lanes with
    internal bus.
  • Inputs received from ARINC 629 databus at 2MHz
    rate, every 20 microseconds
  • Lanes are synchronized by frame data input
    every few microseconds.
  • Each PFC lane is either in Command Mode or
    Monitor Mode
  • Only 1 lane in Command Mode

157
777 FBW PFC
  • PFC Command Lane
  • Calculates surface command
  • Receives proposed surface command from other 2
    PFC channels
  • Performs median select output value (Dedicated
    Hardware)
  • PFC Monitor Lanes
  • Calculate surface command
  • Compare with command lane output

158
777 FBW Software
  • PFC functions divided into 7 major processes

159
777 FBW Software
  • Software coded in ADA
  • 3 different compilers used
  • Flight envelope protection
  • Stall, Over-speed protection
  • Thrust Asymmetry Compensation
  • Tail strike protection (new feature)

160
Automotive Computing
  • Traditional application with Controller Area
    Network (CAN) protocol
  • Engine body control, multimedia, etc.
  • Steer-by-Wire Brake-by-Wire
  • Offer increase in vehicle control performance
  • Implementation will be safety-critical, but
    remain cost-effective
  • Push for Time-Triggered Architecture
  • Currently no certification process available

161
X-by-Wire Example
162
Time-Triggered Architecture
  • Developed by Kopetz colleagues at Technical
    University of Vienna in 1979
  • Recent adoption by automotive industry as well as
    Honeywell in Flight Control
  • Several related protocols
  • SAFEbus (Honeywell, Boeing)
  • TTP/C (TTTech)
  • FlexRay (BMW, DaimlerChrysler, Motorola)
  • Spider (NASA Langley)

163
TTA Overview
  • Global Sparse Time Base
  • Real time is divided into discrete global ticks
  • Events occurring within the same tick duration
    a are considered simultaneous

164
TTA Overview
  • TTA Nodes Bus service
  • Each node consists of a Communication Controller
    (CC), an application specific host, and
    communication network interface (CNI)
  • Each node should form a fault containment region
    (separate power, clock, etc)
  • CC in charge of clock synchronization, data
    transmission, and fault detection (implemented by
    separate by bus guardians).
  • TTA can be implemented on a redundant bus or star
    topology

165
TTA Cluster
166
TTA Synchronization
  • All nodes are synchronized to a global time
  • All nodes have a known a priori instant to send
    and receive messages on the bus (TDMA)
  • The schedule is kept in a table on each node
  • Synchronization is performed by analyzing the
    time a message is received and the predicted time
    a message should be received according to the
    global time base stored on each node.
  • Re-transmission is not allowed in TTA

167
TTA Fault Tolerance
  • Fault Tolerant Units (FTU)
  • FTU is made up of multiple nodes that perform
    same function (physical redundancy)
  • FTU transmit in successive TDMA slots
  • Voting occurs after each cycle, voted value is
    transmitted in the following cycle
  • All messages are (time) protected by temporal
    firewall
  • Message contents (values) encoded with CRC

168
TTP/C Protocol
  • Messages are identified by global time only
  • Guardians prevents nodes from sending message
    unless it is their slot
  • 2 guardians per node, use separate clock
  • 16 bit CRC, Hamming 6 encoding
  • Weak Byzantine agreement through CRC and clique
    avoidance algorithm

169
FlexRay Protocol
  • Execution round based on a priori time-triggered
    slots and on-demand dynamic slots
  • Currently NOT Byzantine resilient

170
Protocol Comparison
171
General Notes Discussion
  • Hardware will continue to become more reliable,
    but software may not, no silver bullet.
  • If hardware becomes more reliable, wont need as
    much redundancy
  • We may never eradicate generic faults, especially
    with increasing system complexity
  • Need to base design on proven methods and take
    incremental risks
  • Root of problem is in specification and knowledge
    transfer

172
General Notes Discussion
  • Good Design Practices
  • As much process management oriented as
    technical
  • Simplify design use fail safe devices, ie. omni
    directional antennas and passive thermal
    protection, etc.
  • Plug many more available at klabs.org

173
General Notes Discussion
  • Future technologies
  • Fiber Optics
  • Nanoscale technology
  • Wireless more advance communication theory
    applications
  • Model-based design
  • Open Systems Architecture, IPs

174
Acknowledgements
  • Many thanks to
  • Rich Katz Igor Kleyner
  • Steve Scott
  • Steve McAveety (Hubble)
  • Eric Finnegan (ST-5)
  • Damon, Alex, George, Ellen, Joan
  • This research was partially supported by NSF ITR
    Grant CCR-0085829, and by NASA Engineering for
    Complex Systems Program grant NAG2-1543.

175
Bibliography Fault Tolerance
General Fault Tolerance Audsley, N.C, Burke, M.,
"Distributed Fault-Tolerant Avionics Systems - A
Real-Time Perspective," IEEE 1998. Avizienis, A.
Mathur, F. Rennels, D., "Automatic Maintenance
of Aerospace Computers and Spacecraft Information
and Control Systems," AIAA Systems Conference,
1969. Avizienis, A., "Toward Systematic Design of
Fault-Tolerant Systems," IEEE 1997. Butler, R.W.,
"Fault-Tolerant Clock Synchronization Techniques
for Avionics Systems," AIAA 1988. Castro, M.
Liskov, B., "Byzantine fault tolerance can be
fast," IEEE 2001. Chen, Jie., and Patton, R.J.,
Robust Model-Based Fault Diagnosis for Dynamic
Systems, Kluwer Academic Publishers,
1999. Driscoll, Kevin, and Hoyme, Kenneth,
"SAFEbus", IEEE AES Systems Magazine,
1993. Frison, Steven G., and Wensley, John H.,
"Interactive Consistency and Its Impact on the
Design of TMR Systems," IEEE, 1982. Geffroy,
Jean-Claude, and Motet, Gilles, Design of
Dependable Computing Systems, Kluwer Academic
Publishers, 2002. Gertler, Janos. Fault
detection and diagnosis in engineering systems,
New York Marcel Dekker, Inc., 1998. Hall, B.
Sellner, B. Maier, R., "Automated safety
critical software development for distributed
control systems A COTS approach," SAE,
2001. Hamiter, L., "The History of Space Quality
EEE Parts in the United States," ESTEC,
1990. Hammett, Robert, Design by Extrapolation
An Evaluation of Fault Tolerant Avionics IEEE
AESS Systems Magazine, 2002.
176
Bibliography Fault Tolerance
General Fault Tolerance Harper, Christopher, and
Winfield, Alan, "A Behaviour-Based Approach to
the Design of Safety-Critical Systems," The
Institution of Electrical Engineers, 1994. Hills,
Andy D., and Mirza, Nisar A., "Fault Tolerant
Avionics", DASC 1988. Hitt, E.F., "Avionics Cost
of Ownership," IEEE, 1997. Kopetz, H., "Fault
Containment and Error Detection in the
Time-Triggered Architecture," IEEE, 2003. Kopetz,
H., "Why Time-Triggered Architectures will
Succeed in Large Hard Real-Time Systems," IEEE,
1995. Kopetz, Hermann, and Bauer, Gunther, "The
Time-Triggered Architecture," IEEE, 2003. Krol,
T., "Interactive consistency algorithms based on
voting and error-correcting codes," IEEE,
1995. Lala, Jaynarayan H., and Harper, Richard
E., "Architectural Principles for Safety-Critical
Real-Time Applications," IEEE, 1994. Lamport,
Leslie, Shostak, Robert, and Pease, Marshall,
"The Byzantine Generals Problem," SRI
International ACM, 1982. Laurvick, C., Singaraju,
"Nanotechnology in Aersopace Systems," IEEE,
2003. McGough, John, "Effects of Near-Coincident
Faults in Multiprocessor Systems," IEEE,
1983. Murdock, John K., "Open Systems Avionics
Network to Replace MIL-STD-1553," IEEE,
2000. Osder, Stephen S., "Generic Faults and
Architecture Design Considerations in
Flight-Critical Systems," AIAA Guidance and
Control Conference, 1982. Papadopoulos, G.M.,
"Design Issues in Data Synchronous Systems,"
Agard Lecture Series, 1987.
177
Bibliography Fault Tolerance
General Fault Tolerance Rushby, J., "A Comparison
of Bus Architectures for Safety-Critical Embedded
Systems," SRI, 2001. Schor, A.L. Leong F.J.
Babcock P.S., "Impact of Fault-Tolerant Avionics
on Life-Cycle Costs," IEEE, 1989. Siewiorek,
Daniel, and Swarz, Robert, Reliable Computer
Systems, AK Peters, 1998. Simpson, T. Henderson,
R. Crawley, E., "The Technical Issues with
Implementing Spacecraft Open Avionics Platforms,"
AIAA, 2002. Srinivasan, Jayakanth, and Lundqvist,
Kristina, "Real-Time Architecture Analysis A
COTS Perspective," DASC, 2002. Thambidurai, P.
You-keun Park, "Interactive consistency with
multiple failure modes," IEEE, 1988. Walter,
C.J., "Identifying the cause of detected errors,"
IEEE, 1990. Williams, Ronald D. Johnson, Barry
W. Roberts, Thomas E., "An Operating System for
a Fault-Tolerant Multiprocessor Controller,"
IEEE, 1988. Zhang, J. Pervez, A. Sharma, A.B.,
"Avionics Data Buses An Overview", IEEE AESS
Systems Magazine, February 2003.
178
Bibliography - Software
Software Basili, V.R. McGarry, F.R. Pajerski,
R. Zelkowitz, M.V., "Lessons Learned from 25
Years of Process Improvement The Rise and Fall
of the NASA Software Engineering Laboratory",
ICSE 2002. Boussinot, F. de Simone, R. "The
ESTEREL Language," IEEE 1991. Brooks, Frederick
P., The Mythical Man-Month, Anniversary Edition,
Addison Wesley, 1995. Brown, T. Pasetti, A.
Pree, W. Henzinger, T.A. Kirsch, C.M., "A
reusable and platform-independent framework for
distributed control systems," IEEE
2001. Pellerin, D. Taylor, D., VHDL Made Easy!,
Prentice Hall PTR, 1996. Henzinger, T.A.
Horowitz, B. Kirsch, C.M., "Giotto a
time-triggered language for embedded
programming," IEEE, 2003. Leveson, Nancy G., "The
Challenge of Building Process-Control Software,"
IEEE, 1990. Leveson, Nancy G., "The Role of
Software in Spacecraft Accidents," 2003. Leveson,
Nancy G., Safeware, Addison Wesley, 1995. NASA
Conference Publication 2222, Production of
Reliable Flight-Crucial Software Validation
Methods Research for Fault Tolerant Avionics and
Control Systems Sub-Working-Group Meeting. North
Caroline, 1981 Reese, John Damon., Leveson, Nancy
G., "Software Deviation Analysis," IEEE,
1997. Sharma, Ashok, Programmable Logic Handbook,
McGraw Hill, 1998. Shimeall, T.J. Leveson, N.G.,
"An empirical comparison of software fault
tolerance and fault elimination," IEEE,
1991. Srinivasan, J.K. Leveson, N. G.,
"Automated Testing from Specifications," IEEE,
2002.
179
Bibliography - Spacecraft
Spacecraft Applications Alkalai, Leon, "An
Overview of Flight Computer Technologies for
Future NASA Space Exploration Missions," Acta
Astronautica, 2003. Bearden, David A., "A
complexity-based risk assessment of low-cost
planetary missions when is a mission too fast
and too cheap?," Acta Astronautica, 2002. Carlow,
G.D., "Architecture of the Space Shuttle Primary
Avionics Software System," ACM 1984. Castell, K.
Hernandez-Pellerano, A. Wismer, M., "Closed loop
software control of the MIDEX power system," IEEE
1998. Cooper, A.E. Chow, W.T., "Shuttle Computer
Complex," IFAC, 1975. Elfving, A. Stagnaro, L.
Winto, A., "SMART-1 key technologies and
autonomy implementation," Acta Astronautica,
2002. Hammett, Robert, Schwartz, Gary, and
Smithgall, William T., "Preventing Data Pollution
in the Space Shuttle Cockpit," DASC
2003. Hanaway, J, and Moorehead, R., Space
Shuttle Avionics System, NASA, 1989. Hecht, H.,
Fault-Tolerant Computers for Spacecraft, AIAA,
1977. Lala, Jaynarayan H., Harper, Richard E.,
Jaskoiak, Kenneth R., Rosch, Gene, Alger, Linda
S., Schor, Andrei L., "Advanced Informatio
Processing System (AIPS) - Based Fault Tolerant
Avionics Architecture for Launch Vehicles," IEEE,
1990. Liu, Chung-Yu, "A Study of Flight-Critical
Computer System recovery from Space
Radiation-Induced Error", IEEE AESS Systems
Magazine, July 2002.
180
Bibliography - Spacecraft
Spacecraft Applications Lockheed Missles Space
Company, Inc, Space Telescope Systems Description
Handbook ST/SE-02, 1985. Lovellete, M.N., and
Wood, K.S., and Wood, D.L., and Beall, J.H.,
"Strategies for Fault-Tolerant, Space-Based
Computing Lessons Learned from the ARGOS
Testbed," Madden, W.A., Rone, K.Y., "Design,
Development, Integration Space Shuttle Primary
Flight Software System," ACM, 1984. Mattox,
Russell and White, J.B., "Space Shuttle Main
Enginer Controller, " NASA Technical Paper,
1981. Moulinier, P. Faye, F. Lair J.C. Maliet,
E., "Mars Express spacecraft design and
development solutions for affordable planetary
missions," Acta Astronautica, 2002. Pasetti, A.,
Pree, W., "A Component Framework for Satellite
On-Board Software," IEEE 1999. Price, C.E.,
"Fault Tolerant Avionics for the Space Shuttle,"
IEEE, 1991. Reichmuth, D.M. Gage, M.L.
Paterson, E.S. Kramer, D.D., "A Fault Tolerant
80960 Engine Controller," AIAA, 1993. Ruffa,
J.A. Castell, K. Flatley, T. Lin, M., "MIDEX
advanced modular and distributed spacecraft
avionics architecture," IEEE, 1998. Scott, S.
Sabelhaus, P., et al, "LANDSAT-7 Failure
Detection and Correction," AAS, 1995. Sklaroff,
J.R., "Redundancy Management Technique for Space
Shuttle Computers," IBM. Spector, A. Gifford,
D., "The Space Shuttle Primary Computer System,"
ACM, 1984. Tai, Ann T. Chau, Savio N. Alkalai,
Leon, COTS-Based Fault Tolerance in Deep Space
Qualitative and Quantitative Analyses of a Bus
Network Architecture, IEEE International
Symposium on High Assurance Systems Engineering,
1999.
181
Bibliography - Spacecraft
Spacecraft Applications Tomayko, James E.,
Computers in Spaceflight The NASA
Experience Trevathan, C.E. Taylor, T.D.
Hartenstein, R.G. Merwarth, A.C. and Stewart,
W.N., "Development and Application of NASA's
First Standard Spacecraft Computer," ACM,
1984. Underwood, C.I. and Oldfield, M.K.,
"Observations on the Reliability of
COTS-Device-Based Solid State Data Recorders
Operating in Low-Earth Orbit" Underwood, Craig,
"18 Years of Fligth Experience with the UoSAT
Microsatellites" Whitcomb, G.P., "The ESA
approach to low-cost planetary mission," Acta
Astronautica, 2002.
182
Bibliography - Aircraft
Aircraft Applications Ahlstrom, Kristina, et al.,
"Redundancy Management in Distributed Flight
Control Systems Experience Simulations," IEEE
2002. Bleeg, Robert J., "Commerical Jet Transport
Fly-By-Wire Architecture Considerations," DASC
1988. Borinski, J.W. Schetz, J., "Aircraft
Health Monitoring Using Optical Fiber Sensors,"
IEEE 2000. Boskovic, J.D. Li, S. Mehra, R.K.,
"A Decentralized Fault-Tolerant Scheme for Flight
Control Applications," ACC 2000. Briere,
Dominique, and Traverse, Pascal, "Airbus
A320/A330/A340 Electrical Flight Controls A
Family of Fault Tolerant Systems", IEEE
1993. Driscoll, Kevin, and Hoyme, Kenneth, "The
Airplane Information Management System An
Integrated Real-time Flight-deck Control System",
IEEE, 1992. Favre, C., "Fly by Wire for
commercial aircraft the Airbus experience," Int.
Journal of Control, 1994. Glista, S., "Lessons
Learned from the F-22 Avionics Integrity
Program," IEEE, 1998. Hammond, Ronald A., Newman,
David S., and Yeh, Y.C., "On Fly-by-Wire Control
System and statistical analysis of system
performance," Simualtion, October, 1989. Kowal,
B.W. Scherz, C.J. Quinlivan, R., "C-17 flight
control system ," IEEE 1992. Kowal, Brian W.
Scherz, Carl J. Quinlivan, Richard, "C-17 Flight
Control System Overview," IEEE, 1992. Miller,
R.J. and McGlone, M.E., "Development of an
Integrated Fault Tolerant Engine Control," AIAA,
1981. Popp, D.J. Kahler, R.L, "C-17 flight
control systems software design ," IEEE 1992.
183
Bibliography - Aircraft
Aircraft Applications Schrage, D.P.
Vachtsevanos, G., "Software Enabled Control (SEC)
for Intelligent UAVs," AIAA, 2002. Sudolsky,
M.D., "C-17 O-level fault detection and isolation
bit improvement concepts ," IEEE, 1996. Tomayko,
James E., COMPUTERS TAKE FLIGHTA HISTORY OF
NASAS PIONEERING DIGITAL FLY-BY-WIRE PROJECT.
NASA SP-2000-4224. Tuttle, F.L., Kisslinger,
R.L., "Verification and Validation of F-15 S/MTD
Unique Software," Tuttle, F.L. Kisslinger, R.L.
Ritzema, D.F., "F-15 S/MTD IFPC Fault Tolerant
Design," IEEE, 1990. Uczekaj, John S., "Reusable
Avionics Software Evolution of the Flight
Management System" IEEE, 1995. Walter, Chris J.,
"MAFT An Architecture for Reliable Fly-By-Wire
Flight Control," DASC, 1988. Yeh, Y.C., "Safety
Critical Avionics for the 777 Primary Flight
Controls System," IEEE, 2001. Yeh, Y.C.,
"Triple-Triple Redundant 777 Primary Flight
Computer," IEEE, 1996. Yeh, Y.C., "Design
considerations in Boeing 777 fly-by-wire
computers," IEEE, 1995.
184
Bibliography - Automotive
Automotive Applications Fuehrer, T. Hugel, R.
Hartwich, F. Weiler, H., "FlexRay - The
Communication System for Future Control Systems
in Vehicles," SAE 2003. Poledna, S. Glück, M.
and Tanzer, C., "OSEKtime A Dependable Real-Time
Fault-Tolerant Operating System and Communication
Layer as an Enabling Technology for By-Wire
Applications," SAE, 2000. Quigley, C.P. Tan,
F.H. Tang, K.H. McLaughlin, R.T., "An
Investigation into the Future of Automotive
In-Vehicle Control Networking Technology," SAE,
2001.
About PowerShow.com