Safety-Critical Systems 3 Hardware/Software - PowerPoint PPT Presentation

About This Presentation
Title:

Safety-Critical Systems 3 Hardware/Software

Description:

Based on the data on recent failures of critical systems, the following can be concluded: ... In non-critical environment code is accepted, when tests are passed. ... – PowerPoint PPT presentation

Number of Views:251
Avg rating:3.0/5.0
Slides: 35
Provided by: CT5
Category:

less

Transcript and Presenter's Notes

Title: Safety-Critical Systems 3 Hardware/Software


1
Safety-Critical Systems 3Hardware/Software
  • T 79.232
  • Ilkka Herttua

2
Current situation / critical systems
  • Based on the data on recent failures of critical
    systems, the following can be concluded
  • Failures become more and more distributed and
    often nation-wide (e.g. commercial systems like
    credit card denial of authorisation)
  • The source of failure is more rarely in hardware
    (physical faults), and more frequently in system
    design or end-user operation / interaction
    (software).
  • The harm caused by failures is mostly economical,
    but sometimes health and safety concerns are also
    involved.
  • Failures can impact many different aspects of
    dependability (dependability ability to deliver
    service that can justifiably be trusted).

3
Examples of computer failures in critical systems
4
Driving force federation
  • Safety-related systems have traditionally been
    based on the idea of federation. This means, a
    failure of any equipment should be confined, and
    should not cause the collapse of the entire
    system.
  • When computers were introduced to safety-critical
    systems, the principle of federation was in most
    cases kept in force.
  • Applying federation means that Boeing 757 / 767
    flight management control system has 80 distinct
    microprocessors (300, if redundancy is taken into
    account). Although having this number of
    microprocessors is no longer too expensive, there
    are other problems caused by the principle of
    federation.

5
Hardware Faults
  • Intermittent faults
  • Fault occurs and recurrs over time (loose
    connector)
  • Transient faults
  • Fault occurs and may not recurr (lightning)
  • Electromagnetic interference
  • Permanent faults
  • Fault persists / physical processor failure
    (design fault over current)

6

Fault Tolerance
  • Fault tolerance hardware- Achieved mainly by
    redundancy Redundancy- Adds cost, weight, power
    consumption, complexityOther means- Improved
    maintenance, single system with better materials
    (higher MTBF)

7
Redundancy types
  • Active Redundancy
  • Redundant units are always operating.
  • Dynamic Redundancy (standby)
  • Failure has to be detected
  • Changeover to other modul

8
Hardware redundancy techniques
  • Active techniques
  • Parallel (k of N)
  • Voting (majority/simple)
  • Standby
  • Operating - hot stand by
  • Non-operating cold stand by

9
Reliability prediction
  • Electronic Component
  • Based on propability and statictical
  • MIL-Handbook 217 experimental data on actual
    device behaviour
  • Manufacture information and allocated circuit
    types
  • Bath tube curve burn in useful life wear out

10
Reliability calculation for system
  • MTTF Mean time to failure- average time for which
    system would operate before first failure
  • MTTR Mean time to repair time to get system
    back in service again
  • MTBF Mean time between failures
  • MTBF MTTFMTTR

11
Safety-Critical Hardware
  • Fault Detection
  • Routines to check that hardware works
  • Signal comparisons
  • Information redundancy parity check etc..
  • Watchdog timers
  • Bus monitoring check that processor alive
  • Power monitoring

12
Safety-Critical Hardware
  • Possible hardware
  • COTS Microprocessors
  • - No safety firmware, least assurance
  • Redundancy makes better, but common failures
    possible
  • Fabrication failures, microcode and
    documentation errors
  • Use components which have history and
    statistics.

13
Safety-Critical Hardware
  • Specialist Microprocessors
  • Collins Avionics/Rockwell AAMP2
  • Used in Boeing 747-400 (30 pieces)
  • High cost bench testing, documentation, formal
    verification
  • Other models SparcV7, TSC695E, ERC32 (ESA
    radiation-tolerant), 68HC908GP32 (airbag)

14
Safety-Critical Hardware
  • Programmable Logic Controllers PLC
  • Contains power supply, interface and one or more
    processors.
  • Designed for high MTBFs
  • Firmware
  • Programm stored in EEPROMS
  • Programmed with ladder or function block
    diagrams

15
Safety-Critical Software
  • Correct Program
  • Normally iteration is needed to develop a
    working solution. (writing code, testing and
    modification).
  • In non-critical environment code is accepted,
    when tests are passed.
  • Testing is not enough for safety-critical
    application Needs an assessment process
    dynamic/static testing, simulation, code analysis
    and formal verification.

16
Safety-Critical Software
  • Dependable Software
  • Process for development
  • Work discipline
  • Well documented
  • Quality management
  • Validated/verificated

17
Safety-Critical Software
  • Safety-Critical Programming Language
  • Logical soundness Unambigous definition of the
    language- no dialects of C
  • Simple definition Complexity can lead to errors
    in compliers or other support tools
  • Expressive power Language shall support to
    express domain features efficiently and easily
  • Security of definition Violations of the
    language definition shall be detected
  • Verification Language supports verification,
    proving that the produced code is consistent with
    the specification.
  • Memory/time constrains Stack, register and
    memory usage are controlled.

18
Safety-Critical Software
  • Software faults
  • Requirements defects failure of software
    requirements to specify the environment in which
    the software will be used or unambigious
    requirements
  • Design defects not satisfying the requirements
    or documentation defects
  • Code defects Failure of code to conform to
    software designs.

19
Safety-Critical Software
  • Software faults
  • Subprogram effects Definition of a called
    variable may be changed.
  • Definitions aliasing Names refer to the same
    storage location.
  • Initialising failures Variables are used before
    assigned values.
  • Memory management Buffer, stack and memory
    overflows
  • Expression evalution errors Divide-by-zero/arith
    metic overflow

20
Safety-Critical Software
  • Language comparison
  • Structured assembler (wild jumps, exhaustion of
    memory, well understood)
  • Ada (wild jumps, data typing, exception
    handling, separate compilation)
  • Subset languages CORAL, SPADE and Ada (Alsys
    CSMART Ada kernel)
  • Validated compilers for Pascal and Ada
  • Available expertise with common languages
    higher productivity and fewer mistakes, but C
    still not appropriate.

21
(No Transcript)
22
Safety-Critical Software
  • Languages used
  • Boeing uses mostly Ada, but still for type
    747-400 about 75 languages used.
  • ESA mandated Ada for mission critical systems.
  • NASA Space station in Ada, some systems with C
    and Assembler.
  • Car ABS systems with Assembler
  • Train control systems with Ada
  • Medical systems with Ada and Assembler
  • Nuclear Reactors core and shut down system with
    Assembler, migrating to Ada.

23
Safety-Critical Software
  • Tools
  • High reliability and validated tools are
    required Faults in the tool can result in faults
    in the safety critical software.
  • Widespread tools are better tested
  • Use confirmed process of the usage of the tool
  • Analyse output of the tool static analysis of
    the object code
  • Use alternative products and compare results
  • Use different tools (diversity) to reduce the
    likelihood of wrong test results.

24
Safety-Critical Software
  • Designing Principles
  • Use hardware interlocks before computer/software
  • New software features add complexity, try to
    keep software simple
  • Plan for avoiding human error unambigious
    human-computer interface
  • Removal of hazardous module (Ariane 5 unused
    code)

25
Safety-Critical Software
  • Designing Principles
  • Add barriers hard/software locks for critical
    parts
  • Minimise single point failures increase safety
    margins, exploit redundancy and allow recovery.
  • Isolate failures dont let things get worse.
  • Fail-safe panic shut-downs, watchdog code
  • Avoid common mode failures Use diversity
    different programmers, n-version programming

26
Safety-Critical Software
  • Designing Principles
  • Fault tolerance Recovery blocks if one module
    fails, execute alternative module.
  • Dont relay on run-time systems

27
Safety-Critical Software
  • Techniques/Tools
  • Fault prevention Preventing the introduction or
    occurence of faults by using design supporting
    tools (UML with CASE tool)
  • Fault removal Testing, debugging and code
    modification

28
Safety-Critical Software
  • Software faults
  • - Faults in software tools (development/modelling)
    can results in system faults.
  • Techniques for software development
    (language/design notation) can have a great
    impact on the performance od the people involved
    and also determine the likelihiid of faults.
  • The characteristics of the programming systems
    and their runtime determine how great the impact
    of possible faults on the overall software
    subsystem can be.

29
Safety-Critical Software
  • Architectural design
  • Layered structure
  • 1 - High level command and control functions
  • 2 Intermediate level routines
  • 3 I/O routines and device driver

30
Safety-Critical Software
  • Architectural design
  • - Design is done after partitioning of the
    required functions on hardware and software.
  • - Complete specification of the architecture with
    components, data structures and interfaces
    (messages/protocols)

31
Safety-Critical Software
  • Architectural design
  • Test plan for each module (testability)
  • Human-computer interface
  • Change control system needed for inconsistencies
    and inadequacies within specification.
  • Verification of the architectural design against
    specification
  • Software partitioning modular aids
    comprehension and isolation (fault limiting)

32
Safety-Critical Software
  • Reduction of Hazardous Conditions -summary
  • Simplify Code contains only minimum features
    and no unnecessary or undocumented features or
    unused executable code
  • Diversity Data and control redundancy
  • Multi-version programming shared specification
    leads to common-mode failures, but
    synchronisation code increases complexity

33
Safety-Critical Software
  • Home assignments 3
  • 6.42 (fault-tolerant system)
  • 7.15 (reliability model)
  • 9.17 (reuse of software)
  • Please email to herttua_at_eurolock.org by
  • 24 of February 2004

34
Home assignments 12
  • 1.12 (primary, functional and indirect safety)
  • 2.4 (unavailability)
  • 3.23 (fault tree)
  • 4.18 (tolerable risk)
  • 5.10 (incompleteness within specification)
  • Email before 24. February to herttua_at_eurolock.org
  • 11 and 18 February Case Studies/ Teemu Tynjälä
Write a Comment
User Comments (0)
About PowerShow.com