SEFI Mitigation Techniques for Microprocessors - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

SEFI Mitigation Techniques for Microprocessors

Description:

Space Micro Inc. SEFI Mitigation Techniques for Microprocessors Author: David Czajkowski (760) 815-5330 dcz_at_spacemicro.com MSFC & Space Micro Mtg Agenda Background ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 32
Provided by: DavidCza
Category:

less

Transcript and Presenter's Notes

Title: SEFI Mitigation Techniques for Microprocessors


1
SEFI Mitigation Techniques for Microprocessors
Space Micro Inc.
Author David Czajkowski (760)
815-5330 dcz_at_spacemicro.com
2
MSFC Space Micro Mtg Agenda
  1. Background need for SEFI mitigation
  2. Hardened Core SEFI Mitigation Description
  3. Hardened Core Test Setup
  4. Proton Radiation Test Results
  5. Hardened Core Design
  6. Hardened Core Roadmap
  7. Conclusions

Paper P15
3
Background Need for SEFI Mitigation
  • Hardened Core
  • (aka SEFI Watchdog Controller)

4
Single Event Functional InterruptMicroprocessors
PowerPC SEFI Data
  • SEFI (aka Hangs)
  • Processor hangs by SEU
  • By protons or heavy ions
  • All CPUs susceptible
  • CPU Hangs from
  • Illegal Branching
  • Upsets in Program Counter
  • Undefined State Machines
  • Approx. rate 1 every 100 days SOI PPC (10d for
    CMOS)
  • SEFI problem is Severe not easily solvable
  • Power down is current industry solution

5
Single Event Functional InterruptSDRAMs
  • No known SDRAM without SEFI
  • SEFI problem greater than SEU problem
  • SEFI causes gt5,000 errors loss of memory
  • SEFI not correctable with Hamming EDAC
  • Reed Solomon EDAC bad for random access
  • No known solution

No Correction (Elpida 2Gbit) Single Bit EDAC Reed Solomon 1 nibble EDAC Reed Solomon 2 nibble EDAC
4.1 SEUs/year 0.006 SEU/year 0.00000017 SEU/year 1.5E-20 SEU/yr
0.96 SEFI/year 1.08 SEFI/year 0.18 SEFI/year 1.6E-7 SEFI/yr
Note Data provided above from Maxwell
Technologies
6
Shuttle Upgrade SEFI Problem
  • CAU program
  • 36 Intel flash parts
  • No replacement
  • SEFI driving system reliability over spec limit
  • Even with system changes, CAU over spec limit
  • Improving flash SEFI problem allows CAU to meet
    system reliability requirements
  • Caused major redesign of 3 subsystems

Flash Parts
7
Hardened Core SEFI Mitigation
  • Description of the Technique

8
New SEE Mitigation Techniques
  • SEFI Hardened Core detects and corrects SEFI
    faults in microprocessor
  • Time-Triple Modular Redundancy corrects SEU
    faults in microprocessor
  • Both enable the use of advanced commercial
    microprocessors in space computers
  • Enables space computers gt1,500 MIPS

9
Hardened Core System
  • More than a Watchdog
  • H-Core generates periodic signal
  • If OK, CPU responds
  • If SEFI, H-Core
  • Toggles interrupt
  • S/W reboot
  • H/W reset
  • Power cycle
  • Post SEFI status flags
  • Recovery software code

10
Technical Objectives
  1. Determine the characteristics of SEFI on a CPU
  2. Develop software prototype of Hardened Core.
    Verify performance in radiation environment
  3. Develop Hardened Core architecture and initial
    product design
  4. Determine SEFI rate in combo with TTMR SEU rate
  5. Determine performance of TTMR computer with SEFI
    Watchdog

11
Hardened Core Test Setup
  • SEFI Mitigation Radiation Test Set using Pentium
    III in Versalogic VSBC-8 Computer

12
Test Set Challenges
  • Finding processor that is not plastic, flip-chip
    and has known SEFI became difficult and caused
    schedule risk
  • Selected a plastic, flip-chip Pentium III (850
    MHz)
  • Changed to proton radiation source to penetrate
    plastic
  • Solved de-lidding thinning issues
  • Beam availability and high cost lowered available
    beam time
  • Resulting in less information on SEFI signatures
  • Found partial hardware watchdog in VSBC-8d
  • Provided unexpected additional prototype data

13
H-Core SEU Test System
VSBC-8d Computer
RS-232
Communication Link - Ethernet
Interrupt Lines
Reset Line
  • Software Hardware Include
  • SEFI test loop
  • Diagnostic self-test routine
  • Hardware watchdog to Reset
  • Linux software watchdog
  • Local APIC (PIII) routines
  • Diagnostic self-test
  • Recovery code display to screen

Monitor Computer
  • Software Routines
  • Mode control
  • Data Collection
  • SEFI Identification
  • Diagnostic self-test

14
Pentium III SEFI Test Set
VSBC Video
Monitor PC
VSBC Hardware - PIII
15
VSBC Pentium Hardware
External Fan
IDE Drive
Pentium w/ Heat Sink
Network Switch
Multiplex Card
VSBC Computer
16
SEFI Test Software
  • VSBC Linux OS
  • VSBC SEFI Test Loop
  • Ethernet serial communicate
  • Math test
  • Timer test
  • Network test
  • IDE test
  • Monitor
  • Communication
  • Mode control
  • Datalog
  • Parallel port control software

17
Selected Pentium Control Signals
  • BINIT - bus state machine reset
  • INIT - resets integer registers
  • LINT0 INTR interrupt (no avail. Interrupt
    vector)
  • IRQ5 INTR hardware signal thru PCI bus
  • LINT1 non-maskable interrupt, or NMI
  • RESET - PIII hardware reset
  • SMI - system management interrupt (not tested,
    no available interrupt vector on VSBC)

18
How to Connect to PIII Signals?
  • NOT EASY
  • Multiplex with VSBCs signals
  • De-populate PIII pins hardwire to MUX circuit
  • Have good technicians

19
Proton Radiation Test Results
  • Tested Hardened Core using Intel Pentium III in
    Proton Environment

20
Hardened Core Radiation Test
  • Tested at UC Davis with 51 MeV Protons
  • Test CPU was Pentium III, Intel, 850 MHz
  • Summary Results
  • 21 SEFIs induced
  • 21 recoveries by SEFI Watchdog Functions
  • IRQ, NMI and Reset brought back Pentium III
  • Patent Pending
  • RESULT Hardened Core Proven with Protons

21
Detailed Test Results
22
H-Core Success Rate by Signal
23
Hardened Core Design
24
Hardened Core is More Than a Chip!
  • Timer code when NO SEFI
  • KILL Threads post SEFI
  • Read H-Core Status Flags
  • Flush cache registers
  • Recovery routines
  • Rollback software routines
  • Rollback data stored in Memory
  • Store critical variable periodically
  • Store instruction pointer locations
  • Software hardware
  • Software allows for post SEFI Recovery

25
Programmable Hardened Core Block Diagram
  • Usable for all CPUs
  • Min 8 Interrupt signals
  • MOSFET driver OUT for power cycle control
  • Variable pulse width
  • Variable timer length
  • 1 ms, 1 s, 1 min, etc
  • Status of CPU saved
  • Flags available
  • External ON/OFF control
  • External H-Core reset

26
Predicted SEU/SEFI Rates Proton100k Computer
  • SEFI 1E-2 corrected resets/day
  • Using Hardened Core
  • 2,400 MIPS, 64 bits _at_400 MHz
  • gt1,440 MIPS SEU corrected
  • SEU lt 1E-5 uncorrected errors/day
  • No SEL
  • Total Dose gt 100 krad
  • 4.9 W CPU, 8W total power
  • VxWorks and Linux OS s/w

27
Hardened Core Roadmap
  • From Inception to Availability

28
Hardened Core Roadmap
Chip Design
Preliminary Design
H-Core Inception
Benchtop Model
Radiation Verification
Software Design
Effort is Complete
Future
  • Verification of H-Core complete
  • Preliminary H-Core Design Complete
  • Design manufacture as rad hard chip
  • Improve H-Core software routines

29
Future Research Options
  • Collect additional microprocessor recovery data
  • SEFI test additional processors (PowerPC, BSP-15,
    TI DSP)
  • SEFI test more samples (statistical improvement)
  • Radiation test simpler microprocessor structures
  • State machine logic (in FPGA)
  • Instruction pointer to software (in simple
    micro-controller)
  • Memory cells
  • Embedded test logic
  • Radiation test improved recovery software
    routines
  • Thread kill cleanup routines
  • H-Core status flag check, used as pointer to
    restart routines
  • Restart routines

30
Hardened Core Planned Availability
  • Hardened Core has been added to Space Micros
    Proton100k computer product
  • Circuit for H-Core in Actel FPGAs available now
  • Stand-alone H-Core IC product available in 2004
  • Application software kernels will be made
    available to customers

31
Conclusions
  • SEFI is growing problem for microprocessors
  • New Hardened Core H/W S/W solution
  • Hardened Core benchtop model radiation tested
  • 850 MHz Intel Pentium III test device
  • Proton radiation testing completed
  • Results show 100 success rate
  • Preliminary design of H-Core complete
  • Added to Proton100k satellite computer
  • Space Micro has plan to design manufacture rad
    hard chip for commercial availability
Write a Comment
User Comments (0)
About PowerShow.com