Rose Hill - PowerPoint PPT Presentation

About This Presentation
Title:

Rose Hill

Description:

Using environmental tests as dependability benchmarking tools ... on Dependable Systems and Networks, Yokohama, Japan, June 2005, pp. 754-759. ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 25
Provided by: homepag71
Category:
Tags: hill | rose | yokohama

less

Transcript and Presenter's Notes

Title: Rose Hill


1
Dependability Benchmarking of VLSI
Circuits Cristian Constantinescu cristian.constan
tinescu_at_intel.com Intel Corporation
2
Outline
  • Neutron SER characterization of microprocessors
  • SER scaling trends
  • Experimental set-up
  • Experimental Results
  • Other sources of errors
  • Memory intermittent faults
  • Front side bus intermittent faults
  • Using environmental tests as dependability
    benchmarking tools
  • Temperature and Voltage Operating Test
  • ESD Operating Test
  • Summary
  • Backup
  • Linpack benchmark
  • References
  • Acknowledgement
  • Neutron SER characterization Bruce Takala, Steve
    Wander (LANSCE), Nelson Tam, Pat Armstrong (Intel
    Corp.)
  • Environmental testing John Blair, Scott
    Scheuneman (Intel Corp.)

3
Neutron SER Characterization of Microprocessors
4
Single Event Upsets
  • Single event upsets (SEU) are
  • induced by
  • Alpha particles generated during
  • radioactive decay of the package
  • and interconnect materials
  • Neutrons, protons, pions generated
  • by cosmic rays penetrating the atmosphere
  • SEU may induce errors both in storage elements
    and combinational logic
  • Frequency of occurrence of the particle induced
    induced errors soft error rate (SER)

5
SER Scaling Trends
  • SRAM SER per bit and chip Latch SER per
    bit and chip

Assumption SRAM/latch count increases 2x per
generation
6
Hadron Cascades
Main constituents of atmospheric hadron cascades
  • Neutrons represent 94 of the hadrons reaching
    sea level
  • For terrestrial applications it makes sense to
    benchmark the impact of
  • neutron SER

7
LANSCE Neutron Beam
  • Los Alamos Neutron Science Center (LANSCE)
  • Generates high-energy neutrons by spallation a
    linear accelerator generates a pulsed proton beam
    that strikes a tungsten target

Energy dependence of the natural cosmic-ray
neutron flux and the LANSCE neutron flux
8
Experimental Set Up
  • Itanium processor based server
  • Windows NT 4.0 operating system
  • Linpack benchmark
  • Performs matrix computations
  • Derives residues can detect silent data
    corruption (SDC)
  • Fission ion chamber to determine neutron fluence

9
Deriving MTTF
  • MTTF Tua/U
  • Tua duration of an equivalent experiment,
    taking place in unaccelerated conditions h
  • U total number of upsets (failures) over the
    duration of the experiment
  • Tua (Fcp Nc)/ Nf
  • Fcp total number of fission chamber pulses,
    over the duration of the experiment
  • Nc average neutron conversion factor
    neutrons/fission pulse/cm2
  • Nf cosmic-ray induced neutron flux at the
    desired geographical location and altitude
    neutrons/cm2/h

10
Experimental Results
  • Run Linpack benchmark for square matrixes of size
    800 and 1000
  • Completed 40 runs
  • Duration of one run 10 s 5 min
  • Failure types
  • Blue screen
  • Hang
  • Silent data corruption (SDC)

11
Experimental Results
  • Itanium processor MTTF due to neutrons, as a
    function of number of runs

12
Experimental Results
  • MTTF confidence intervals
  • SDC one event
  • Insufficient for statistical analysis

13
Practical Considerations
  • Error handling techniques differ greatly from one
    manufacturer to another
  • HW error detection and correction, e.g. ECC, is
    faster
  • FW/SW implemented recovery may be overwhelmed by
    an accelerated test (near coincident faults
    scenario)
  • Acceleration factor is an important variable
  • Failure prediction and automatic deconfiguration
    may lead to misleading results
  • Multiple experiments
  • Beam divergence
  • Beam attenuation

14
Other Sources of Errors
15
Memory Intermittent Faults
  • Intermittent faults are induced by unstable or
    marginal hardware
  • Intermittent shorts/opens
  • Manufacturing residuals
  • Timing faults

Number of memory single-bit errors reported by
193 systems over 16 months
Daily number of memory single-bit errors
reported by one system over 16 months
16
Front Side Bus Intermittent Faults
  • Front side bus (FSB) errors
  • Bursts of single-bit errors (SBE) on data path
  • SBE detected and corrected (data path protected
    by ECC)
  • Failure analysis results
  • Intermittent contacts at solder joints
  • Fault injection showed that similar faults
    experienced by control signals induce SDC

17
Using Environmental Tests as Dependability
Benchmarking Tools
18
Temperature and Voltage Operating Test
  • Ten systems were tested
  • Workload Linpack benchmark

70o C
25o C
-10o C
  • Profile of the test
  • 9 systems experienced SDC
  • SDC events 134 (90.5)
  • Detected errors 14 (9.5)
  • SDC preceded detected errors

19
Temperature and Voltage Operating Test
  • Distribution of the SDC events
  • Failure analysis results
  • Memory controller setup and hold-time violations

20
ESD Operating Test
  • 4 servers from 2 manufacturers
  • Workload Linpack benchmark
  • 30 test points per server
  • 20 positive and 20 negative discharges per test
    point
  • Air discharge 4 kV 15 kV
  • Contact discharge 8 kV
  • One server experienced SDC
  • 8 of the discharges targeted to the disk bay
    area (15 kV, air)
  • First ESD operating test to reveal SDC in a
    commercially available server

21
Summary
  • The need for dependability benchmarking is
    increasing
  • Wider use of COTS components in critical
    applications
  • Technology is a two edge sword
  • Higher performance
  • Higher rates of occurrence of the transient and
    intermittent faults
  • SDC is a real threat
  • We take for granted the correctness of the
    computer data
  • Dependability benchmarks should determine whether
    the circuits/systems under evaluation experience
    SDC
  • Fault injection techniques require in depth
    knowledge of the evaluated system
  • Appropriate for designers and manufacturers
  • Accelerated neutron tests and environmental tests
    are a black box approach
  • Capable of unveiling SDC
  • In depth knowledge of the system under test is
    not required
  • Linpack benchmark is available for free
  • Can be used both by manufacturers and independent
    evaluators

22
Backup
23
Linpack Benchmark
  • Example of Linpack output large residues
    indicate SDC

24
References
  • Neutron SER characterization of
    microprocessors, Proc. of the International
    Conference on Dependable Systems and Networks,
    Yokohama, Japan, June 2005, pp. 754-759.
  • Dependability benchmarking using environmental
    test tools, Proc. of the Reliability and
    Maintainability Symposium, Alexandria, VA, USA,
    January 2005, pp. 567 571.
  • Impact of deep submicron technology on
    dependability of VLSI circuits, Proc. of the
    International Conference on Dependable Systems
    and Networks, Washington, DC, USA, June 2002, pp.
    205-209.
Write a Comment
User Comments (0)
About PowerShow.com