An Architectural framework for evaluating impact of soft errors in arithmetic units - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

An Architectural framework for evaluating impact of soft errors in arithmetic units

Description:

Concurrent error detection techniques will work well for these adder designs ... Cache structures can be well protected by parity, ECC, etc. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 29
Provided by: cse56
Category:

less

Transcript and Presenter's Notes

Title: An Architectural framework for evaluating impact of soft errors in arithmetic units


1
An Architectural framework for evaluating impact
of soft errors in arithmetic units
  • Rajaraman R
  • Jie Hu

2
Overview
  • Introduction
  • Circuit level estimation (Qcritical)
  • Single bit adders
  • Four bit adders
  • Results and optimizations
  • Converting Qcritical to SER
  • Architectural simulations
  • Results and solutions
  • Conclusion and future work

3
Introduction
  • Data paths and combinational logic are inherently
    resistant to soft errors due to shivakumar02
  • Logical masking ( effect remains same as
    technology scales)
  • Electrical masking (effect reduces)
  • Latching window masking (effect reduces)
  • Their susceptibility is increasing as
  • Pipeline depth increases
  • Devices scale

4
Introduction
  • In this work we present
  • Circuit level estimation of soft errors for
  • Single bit adders
  • Four bit adders
  • Discuss solutions based on concurrent error
    detection and other solutions.
  • Architectural simulations (Jie Hu)
  • Architectural solutions for data path error
    detection and correction

5
Circuit level estimation
  • QCritical estimated by having a FF at output
  • Here we estimate the Qcritical for
  • Single bit adders
  • Mirror adder
  • Transmission Gate based adder
  • Half adder based full adder
  • XOR based full adder
  • Four bit adders
  • Ripple carry adder
  • Carry skip adder
  • Prefix adder (Brent-kung)

6
Single bit adders
Mirror adder
Transmission gate adder
Nodes evaluated for Qcritical
7
Single bit adders
Half adder based FA
XOR based FA
Cin
A
G
G
B
P
Nodes evaluated for Qcritical
8
Four bit adders
  • Ripple carry adder
  • Flip at the lowest FA cell, will take very high
    Qcritical to affect the MSB
  • But it affects all sums in worst case scenario
  • May often result in multi bit errors

9
Four bit adders
  • Carry skip (bypass) adder
  • Has a faster block propagate signal and logic
    which will have lower Qcritical
  • But lower multi-bit errors

10
Four bit adders
  • Brent-kung adder
  • Qcritical for S3 might be lower than RCA but
    higher than CSA for worst case scenario
  • Trade-off between multi-bit errors and Qcritical
    value could be studied for different prefix adder
    designs

0
1
2
3
S
S
S
S
)
)
)
)
0
1
2
3
B
B
B
B
,
,
,
,
0
1
2
3
A
A
A
A
(
(
(
(
11
Results
HA based FA
mirror
TG based
12
Results
mirror
HA based FA
TG based
13
Optimization techniques
  • Concurrent error detection techniques will work
    well for these adder designs
  • Mitra00 proposes that design diversity in
    designs results in more robust designs
  • With the existing trade-offs in the various adder
    designs, diversity could be used to build robust
    CED design.
  • Other techniques include
  • Arithmetic coding techniques like carry
    checking/parity prediction adders Nicolaidis03
  • Other redundancy techniques like time redundancy
    Nicolaidis99

14
Converting Qcritical to SER
  • We know
  • SER a Nflux CSexp (Qcritical /Qs)
  • Hazucha, 2000
  • Nflux- Neutron Flux (difficult to find)
  • CS- Cross Sectional area
  • Qcritical Critical charge necessary for a Bit
    Flip
  • Qs Charge Collection Efficiency (difficult to
    find)
  • Thus only Qcritical is easiest to determine!!
  • Working on finding other metrics to find SER

15
References (Circuits)
  • Nicolaidis99 Nicolaidis, M. Time redundancy
    based soft-error tolerance to rescue nanometer
    technologies, Proceedings of 17th IEEE VLSI
    Test Symposium, 25-29 April 1999 Page(s) 86 -94
  • Nicolaidis03 Nicolaidis, M. Carry
    checking/parity prediction adders and ALUs IEEE
    Transactions on Very Large Scale Integration
    (VLSI) Systems,, Volume 11 Issue 1 , Feb. 2003
    Page(s) 121 -128
  • Mitra00 Mitra, S. McCluskey, E.J. Which
    concurrent error detection scheme to choose ?
    Proceedings of international Test Conference,
    3-5 Oct. 2000 Page(s) 985 -994
  • Shivakumar 02   Shivakumar, P. Kistler, M.
    Keckler, S.W. Burger, D. Alvisi, L. Modeling
    the effect of technology trends on the soft error
    rate of combinational logic Proceedings of
    International Conference on Dependable Systems
    and Networks, 23-26 June 2002 Page(s) 389 -398
  • Hazucha, 2000 Hazucha P. and Svensson C.
    Impact of CMOS Technology Scaling on the
    Atmospheric Neutron Soft Error Rate IEEE
    Transactions on Nuclear Science, Vol. 47, No. 6,
    Dec. 2000.

16
Evaluate the Impact of Soft Errors on Processor
Datapath---- A Focus of Int. FUs
17
Motivation
  • Soft error a big reliability problem in
    processor design
  • Processor components are more susceptible to soft
    errors in new technology
  • Cache structures can be well protected by parity,
    ECC, etc.
  • Combinational logic time/space redundancy
  • Plenty of work on error detection/recovery
    654
  • How soft errors in combinational logic affect the
    system?
  • Any better cost-effective reliable designs?

18
Superscalar Processor Core
19
A Focus on Functional Units
  • Integer Functional Units have a wide range of
    impact on program execution
  • Conditions of branches
  • Addresses of data references
  • Addresses of function references
  • Floating-point Functional Units
  • Mostly for numerical operations
  • Less impact on the execution of program
  • Todays focus Int. ALU (Adder, Logic), Int.
    MULT/DIV

20
Experimental Setup
21
Error Injection Scheme
  • Error Injection based on hardware
  • Need circuit details of functional units
  • Diff. processors may use diff. design styles
  • Difficult to get the error-infected results at
    architectural level
  • A more effective way
  • Introduce soft errors at one of its source
    operands
  • Restore the original source operand value if the
    result reg. no is diff. from that source reg.
  • Experimental scheme
  • Only consider SEU (single event upset)
  • Simulating a maximum 0.5 Billion committed inst.

22
Addition Operations
  • Inject soft errors to addition operations at a
    fixed interval (10,000 cycles) till program
    execution crashes
  • Error accumulation results in program crashes
  • Different applications have different resistance
    to errors
  • Additional exp. Single error at diff. cycle time
    didnt crash

23
Addition Uniform Error Rate
  • Introduce soft errors at different uniformly
    distributed probabilities
  • For all benchmarks, error rate of 0.0001 is the
    most sensitive point

24
Logic Uniform Error Rate
  • 175.vpr (FPGA Placement and Routing) is more
    sensitive to errors happened during logic
    operations
  • 256.bzip2 (Compression) can survive from large
    number of errors

25
ALU Uniform Error Rate
  • A combinational effect of errors in both addition
    and logic operations
  • All benchmarks show an exacerbated behavior
    except 175.vpr

26
MULT/DIV Uniform Error Rate
  • In general, programs can still survive from
    errors happened in MULT/DIV operations due to
    their less number and less relationship to the
    program execution control.

27
Conclusions and Ongoing Work
  • Conclusions
  • Errors in different Int. operations have
    different impact on the program execution
  • Different programs have different behavior under
    error injection
  • Control-intensive (lower IPB) applications are
    more sensitive to logic operation errors
  • Multiplication/Division operations have less
    impact on program execution
  • Future work
  • More detailed characterization of program
    behavior under error impact
  • Modeling the soft error rate from Qcritical for
    arithmetic units
  • Use the above information to develop some
    selective error protection/detection/recovery
    schemes

28
References
  • 1 Ghani A. Kanawati, Nasser A. Kanawati, and
    Jacob A. Abraham. FERRARI A Flexible
    Software-Based Fault and Error Injection System.
    IEEE Transactions on Computers, 44(2)248-260,
    February 1995.
  • 2 S Mitra and E. J. McCluskey. Which concurrent
    error detection scheme to choose ? In Proceedings
    of International Test Conference, pages 985 -
    994, October 2000.
  • 4 Nahmsuk Oh, Subhasish Mitra, and Edward J.
    McCluskey. ED4I Error Detection by Diverse Data
    and Duplicated Instructions. IEEE Transactions on
    Computers, 51(2)180-199, February 2002.
  • 5 Joydeep Ray, James C. Hoe, and Babak Falsa.
    Dual Use of Superscalar Datapath for
    Transient-Fault Detection and Recovery. In Proc.
    the 34th Annual International Symposium on
    Microarchitecture, 2001.
  • 6 E. Rotenberg. AR-SMT A microarchitectural
    approach to fault tolerance in micro- processors.
    In Proceedings of the 29th Fault-Tolerant
    Computing Symposium, June 1999.
  • 8 J. F. Ziegler et al. IBM experiments in soft
    fails in computer electronics (1978 - 1994). IBM
    Journal of Research and Development,, 40(1)3-18,
    1996.
Write a Comment
User Comments (0)
About PowerShow.com