Computational Limits of Reliability Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Limits of Reliability Evaluation

Description:

DARPA DARPA Computational Limits of Reliability Evaluation Smita Krishnaswamy, George F. Viamontes, Igor L. Markov, and John P. Hayes Univ. of Michigan, Advanced ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 29
Provided by: blal5
Category:

less

Transcript and Presenter's Notes

Title: Computational Limits of Reliability Evaluation


1
Computational Limits of Reliability Evaluation
  • Smita Krishnaswamy, George F. Viamontes,
  • Igor L. Markov, and John P. Hayes
  • Univ. of Michigan, Advanced Computer Architecture
    Lab
  • Los Alamos National Laboratory

2
Motivation
  • Problem addressed
  • Given probabilistic characterization of gate
    behavior,propagate SEU information to whole
    circuits
  • E.g., compute the overall error rate,
    averagedover all inputs(perhaps, from an input
    distribution)
  • E.g., find worst-case/best-case inputs
  • How difficult are such computations?
    (exact/approximate)
  • What information can realistically be computed?
  • How much accuracy can be achieved?

3
Loss of Accuracy Seems Inevitable
  • Computing/tracking everything is too hard
  • Accurate models/data are hard to obtain
  • Hard computations required
  • Optimization is at least as hard as evaluation
  • Applications matter rough estimation may not
    need as much accuracy as optimization
  • E.g., given limited area budget, which gates to
    harden?
  • Sensitivity fidelity versus accuracy
  • Approximate modeling vs approx. computation
  • Skip complicated models or skip hard computations?

4
Discussion
  • Minimal modeling of gates
  • Probability of being hit by a particle with Qcrit
  • Dependencies on input values
  • Simplified computation
  • Consider one path at a time
  • Consider one input at a time (sampling)
  • In this work computational aspects
  • Is it possible to handle all paths and all
    inputs?
  • Can we get enough fidelity to improve reliability?

5
Prior Work
  • Factors for latching transient errors
  • Electrical, logical, latching window masking
    Shivakumar 2002
  • Calculation of transient error probabilities for
    gates
  • parameters include gate area, neutron flux,
    switching voltage, altitude Mohanram Touba
    2002
  • In SERA, error rates of circuits are approximated
    with user-supplied inputs Zhang Shanbhag ICCAD
    2004
  • Fault tolerant architectures
  • NAND-multiplexing Von Neumann 1956
  • Reliability improvement by selectively adding
    redundancy, TMR Mohanram Touba ITC03
  • However, applying these methods still requires
    careful, accurate analysis

6
Probabilistic Transfer Matrix
output values
0
1
00
  • Ideal transfer matrix (ITM ) The function of a
    correct gate expressed as a matrix
  • We perturb the 0s and 1s and interpret them as
    probabilities
  • Probabilistic Transfer Matrix (PTM)
  • A matrix whose (j,k)th entry represents
    Poutput k input j
  • Levin Engin Cybernetics 1964
  • Patel, Markov Hayes IWLS03
  • Valid PTMS are stochastic matrices

01
10
11
input values
0
1
00
01
P(output1input10)
10
11
7
Error Representation
  • PTMs can describe different error behavior for
    each input combination
  • Indeed, the incidence of errors in practice may
    depend on input values
  • For some technologies zero-to-one errors are more
    common than one-to-zero
  • Deterministic (permanent) errors can also be
    represented by PTMs
  • Stuck-at faults
  • Wrong gates

8
Examples of PTMs
0
1
0
1
00
ITM
00
PTM1 S-a-1
01
01
10
10
11
11
0
1
0
1
PTM 2
PTM 3
00
00
Stochastic S-a-1 (one-way)
01
Wrong-gate (NAND?AND)
01
10
10
11
11
9
Circuit PTMs
  • Circuit PTMs created from gate PTMs with matrix
    algebra
  • serial composition matrix product
  • parallel composition tensor product
  • Given two matrices M (m by n) and N (o by p)
  • the Tensor Product M?N is an mo by np matrix
    whose entriesare given by p(kj)
    p(k1j1)p(k2j2)
  • Gives joint probabilities of all possible
  • combinations of independent
  • signal probabilities

10
Fanouts and Wire Permutations
  • Fanout PTM (0?00, 1?11)
  • Wire swap (01?10)(10?01)

00 01 10 11
0
1
00 01 10 11
00
01
10
11
11
Example Computing Circuit PTM
12
Complexity of PTM Calculations
  • Exponential worst-case space complexity
  • n-input m-output PTM takes space O(2mn)
  • Store PTMs as algebraic decision diagrams
    (ADDs)using the QuIDDPro library for lossless
    compression
  • Viamontes et. al. Quant. Inf. Proc. 2003
  • ADDs are variants of BDDs, used in synthesis
    Bahar,1997
  • Operations done on compressed forms, results
    come out compressed
  • Scalability (purely combinational circuits)
  • Current implementation scales up to circuit width
    50
  • Sufficient to handle regular fabrics (FPGAs,
    structured ASICs, etc)
  • Sufficient to handle many deeply pipelined
    circuits
  • Greater scaling ongoing work

13
ADD Representation of PTM
  • ri-row variables,
  • ci-column variables
  • Interleaved ordering beneficial for tensor
    products

c1
0 1
r0,r1
00
01
10
11
14
Problem Handling Non-Square Matrices
  • ADDs usually represent square matrices
  • PTMs are generally not square (gates have fewer
    outputs than inputs)
  • Obvious extension skipping DD variables
    does not work
  • Causes ambiguity (two matrices below have same
    ADD)
  • Multiplication algorithms choose the second
    interpretation,but the resulting matrix is not a
    valid PTM (not stochastic)

15
Padding Non-square Matrices with 0s
  • To facilitate matrix multiplication, use zero
    padding
  • The product of two zero-padded matricesis also a
    zero-padded matrix
  • However, tensor products do not preserve
    zero-padding

16
Solution 1 Permutation Method
  • The columns of the incorrect matrix can be
    permuted to obtain correct tensor
  • Permutation matrix itself is too large
  • Permutation matrix can be decomposed as tensor
    product of
  • I (identity) matrices
  • Larger identity matrices tensor products of
    smaller ones
  • Rperms , I with row variables permuted (same as a
    wire permutation matrix)
  • Number of row variables is log( rows)

17
Permutation Method
I
Rperm
Permutation
  • For gates with higher input to output ratio, use
    a series of these permutations
  • Recursively cut of non-contiguous zero-columns
    by half

18
Solution 2 Dummy Output Method
  • Add dummy outputs to make the number of row and
    column variables equal
  • Tensor with an identity andapply
    remove_redundant on an input variable
  • This adds an output but not an input
  • Use fan-in matrices to eliminate dummy variables
    and perform zero-padding
  • Fanin-matrix (abstracted identity matrix)
  • Abstraction summing
  • over a variable

00
01
10
11
Sum of cols0,1 is new col1 Cols0,1 only differ
in c1
19
Other Operations for PTM Manipulation
  • Abstraction Rows/Columns corresponding to the
    variable being zero and the variable being one
    are added together
  • Remove_redundant fanouts to the same level need
    not be represented twice
  • If two inputs signals are identical then delete
    rows where the two variables have different
    values (these rows are meaningless)
  • Can be implemented as a variation on abstraction

20
Example Dummy Output Method
  • Adding a dummy output Add the first input
    variable also as an output

000
  • Remove rows with different vals for 1st and 3rd
    index

001
010
011
100
  • Tensoring with I adds an input AND an output
  • Added input is redundant

101
110
resultant matrix
111
21
Example (continued)

3-2 FANIN
Zero-padded Result
22
Evaluation Algorithm
CurrSigs primary outputs While(CurrSigs!
Primary Inputs) For(i0iltCurrSigs.size()i)
Gategate_lookup(Currsigsi) //only returns
gate if all sinks to a sig are done
CurrLeveltensor(CurrLevel,Gate)
zero_track(CurrLevel) //either permutation or
dummy output method
remove_redundant(CurrLevel)
CircuitPTMCircuitPTM CurrLevel CurrSigs
CircuitPTM.inputs()
23
Circuit Reliability
  • For a circuit with ITM J, PTM M and input distro
    p(i)
  • reliability ?p(i)M(i,j)J(i,j)
  • This measure can be used to evaluate the
    reliability of a circuit made of components of
    varying robustness
  • Can efficiently implement this operation using
    ADDs

24
Experiment 1 Reliability Evaluation
  1. Calculate the ITM of standard benchmark circuits
    in BLIF
  2. Alter the individual gate PTMs by adding a
    probability of error to each input
  3. Recalculate the circuit PTM
  4. Calculate the reliability of the circuit by
    comparing the PTM to the ITM

25
(No Transcript)
26
Experiment 2 Gate Susceptibility
  • Find the most critical gates in a circuit by
    calculating the susceptibility of each gate
  • Calculating susceptibility
  • Add an error to the gate being evaluated
  • Leave all other gates ideal
  • Calculate the probability of error
    (1-reliability) of the entire circuit with only
    this gate having error
  • Find the top most critical gates and reduce their
    error probability (from 0.5 to .005), calculate
    improvement in reliability

27
Gate Susceptibility Data
Orig Top3 imp Top5 imp
C17 .864 .959 11 .98 13.4
Mux .907 .974 7.39 .985 8.6
Parity .603 .637 5.64 .666 10.4
xor5 .047 .068 46.2 .070 50.5
pm1 .375 .429 14.4 .469 25.1
28
Experiment 3 Analyzing von Neumanns NAND-MUX
Architecture
  • Each signal is repeated n times
  • The NAND levels act as simple majority
    gatesbetween levels of random permutations
  • Can relax assumptions used in analytical analysis
    and evaluate with PTMs

29
Numerical Evaluation of Fault Tolerance
  • PTM evaluation can be used determine
  • thresholds error value
  • required levels for NAND-MUX to be functional

Number of Levels Number of Levels Number of Levels Number of Levels Number of Levels
Error 2 4 6 8 10
.05 .8075 .778 .747 .719 .574
.02 .916 .9144 .9074 .9005 .8175
.005 .9741 .9795 .9789 .9784 .9544
30
Conclusions
  • It is possible to handle all inputs and all
    pathsin reliability evaluation
  • So far, small circuits only (may be sufficient
    for memories, FPGAs, structured ASICs, etc)
  • So far, for simple gate models only
  • Within reach time-dependent reliability
  • Applications quantifiable approximation
  • Deliberate simplifications to improve scalability
  • Bootstrapping faster methods
  • Applications analysis and optimization
  • Finding most critical componentsin small
    circuits and regular fabrics
  • Hardening a small number of gates
  • Applications probabilistic test

31
Selected References
  • R.I. Bahar et al., Algebraic Decision Diagrams
    and their Applications," J. of Formal Methods in
    Sys. Design10, no.2/3, April-May 1997, pp.
    171-206.
  • V.L.Levin,Probability Analysis of Combination
    Systems and their Reliability,' Engin.
    Cybernetics, no 6. Nov-Dec. 1964, pp. 78-84.
  • K. Mohanram and N. A. Touba, Cost-Effective
    Approach for Reducing Soft Error Failure Rate in
    Logic Circuits,'' ITC, 2003, pp. 893-901.
  • K.N.Patel, J.P.Hayes, and I.L. Markov,
    Evaluating Circuit Reliability Under
    Probabilistic Gate-Level Fault Models,'' IWLS May
    2003, pp. 59-64.
  • P. Shivakumar, M. Kistler, et. al, Modeling the
    Effect of Technology Trends on Soft Error Rate of
    Combinational Logic" Intl. Conf. on Dependable
    Systems and Networks, 2002, pp. 389-398.
  • G. F. Viamontes, I. L. Markov and J. P. Hayes,
    Improving Gate-Level Simulation of Quantum
    Circuits'',Quantum Information Processing, vol.
    2(5), October 2003, pp. 347-380.
Write a Comment
User Comments (0)
About PowerShow.com