Static and Dynamic Fault Diagnosis - PowerPoint PPT Presentation

About This Presentation
Title:

Static and Dynamic Fault Diagnosis

Description:

In the distributed diagnosis model there is no central controller, and all good ... Distributed diagnosis is reducible to the 'cooperative collect' problem, and can ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 34
Provided by: richard171
Learn more at: https://cis.temple.edu
Category:

less

Transcript and Presenter's Notes

Title: Static and Dynamic Fault Diagnosis


1
Static and Dynamic Fault Diagnosis
  • Richard Beigel
  • Univ. Illinois at Chicago
  • and DIMACS

2
Nonstandard computing architectures
  • Perceptrons and small-depth circuits
  • Optically interconnected multiprocessors
  • DNA computing

Self-diagnosing Systems
3
Brief history of system-level fault diagnosis
  • Preparata et al 67
  • static, nonadaptive
  • Nakajima 81
  • static, adaptive, serial
  • Hakimi Nakajima 84
  • static, adaptive, parallel

4
Recent advances in system-level diagnosis
  • Distributed diagnosis
  • Diagnosing intermittent faults
  • Diagnosis with errors
  • Fast parallel diagnosis of static faults
  • Ongoing diagnosis and repair of dynamic faults

5
Fault diagnosis problem
  • Given n processors
  • a primitive by which each processor can test any
    other
  • a reliable external controller that observes test
    results
  • Determine which are good and which are faulty
  • Assume perfect communication in a complete network

6
Whats so hard about that?
Say Ah
Ha Ha!
OK, you pass
Faulty processors may give incorrect test results
7
Possible test results
8
A majority of processors must be goodfor
diagnosis to be possible
Were all good Theyre all faulty
Were all good Theyre all faulty
9
Serial diagnosis of static faults
  • n processors, at most t faults, t lt n/2
  • Nonadaptive diagnosis
  • n(t1) tests are necessary and sufficient
  • Preparata et al 67
  • Adaptive diagnosis
  • nt-1 tests are necessary and sufficient
  • Nakajima 81

10
Distributed diagnosis of static faults
  • In the distributed diagnosis model there is no
    central controller, and all good processors must
    learn the status of the other processors.
  • Distributed diagnosis is reducible to the
    cooperative collect problem, and can be solved
    with tests Aspnes-Hurwood 96

11
INTERMITTENT FAULTS AND ERRORS
  • Work in progress by Beigel and Fu

12
Intermittent faults
  • An intermittent fault may appear faulty in some
    tests and good in others
  • We cannot hope to diagnose intermittent faults as
    such because they might exhibit consistent
    behavior in all tests
  • Goal correctly diagnose all other processors

13
Errors
  • An error is a misdiagnosis by a good processor.
  • Note the similarity to an intermittent fault

faulty
good
good
14
Results
  • In rounds, we can perform static diagnosis
    assuming that a majority of the processors are
    good and at most t of them are intermittently
    faulty.
  • In rounds, we can perform static diagnosis in
    the presence of errors. Assuming at most t
    errors per round, the results will be within
    of a correct diagnosis.

15
PARALLEL DIAGNOSIS OF STATIC FAULTS
  • Perform many tests simultaneously

16
Parallel diagnosis of static faults
  • 84 Hakimi Schmeichel O(n/logn)
  • 90 S H Otsuka Sullivan O(logn)
  • 89 Beigel Kosaraju Sullivan O(1)
  • 93 Beigel Margulis Spielman 32
  • 94 Beigel Hurwood Kahale 10
  • best lower bound 5

17
Digraphs
  • tester testee
  • testing round directed matching

18
SHOS 90 generates a large mutual admiration
society
  • MAS strongly connected component with all good
    edges
  • Either
  • all nodes good, or
  • all nodes faulty

g
g
g
g
g
g
g
g
g
g
19
SHOS 90O(logn) pairing algorithm
  • Pair up processors
  • Pair up pairs
  • Pair up fours

20
What about processors that dont like each other?
  • Build one chain for each good processor we found
    (4 rounds)
  • Most chains must have a good processor in each
    level (count!)
  • Total 4 1 rounds

21
Beigel-Margulis-Spielman 94
  • non (32 rounds)
  • Find several MASs of size including
    at least one good MAS
  • Large MASs test each other and all remaining
    processors in 4 rounds
  • constructive (84 rounds)
  • Find several MASs of size including
    at least one good MAS
  • Large MASs test each other and all remaining
    processors in 6 rounds

22
Expander graphs guarantee a good big MAS
  • In the Cayley graphs of Margulis and LPS with
    p37, every n/2-node induced subgraph contains a
    strong component of size
  • (cf Alon Chung 88, who find long paths)
  • degree of undirected graph 38
  • 78 directed matchings cover graph
  • 78 6 84 rounds

23
Random graphs guarantee a good big MAS
  • If G consists of 14 directed Hamiltonian paths on
    n vertices then, whp, every n/2-node induced
    subgraph contains a strong component of size
  • 28 directed matchings cover graph
  • 28 4 32 rounds

24
Beigel-Hurwood-Kahale 95 speeds up BMS 94
  • In k1 rounds build MASs of size
  • also build one chain of dont-likes
  • each MAS can be in simultaneous tests
  • Perform Gs directed matchings in 1 round
  • Process chain in 2 or 3 more rounds
  • Constructive 13 rounds. Non 10 rounds.

25
Lower boundUpper bound for smaller t
  • n processors, at most t faults
  • If 5 rounds are necessary
  • If 4 rounds suffice
  • algorithm uses lower-degree expanders

26
DIAGNOSIS AND REPAIR OF DYNAMIC FAULTS
  • Processors fail each round,
  • but algorithm may order repairs

27
Ongoing diagnosis and repair of dynamic faults
  • Processors may fail each round, but algorithm may
    order repairs
  • In each round
  • 1. perform tests
  • 2. direct that up to t processors are repaired
  • 3. at most t processors fail
  • Goal bound number of faults at all times

28
Results for n processorsat most t failures per
round
  • When t gt 70 and n gt 376tlogt 50t, we can
    maintain n - 64tlogt - 10t good processors at all
    times
  • This works even if the number of faults exceeds
    n/2
  • When n 640 and t 1, we can maintain 520 good
    processors at all times.

29
Whys this hard?
  • We cant determine the status of a chosen
    processor because its testers might fail right
    before we choose them
  • Mutual admiration societies dont work either

30
SIFT and WINNOW
  • SIFT finds a large set G consisting of processors
    that were good when SIFT started running, and a
    small set F containing some faulty processors
  • WINNOW uses G to diagnose most of the faulty
    processors in F
  • Algorithm SIFT, WINNOW, repair, repeat

31
SIFT algorithm
  • Let r 2logt
  • In 2r rounds form undirected hypercubes of size
  • Put MASs into G, others into F
  • MASs must have been entirely good at start of
    SIFT, and are still mostly good

32
WINNOW algorithm
  • Choose a processor P in F
  • For 2logt rounds,
  • test P and every processor that has tested P so
    far, using testers in G
  • If the tests always call P faulty but dont call
    any of the others faulty then we can be sure that
    P really is faulty
  • Most old faults are diagnosed, but 4tlogt new
    ones could accumulate.

33
Summary
  • We have efficient algorithms for
  • diagnosis in the presence of a small number of
    intermittent faults
  • diagnosis with a small number of diagnosis errors
  • parallel fault diagnosis
  • ongoing diagnosis of dynamic faults
Write a Comment
User Comments (0)
About PowerShow.com