Review last week - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Review last week

Description:

Logical refers to the Hyperthreading side, physical means core. ... multithreading (SMT) is the same concept as Intel's Pentium 4 'hyper-threading' ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 41
Provided by: danielort
Category:

less

Transcript and Presenter's Notes

Title: Review last week


1
Review last week
  • The software problem
  • Robust SW coding techniques
  • Regression testing
  • Reliability models for software
  • Redundancy in Software
  • Reliability of N-versioning
  • Software rejuvenation

2
Today
  • Reliability of networks
  • Hardware related FTC techniques
  • Watchdog Techniques
  • Redundancy in time (Re-execution)
  • RESO
  • Processes, threads
  • Superscalar, CMP,SMT
  • Research on FT microarchitectures
  • AR-stream, DIVA
  • Other error detection mechanisms in HW
  • BIST

3
Reliability of Networks
  • Based on graph theory nodes represent computers,
    branches represent communication links
  • Simplest model assumes nodes do not fail but
    links do.
  • Link failures may be due to traffic congestion or
    physical failures
  • Path is a collection of branches that provide
    communications between specific pair of nodes
  • In general we are interested in knowing
  • RallP(all nodes are connected)
  • RstP(nodes s and t are connected)
  • RkP(k nodes are connected)

4
Reliability of Networks
Simple state space enumeration
b
a
1
5
4
6
2
c
3
d
Represent all possible ways to go from node a to
b considering that links fail
Prob. 1 link failure
Prob. 2 links failure
If all links are equal and pprob. of being
up qprob. being down
If p0.9 and q0.1 then Rab0.997
5
Reliability of Networks
b
a
  • To improve network reliability we can increase p
    or add more branches to the network
  • There are other more efficient methods to compute
    network reliability
  • Cut sets
  • Graph Reduction

1
5
4
6
2
c
3
d
6
Cut Sets
b
a
Cut sets A group of links that break all paths
between s and t when they are removed from the
graph (sa, tb in the example graph) C1145 C
2162 C31563 C41234
1
5
4
6
2
d
c
3
R1-P(C1 or C2 or C3 orCJ) Rab1-P(C1 or C2 or
C3 or C4) Rab1-P(145 or 162 or 1563 or
1234) Rab1-P(145)P(162)P(1563)P
(1234) P(12456)P(13456)P(1234
5)P(12356)P(123456)P(123456)

P(A or B) P(A) P(B) - P(A and B)
7
Primary Graph Reductions
  • Graph reductions facilitate calculation
  • Series sequence of edges are required
    simultaneously combine with axiom of
    probability
  • P(A?B) P(A)P(B)
  • Parallel network is operational if any of these
    edges are operational combine with axiom of
    probability
  • P(A?B) P(A) P(B) P(A?B)

serial
S
.9
.9
A
B
.9
.9
T
Serial reduction
P(A?B) .81.81-(.81.81) .9639
Parallel reduction
8
Watchdog Techniques
  • Key concept
  • A process or processor is checked by another
    hardware (normally) unit of its actions such as
    if the process is still active, alive, not
    executing incorrect paths during execution, etc.

9
Watchdog Timers
  • Check for aliveness
  • Processor resets the timer at certain interval or
    on certain conditions
  • Timer raises error flag if not reset before it
    overruns

10
Watchdog Timers (contd.)
  • Check for timeout
  • Processor sends a message and starts a timer, the
    second processor must reply within this time
    (hardware/software implementation)

11
Watchdog Timers (contd.)
  • Applications
  • Processor control systems (chemical, mechanical
    and other control systems)
  • Switching systems messages sent or received
    often await certain length of time before they
    are repeated
  • Networks email messages often have timeouts
    associated with them

12
Watchdog Processors
  • Consider the following simple architecture

Watchdog can Observe the address bus Observe
the data Observe instructions Check the flow of
program control
Need to know what kind of errors can occur
13
Watchdog Control flow checking
  • Some studies have found that 60 of all transient
    faults could be detected by monitoring control
    flow
  • Control flow basic principle
  • Analyze the program and extract control
    information
  • Branch free intervals
  • Subroutine calls
  • Assign signatures to branch free intervals and
    provide these signatures to the watchdog
    processor to check these values
  • Signatures can be checksums of instruction opcodes

14
Watchdog Control flow checking (contd.)
Watchdog Receive start Observe instr.
flow Calculate signature Check with stored
signature
Program Start branch free code End branch free
code
15
Watchdog Mem access
  • What to do about memory/data errors
  • Use ECC
  • AMD Opteron, Intel Pentium D multicore processors
    use ECC techniques to avoid transient errors in
    memory access
  • Few other methods using watchdog techniques
  • Check for non existent memory addresses
  • Check for out of range addresses

16
Fault Detection in Complex Processors
  • High density and complexity of current processors
    increases the probability of occurrence of
    transient, intermittent and permanent faults
  • Diverse techniques are used to detect these
    faults
  • RESO
  • Re-excution
  • BIST

17
Re-execution with Shifting operands (RESO)
  • Re-execute the same arithmetic operations, but
    shifting the operands
  • Goal detect errors in ALU
  • Example shift left by 2
  • 1 0 1 0 1 0 X X
  • 1 0 0 1 0 1 X X
  • 0 0 1 0 1 1 X X
  • By comparing output bit 0 of the first execution
    and output bit 2 of the shifted re-execution, we
    detect an error in the ALU, since they should be
    equal

error
18
Re-execution
  • Replicate the actions on a module either on the
    same module (temporal redundancy) or on spare
    modules (temporal spatial redundancy)
  • Good for detecting and/or correcting transient
    faults
  • Transient error will only affect one execution
  • Can implement this policy at many different
    levels
  • ALU
  • Thread context
  • Processor
  • System

19
Race Conditions
  • In concurrent applications race conditions may
    happen
  • A race condition is a bug that occurs when the
    outcome of a program depends on which of two or
    more threads reaches a particular block of code
    first. Running the program many times produces
    different results, and the result of any given
    run cannot be predicted.
  • Re-execution of the same threads may be used to
    detect a race condition.

20
Break
21
Re-execution with Processes
  • Idea Use redundant processes to detect errors
  • Problem in a uniprocessor serialization,
    slowdown factor of 2
  • In a multicore/multiprocessor, we can execute
    multiple copies of the same process
    simultaneously on 2 processors and have them
    periodically compare their results
  • Almost no slowdown, except for comparisons
  • Disadvantage not using the other processor to
    perform non-redundant work

Process
Process
CPU
Check errors
Process
Process
CPU
CPU
Check errors
22
Current Multi-Core Procesors
  • A multi-core CPU combines independent processors
    (cores) onto a single silicon chip.
  • Intel Distinguishes between logical and
    physical processors
  • Logical refers to the Hyperthreading side,
    physical means core.
  • An Intel Dual-Core processor has two physical
    processors in the same chip package (Paxville)
  • AMD Uses the concept of logical processor count
    to refer to multiple cores existing within the
    same chip package.
  • Dual-core Opteron and AMD64 (X2) dual-core

23
Shared Memory Multiprocessor Architectures
Athlon 64FX2
Pentium D
24
Past, Present and the Future?
Basic Multicore IBM Power5
Traditional Multiprocessor
Integrated Multicore 16 Tile MIT Raw
PE
PE
PE
PE




Memory
Memory
Memory
Memory
25
Re-execution of microinstructions
Superscalar UniProcessor Microarchitecture
Pipleline Stages IF



ID



RD



( in order )
Dispatch
Buffer
Re-execute instructions on different Functional
Units
Drawback -Tests only FUs not whole pipeline
( out of order )
ALU
MEM1
FP1
BR
EX
MEM2
FP2
FP3
( out of order )
Reorder
Buffer
( in order )
WB



26
Re-execution with Threads
  • Use redundant threads to detect errors
  • Many current superscalar microprocessors are
    multithreaded ( Intel Pentium4, IBM Power5,
    Compaqs Alpha21464,Suns UltraSparc 3)
  • Each processor can run multiple processes or
    multiple threads of the same process
  • Can re-execute a program on multiple thread
    contexts, just like with multiple processors
  • Better performance than re-execution with
    multiple processors, since the comparison can be
    performed on-chip
  • Lower cost to use an extra thread context rather
    than extra processor

27
SMT
  • Simultaneous multithreading (SMT) is the same
    concept as Intels Pentium 4 hyper-threading
  • Main idea of SMT
  • Improve efficiency of a superscalar processor by
    exploiting thread level parallelism (TLP) and
    instruction level parallelism (ILP) at same time
  • Threads are generated by a compiler or OS
    (processes)
  • According to Intels data SMT provides 30 of
    improvement at the cost of 5 more chip area

28
SMT - Flow of Instructions
Thread 1
Thread 2
Thread 3
Thread 4
29
Re-execution with Simultaneous Multithreaded (SMT)
  • Motivation (Rotenberg 99)
  • Increasingly high clock rates and chip density
    may cause transient errors in high performance
    microprocessors
  • High cost of multiprocessor (at that time)
  • Active stream/redundant stream Simultaneous
    Multithreading (SMT)
  • Low overhead, broad coverage of transient faults
    and some permanent faults
  • In AR-SMT, two explicit copies of the program run
    concurrently on the same processor resources

30
Re-execution with Simultaneous Multithreaded (SMT)
  • A-stream is executed on SMT and results are
    committed in the delay buffer
  • R-stream executes on the SMT, delayed from the
    A-stream, by no more than the size of the delay
    buffer
  • R-stream results are compared to A-stream results
    in delay buffer, a fault is detected if results
    differ
  • SMT Pipeline
  • time-shared, in any given cycle, the pipeline
    stage is consumed entirely by one thread.
  • space-shared, every cycle a fraction of the
    bandwidth is allocated to both threads.

31
DIVA Dynamic Implementation and Verification
architecture
  • Permits detection and recovery of all functional
    and electrical faults
  • Extends the speculative mechanism to fault
    detection and recovery
  • Addresses recovery from permanent faults that
    maybe caused through design faults

32
A high level view of processor
DIVA processor
33
DIVA Overview
  • The processor is divided into a deeply
    speculative core and a functionally and
    electrically robust DIVA checker
  • Core has all the stages except the retirement
    stage
  • DIVA checker verifies correctness of the
    computations before saving in architected storage
  • Incorporates a watchdog timer that is used to
    restart the core if no forward progress is being
    made

34
DIVA - Architecture
  • Two pipelines
  • CHKcomp verifies integrity of all functional
    units computations
  • CHKcomm verifies register and memory
    communications between the instructions
  • CT Commit stage. Instructions are committed if
    both CHKcomp and CHKcomm pass

35
DIVA
  • EX results of instruction are recomputed
  • CMP Recomputed results are compared with the
    one from the core
  • RD Reads register/memory values from
    architected storage
  • CHK compares the values read to the input
    operands from the core
  • A bypass is provided in case an instruction
    immediately before is checking the values
    currently being written

36
Other Error Detection Mechanisms
  • Testing techniques are used to detect errors in
    critical components in a processor
  • BIST (Built in Self Test) random testing
    patterns of bits are applied to the circuit under
    test and the output is checked for errors
  • ATP (Automatic Test Pattern Generation)

37
Basic BIST Architecture
BIST Start
BIST Done
Test Controller
Pass/Fail
Output Response Analyzer (ORA)
Test Pattern Generator
System Outputs
Input Isolation circuitry
Circuit Under Test
System Inputs
38
Advantages of BIST
  • Can be used at all levels of testing
  • System level testing in field
  • No need for external test machines
  • Less I/O pins needed for testing
  • Burn-in Test made easy
  • No need for test vector development

39
Disadvantages of BIST
  • Area overhead susceptibility to manufacturing
    defects
  • Performance penalties
  • Extra efforts to designing and verifying proper
    operation of BIST at design level.
  • Additional risk in project

40
Summary
  • What were the main points of the lecture?
Write a Comment
User Comments (0)
About PowerShow.com