Selfcalibrating Online Wearout Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Selfcalibrating Online Wearout Detection

Description:

Electrical Engineering and Computer Science. University of Michigan ... Electrical Engineering and Computer Science. OBD HSPICE Model. Post-breakdown leakage modeling ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 24
Provided by: shugua
Learn more at: https://microarch.org
Category:

less

Transcript and Presenter's Notes

Title: Selfcalibrating Online Wearout Detection


1
Self-calibrating Online Wearout
Detection Authors Jason Blome Shuguang
Feng Shantanu Gupta Scott Mahlke
MICRO-40 December 3, 2007
2
Motivation
  • Designing Reliable Systems from Unreliable
    Components
  • - Shekhar Borkar (Intel)

Failures will be wearout induced
More failures to come
3
Current Approaches
  • Traditional
  • Design margins
  • Burn-in
  • Detection based on replication of computation
  • TMR (Tandem/HP NonStop servers)
  • DIVA (Bower, MICRO05)
  • Prediction utilizes precise analytical models
    and/or sensors
  • Canary circuits (SentinelSilicion, RidgeTop)
  • RAMP (Srinivasan, UIUC/IBM)

Impractical
Static
Costly
4
Wearout Mechanisms
  • Many failure mechanisms have been shown to be
    progressive
  • Hot carrier injection (HCI)
  • Negative Bias Temperature Inversion (NBTI)
  • Electromigration (EM)
  • Oxide Breakdown (OBD)

5
Objective
  • Propose a failure prediction technique that
    exploits the progressive nature of wearout
  • Monitor impact on path delays
  • Prediction
  • Monitors evolution of wearout
  • Proactive
  • enables failure avoidance/mitigation
  • Continuous feedback
  • False negatives and positives
  • Detection
  • Identifies existing fault
  • Reactive
  • enables failure recovery
  • End-of-life feedback
  • False negatives

6
Oxide Breakdown (OBD)
  • Accumulation of defects leads to a conductive
    path

Percolation Model Stathis, JAP06
7
OBD HSPICE Model
  • Post-breakdown leakage modeling

BSIM4.6.0, 06
8
Characterization Testbench
  • 90nm standard cell library

tcircuit
tcell
9
Impact on Propagation Delay
10
Delay Profiling Unit (DPU)
input signal
1
1
1
Latency Sampling
uArch Module
1
1
11
TRIX Analysis
Magnitude of divergence between TRIXglobal and
TRIXlocal reflects amount of degradation
12
TRIX Analysis Details
  • Exponential Moving Average (EMA)
  • Triple-smoothed Exponential Moving Average

13
Noisy Latency Profile
Percent Nominal Delay ()
Increasing Age
14
DPU with TRIX Hardware
TRIXl Calculation
input signal
Latency Sampling
Prediction
TRIXg Calculation
15
Wearout Detection Unit (WDU)
TRIXl Calculation
Latency Sampling
Prediction
TRIXg Calculation
16
Evaluation Framework
Gate-level Processor Simulator
OR1200 Verilog
Synthesis and Place and Route
90nm Library
Timing, Power, and Temperature Simulations
MediaBench Suite
Workload Simulator
OBD Wearout Model
HSPICE Simulations
Wearout Simulator
17
WDU Accuracy
18
WDU Overhead
19
WDU Overhead
20
Long-term Vision
  • Introspective Reliability Management (IRM)
  • Intelligent reliability management directed by
    on-chip sensor feedback
  • Prospective sensors
  • Delay (WDU)
  • Leakage/Vt
  • Temperature

21
Introspective Reliability Management
22
Conclusions
  • Many progressive wearout phenomenon impact
    device-level performance.
  • Its possible to characterize this impact and
    anticipate failures
  • WDU performance
  • Failure predicted within 20 of end of life
    (tunable)
  • Area overhead lt 3 (hybrid)
  • Low-level sensors can be used to enable
    intelligent reliability management

23
Questions?
?
Write a Comment
User Comments (0)
About PowerShow.com