Title: Opportunities and Challenges for Better Than WorstCase Design
1Opportunities and Challengesfor Better Than
WorstCase Design
- Todd Austin (presenter)
- Valeria Bertacco
- David Blaauw
- Trevor Mudge
- University of Michigan
- razor_at_eecs.umich.edu
2Traditional Worst-Case Design
3Better Than Worst-Case Design
4Addressing Challengesin the Nanometer Regime
- Design complexity
- Billions and billions of transistors lead to
untenable designs - Soft errors upsets in logic and memory
- Cosmic rays, alpha particles, neutrons, etc
- Uncertainty in design parameters
- Process and temperature variation, supply noise
- Power/performance demands
- Bounding performance, area, and battery life
5Example BTWC DesignDIVA Checker
Performance
Correctness
Core
Checker
speculative instructions in-order with PC,
inst, inputs, addr
EX/ MEM
IF
ID
REN
REG
SCHEDULER
CHK
CT
- All core function is validated by checker
- Simple checker detects and corrects faulty
results, restarts core - Checker relaxes burden of correctness on core
processor - Tolerates design errors, electrical faults,
defects, and failures - Core has burden of accurate prediction, as
checker is 15x slower - Core does heavy lifting, removes hazards that
slow checker
6Another BTWC DesignRazor Logic
5
9
3
9
MEM
4
9
clk
clk
clk_del
- Double-sampling metastability tolerant latches
detect timing errors - Second sample is correct-by-design
- Microarchitectural support restores state
- Timing errors treated like branch mispredictions
7Distributed Pipeline Recovery
Cycle
0
1
2
3
4
5
6
7
8
9
inst1
inst2
inst3
inst4
inst5
inst6
inst2
inst7
inst8
inst3
inst4
IF
ID
EX
MEM (read-only)
WB (reg/mem)
error
bubble
error
bubble
error
bubble
bubble
error
recover
recover
recover
recover
Flush Control
flushID
flushID
flushID
flushID
- Builds on existing branch prediction framework
- Multiple cycle penalty for timing failure
- Scalable design as all communication is local
8Opportunities for CAD
- Key observation
- Infrequent faults in the core design are
tolerable. - Opportunities
- Focus only on the critical components, no need to
verify ad infinitum - Optimize performance/power for the most common
scenarios (typical-case optimization)
9Razor OpportunityTypical-Case Energy Reduction
- Energy reduction can be realized with a simple
proportional control function - Control algorithm implemented in software
10Energy/Performance Characteristics
1
Energy
IPC
50
Decreasing Supply Voltage
11Razor OpportunityTypical-Case Optimized Adder
12Carry Propagations for Random Data
Probability
Bit Position
Carry Distance
13Carry Propagations for Typical Data
Probability
Bit Position
Carry Distance
14Typical Case Optimized Adder
15Benefits of Typical Case Optimization
- Typical-case performance much better than worst
case - Especially for typical-case optimized design
16Core CAD RequirementObservability of
Circuit-Level Characteristics
- Circuit-Aware Architectural Simulator efficiently
melds circuit simulation with architectural
simulation
17Additional CAD Opportunities
- For synthesis
- Typical-case library characterization (e.g., pdf
of delay) - Synthesize design for target performance, power,
etc - TCO-style optimizations possible for
macro-modules - For verification
- Full formal verification for checker components
- Profile-directed simulation-based verification
for core - For testing
- Checker component can facilitate software-based
manufacturing test of core components
18Conclusions
- Better than worst-case design abandons
traditional worst-case design constraints - Couples complex designs with checkers
- Enables CAD opportunities for typical-case
optimization - Requires tool support for observability,
synthesis and verification - For more information
- http//www.eecs.umich.edu/razor
- First tutorial at DATE, Munich, March 2005
19Example BTWC Design Razor Logic
- Goal reduce voltage margins with in-situ circuit
timing error detection and correction - Approach
- Tune processor voltage based on error rate
- Eliminate margins, run below critical voltage
- Trade-off power savings vs. overhead of
correction - Technique is targeted toward embedded CPUs
20Kogge-Stone Adder