Timing Analysis Challenges for High speed CPU's at 90nm and below - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Timing Analysis Challenges for High speed CPU's at 90nm and below

Description:

Handles transparent latches and sequential transparent loops, both BFS and DFS ... Combinational loops are disallowed. Local self-resetting circuitry may exist ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 30
Provided by: avief
Category:

less

Transcript and Presenter's Notes

Title: Timing Analysis Challenges for High speed CPU's at 90nm and below


1
Timing Analysis Challenges for High speed CPU's
at 90nm and below
Agenda
  • ITRS Predictions Design Challenges
  • Timing Analysis at intel
  • Current issues and solutions
  • Mid-term challenges
  • Summary

Avi Efrati, Moshe Kleyner
2
The VLSI Chip in 2010...
Process Technology
25nm gate length
Transistors
1,546 M
Logic Transistors
300 M
2
Size
280 mm
Clock frequency
11.5 GHz
Chip
I/Os
3,840
Wiring levels (metals)
9 - 10
Voltage
0.8 - 1.0
Power
120-218 Watts
Supply current
160 Amps
Source ITRS 01 roadmap
3
Timing verification for Intel CPUs
  • Synchronous design style, mostly
  • Multiple synchronized clocks, GHz range
  • NO trend to asynchronous design in near future
  • Deep pipelining
  • Internal static timer Tango
  • Cell-based, using abstract models for custom
    blocks
  • Handles transparent latches and sequential
    transparent loops, both BFS and DFS timing
    propagation options
  • Generates and uses proprietary abstract timing
    model for hierarchical timing
  • At each level an abstract timing model can be
    created for next level
  • Typically 2-3 timing hierarchy levels
  • PathMill used at device-level, produces same
    abstract model

4
Whats under the hood ?
  • Handling transparent loops
  • False paths
  • Hierarchical Analysis
  • Shell models

5
Loops
clk
clk
clk
clk2
  • Combinational loops are disallowed
  • Local self-resetting circuitry may exist
  • Sequential loops exist
  • Formed by combinational paths and transparent
    latches
  • Actually form SCC (Strongly connected component),
    handled automatically
  • Typical for FSM implemented with Latches

6
False Paths
  • Manual marking of false paths, considered in
    timing analysis
  • Automatic SAT-based false paths
  • Work done with K.Sakallah U.Mich.
  • Applied in combinational logic

b0
c1
d0
e1
c0
7
Hierarchical Analysis
  • Cannot analyze full-chip at transistor or gate
    level
  • Huge data, impractical run-time
  • Abstract blocks as compact models
  • Hide internal details not relevant at chip level,
    assume pre-defined clocks
  • As accurate as possible electrical interface and
    timing model
  • Abstract model supports also timing transparency
    BLUE BOX

8
Shell Model
Core
  • Interface cells and interconnect are preserved
  • User may select deeper than 1 shell
  • User may expose some transparent latches
  • Balance core complexity versus amount of cells
    exposed in full-chip, Deep Shell Model
  • Cores are abstract timing models
  • Full-chip analysis uses shell models of blocks

9
Current and near-term challenges
  • CrossTalk impact on timing
  • Active interconnect
  • Mixed abstraction, device to full-chip
  • Use of domino as characterized cells
  • SoC challenges

10
CrossTalk impact on Timing
  • CrossTalk has noise and timing impact
  • Search for highest peak noise while
  • Victim transitions for timing
  • Victim stable for functional noise
  • CrossTalk timing effect may be approximated as a
    Miller Xcap multiplier (MCF), but
  • Default MCF may over or under-estimate effect
  • MCF is slope dependent, difficult to set upfront
  • AWE superposition gives good results but may be
    too costly to apply everywhere
  • Accuracy vs. run-time tradeoff is key
  • Timing filtering followed by local logic
    filtering
  • SMCF (smart MCF) or AWE-based peak
  • Timing iterations to converge CrossTtalk impact
  • Very active research in last few years !!

11
Fitting SMCF to experimental data
  • Physically MCF depends on LTvic/Tagg
  • Experimentally fitted with equation a-bexp(-L)

12
Active Interconnect
  • For quite some time interconnect is not
    negligible, now it becomes active !
  • Repeaters may be buffers, inverters, latches,
    flops
  • Virtual (early design) or real repeaters
  • Interconnect may be
  • Simple wire
  • Buffered (inverted or not)
  • Pipelined (and buffered)
  • Pipelining the interconnect is considered
    simultaneously in RTL, Floor Plan and early
    timing
  • Mutual Inductance impact being assessed
  • Asynchronous long-distance on-chip communication ?

13
Mixed Abstraction
  • Layout becomes more cell-basedbut circuit
    families in cells are more complex
  • Some circuits may be characterized as cells, some
    may require device-level analysis
  • Fluid cells device-level optimization
  • Comprehend devices, cells and abstract models in
    same run
  • Single timing graph
  • May need on-the-fly dynamic analysis on parts of
    circuit
  • Use circuit recognition capabilities
  • Requires stimuli generation
  • More detailed waves, not only slope
  • Sophisticated timing checks for domino
  • Propagate also pulses not only arrival time

14
Mixed-level Timing
  • Cell, abstracts and devices co-exist at analysis
    level
  • Choose flexible abstraction/accuracy trade-off

Mixed device/cells/abstracts
15
Domino characterization
  • Regular or footless domino as characterized cells
  • Will be supported in cell-based timing
  • Additional domino latches, etc
  • Delay similar to static cells and latches
  • Checks are more complex !!next page

keeper
clk
keeper
clk
output
Domino node
Domino node
output
inputs
inputs
Domino And2
Footless And2
See Van Campenhout, Sakallah, Mudge paper 1999
16
Pulse Width Checks
  • Need sufficiently wide pulse at domino node
  • Ensure pulse width to next stage
  • Ensure feedback can hold data
  • Modeling issues
  • Slopes of inputs
  • Pulse width per discharge path
  • Translating inputs intersection into pulse at
    domino node
  • Dis-allowing min-transparency converts pulse
    width to setup check
  • Non-transparency hold check

17
SoC challenges
  • Multi-core CPUs or high-integration SoC
  • New integration level in all areas RTL, timing,
    layout, testing etc
  • Timing challenges
  • New level of hierarchical timing, more need for
    functionality aware timing, better abstract
    models
  • Optimize interfaces without core re-design
  • Integrative approach, zoom-in from abstract to
    detailed in same environment
  • Multiple clocks, possibly asynchronous to each
    other
  • Inter-module communication, protocols, early spec
    and accurate verification
  • More in-die variation, instances of same module
    may operate at different Vcc/temperature etc

18
Mid-term challenges
  • MIS Multiple Input Switching
  • Process and environment variability
  • Voltage and Temperature
  • Process variability
  • Timing challenges due to leakage reduction
    techniques
  • Sleep transistors usage methodology and support
    in timing

19
MIS Multiple Input Switching
  • More MIS situations as frequency increases
  • Less stages in clock cycle
  • Slope steepness increases slower than frequency
  • Broad range of effects
  • Single stage well known
  • Impact across stages more subtle
  • Load stage may present different effective load
    due to Miller coupling
  • Either slow-down or speed-up
  • Holding side input by real driver versus ideal
    voltage has accuracy impact
  • Characterization/modeling issues

20
One gate slow-down/ speed-up
39.7 speedup
Single input switches
21
Two gates, Fanout pull-in
  • c with a or b or both MIS
  • Miller coupling c,o
  • Position dependent
  • No generic model

o
o2
o2
miller coupling, droop causes speedup on o
single input switching o
15.6 speedup
mitigate with legging, pushing down stack if only
one signal critical
22
Fanout Signal Location
  • c with a, b or both MIS
  • Either speedup or pushout based on connection
  • connected to pin a -15.6 to 12.6 variation
  • connected to pin b -0.8 to 0.3 variation

o
o2
23
MIS Modeling issues
  • Not so easy to model in CBD (Cell-Based Design)
  • Min/Max timing window provides a range of
    switching times
  • Window overlap of two inputs allows MIS but
    doesnt guarantee it
  • Assuming full MIS leads to over-design
  • Most important to check MIS effect on min-delay
    which may lead to chip failure
  • Max delay MIS may only reduce operating frequency
  • Possibly consider max-delay MIS as random
    variable over overlap window
  • Easier to consider MIS in BFS timing propagation

24
Process and Environment Variability
  • Both deterministic and random variation
  • The absolute ? of CD does not decrease at same
    pace as channel length
  • Thus relative value of L and Vt variation
    increases
  • Lower voltages, higher currents
  • Non-uniform Vdd on chip, consider ?Vdd in timing
  • Big drivers may starve neighbors
  • Are variations causing significant critical path
    re-ordering ?
  • Nominal timing is not good enough to accurately
    predict silicon
  • Worst-casing all effects reduces design space or
    makes design impossible
  • Consider chip map for deterministic variations
  • Need statistical approach in STA for random
    effects

25
Reducing leakage power
  • Most important for mobile and internet servers,
    as important as speed !
  • Standby leakage
  • power consumed when whole chip is idle, Tj is NOT
    high (Spec temp. for mobile at 50C)
  • impact on battery life for portable devices
  • Active leakage
  • power consumed due to device leakage when chip is
    working, and Tj is high (110C)
  • Subthreshold and Gate leakage significantly
    higher
  • impact on overall chip thermal design power and
    frequency
  • PtotPswitch Pleak,,

26
Leakage Gating with Sleep Transistor
  • Leakage is a main concern below 90nm
  • Partition the chip to allow individual control of
    the sleep transistors
  • Sleep transistor is on while the block is working
  • Sleep transistor is off while the block is idle

27
Sleep transistors in timing
  • Difficult to comprehend in STA
  • Many cells share same virtual ground through one
    sleep transistor (legged/distributed in reality)
  • Voltage of virtual ground depends on current
    drawn by all active gates on same sleep
    transistor
  • Need to guarantee max/min voltage on virtual
    ground
  • How to verify statically min/max GND voltage
  • Need cell models and interaction models for cells
    on different virtual ground
  • Logic grouping, by time of common switching
  • Estimate current needed in worst case
  • Lack of support in timing tools is main limiting
    factor for using this technique

28
Summary
  • STA is a key component of chip design
  • New VDSM and high frequency challenges
  • Hierarchical models cope with full chip
    complexity
  • Electrical interaction across logical hierarchy
    boundaries
  • CrossTalk, MIS, variability and more phenomena
    need efficient solutions
  • Will require more dynamic device-level analysis
    within static timing tools
  • Closer interaction with Logic/Satisfiability

29
Contributors
  • Noel Menezes
  • Florentin Dartu
  • Ken Stevens
  • Vladi Tsipenyuk
  • Uri First
  • Igor Keller
  • Abhijit Dharchoudhury
Write a Comment
User Comments (0)
About PowerShow.com