Variation Tolerant Analog and Digital Design Methodologies - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Variation Tolerant Analog and Digital Design Methodologies

Description:

Regular and restricted design rule (RDR) logic fabrics ... 2/3rd Annular. s = 0.75. Pileggi 6. www.c2s2.org. Example: Gridded M1 Patterns. Pattern B ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 52
Provided by: rut128
Category:

less

Transcript and Presenter's Notes

Title: Variation Tolerant Analog and Digital Design Methodologies


1
Variation Tolerant Analog and Digital Design
Methodologies
  • Larry Pileggi
  • Carnegie Mellon
  • pileggi_at_ece.cmu.edu

2
Preview
  • Controlling the dominant (systematic) variations
  • Regular and restricted design rule (RDR) logic
    fabrics
  • Methodologies and circuits that are optimized for
    regular fabrics
  • Analog/RF regularity
  • Stochastic design methods for random variations
  • Modeling circuit-level variability
  • SRAM statistical modeling and design
  • Analog/RF stochastic design

3
Sub-65nm CMOS Challenges
  • Design and manufacturing costs are now
    prohibitive
  • Printability limited bysub-wavelength
    lithography
  • Standard layout rules become insufficient
  • First eliminate systematic variability, then
    address random variability

4
Gridded RDRs
  • Return to the past? ? gridded, fixed pitch
    layouts
  • The translation of stick-layouts to gds2 patterns
    dictates the required rules and layout density

5
Example Gridded M1 Patterns
  • Example rules for contacts/vias at line-ends
  • Which micro-regular pattern is more
    manufacturable?

Pattern B
Pattern A
100nm spacing
6
Example Gridded M1 Patterns
  • Exploiting the slightly tighter line-ends with
    pattern B can improve area particularly with
    gridded layout
  • Relies on ability to characterize all possible
    patterns

Pattern B
Desired Process Window
7
Reduced Number of Patterns
  • Micro-regular patterning can reduce number of
    unique patterns and reduce systematic variability
  • But the macro-regularity, or the way we group
    subtle patterns can be equally important
  • At what node do we benefit from limiting both
    micro-regular patterns and macro-regular
    groupings?
  • How can we best utilize this regularity?

8
Macro-Regular Predictability
  • Ex SRAM-layout specific SPICE models are
    required for design closure of CMOS SRAMs
  • Statistical transistor models (90nm) based on
    all possible patterns produce a much wider
    noise margin distribution

s 0.060
s 0.026
SRAM-layout-specific models
DR-compliant-layout models
9
Regularity Simplifies Rules Qualification
  • Standard design rules created for worst case
    SRAM rules created for specific patterns
  • Rules can be simplified (pushed) for regular
    patterns with knownneighborhoods
  • If we can pre-qualify all regular patterns, there
    is less need to pre-qualify all logic cells
  • Can now derive application-domain-specific logic
    for improved logic efficiency and density
  • Requires methodology based on micro- and
    macro-regularity

10
Logic Bricks for Macro-Regularity
  • To control the number of geometry patterns that
    must be pre-qualified we can implement logic from
    larger cells (bricks)
  • Reduces the number of edge patterns

11
CMU Experimental Brick Flow
12
ARM926EJ Example
  • 65nm Low Power CMOS
  • Std Cell Spec Design
  • 16KB D cache, 32KB I cache
  • 250MHz worst case
  • Area 1.1323 mm2
  • Bricks derived from7 fixed-size primitives
  • AO22, AO12, Nand2, Nor2, And2, 41 Mux
  • 3 Flip Flop types
  • Various INV sizes for buffering
  • 16 fixed-size application-specific bricks
  • Compatible boundary INVs, NANDs and NORs
  • Identical to std cell footprint
  • 25 man-weeks of design time
  • First-pass working silicon

16KB D cache 32KB I cache 250MHz worst case Area
1.1323 mm2
UnidirectionalMicroRegular Fabric
13
ARM926EJ Results
  • Std cells based on sizing and resynthesis using
    complete library
  • Results do not reflect improved control of
    variations, or possible improvement with
    brick-specific synthesis and design flow
  • Normalized Leff comparison based on ACLV
    simulations at nominal process conditions for DFF
    cell vs. brick

14
Regular Bricks vs. Non-Regular Std Cells
  • DR compliant, unidirectional FEOL pattern bricks
    incur 15-75 area penalty vs. non-regular
    standard cells at 65nm
  • Simple patterns allow pushed line-end rules for
    area improvement
  • Can merge diffusions within large brick functions

1.2
Normalizedto non-regularpattern std cells
Normalizedto non-regularpattern std cells
1.1
1.0
Pushed-Rule Bricks
0.9
Normalized Area
0.8
0.7
0.6
0.5
0
1
6
8
9
11
12
14
15
17
18
Brick Index
15
Regular Bricks vs. Non-Regular Std Cells
  • DR compliant, unidirectional FEOL pattern bricks
    incur 15-75 area penalty vs. non-regular
    standard cells
  • Simple patterns allow pushed line-end rules for
    area improvement
  • Can merge diffusions within large brick functions
  • Transistor-level optimization (TLO) of large
    brick functions offers further improvement

Normalizedto non-regularpattern std cells
Pushed-Rule Bricks
Bricks w/ Manual TLO
16
Baking Bricks
  • Can derive application-specific bricks that are
    constructed from pre-qualified regular patterns
  • TLO can provide significant improvement for large
    non-traditional logic functions
  • Some very efficient transistor-level
    implementations are possible for certain
    application-domain specific designs

Example Brick F ab bc bd acd Synthesis
with a standard cell library requires 18
transistors vs. 10 for this implementation
17
Mapping to Bricks
  • Difficult to add all possible large logic
    functions to standard cell libraries technology
    mapping algorithms would struggle

Function ABC(DEFG) 20 transistors (2 AO22)4
stages of logic
Function ABC(DEFG) 16 transistors2 stages of
logic
18
Logic/Pattern Co-Optimization
Gate-based Micro-Regular Gridded Layout
Micro-Regular Layoutof Gridded TLO Brick
19
Application-Specific TLO
  • Can also attempt to extract functions that are
    particularly efficient for a specific fabric or
    application
  • Example b0 p0p1 p2p3 p1p2p4 p0p3p4

12 transistors, 2 logic stages
26 transistors, 4 logic stages
20
Logic BRIX
  • Greater advantages below 65nm as methodologies
    and mapping algorithms accommodate bricks
  • Beta version of commercial flow has demonstrated
    the benefits of pre-qualified patterns and TLO

Courtesy of PDF Solutions pdBRIX
21
Analog and Mixed-Signal
  • Same lithography setup must work for analog and
    mixed-signal components (SRAM)
  • SRAM has always been macro-regular, now becoming
    more micro-regular
  • Analog layout has always been regular to control
    systematic mismatch
  • Random variations now become dominant

22
Random Variations
  • Random variations most prominent for min-sized
    FETs
  • E.g. Line edge roughness is most dominant for
    min length FET
  • Wider FETs reduce variation via Central Limit
    Thereom

W
W0
45
50
55
45
50
55
L (nm)
Distribution of DL variation
Distribution of avg. length
23
Stochastic Design SRAMs
  • SRAM timing is determined by small FETs in
    bit-cells

BL
BL
_
Core Cell
WL
Core Cell
Core Cell
Column Mux
Replica path
SA
SAEN
Waveforms sampled from90nm CMOS low-swing
bitlineSRAM testchip (in collaboration with
Prof Ken Mai, CMU)
OUT
OUT
_
24
Replica Bitline (RBL)
  • Conventional RBL chooses a fixed number of driver
    cells to partially average out the randomness
  • Increasingly difficult as random mismatch becomes
    more dominant

25
Configurable Replica Bitline (CRBL)
  • Instead select a subset of potential driver cells
    (post-manufacturing) that best average out
    randomness

26
Configurable Replica Bitline (CRBL)
  • Post manufacturing selection provided a 100ps
    tuning range using 3 cells selected from 10
    candidates in 90nm testchip
  • Randomness provides for wider tuning range

100ps
27
RBL vs. CRBL
  • Simulations of read path for a commercial 65nm
    SRAM design

Replica Path
Replica Path
Global Only ? 0.91
Global Local ? 0.41
Read Path
Read Path
RBL vs. CRBL (3 of 5 cells)
RBL vs. CRBL (3 of 10 cells)
RBL Delay w/o mismatch
RBL Delay w/o mismatch
Configurable RBL Delay w/ mismatch
Configurable RBL Delay w/ mismatch
28
Capturing the System Level Impact
  • Build statistical response surface models (RSMs)
    to compare and optimize designs
  • Example SRAM self-timing
  • Self-timing circuit must track bitcell delay
  • Self-timing delay is part of READ delay
  • Buffer chain (BUF)
  • Insensitive to intra-die variations
  • Poor tracking of inter-die and environmental
    variations
  • Replica bitline (RBL)
  • Better tracking for inter-die and environmental
    variations
  • More sensitive to mismatch
  • Configurable Replica bitline (C-RBL)

29
Monte Carlo (Statistical) Analysis
  • Monte Carlo analysis
  • Randomly select M samples for e1, e2,...
  • Evaluate circuit performance at each sampling
    point
  • Estimate performance distribution using the M
    samples

model NMOS bsim4 typen tox 4e-9 1e-10?1
1.3e-10?2 ... vth0 0.6 0.24?1
0.3?2 ... ...
?1, ?2, ?3, ...
Simulator
Performance Distribution
30
Monte Carlo Samples
  • Applying MC at system level can be run-time
    costly
  • 1k 10k sampling points are typically required
    to achieve reasonable accuracy
  • Even with 10k sampling points, an accurate result
    is not guaranteed!
  • MC analysis is random, and you can be unlucky
    with samples (especially for results beyond the
    /-3 sigma range)
  • Controlling sampling points is often important,
    especially for circuits like SRAMs

31
Importance Sampling
  • Brute-force MC simulation is impractical for rare
    events
  • If Pr Performance lt SPEC lt 10-6, at least
    million samples required to observe this event in
    MC simulation
  • Idea Bias the random sample generation in such a
    way to observe rare events with a much smaller
    number of samples
  • Build a response surface model (RSM) to identify
    the failure space for MC analysis

32
Response Surface Modeling (RSM)
  • Approximate the performance of interest (e.g.,
    delay, power, gain, etc.) as an analytic
    function of process parameters
  • Can cover local variations of process parameters
    /-30
  • Use linear or quadratic functions to approximate
    the corresponding local variation
  • Fitting RSM to samples, then performing MC on
    RSM, can be more efficient than direct MC if the
    number of variables (N) is small
  • The number of sampling points must be equal to or
    greater than the number of unknown model
    coefficients to fit RSM
  • Linear RSM contains N 1 model coefficients
  • Quadratic RSM contains N(N1)/2 N 1 model
    coefficients
  • PWL RSMs stitched together can be used to cover a
    larger space

Local RSM, p(X)
33
Statistical Response Surface Models
  • Given set of correlated inter-die variations and
    set of spatially correlated intra-die variations,
    build statistical RSM
  • Fitted analytical performance model based on
    well-chosen simulation samples
  • Accuracy depends on model complexity Linear,
    quadratic, piecewise-linear,..
  • Include uniform distribution orcorner models for
    VDD andtemperature

34
Model Explosion
  • Statistical device models can be extremely
    complex
  • Over 300 random ?s (inter-die) for a 65nm
    process
  • Mismatch modeling can require 1020 additional
    ?s for every transistor
  • If the number of variables is large, first
    convert set of correlated random variables to
    independent set of random variables
  • Simple example

?VTH,NL and ? VTH,NR are correlated ?VTH,NL
y1y3 ? VTH,NR y2y3
35
PCA
  • Principal Component Analysis (PCA) does this in a
    generalized way for jointly normal random
    variables
  • Apply eigen decomposition to produce a new
    (possibly smaller) set of parameters that are
    uncorrelated
  • Similar to finding the orthogonal basis of a
    vector space

Dx correlated parameters Dy uncorrelated
parameters
36
Reducing Number of Variables
  • If some of the eigenvalues are small, they can be
    removed to reduce the random space dimension
  • Allows us to use a compact set of independent
    random variables to approximate the original
    high-dimensional space
  • Most large problems tend to be rank deficient

J. Friedman and W. Stuetzle, Projection pursuit
regression, Journal of the American Statistical
Association, vol. 76, no. 376, pp. 817-823,
1981 X. Li, J. Le, L. Pileggi and A. Strojwas,
"Projection-based performance modeling for
inter/intra-die variations," ICCAD, pp. 721-727,
2005
37
Statistical Modeling Example
  • Constructed PWL RSMs to compare designs
  • Buffer chain (BUF)
  • Replica bit line (RBL)
  • Configurable Replica bit-line (C-RBL)
  • Applied MC analysis to the region of most likely
    failures

38
M-C Simulation Results for 65nm CMOS
  • Comparison of self timing architectures (results
    based on 0.98 success rate at chosen frequency)

39
Optimizing Designs
  • Can we use RSMs to optimize the designs over
    local statistical parameter space?
  • Formally find the optimum set of design
    variables which minimizes a cost function and
    meets a set of specifications
  • Both the objective and the constraint become
    stochastic
  • Choice of optimization algorithm would depend on
    the objective and constraint functions

40
Example Sense Amp Optimization
  • Random offset impacts the self-timing of the READ
  • Build RSM of offset that is dominated by VTHn
    variations
  • Simulations suggest a linear relationship between
    offset and VTHn

65nm Latch type sense amp
Based on 1000 MC samples
41
Random Input Offset
  • There is little or no correlation for VTHp and
    other variables
  • Offset is less sensitive to these other
    variations even across different precharge
    voltages

65nm Latch type sense amp
Based on 1000 M-C samples
42
Optimization
  • Since dominant variation parameter shows linear
    relationship, a simple linear RSM model can be
    used to optimize sizing of NFETs (N1 and N2) and
    PFETs (P1 and P2)
  • Voffset aDiff(Vtn) bDiff(Vtp) c
  • Model has less than 3 error
  • Vtn and Vtp are incorporated as independent and
    Gaussian

43
Measurement Results
  • Large input offset voltage variation for 65nm
  • Optimized circuit 10 larger gate area, 25
    lower offset
  • Measured data based on 14K SAs from 20 different
    chips

44
Simulation Results
  • Comparison of simulation and measurement as a
    function of precharge voltage, Vpc
  • Optimized circuit has been desensitive to
    variations, including precharge

45
Pelgrom Model
  • It is well known Pelgrom that increasing the
    device sizes will tend to average out random
    variations
  • For random threshold variation, Pelgrom showed
    that the (uncorrelated) variance improvement is
    proportional to WL

Pelgrom et al, Matching Properties of MOS
Transistors, IEEE JSSC, vol 24, no. 5, Oct. 1989.
46
Results Comparison
  • How does Pelgrom model compare as a function of
    precharge
  • Accuracy of the model depends on the region of
    operation
  • Pelgrom model only applies if performance
    variation is dominated by mismatch of 2 xtors

47
Analog/RF Design in Scaled CMOS
  • As CMOS continues to scale, oversizing
    transistors can potentially cancel any benefit of
    moving to the next generation technology
  • Example Pelgrom modelanalysis of a 65nm
    differential pair
  • Mismatch improves slowly with increasing
    transistor size 1/sqrt(area)

48
Sizing via Selection of Elements
  • Start with regular fabric of analog
    sub-components but select only a subset of
    themfor precision matching
  • Ex open-loop amp for pipeline ADC mismatch in
    65nm CMOS
  • Select some (1/2) rather than all subcomponents
    to minimize offset

49
Post-Silicon Element Selection for Mismatch
  • Some circuit overhead required to implement
    post-silicon tuning
  • But with further scaling, post-silicon tuning
    might be the only way to meet specs and reap the
    benefits of next gen technology
  • Example Exponential vs. sqrt improvement
    (Pelgrom model)with area for 65nm open-loop
    amplifier

50
Conclusions
  • Regular patterning for logic, memory and analog
    becomes increasingly important below 65nm
  • New circuits and methodologies can exploit this
    regularity for improved performance
  • As systematic variations are better controlled,
    random variations will become dominant
  • Stochastic design methods will be needed to
    produce competitive chips
  • Configurable and tunable circuits will become
    more imperative particularly for analog and
    mixed-signal

51
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com