SOC Design: From System to Transistor - PowerPoint PPT Presentation

1 / 178
About This Presentation
Title:

SOC Design: From System to Transistor

Description:

port ( d0, d1, d2, en, clk : in bit; q0, q1, q2 : out bit ); end; ... port ( clk, reset : in bit; multiplicand, multiplier : in integer; product : out integer ) ... – PowerPoint PPT presentation

Number of Views:318
Avg rating:3.0/5.0
Slides: 179
Provided by: zoransta
Category:
Tags: soc | bit | design | system | transistor

less

Transcript and Presenter's Notes

Title: SOC Design: From System to Transistor


1
SOC Design From System to Transistor
Zoran Stamenkovic
2
Outline
  • Modeling Systems
  • Simulation and Verification
  • Analog Integrated Circuits
  • Digital Integrated Circuits
  • Embedded Memories
  • Logic Synthesis
  • Design for Testability
  • Layout Generation
  • Design for Manufacturability
  • SOC Example

3
Modeling Systems
  • Domains and Levels
  • ESL Design
  • Basics of HDL
  • Gate Modeling
  • Delay Modeling
  • Power Modeling
  • Effects of Parasitics
  • Logic Optimization

4
Domains and Levels
  • Open Systems Interconnection (OSI) model of
    network communication
  • Local area network (LAN) technologies are defined
    by standards that describe unique functions at
    both the Physical and the Data Link layers

5
Domains and Levels
  • 802.11 Wireless LAN modem
  • Modulates outgoing digital signals from a
    computer or other digital device to an analogue
    (radio) signal
  • Demodulates the incoming analogue (radio) signal
    and converts it to a digital signal for the
    digital device

6
MIMO and MIMAX WLAN Modems
Domains and Levels
Signal processing performed in the analogue RF
domain Number of the digital basebands reduced to
a single one
Signal processing performed in the digital
baseband
7
Domains and Levels
8
Behavioral Domain
9
Structural Domain
10
Physical Domain
11
Electronic System Level Design
  • The point of a system level model is to capture
    the intent of the design
  • Design does exactly what it is defined to do, and
    the model is the definition of what the design
    does
  • It allows software developers to test their code
    on a working model
  • The value of system level modeling is in helping
    us to understand the implications of our intent
  • To explore responses to the stimulus in an useful
    way
  • ESL Languages
  • UML, SystemC, SystemVerilog
  • ESL Verification
  • No amount of experimentation can ever prove me
    right a single experiment can prove me wrong
    Albert Einstein
  • The system level testbench languages and
    methodologies that exist today are woefully
    inadequate
  • If one tries to capture enough information in ESL
    to verify RTL, then one might as well write RTL

12
Electronic System Level Design
  • The environment that provides models of
    memories, connectors, and queues that can be
    interconnected with configured processors into an
    overall system model
  • Processor and device interfaces are at the
    transaction level
  • Transaction-level modeling requests for SOC
    architecture assembly and simulation tools
  • If RTL IP blocks present, HW/SW co-verification
    tools needed

13
Electronic System Level Design
14
Hardware Description Languages
  • Motivation for HDL
  • Increased hardware complexity
  • Design space exploration
  • Inexpensive alternative to prototyping
  • General features
  • Support for describing circuit connectivity
  • High-level programming language support for
    describing behavior
  • Support for timing information (constraints,
    etc.)
  • Support for concurrency
  • VHDL
  • IEEE Standard 1076-1987
  • IEEE Standard 1076-1993
  • Extension VHDL-AMS-1999
  • Verilog
  • IEEE Standard 1364-1995
  • IEEE Standard 1364-2000

15
Modeling Interfaces
  • Entity (VHDL) or Module (Verilog) declaration
  • Describes the input/output ports of a module

16
Modeling Behavior
  • Architecture Body (VHDL)
  • Describes an implementation of an entity
  • May be several per entity
  • Module (Verilog)
  • Is unique
  • Behavioral Architecture
  • Describes the algorithm performed by the module
  • Contains
  • Procedural Statements, each containing
  • Sequential Statements, including
  • Assignment Statements and
  • Wait Statements

17
Behavior Example
entity reg3 is port ( d0, d1, d2, en, clk in
bit q0, q1, q2 out bit )end architectur
e behav of reg3 isbegin process ( d0, d1, d2,
en, clk ) begin if en '1' and clk '1'
then q0 lt d0 after 5 ns q1 lt d1 after
5 ns q2 lt d2 after 5 ns end if end
process end
timescale 1ns/10ps module reg3 ( d0, d1, d2, en,
clk, q0, q1, q2 ) input d0, d1, d2, en,
clk output q0, q1, q2 reg q0, q1, q2 always
_at_ ( d0 or d1 or d2 or en or clk ) if ( en
clk ) begin q0 lt 5 d0
q1 lt 5 d1 q2 lt 5 d2
end endmodule
VHDL
Verilog
18
Modeling Structure
  • Structural Architecture
  • Implements the module as a composition of
    components
  • Contains
  • Signal Declarations (entity ports are also
    signals)
  • Declare internal connections
  • Component Instances
  • Instantiate previously declared
    entity/architecture pairs
  • Port Maps in component instances
  • Connect signals to component ports
  • Wait Statements
  • Suspend a process or procedure

19
Structure Example
20
Structure Example
21
Structure Example
22
Mixing Behavior and Structure
  • An architecture can contain both behavioral and
    structural parts
  • Process Statements and Component Instances
  • Collectively called Concurrent Statements
  • Processes can read and assign to signals
  • Example Register-transfer-language model
  • Data-path described structurally
  • Control section described behaviorally

23
Mixed Example
24
Mixed Example
entity multiplier is port ( clk, reset in
bit multiplicand, multiplier in
integer product out integer
)end architecture mixed of multiplier
is signal partial_product, full_product
integer signal arith_control, result_en,
mult_bit, mult_load bit begin arith_unit
entity work.shift_adder(behavior) port map (
addend gt multiplicand, augend gt
full_product, sum gt partial_product, ad
d_control gt arith_control ) result entity
work.reg(behavior) port map ( d gt
partial_product, q gt full_product, en gt
result_en, reset gt reset ) ...
25
Mixed Example
multiplier_sr entity work.shift_reg(behavior
) port map ( d gt multiplier, q gt
mult_bit, load gt mult_load, clk gt clk
) product lt full_product control_section
process is -- variable declarations for
control_section -- begin -- sequential
statements to assign values to control
signals -- wait on clk, reset end process
control_section end
26
Logic Functions
  • Function
  • f ab ab a is a variable, a and a are
    literals, ab is a term
  • Irredundant Function
  • No literal can be removed without changing its
    value
  • Implementing logic functions is non-trivial
  • No logic gates in the library for all logic
    expressions
  • A logic expression may map into gates that
    consume a lot of area, time, or power
  • A set of functions f1, f2, ... is complete if
    every Boolean function can be generated by a
    combination of the functions from the set
  • NAND is a complete set
  • NOR is a complete set
  • AND and OR are not complete
  • Transmission gates are not complete
  • Incomplete set of logic gates
  • No way to design arbitrary logic

27
Inverter
28
Inverter
29
Switches
  • Complementary switch produces full-supply
    voltages for both logic 0 and logic 1
  • n-type transistor conducts logic 0
  • p-type transistor conducts logic 1

30
NAND Gate
31
NOR Gate
32
AOI/OAI Gates
  • AOI and/or/invert
  • OAI or/and/invert
  • Implement larger functions
  • Pull-up and pull-down networks are compact
  • Smaller area, higher speed than NAND/NOR network
    equivalents
  • AOI312
  • And 3 inputs
  • And 1 input (dummy)
  • And 2 inputs
  • Or together these terms
  • Invert

out abc
33
Logic Levels
  • Solid logic 0/1 defined by VSS/VDD
  • Inner bounds of logic values VL/VH are not
    directly determined by circuit properties, as in
    some other logic families
  • Levels at output of one gate must be sufficient
    to drive next gate

34
Inverter Transfer Curve
  • Choose threshold voltages at points where slope
    of transfer curve is -1
  • Inverter has
  • High gain between VIL and VIH points
  • Low gain at outer regions of transfer curve
  • Note that logic 0 and 1 regions are not equally
    sized
  • In this case, high pull-up resistance leads to
    smaller logic 1 range
  • Noise margins are VDD-VIH and VIL-VSS
  • Noise must exceed noise margin to make second
    gate produce wrong output

35
Inverter Delay
  • Only one transistor is on at the time
  • Rise time (pull-up on)
  • Fall time (pull-up off)
  • Resistor model of transistor
  • Ignores saturation region
  • Mischaracterizes linear region
  • Gives acceptable results

36
RC Model for Delay
  • Delay
  • Time required for gates output to reach 50 of
    final value
  • Transition time
  • Time required for gates output to reach 10
    (logic 0) or 90 (logic 1) of final value
  • Gate delay based on RC time constant
  • Vout(t) VDD exp-t/(RnRL)CL
  • td 0.69 RnCL
  • tf 2.3 RnCL
  • 0.5 mm process
  • Rn 3.9 kW
  • CL 0.68 fF
  • td 0.69 x 3.9 x .68E-15 1.8 ps
  • tf 2.3 x 3.9 x .68E-15 6.1 ps
  • For pull-up time, use pull-up resistance
  • Current source model (in power/delay studies)
  • tf CL (VDD-VSS)/0.5 k (W/L) (VDD-VSS -Vt)2
  • Fitted model
  • Fit curve to measured circuit characteristics

37
Step Input (VGS VDD) Approximation
38
Body Effect
  • Source voltage of gates in middle of network may
    not equal substrate voltage
  • Difference between source and substrate voltages
    causes body effect
  • To minimize body effect
  • Put early arriving signals at transistors closest
    to power supply

39
Power Consumption
  • Clock frequency
  • f 1/t
  • Energy
  • E CL(VDD - VSS)2
  • Power
  • E x f f CL(VDD - VSS)2
  • Almost all power consumption comes from switching
    behavior
  • A single cycle requires one charge and one
    discharge of capacitor
  • Static power dissipation
  • Comes from leakage currents
  • Surprising result
  • Resistance of the pull-up/pull-down transistor
    drops out of energy calculation
  • Power consumption is independent of the sizes of
    the pull-up and pull-down transistors
  • Static CMOS power-delay product is independent of
    frequency
  • Voltage scaling depends on this fact

40
Effects of Parasitics
  • Capacitance on power supply is not bad
  • Can be good in absence of inductance
  • Resistance slows down static gates
  • May cause pseudo-nMOS circuits to fail
  • Increasing capacitance/resistance
  • Reduces input slope
  • Resistance near source is more damaging
  • It must charge more capacitance

41
Optimal Sizing
  • Sometimes, large loads must be driven
  • Off-chip or by long wires on-chip
  • Sizing up the driver transistors only pushes back
    the problem
  • Driver now presents larger capacitance to earlier
    stage
  • Use a chain of inverters
  • Each stage has transistors larger than previous
    stage
  • a is the driver size ratio, Cbig/Cd an,
    ln(Cbig/Cd) n lna
  • Minimize total delay through the driver chain
  • ttot ln(Cbig/Cd)(a/lna)td
  • Optimal driver size ratio is aopt e
  • Optimal number of stages is nopt ln(Cbig/Cd)

42
Driving Large Fan-Out
  • Fan-out adds capacitance
  • Increase sizes of driver transistors
  • Must take into account rules for driving large
    loads
  • Add intermediate buffers
  • This may require/allow restructuring of the logic

43
Path Delay
  • Network delay is measured over paths through
    network
  • Can trace a causality chain from inputs to
    worst-case output
  • Critical path creates longest delay
  • Can trace transitions which cause delays that are
    elements of the critical path delay
  • To reduce circuit delay, speed up the critical
    path
  • Reducing delay off the path doesnt help
  • There may be more than one path of the same delay
  • Must speed up all equivalent paths to speed up
    circuit

44
False Paths
  • Logic gates are not simple nodes
  • Some input changes dont cause output changes
  • A false path is a path which cannot be exercised
    due to Boolean gate conditions
  • False paths cause pessimistic delay estimates

45
Logic Transformations
  • Rewrite by using sub-expressions
  • Logic rewrites may affect gate placement
  • Flattening logic
  • Increases gate fan-in
  • Logic synthesis programs
  • Transform Boolean expressions into logic gate
    networks in a particular library

Deep Logic
Shallow Logic
46
Logic Optimization
  • Optimization goals
  • Minimize area, meet delay constraint
  • Technology-independent optimization
  • Works on Boolean expression equivalent
  • Estimates size based on number of literals
  • Uses factorization, resubstitution, minimization,
    etc.
  • Uses simple delay models
  • Technology-dependent optimization
  • Maps Boolean expressions into a particular cell
    library
  • May perform some optimizations on addition to
    simple mapping
  • Allows more accurate delay models

47
Simulation and Verification
  • Simulation
  • Verification
  • Annotation

48
Simulation
  • Simulation
  • Tests the functionality of a designs elaborated
    model
  • Needs a test bench and a simulation tool
  • Advances in discrete time steps
  • Test Bench
  • Includes an instance of the design under test
  • Applies sequences of test values to inputs
  • Monitors signal values on outputs using simulator
  • Simulation Tools
  • NCSIM (Cadence)
  • VSIM (Mentor Graphics)
  • VCS (Synopsys)

49
Event-Driven Simulation
  • Event-driven simulation is designed for digital
    circuit characteristics
  • Small number of signal values
  • Relatively sparse activity over time
  • Event-driven simulators try to update only those
    signals which change in order to reduce CPU time
    requirements
  • An event is a change in a signal value
  • A time-wheel is a queue of events
  • Simulator traces structure of circuit to
    determine causality of events
  • Event at input of one gate may cause new event at
    gates output

50
Switch Simulation
  • Special type of event-driven simulation optimized
    for MOS transistors
  • Treats the transistor as a switch
  • Takes capacitance into account to model charge
    sharing
  • Can also be enhanced to model the transistor as a
    resistive switch

51
Test Bench Example
entity test_bench isend architecture test_reg3
of test_bench is signal d0, d1, d2, en, clk, q0,
q1, q2 bit begin dut entity
work.reg3(behav) port map ( d0, d1, d2, en,
clk, q0, q1, q2 ) stimulus process
is begin d0 lt 1 d1 lt 1 d2 lt 1
wait for 20 ns en lt 0 clk lt 0 wait
for 20 ns en lt 1 wait for 20 ns clk lt
1 wait for 20 ns d0 lt 0 d1 lt 0
d2 lt 0 wait for 20 ns wait end
process stimulus end
52
Verification
  • To test a refinement of a design
  • Low-level structural model must be functionally
    the same as a corresponding behavioral model
  • To include two instances of a design in the test
    bench
  • To stimulate both with same test values on inputs
  • To compare values of outputs for equality
  • To take account of timing differences
  • Zero delay
  • Unit delay
  • Gate delay
  • RC delay

53
Verification Example
architecture regression of test_bench is signal
d0, d1, d2, d3, en, clk bit signal q0a, q1a,
q2a, q3a, q0b, q1b, q2b, q3b bit begin dut_a
entity work.reg4(struct) port map ( d0, d1,
d2, d3, en, clk, q0a, q1a, q2a, q3a ) dut_b
entity work.reg4(behav) port map ( d0, d1, d2,
d3, en, clk, q0b, q1b, q2b, q3b ) stimulus
process is begin d0 lt 1 d1 lt 1 d2 lt
1 d3 lt 1 wait for 20 ns en lt 0
clk lt 0 wait for 20 ns en lt 1 wait
for 20 ns clk lt 1 wait for 20
ns wait end process stimulus ...
54
Verification Example
verify process is begin wait for 10
ns assert q0a q0b and q1a q1b and q2a
q2b and q3a q3b report implementations have
different outputs severity error wait on
d0, d1, d2, d3, en, clk end process verify end
architecture regression
55
Annotation
  • Standard Delay Format (SDF) annotation
  • Design timing is stored in an SDF file
  • Used to iteratively improve design
  • Updates a more-abstract design with information
    from later design stages
  • Annotation of logic schematic with extracted
    parasitic resistances and capacitances
  • Back annotation requires tools to know more about
    each other
  • Simulation tools
  • Synthesis tools
  • Layout tools

56
Standard Delay Format
(CELL (CELLTYPE "exnor2_1") (INSTANCE
i_aes_wr/U_ALG/U6533) (DELAY (ABSOLUTE
(IOPATH a x (0.6621.0451.045)
(0.6821.0761.076)) (IOPATH b x
(1.3791.4161.416) (1.4541.4921.492)) )
) ) ... (CELL (CELLTYPE "mux2_2") (INSTANCE
i_mips/u0/ejt_tap\/pa_addr_reg_next\/bit_00i/U1)
(DELAY (ABSOLUTE (IOPATH d0 x
(0.3950.3950.395) (0.4640.4640.464))
(IOPATH d1 x (0.3870.4030.403)
(0.4470.4770.477)) (IOPATH sl x
(1.7681.7811.781) (1.8791.8921.892)) )
) ) )
  • (DELAYFILE
  • (SDFVERSION "OVI 1.0")
  • (DESIGN "tcp_1_chip")
  • (DATE "Fri Apr 30 094822 2004")
  • (VENDOR "cdr3synPwcslV225T125")
  • (PROGRAM "Synopsys Design Compiler cmos")
  • (VERSION "2003.06")
  • (DIVIDER /)
  • (VOLTAGE 2.252.252.25)
  • (PROCESS)
  • (TEMPERATURE 125.00125.00125.00)
  • (TIMESCALE 1ns)
  • (CELL
  • (CELLTYPE "tcp_1_chip")
  • (INSTANCE)
  • (DELAY
  • (ABSOLUTE
  • (INTERCONNECT U5/x U81/a (0.0000.0000.000))
  • (INTERCONNECT U73/x U74/a (0.0000.0000.000))

57
Analog Integrated Circuits
  • Filters
  • Amplifiers
  • Phase Lock Loop
  • Voltage Control Oscillator
  • Modulator/Demodulator

58
Fairchild Semiconductor µA741 Op-Amp
  • In 1963, a 26-year-old engineer named Robert
    Widlar designed the first monolithic op-amp IC,
    the µA702
  • Price at the beginning was 300
  • Fairchild and competitors have sold it in the
    hundreds of millions
  • Now, for 300 you can get about a thousand of
    todays 741 chips

59
Signetics NE555 Timer
  • A simple IC from 1971 that could function as a
    timer or an oscillator
  • It would become a best seller in analog
    semiconductors
  • Kitchen appliances
  • Toys
  • Spacecraft
  • A few thousand other things
  • Many billions have been sold

60
Intersil ICL8038 Waveform Generator
  • A generator of sine, square, triangular,
    sawtooth, and pulse waveforms from 1983
  • Countless applications
  • Music synthesizers
  • Blue boxes
  • Hundreds of millions sold
  • Intersil discontinued the production in 2002

61
LNA in BiCMOS Technology
62
PLL for 802.11a WLAN
63
Oscillator
64
Modulator
65
Digital Integrated Circuits
  • Adders
  • Multipliers
  • Shifters
  • Carry Units
  • Arithmetic-Logic Units

66
Full Adder
  • Computes one-bit sum and carry
  • si ai ? bi ? cin
  • cout aibi aici bicin
  • Ripple-carry adder n-bit adder built from full
    adders
  • Delay of ripple-carry adder goes through all
    carry bits

67
Combinational Multiplier
  • 0 1 1 0 multiplicand
  • x 1 0 0 1 multiplier
  • 0 1 1 0
  • 0 0 0 0
  • 0 0 1 1 0
  • 0 0 0 0
  • 0 0 0 1 1 0
  • 0 1 1 0
  • 0 1 1 0 1 1 0

68
Array Multiplier
  • Array multiplier is an efficient layout of a
    combinational multiplier
  • Array multipliers may be pipelined to decrease
    clock period at the expense of latency

69
Wallace Tree
  • Reduces depth of adder chain
  • Built from carry-save adders
  • Three inputs a, b, c
  • Produces two outputs y, z
  • y z a b c
  • Carry-save equations
  • yi parity (ai,bi,ci)
  • zi majority (ai,bi,ci)
  • At each stage, i numbers are combined to form
    2i/3-sums
  • Final adder completes the summation
  • Wiring is more complex

70
Serial-Parallel Multiplier
  • Used in serial-arithmetic operations
  • Multiplicand can be held in place by register
  • Multiplier is shifted into array

71
Barrel Shifter
  • Can perform n-bit shifts in a single cycle
  • Accepts 2n data inputs and n control signals,
    producing n data outputs
  • Selects arbitrary contiguous n bits out of 2n
    input buts
  • Examples
  • Right shift data into top, 0 into bottom
  • Left shift 0 into top, data into bottom
  • Rotate data into top and bottom

72
Barrel Shifter
  • Two-dimensional array of 2n vertical X n
    horizontal cells
  • Input data travels diagonally upward
  • Output wires travel horizontally
  • Control signals run vertically
  • Exactly one control signal is set to 1, turning
    on all transmission gates in that column
  • Large number of cells, but each one is small
  • Delay is large, considering long wires and
    transmission gates

73
Carry-Lookahead Unit
  • First computes carry propagate and generate
  • Pi ai bi
  • Gi aibi
  • Computes sum and carry from P and G
  • si ci ? Pi ? Gi
  • ci1 Gi Pici
  • Can recursively expand carry formula
  • ci1 Gi Pi(Gi-1 Pi-1ci-1)
  • ci1 Gi PiGi-1 PiPi-1 (Gi-2 Pi-1ci-2)
  • Expanded formula does not depend on intermediate
    carries
  • Allows carry for each bit to be computed
    independently

74
Depth-4 Carry-Lookahead Unit
  • Deepest carry expansion requires gates with large
    fan-in
  • Large and slow
  • Carry-lookahead unit requires complex wiring
    between adders and lookahead unit
  • Values must be routed back from lookahead unit to
    adder

75
Carry-Skip Adder
  • Looks for cases in which carry out of a set of
    bits is identical to carry in
  • Typically organized into m-bit stages
  • If ai bi for every bit in stage, then bypass
    gate sends stages carry input directly to carry
    output

76
Carry-Select Adder
  • Computes two results in parallel, each for
    different carry input assumptions
  • Uses actual carry in to select correct result
  • Reduces delay to multiplexer

77
Manchester Carry Chain
  • Precharged carry chain which uses P and G signals
  • Propagate signal connects adjacent carry bits
  • Generate signal discharges carry bit
  • Worst-case discharge path goes through entire
    carry chain

78
Serial Adder
  • May be used in signal-processing arithmetic where
    fast computation is important but latency is
    unimportant
  • LSB control signal clears the carry shift register

79
Arithmetic-Logic Unit
  • Computes a variety of logical and arithmetic
    functions based on opcode
  • May offer complete set of functions of two
    variables or a subset
  • Built around adder, since carry chain determines
    delay
  • Function block may be used to compute required
    intermediate signals for a full-function ALU
  • Transmission gates may introduce significant delay

80
Arithmetic-Logic Unit
  • P and G compute intermediate values from inputs
  • May not correspond to carry lookahead P and G for
    non-addition functions
  • Add unit is adder of choice
  • Output unit computes from sum, propagate signal

81
Acorn Computers ARM1 Processor
  • 32-bit RISC microprocessor from 1985
  • The simplicity made all the difference
  • Small, low power, and easy to program
  • ARM architecture has become the dominant embedded
    processor
  • More than 10 billion ARM cores have been used in
    all sorts of gadgetry, including the iPhone

82
Computer Cowboys Sh-Boom Processor
  • Russell Fish and Chuck Moore 1988 found a way to
    have the processor run its own super fast
    internal clock while still staying synchronized
    with the rest of the computer
  • In the years since Sh-Booms invention, the speed
    of processors had by far surpassed that of
    motherboards, and so practically every maker of
    computers and consumer electronics wound up using
    the same solution
  • Since 2006, Patriot Scientific (and Moore) have
    reaped over US 125 million in licensing fees
    from Intel, AMD, Sony, Olympus, and others

83
8-bit Microprocessors
  • Microchip Technology PIC16C84 8-bit
    microcontroller in 1993
  • Incorporates EEPROM
  • Does not need UV light to be erased as EPROM needs
  • Radiation-hardened RCA CDP 1802 8-bit
    microprocessor in 1976
  • One of the first, if not the first, CMOS
    processors
  • Low power consumption, wide range of operating
    voltages and military operating temperature range

84
Embedded Memories
  • Read-Only Memory
  • Static Random-Access Memory
  • Dynamic Random-Access Memory
  • Memory Generators

85
Memory Architecture
  • Address is divided into row and column
  • Row may contain full word or more than one word
  • Selected row drives/senses bit lines in columns
  • Amplifiers/drivers read/write bit lines

86
Read-Only Memory (ROM)
  • ROM core is organized as an array of NOR gates
  • Pull-down transistors of NOR determine
    programming
  • Erasable ROMs require special processing that is
    not typically available
  • ROMs on digital ICs are generally mask-programmed
  • Placement of pull-downs determines ROM contents

87
Static Random-Access Memory (SRAM)
  • Core cell uses six-transistor circuit to store
    value
  • Value is stored symmetrically
  • Both true and complement are stored on
    cross-coupled transistors
  • SRAM retains value as long as power is applied to
    circuit
  • Read
  • Precharge bit and bit high
  • Set select line high from row decoder
  • One bit line will be pulled down
  • Write
  • Set bit/bit to desired (complementary) values
  • Set select line high
  • Drive on bit lines will flip state if necessary

88
SRAM Sense Amplifier
  • Differential pair
  • Takes advantage of complementarity of bit lines
  • One bit line goes low
  • One arm of diff pair reduces its current, causing
    compensating increase in current of another arm
  • Sense amp can be cross-coupled to increase speed

89
Dynamic Random-Access Memory (DRAM)
  • Cell can easily be made with a CMOS digital
    technology process
  • Dynamic RAM loses value due to charge leakage
  • Must be refreshed
  • Value is stored on gate capacitance of transistor
    t1
  • Read
  • read 1, write 0, read_data is precharged
  • t1 will pull down read_data if 1 is stored
  • Write
  • read 0, write 1, write_data value
  • Guard transistor writes value onto gate
    capacitance
  • Modern commercial DRAMs use one-transistor cell

90
Toshiba NAND Flash Memory
  • In 1980, Fujio Masuoka recruited four engineers
    to a project aimed at designing a memory chip
    that could store lots of data and would be
    affordable
  • Team came up with a variation of EEPROM that
    featured a memory cell consisting of a single
    transistor (at the time, conventional EEPROM
    needed two transistors per cell)
  • Why is it named flash?
  • Because of the chips ultrafast erasing
    capability
  • In 1984 Masuoka presented a paper at the IEEE
    International Electron Devices Meeting
  • In 1988 Intel developed a type of flash based on
    NOR logic gates (a 256-kilobit chip)
  • Toshibas first NAND flash (greater storage
    densities but trickier to manufacture) hit the
    market in 1989

91
Memory Generators
  • A software tool which can create memories (ROM or
    RAM blocks) in a range of sizes as needed
  • The customer usually wants a particular number of
    words (depth) and bits (width) for each memory
    ordered
  • Each of the final building blocks (physical
    layout) will be implemented as a stand-alone,
    densely packed, pitch-matched array
  • Complex layout generators and state-of-the-art
    logic and circuit design techniques offer
  • Embedded memories of extreme density and
    performance
  • Each memory generator is a set of various,
    parameterized generators
  • Layout generator generates an array of custom,
    pitch-matched leaf cells
  • Schematic generator and Net-lister extracts a
    net-list used for both layout vs. schematic and
    functional verification
  • Function and Timing model generators create
    models for gate level simulation, dynamic/static
    timing analysis and synthesis
  • Symbol generator generates schematic
  • Critical Path generator is used for both circuit
    design and timing characterization

92
Logic Synthesis
  • Logic Synthesis Flow
  • Optimization
  • Technology Mapping
  • Low-Power Techniques

93
Logic Synthesis Flow
  • Goal is to create a logic gate network which
    performs a given set of functions
  • Input is Boolean formulae
  • Output is gates implementing Boolean functions
  • Several iterations needed for generation of the
    optimized gate-level description
  • Logic synthesis
  • Maps onto available gates
  • Restructures for delay, area, testability, power,
    etc.
  • Automated logic synthesis has enabled
  • Enormous reduction of the time needed for
    conversion of a design from high-level to
    gate-level description
  • Saving of designer resources for architectural
    and RTL descriptions, and optimization of the
    standard cell library

94
High-Level Synthesis
  • Scheduling determines
  • Number of clock cycles required
  • As-soon-as-possible (ASAP) schedule puts every
    operation as early in time as possible
  • As-late-as-possible (ALAP) schedule puts every
    operation as late in schedule as possible
  • Binding determines
  • Area and cycle time
  • Area tradeoffs must consider
  • Shared function units vs. multiplexers and
    control
  • Delay tradeoffs must consider
  • Cycle time vs. number of cycles

95
Logic Synthesis Phases
  • Technology-independent optimizations
  • A Boolean network is the main representation of
    the logic functions
  • Each node can be represented as sum-of-products
    (or product-of-sums)
  • Functions in the network need not correspond to
    logic gates
  • Technology mapping (library binding)
  • Design transformation from technology-independent
    to technology-dependent
  • Technology-dependent optimizations
  • Work in the available set of logic gates

96
Technology-Independent Optimization
  • Area is estimated by number of literals
  • Literal is true or complement form of a variable
  • Simplification
  • Rewrites a node to reduce the number of literals
    in it
  • Network restructuring
  • Introduces new nodes for common factors
  • Collapses several nodes into one new node
  • Delay restructuring
  • Changes factorization to reduce path length

97
Covers and Cubes
  • Function is defined by
  • On-set set of inputs for which output is 1
  • Off-set set of inputs for which output is 0
  • Dont-care-set set of inputs for which output is
    dont-care
  • Each way to write a function as a sum-of-products
    is a cover
  • It covers the on-set
  • A cover is composed of cubes
  • Cubes are product terms that define a subspace
    cube in the function space

98
Covers and Optimizations
  • Larger cover
  • x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3
  • Requires four cubes (12 literals)
  • Smaller cover
  • x2 x3 x1 x3 x1 x2 x3
  • Requires three cubes (7 literals)
  • x1 x2 x3 is covered by two cubes
  • Dont-cares
  • Can be implemented in either on-set or off-set
  • Provide the greatest opportunities for
    minimization in many cases
  • Espresso
  • A two-level logic optimizer
  • Expands, makes irredundant and reduces
  • Optimization loop refines cover to reduce its size

99
Factorization
  • Based on division
  • Formulate candidate divisor
  • Test how it divides into the function
  • If g f/c, we can use c as an intermediate
    function
  • Algebraic division
  • Doesnt take into account Boolean simplification
  • Less expensive then Boolean division
  • Three steps
  • Generate potential common factors and compute
    literal savings if used
  • Choose factors to substitute into network
  • Restructure the network to use the new factors
  • Algebraic/Boolean division is used to implement
    first step

100
Technology Mapping
  • Rewrites Boolean network
  • In terms of available logic functions
  • Optimizes for
  • Area
  • Delay
  • Can be viewed as a pattern matching problem
  • Find pattern match which minimizes area/delay
    cost
  • Procedure
  • Write Boolean network in canonical NAND form
  • Write each library gate in canonical NAND form
  • Assign cost to each library gate
  • Use dynamic programming to select minimum-cost
    cover of network by library gates

101
Breaking into Trees
not optimal, but reasonable cuts usually work well
102
Mapping Example
after three levels of matching
103
Mapping Example
after four levels of matching
104
Low Power Techniques
  • Architecture-driven supply voltage scaling
  • Add extra logic to increase parallelism so that
    system can run at lower frequency
  • Power improvement for n parallel units over Vref
  • Pn(n) 1 Ci(n)/nCref Cx(n)/Cref(V/Vref)
  • Dynamic voltage and frequency scaling
  • Decreased to parts of the circuit where it does
    not adversely affect the performance
  • Dynamic scaling is regulated by software based on
    system load
  • Reducing capacitances
  • Parasitic capacitances of the transistors
  • Parasitic capacitances of the wires

105
Low Power Techniques
  • Reducing switching activity
  • Deactivate the clock to unused registers (clock
    gating)
  • Deactivate signals if not used (signal gating)
  • Deactivate VDD for unused hardware blocks (power
    gating)
  • Distributed clocks Globally Asynchronous Locally
    Synchronous
  • Eliminating centrally synchronous clocks and
    utilizing local clocks
  • Distinct local clocks, possibly running at
    different frequencies

106
Design for Testability
  • DFT Methods
  • Scan Design
  • Test Pattern Generation
  • Built-In Self-Test

107
Design for Testability Methods
  • Make the system as testable as possible
  • Keep minimum cost in hardware and testing time
  • Use knowledge of architecture to help in
    selection of testability points
  • Modify architecture to improve testability
  • DFT for digital circuits
  • Ad-hoc methods
  • Avoid asynchronous feedback
  • Make flip-flops initializable
  • Avoid redundant gates, large fan-in gates and
    gated clocks
  • Provide test control for difficult-to-control
    signals
  • Consider ATE requirements (tri-states, etc.)
  • Structured methods
  • Scan Design
  • Built-in self-test (BIST)
  • Boundary scan

108
Scan Design
  • Circuit is designed using pre-specified design
    rules
  • Test structure (hardware) is added to the
    verified design
  • Add a test control (TC) primary input
  • Replace flip-flops by scan flip-flops (SFF) and
    connect to form one or more shift registers
    (scan-chains) in the test mode
  • Make input/output of each scan-chain
    controllable/observable from primary
    input/primary output
  • Use combinational ATPG to obtain tests for all
    testable faults in the combinational logic
  • Add shift register tests and convert ATPG tests
    into scan sequences for use in manufacturing test
  • Full scan is expensive
  • Must roll out and roll in state many times during
    a set of tests
  • Partial scan selects some registers (not all) for
    scanability to reduce the chain length
  • Analysis is required to choose which registers
    are best for scan

109
Scanable Flip-Flop
110
Level-Sensitive Scanable Flip-Flop
111
Scan Structure
112
Combinational Test Vectors
113
Testing Scan Chain
  • Scan-chain must be tested prior to application of
    scan test sequences
  • A shift sequence 00110011 . . . of length nsff4
    in scan mode (TC0)
  • Produces 00, 01, 11 and 10 transitions in all
    flip-flops
  • Observes the result at SCANOUT output
  • Total scan test length
  • (ncomb 2) nsff ncomb 4 clock periods
  • Example
  • 2,000 scan flip-flops, 500 comb. vectors, total
    scan test length 106 clocks
  • Multiple scan-chains reduce test length

114
Testing and Faults
  • Errors are introduced during manufacturing
  • Testing weeds out infant mortality
  • Varieties of testing
  • Functional testing
  • Performance testing
  • Fault model
  • Possible locations of faults
  • I/O behavior produced by the fault
  • With a fault model, we can test the network for
    every possible instantiation of that type of
    fault
  • It is difficult to enumerate all types of
    manufacturing faults
  • Testing procedure
  • Set inputs
  • Observe output
  • Compare fault-free and observed output

115
Stuck-At-0/1 Faults
  • Logic gate output is always stuck at 0 or 1
    independently on input values
  • Correspondence to manufacturing defects depends
    on logic family
  • Experiments show that 100 stuck-at-0/1 fault
    coverage corresponds to high overall fault
    coverage
  • Testing NAND
  • Three ways to test it for stuck-at-0
  • Only one way to test it for stuck-at-1
  • Testing NOR
  • Three ways to test it for stuck-at-1
  • Only one way to test it for stuck-at-0

116
Multiple Test Example
  • Can test both NANDs for stuck-at-0 simultaneously
  • abc 000
  • Cannot test both NANDs for stuck-at-1
    simultaneously due to inverter
  • Must use two vectors
  • Must also test inverter

117
Stuck-At-Open/Closed Model
  • Transistors always on/off
  • t1 is stuck open (switch cannot be closed)
  • No path from VDD to output capacitance
  • Testing requires two cycles
  • Must discharge capacitor
  • Try to operate t1 to charge capacitor

118
Combinational Testing Example
  • Two parts of testing
  • Controlling the inputs of (possibly interior)
    gates
  • Observing the outputs of (possibly interior)
    gates
  • Delay faults
  • Gate delay model assumes that all delays are
    lumped into one gate
  • Path delay model takes into account the delay of
    a path through network
  • Performance problems
  • Functional problems in some types of circuits

119
Testing Procedure
  • Goal
  • Test gate D for stuck-at-0 fault
  • First step
  • Justify 0 values on gate inputs
  • Work backward from gate to primary inputs
  • w1 0 (A output 0)
  • i1 i2 1
  • Observe the fault at a primary output
  • o1 gives different values if D is true/faulty
  • Work forward and backward
  • Fs other input must be 0 to detect true/fault
  • Justify 0 at Es output
  • In general, may have to propagate fault through
    multiple levels of logic to primary outputs

120
Redundancy and Testing
  • Redundant logic can mask faults
  • Testing NOR for SA0 requires setting both inputs
    to 0
  • Network topology ensures that one NOR input (for
    instance b) will always be 1
  • Function reduces to 0
  • f ((ab) b) (a b)b 0
  • Redundant logic can introduce delay faults and
    other problems

121
Sequential Testing
  • Much harder than combinational testing
  • Cant set memory element values directly
  • Must apply sequences
  • To put machine in proper state for test
  • To observe value of test
  • Testing of NAND for stuck-at-1
  • Set both NAND inputs to 1
  • Primary input i1 can be controlled directly
  • Lower input is 1 if ps0/ps1 1

122
Time-Frame Expansion
  • A model for sequential test
  • Unroll machine in time
  • A single-stuck-at fault in sequential machine
    appears to be the multiple-stuck-at fault

123
Test Pattern Generation
  • Automatic test pattern generator (ATPG) generates
    a set of test vectors
  • Boolean network (combinational ATPG)
  • Sequential machine (sequential ATPG)
  • D (from Discrepancy) allows us to quickly write
    fault
  • D value on a node means that good and faulty
    circuits have different values at that point
  • If a test for a particular fault exists,
    D-algorithm will find it by an exhaustive search
    of all sensitized paths
  • Start at the faulty gate
  • Suppose initially a stuck-at fault on gate output
  • Primitive D-cube of failure (PDCF) of gate
    summarizes minimal assignment of input values to
    highlight fault
  • Propagation D-cube (PDC) has D or D on output
    and on at least one input
  • Summarizes non-controlling values for other
    inputs to allow propagation of D signal

124
PODEM Algorithm
  • PODEM stands for Path-Oriented DEcision Making
  • Circuit-based, fault-oriented ATPG algorithm
  • Goal
  • Propagate D value to primary outputs
  • Signal values are explicitly assigned at primary
    inputs only
  • Other values are computed by implication
  • Backtracking means reassigning primary inputs
    when a contradiction occurs
  • Uses implicit enumeration
  • Uses five values 0, 1, D, D, and X
  • Start all values at X
  • In worst case, must examine all possible inputs
  • Can be implemented to run quickly

125
Fault Propagation Example
126
Built-In Self-Test (BIST)
  • Includes on-chip machine responsible for
  • Generating tests
  • Evaluating correctness of tests
  • Allows many tests to be applied
  • Cant afford large memory for test results
  • Rely on compression and statistical analysis
  • Uses a linear-feedback shift register (LFSR) to
    generate a pseudo-random sequence of bit vectors

127
BIST Architecture
  • One LFSR generates test sequence
  • Another LFSR captures/compresses results
  • Can store a small number of signatures which
    contain expected compressed results for valid
    system
  • Usually used for testing memory blocks

128
Layout Generation
  • Layout Generation Flow
  • Design Rules
  • Layout Tools
  • Standard Cells
  • Floorplanning
  • Placement
  • Routing
  • Clock Tree
  • Pads

129
Layout Generation Flow
  • Library Exchange Format (LEF) files
  • To create a library database (standard cells, I/O
    cells, and macro blocks)
  • Timing Library Format (TLF) file
  • Timing constraints
  • General Constraints Format (GCF) file
  • Design constraints
  • Verilog net-list
  • To create a design database

130
Layout Generation Flow
  • Floorplanning
  • To create a core area with rows (or columns) and
    I/O rows around the core area
  • Power planning and routing
  • To plan, modify and rout power paths, power rings
    and power stripes
  • Placement
  • An I/O constraints file may be used to place the
    I/O pads
  • Block placement
  • Cell placement
  • Size adjustment
  • To estimate the die size
  • To resize the design to make it routable

131
Layout Generation Flow
  • Generating clock trees
  • The clock buffer space and clock net must be
    defined
  • Generating clock trees is iterative process
  • At this point, the physical net-list differ from
    the logical (original) net-list
  • Placement optimization
  • To resize gates and insert buffers to correct
    timing and electrical violations
  • Routing
  • To perform both global and final route on a
    placed design
  • Verification
  • To check for shorts and design rule violations

132
Design Rules
  • Masks are tools for manufacturing
  • Manufacturing processes have inherent limitations
    in accuracy
  • Design rules specify geometry of masks which will
    provide reasonable yields
  • Design rules are determined by experience
  • MOSIS SCMOS
  • Designed to scale across a wide range of
    technologies
  • Designed to support multiple vendors
  • Designed for educational use
  • Fairly conservative
  • Lambda (?) design rules
  • Size of a minimum feature defines ?
  • Specifying ? particularizes the scalable rules
  • Parasitics are generally not specified in ??units

133
Wires
134
Transistors
135
Vias
  • Types of via
  • Metal1/diff
  • Metal1/poly
  • Metal2/metal1
  • Metal3/metal2
  • ...
  • Highest via
  • Cut 3 x 3
  • Overlap by metal2 1
  • Minimum spacing 3
  • Minimum spacing to via1 2

136
Spacings
  • Diffusion/diffusion
  • 3
  • Poly/poly
  • 2
  • Poly/diffusion
  • 1
  • Via/via
  • 2
  • Metal1/metal1
  • 3
  • Metal2/metal2
  • 4
  • Metal3/metal3
  • 4

137
Overglass
  • Cut in passivation layer
  • Connection for bonding wire
  • Minimum bonding pad
  • 100
  • Pad overlap of glass opening
  • 6
  • Minimum pad spacing to unrelated metal2/3
  • 30
  • Minimum pad spacing to unrelated metal1, poly,
    active
  • 15

138
Layout Tools
  • Layout editors are interactive tools
  • Design rule checkers identify errors on the
    layout
  • Circuit extractors extract the net-list from the
    layout
  • Connectivity verification systems (CVS) compare
    extracted and original net-lists
  • CADENCE Virtuosos Layout-versus-Schematic (LVS)
    tool
  • Standard cell layouts are created from
    pre-designed cells using the custom routing
  • Silicon Ensemble (CADENCE)
  • Encounter (CADENCE)
  • Physical Compiler (SYNOPSYS)

139
Standard Cell Layout
  • Layout made of small cells
  • Gates, flip-flops, etc.
  • Cells are hand-designed
  • Assembly of cells is automatic
  • Cells arranged in rows
  • Wires routed between and through cells
  • Pitch is the height of a cell
  • All cells have same pitch, may have different
    widths
  • VDD/VSS connections are designed to run through
    cells
  • A feedthrough area allows wires to be routed over
    the cell

140
Floorplanning Strategy
  • Floorplanning must take into account
  • Blocks of varying function, size, and shape
  • Space allocation
  • Signal routing
  • Power supply routing
  • Clock distribution

141
Floorplanning Tips
  • Develop a wiring plan
  • Think about how layers will be used to distribute
    important wires
  • Draw separate wiring plans for power and clocking
  • These are important design tasks which should be
    tackled early
  • Sweep small components into larger blocks
  • A floorplan with a single NAND gate in the middle
    will be hard to work with
  • Design wiring that looks simple
  • If it looks complicated, it is complicated
  • Design planar wiring
  • Planarity is the essence of simplicity
  • Do it where feasible (and where it doesnt
    introduce unacceptable delay)

142
Placement Metrics
  • Placement of components interacts with routing of
    wires
  • Quality metrics for layout
  • Area and delay
  • Area and delay determined in part by
  • Wiring
  • How do we judge a placement without wiring?
  • Estimate wire length without actually performing
    routing

bad placement
good placement
143
Placement Techniques
  • To construct an initial solution
  • To improve an existing solution
  • Pairwise interchange is a simple improvement
    metric
  • Interchange a pair, keep the swap if it helps
    wire length
  • Heuristic determines which two components to swap
  • Placement by partitioning
  • Works well for components of fairly uniform size
  • Partition net-list to minimize total wire length
    using min-cut criterion
  • Kernighan-Lin Algorithm
  • Computes min-cut criterion, count total net-cut
    change
  • Exchanges sets of nodes to perform hill-climbing
    finding improvements where no single swap will
    improve the cut
  • Recursively subdivide to determine placement
    detail

144
Routing
  • Major phases in routing
  • Global routing assigns nets to routing areas
  • Detailed routing designs the routing areas
  • Net ordering determines quality of result
  • Net ordering is a heuristic
  • Blocks and wiring
  • Blocks divide wiring area into routing channels
  • Large wiring areas may force rearrangement of
    block placement
  • Channel routing
  • Channel grows in one dimension to accommodate
    wires
  • Pins generally on only two sides
  • Switchbox routing
  • Box cannot grow in any dimension
  • Pins are on all four sides

145
Routing Channels
  • Tracks form a grid for routing
  • Spacing between tracks is center-to-center
    distance between wires
  • Track spacing depends on wire layer used
  • Density (vertical and horizontal)
  • Gives the number of wire segments crossing a
    vertical/horizontal grid segment
  • Different layers are used for horizontal and
    vertical wires
  • Horizontal and vertical wires can be routed
    relatively independently
  • Placement of cells determines placement of pins
  • Pin placement determines difficulty of routing
    problem

146
Left-Edge Algorithm
  • Assumes one horizontal segment per net
  • Sweep pins from left to right
  • Assign horizontal segment to lowest available
    track
  • Limitations
  • Some combinations of nets require more than one
    horizontal segment per net (a dog-leg wire)
  • Aligned pins form vertical constraints
  • Wire to lower pin must be on lower track
  • Wire to upper pin must be above lower pins wire

147
Global and Detailed Routing
  • Global routing
  • Assign wires to paths through channels
  • Dont worry about exact routing of wires within
    channel
  • Can estimate channel height using congestion
  • Detailed routing
  • Dog-leg router breaks net into m
Write a Comment
User Comments (0)
About PowerShow.com