Title: EECS 150 - Components and Design Techniques for Digital Systems Lec 27
1EECS 150 - Components and Design Techniques for
Digital Systems Lec 27 Summary
(whirlwind)12-9-04
- David Culler
- Electrical Engineering and Computer Sciences
- University of California, Berkeley
- http//www.eecs.berkeley.edu/culler
- http//www-inst.eecs.berkeley.edu/cs150
2Background
3Course Content
- Components and Design Techniques for Digital
Systems - Synchronous Digital Hardware Systems
- Synchronous Clocked - all changes in the
system are controlled by a global clock and
happen at the same time (not asynchronous) - Digital All inputs/outputs and internal values
(signals) take on discrete values (not analog).
4Trick you into building an extreme project
- FPGA/SDRAM provides full game logic
- Court, obstructions
- Moving paddles
- Moving, colliding ball
- All the physics
- Court displayed to NTSC (TV) Video Output
- Real time Sound effects ???
- N64 controller (and switches) for input
- How to make it multiplayer?
- The network
5Levels of Digital Design
6What makes Digital Systems tick?
Combinational Logic
clk
time
What determines the systems performance?
7The 150 stuff
- Building blocks of computer systems
- ICs (Chips), PCBs, Chassis, Cables Connectors
- CMOS Transistors
- Voltage controlled switches
- Complementary forms (nmos, pmos)
- Logic gates from CMOS transistors
- Logic gates implement particular boolean
functions - N inputs, 1 output
- Serial and parallel switches
- Dual structure
- P-type pull up transmit 1
- N-type
- Complex gates mux
- Synchronous Sequential Elements
- D FlipFlops
8Combinational Logic (CL) Defined
- yi fi(x0 , . . . . , xn-1), where x, y are
0,1. - Y is a function of only X.
- If we change X, Y will change
- immediately (well almost!).
- There is an implementation dependent delay from X
to Y.
9Transistor-level Logic Circuits - NAND
- NAND gate
- Logic Function
- out 0 iff both a AND b 1 therefore out
(ab) - pFET network and nFET network are duals of one
another.
nand (out, a, b)
How about AND gate?
10Combinational logic summary
- Logic functions, truth tables, and switches
- NOT, AND, OR, NAND, NOR, XOR, . . ., minimal set
- Axioms and theorems of Boolean algebra
- Proofs by re-writing and perfect induction
- Gate logic
- Networks of Boolean functions and their time
behavior - Canonical forms
- Two-level and incompletely specified functions
- Optimization
- Two-level simplification using K-maps
- Automation of simplification
- Multi-level logic
- Later
- Design case studies
- Time behavior
11Transistor-level Logic Circuits - Latch
- Positive Level-sensitive latch
- Transistor Level
D FlipFlop
- Positive Edge-triggered flip-flop built from two
level-sensitive latches
clk
clk
clk
clk
12D Flip-Flop
- Make S and R complements of each other in Master
stage - Eliminates 1s catching problem
- Input only needs to settle by clock edge
- Can't just hold previous value (must have new
value ready every clock period) - Value of D just before clock goes low is what is
stored in flip-flop - Can make R-S flip-flop by adding logic to make D
S R' Q
10 gates
13Timing Methodologies
- Rules for interconnecting components and clocks
- Guarantee proper operation of system when
strictly followed - Approach depends on building blocks used for
memory elements - Focus on systems with edge-triggered flip-flops
- Found in programmable logic devices
- Many custom integrated circuits focus on
level-sensitive latches - Basic rules for correct timing
- (1) Correct inputs, with respect to time, are
provided to the flip-flops - (2) No flip-flop changes state more than once per
clocking event
14Timing Methodologies (contd)
- Definition of terms
- clock periodic event, causes state of memory
element to change can be rising or falling edge,
or high or low level - setup time minimum time before the clocking
event by which the input must be stable (Tsu) - hold time minimum time after the clocking event
until which the input must remain stable (Th)
data
clock
there is a timing "window" around the clocking
event during which the input must remain stable
and unchanged in order to be recognized
changing
stable
data
clock
15Whats an FSM?
- Next state is function of state and input
- Moore Machine output is a function of the state
- Mealy Machine output is a function of state and
input
inputA
State / output
inputB
inputA/outputA
State
inputB/outputB
16Formal Design Process for FSMs
Logic equations from table OUT PS NS PS xor
IN
- Review of Design Steps
- 1. Circuit functional specification
- 2. State Transition Diagram
- 3. Symbolic State Transition Table
- 4. Encoded State Transition Table
- 5. Derive Logic Equations
- 6. Circuit Diagram
- FFs for state
- CL for NS and OUT
- Circuit Diagram
- XOR gate for ns calculation
- DFF to hold present state
- no logic needed for output
17Composing FSMs into larger designs
FSM
FSM
CL
CL
18Sequential Synchronous Elements
- Basic registers
- Common control, MUXes
- Simple, important FSMs
- simple internal feedback
- Ring counters, Pattern detectors
- Binary Counters
- Universal Shift Register
- Using Counters to build controllers
- Simplify control by controlling simpler FSM
19150 and the changing times
- Advancing technology changes the trade-offs and
design techniques - 2x transistors per chip every 18 months
- ASIC, Programmable Logic, Microprocessor
- Programmable logic invests chip real-estate to
reduce design time time to market - FPGA
- programmable interconnect,
- configurable logic blocks
- LUT storage
- Block RAM
- IO Blocks
- PLAs
- General devices for SoP or PoS logic
20Virtex-E Configurable Logic Block (CLB)
- CLB 4 logic cells (LC) in two slices
- LC 4-input function generator, carry logic,
storage elet - 80 x 120 CLB array on 2000E
FF or latch
16x1 synchronous RAM
21HDLs
- Basic Idea
- Language constructs describe circuits with two
basic forms - Structural descriptions similar to hierarchical
netlist. - Behavioral descriptions use higher-level
constructs (similar to conventional programming). - Originally designed to help in abstraction and
simulation. - Now logic synthesis tools exist to
automatically convert from behavioral
descriptions to gate netlist. - Greatly improves designer productivity.
- However, this may lead you to falsely believe
that hardware design can be reduced to writing
programs!
- Structural example
- Decoder(output x0,x1,x2,x3
- inputs a,b)
-
- wire abar, bbar
- inv(bbar, b)
- inv(abar, a)
- nand(x0, abar, bbar)
- nand(x1, abar, b )
- nand(x2, a, bbar)
- nand(x3, a, b )
-
- Behavioral example
- Decoder(output x0,x1,x2,x3
- inputs a,b)
-
- case a b
- 00 x0 x1 x2 x3 0x0
- 01 x0 x1 x2 x3 0x2
22Finite State Machines in Verilog
Mealy outputs
Moore outputs
next state
combinational logic
inputs
combinational logic
current state
23Design Methodology in Detail
Postsynthesis Design Validation
Design Specification
Postsynthesis Timing Verification
Design Partition
Design Entry Behavioral Modeling
Test Generation and Fault Simulation
Simulation/Functional Verification
Cell Placement/Scan Insertation/Routing
Verify Physical and Electrical Rules
Design Integration And Verification
Synthesize and Map Gate-level Net List
Pre-Synthesis Sign-Off
Design Sign-Off
Synthesize and Map Gate-level Net List
24Configuring CLBs
out
25Configuring Routes
0
0
111
1
1
A
0
0
1
1
A
1
1
1
1
A
2
FF
1
000
A
A
A
2
1
0
in
nextstate A2 xor A1
out (A1 A2 A3)
26Timing for Synchronous Circuits
- In general, for correct operation
- for all paths.
- How do we enumerate all paths?
- Any circuit input or register output to any
register input or circuit output. - setup time for circuit outputs depends on what
it connects to - clk-Q time for circuit inputs depends on from
where it comes.
T ? time(clk?Q) time(CL) time(setup) T ?
?clk?Q ?CL ?setup
27Typical SRAM Organization 16-word x 4-bit
Din 0
Din 1
Din 2
Din 3
WrEn
A0
Word 0
SRAM Cell
SRAM Cell
SRAM Cell
SRAM Cell
A1
A2
Address Decoder
Word 1
SRAM Cell
SRAM Cell
SRAM Cell
SRAM Cell
A3
Word 15
SRAM Cell
SRAM Cell
SRAM Cell
SRAM Cell
Dout 0
Dout 1
Dout 2
Dout 3
28Classical DRAM Organization (Square)
bit (data) lines
r o w d e c o d e r
Each intersection represents a 1-T DRAM Cell
RAM Cell Array
Square keeps the wires short Power and speed
advantages Less RC, faster precharge
anddischarge is faster access time!
word (row) select
Column Address
Column Selector I/O Circuits
row address
- Row and Column Address together select 1 bit a
time
data
29DRAM with Column buffer
R O W D E C O D E R
11
A0A10
(2,048 x 2,048)
Storage
W
ord Line
Cell
Sense
Amps
Column Latches
MUX
Pull column into fast buffer storage Access
sequence of bit from there
30Digital Arithmetic
- Circuit design for unsigned addition
- Full adder per bit slice
- Delay limited by Carry Propagation
- Ripple is algorithmically slow, but wires are
short - Carry select
- Simple, resource-intensive
- Excellent layout
- Carry look-ahead
- Excellent asymptotic behavior
- Great at the board level, but wire length effects
are significant on chip - Digital number systems
- How to represent negative numbers
- Simple operations
- Clean algorithmic properties
- 2s complement is most widely used
- Circuit for unsigned arithmetic
- Subtract by complement and carry in
- Overflow when cin xor cout of sign-bit is 1
312s Complement Adder/Subtractor
A - B A (-B) A B 1
32Digital design - as weve seen it
System specification (in words)
Datapath specification
Controller specification
FSM generation
Comb. logic operations
Verilog dataflow
STT / STD / Encoding
Logic nextstate/outputs
Gates / LUTs
Verilog behavior
Gates / LUTs / FF
33Final Example Ant Brain (Ward, MIT)
- Sensors L and R antennae, 1 if in touching
wall - Actuators F - forward step, TL/TR - turn
left/right slightly - Goal find way out of maze
- Strategy keep the wall on the right
34Serial Line TX/RX dealing with I/O
35The GAME
- CP1 N64 interface
- CP2 Digital video encoder
- CP3 SDRAM controller
- CP4 IEEE 802.15.4 (cc2420) interface
- Project CP game engine
- Endgame
composite video
ADV7194
8
ITU 601/656
FPGA
Video Encode
SDRAM
Control
Render Engine
SDRAM Control
Data
player-1 input
32
Game Physics
player-0 input
Joystick Interface
N64 controller interface
36Computer Organization
- Computer design as an application of digital
logic design procedures - Computer processing unit memory system
- Processing unit control datapath
- Control finite state machine
- Inputs machine instruction, datapath conditions
- Outputs register transfer control signals, ALU
operation codes - Instruction interpretation instruction fetch,
decode, execute - Datapath functional units registers
interconnect - Functional units ALU, multipliers, dividers,
etc. - Registers program counter, shifters, storage
registers - Interconenct busses and wires
- Instruction Interpreter vs Fixed Function Device
37Design hierarchy
system
control
data-path
coderegisters
stateregisters
combinationallogic
multiplexer
comparator
register
logic
switchingnetworks
38Datapath vs Control
Datapath
Controller
Control Points
- Datapath Storage, FU, interconnect sufficient to
perform the desired functions - Inputs are Control Points
- Outputs are signals
- Controller State machine to orchestrate
operation on the data path - Based on desired function and signals
39Datapath Design
- Datapath consists of state (reg, reg file),
function units (adders, ALUs), and interconnect
(mux, tri-state bus) - It can perform certain register transfers source
regs through function units and interconnect to
dest reg - Set of reg. Transfers occur on each cycle
- Each datapath element has control points
- Reg (LD), FU (op), MUX (sel), TriState (OE)
- Controller asserts the proper control point to
cause the data path to carryout the requested
register transfers - The RTLs associated with each step in the high
level algorithm determine the STD of the
contoller - Controller inputs are datapath outputs
(conditions) - Controller outputs are datapath inputs (control
points)
40Array Multiplier
Generates all n partial products simultaneously.
Each row n-bit adder with AND gates
What is the critical path?
41Shift and Add Multiplier
- Sums each partial product, one at a time.
- In binary, each partial product is shifted
versions of A or 0.
- Control Algorithm
- 1. P ? 0, A ? multiplicand,
- B ? multiplier
- 2. If LSB of B1 then add A to P
- else add 0
- 3. Shift PB right 1
- 4. Repeat steps 2 and 3 n-1 times.
- 5. PB has product.
- Cost ? n, ? n clock cycles.
- What is the critical path for determining the min
clock period?
42DIVIDE HARDWARE Version 2
- 32-bit Divisor register, 32-bit ALU, 64-bit
Remainder register, 32-bit Quotient register
Divisor
32 bits
Shift Left
Quotient
32 bits
add/sub
Shift Left
Remainder
Control
Write
64 bits
43Register Transfers - interconnect
- Point-to-point connection
- Dedicated wires
- Muxes on inputs ofeach register
- Common input from multiplexer
- Load enablesfor each register
- Control signalsfor multiplexer
- Common bus with output enables
- Output enables and loadenables for each register
44Register Transfer Level Descriptions
- RTL comprises a set of register transfers with
optional operators as part of the transfer. - Example
- regA ? regB
- regC ? regA regB
- if (start1) regA ? regC
- Personal style
- use to separate transfers that occur on
separate cycles. - Use , to separate transfers that occur on the
same cycle. - Example (2 cycles)
- regA ? regB, regB ? 0
- regC ? regA
- A standard high-level representation for
describing systems. - It follows from the fact that all synchronous
digital system can be described as a set of state
elements connected by combination logic (CL)
blocks
45List Processor Example
- RTL gives us a framework for making high-level
optimizations. - Fixed function unit
- Approach extends to instruction interpreters
- General design procedure outline
- 1. Problem, Constraints, and Component Library
Spec. - 2. Algorithm Selection
- 3. Micro-architecture Specification
- 4. Analysis of Cost, Performance, Power
- 5. Optimizations, Variations
- 6. Detailed Design
463. Architecture 1
Direct implementation of RTL description
Datapath
Controller
If (START1) NEXT?0, SUM?0 repeat
SUM?SUM MemoryNEXT1
NEXT?MemoryNEXT until (NEXT0) R?SUM,
DONE?1
47Approaching an ISA
- Instruction Set Architecture
- Defines set of operations, instruction format,
hardware supported data types, named storage,
addressing modes, sequencing - Meaning of each instruction is described by RTL
on architected registers and memory - Given technology constraints assemble adequate
datapath - Architected storage mapped to actual storage
- Function units to do all the required operations
- Possible additional storage (eg. MAR, MBR, )
- Interconnect to move information among regs and
FUs - Map each instruction to sequence of RTLs
- Collate sequences into symbolic controller STD
- Lower symbolic STD to control points
- Implement controller
48Instruction Sequencing
- Example an instruction to add the contents of
two registers (Rx and Ry) and place result in a
third register (Rz) - Step 1 Fetch the ADD instruction from memory
into an instruction register - Step 2 Decode instruction
- Instruction in IR has the code of an ADD
instruction - Register indices used to generate output enables
for registers Rx and Ry - Register index used to generate load signal for
register Rz - Step 3 Execute instruction
- Enable Rx and Ry output and direct to ALU
- Setup ALU to perform ADD operation
- Direct result to Rz so that it can be loaded into
register
49Instruction Execution
- Control State Diagram (for each diagram)
- Reset
- Fetch instruction
- Decode
- Execute
- Instructions partitioned into three classes
- Branch
- Load/store
- Register-to-register
- Different sequencethrough diagram for each
instruction type - Controller manipulates the data path to perform
the instruction
Reset
Init
InitializeMachine
FetchInstr.
XEQInstr.
Load/Store
Branch
Register-to-Register
BranchNot Taken
Branch Taken
Incr.PC
50Networking Layers
Application
send _at_sdata dest
actual
actual
Analog Transmitter
Analog Receiver
time
51What the PHY does
- Code, transmit, receive, decode frames
- activation and deactivation of the radio
transceiver - energy detection (ED) within current channel
- link quality indication (LQI) for received
packets - channel selection
- clear channel assessment (CCA) for CSMA-CA
52CSMA
- Carrier Sense Media Access Collision Avoidance
(CSMA-CA) - Listen for a period of time to hear if the
channel is free (CCA) - If hear traffic, back off for random period of
time - Typically exponentially increasing backoff
- Try again
- May also due random delay before first CCA
- If channel is clear, transmit
- Ethernet does CSMA-CD (collision detect)
53Error Correction Codes (ECC)
- Memory systems generate errors (accidentally
flipped-bits) - DRAMs store very little charge per bit
- Soft errors occur occasionally when cells are
struck by alpha particles or other environmental
upsets. - Less frequently, hard errors can occur when
chips permanently fail. - Problem gets worse as memories get denser and
larger - Where is perfect memory required?
- servers, spacecraft/military computers, ebay,
- Memories are protected against failures with ECCs
- Extra bits are added to each data-word
- used to detect and/or correct faults in the
memory system - in general, each possible data word value is
mapped to a unique code word. A fault changes
a valid code word to an invalid one - which can
be detected.
54Correcting Code Concept
Space of possible bit patterns (2N)
- Detection bit pattern fails codeword check
- Correction map to nearest valid code word
- Example Parity bit
55SECDED
1 2 3 4 5 6 7 positions 001 010 011 100 101 110
111 P1 P2 d1 P3 d2 d3 d4 role
Position of error C3C2C1 Where Ci is parity of
group i
- You receive
- 1111110
- 0000010
- 1010010
- What is the correct value?
56Concept Redundant Check
- Send a message M and a check word C
- Simple function on ltM,Cgt to determine if both
received correctly (with high probability) - Example XOR all the bytes in M and append the
checksum byte, C, at the end - Receiver XORs ltM,Cgt
- What should result be?
- What errors are caught?
bit i is XOR of ith bit of each byte
57CRC concept
- I have a msg polynomial M(x) of degree m
- We both have a generator poly G(x) of degree m
- Let r(x) remainder of M(x) xn / G(x)
- M(x) xn G(x)p(x) r(x)
- r(x) is of degree n
- What is (M(x) xn r(x)) / G(x) ?
- So I send you M(x) xn r(x)
- mn degree polynomial
- You divide by G(x) to check
- M(x) is just the m most signficant coefficients,
r(x) the lower m - n-bit Message is viewed as coefficients of
n-degree polynomial over binary numbers
n bits of zero at the end
tack on n bits of remainder Instead of the zeros
58Controlling Energy Consumption
What control do you have as a designer?
- Largest contributing component to CMOS power
consumption is switching power
- Factors influencing power consumption
- n total number of nodes in circuit
- ? activity factor (probability of each node
switching) - f clock frequency (does this effect energy
consumption?) - Vdd power supply voltage
- What control do you have over each factor?
- How does each effect the total Energy?
59Digital Design
- Given a functional description and performance,
cost, power constraints, come up with an
implementation using a set of primitives. - How do we learn how to do this?
- 1. Learn about the primitives and how to generate
them. - 2. Learn about design representation.
- 3. Learn formal methods to optimally manipulate
the representations. - 4. Look at design examples.
- 5. Use trial and error - CAD tools and
prototyping. - Digital design is in some ways more an art than a
science. The creative spirit is critical in
combining primitive elements other components
in new ways to achieve a desired function. - However, unlike art, we have objective measures
of a design performance cost power
60Traversing Digital Design
CS61C
EE 40
61So whats on the final?
- 5 questions (one full design problem)
- Focused on latter third, but build upon
everything weve done - Digital arithmetic
- Datapath / Control / Computer Organization
- RTL
- Error coding
- But also
- Combinational logic, timing and delays,
controller design - Partly recalling what was presented, partly
putting your knowledge to work to solve a new
problem
62Maintaining the Digital Abstraction (in an analog
world)
- Circuit design with very sharp transitions
- Noise margin for logical values
- Carefully Design Storage Elements (SE)
- Internal feedback
- Structured System Design
- SE CL, cycles must cross SE
- Timing Methodology
- All SE advance state together
- All inputs stable across state change
- Channel coding, framing, encapulation
- Error coding, detection, correction
63Moores Law 2x stuff per year or so
64Bells Law new computer class per 10 years
log (people per computer)
streaming information to/from physical world
- Enabled by technological opportunities
- Smaller, more numerous and more intimately
connected - Ushers in a new kind of application
- Ultimately used in many ways not previously
imagined
year
65What to take away from EECS 150
- Hands-on understanding of digital design
techniques and their relationship to the
underlying technology. - Experience with the fundamental process of the
design of digital systems - Components, DP, RTL, FSM, Controller
- An intellectual toolbox for a changing world.