Title: EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp
1EECS 150 - Components and Design Techniques for
Digital Systems Lec 26 - WrapUp
- David Culler
- Electrical Engineering and Computer Sciences
- University of California, Berkeley
- http//www.eecs.berkeley.edu/culler
- http//inst.eecs.berkeley.edu/cs150
http//www.youtube.com/watch?vTb2Q1GGEYA4
2Announcements
- Final Exam
- TUESDAY, DECEMBER 18, 2007 5-8P
- Location 106 STANLEY
- Course Control Number 26455
- Final Exam Group 15
- TA office hours tues AM
- Review Sunday 12/16 5-7 _at_ 125 Cory
- Project Partner forms into HW box
- Project Presentations Friday as per SignUp
- No lecture thurs, no labs, no discussion
- Office Hours
- HW 10 in box wed
3Recall Day 1
4Congratulations
- You have accomplished a phenomenal task.
5Day 1 What is EECS150 about?
6Day 1 We Will Learn in EECS 150
- Language of logic design
- Logic optimization, state, timing, CAD tools
- Concept of state in digital systems
- Analogous to variables and program counters in
software systems - Hardware system building
- Datapath control digital systems
- Hardware system design methodology
- Hardware description languages Verilog
- Tools to simulate design behavior output
function (inputs) - Logic compilers synthesize hardware blocks of our
designs - Mapping onto programmable hardware (code
generation) - Contrast with software design
- Both map specifications to physical devices
- Both must be flawless
7Day 26 Ready to tackle ANY digital design
8Tackling complex digital designs
- Step 1 Decompose the system into a collection of
subsystems - Each has top-down requirements and bottom-up
constraints - Interconnected through interfaces
- Often with particular protocols
- Potentially different clock domains
- Rate matching, buffering, timing
- For example
9For Example
- Encodings
- Protocols
- Synchronization
- Commands
- Formats
- Specifications
- Datasheets
Display
Camera (optional)
Video encoder
Audio
Hand input (limited)
10Traversing Digital Design
CS61C
EE 40
11In Each Datapath and Control
Datapath
Controller
Control Points
- Datapath Storage, FU, interconnect sufficient to
perform the desired functions - Inputs are Control Points
- Outputs are signals
- Controller State machine to orchestrate
operation on the data path - Based on desired function and signals
12Tackling complex digital designs
- Step 1 Decompose the system into a collection of
subsystems - Each has top-down requirements and bottom-up
constraints - Interconnected through interfaces
- Often with particular protocols
- Potentially different clock domains
- Rate matching, buffering, timing
- For Each Subsystem
- Step2 Design the Datapath
13What makes Digital Systems tick?
Combinational Logic
clk
time
14Register Transfer Level Descriptions
- RTL comprises a set of register transfers with
optional operators as part of the transfer. - Example
- regA ? regB
- regC ? regA regB
- if (start1) regA ? regC
- Personal style
- use to separate transfers that occur on
separate cycles. - Use , to separate transfers that occur on the
same cycle. - Example (2 cycles)
- regA ? regB, regB ? 0
- regC ? regA
- A standard high-level representation for
describing systems. - It follows from the fact that all synchronous
digital system can be described as a set of state
elements connected by combination logic (CL)
blocks
15A Register Transfer
C ? A Sel ? 0 Ld ? 1 C ? B Sel ? 1 Ld ? 1
A
B
Sel0
D E C
Sel
0 1
Sel1
Bus
Clk Sel Ld
Ld
C
Clk
A on Bus
B on Bus
One of potentially many source regs goes on the
bus to one or more destination regs Register
transfer on the clock
Ld C from Bus
?
16Register Transfers - interconnect
- Point-to-point connection
- Dedicated wires
- Muxes on inputs ofeach register
- Common input from multiplexer
- Load enablesfor each register
- Control signalsfor multiplexer
- Common bus with output enables
- Output enables and loadenables for each register
17Data Path (Bit-slice)
- Bit-slice concept iterate to build n-bit wide
datapaths - Data bit busses run through the slice
2 bits wide
1 bit wide
18Approaching an ISA
- Instruction Set Architecture
- Defines set of operations, instruction format,
hardware supported data types, named storage,
addressing modes, sequencing - Meaning of each instruction is described by RTL
on architected registers and memory - Given technology constraints assemble adequate
datapath - Architected storage mapped to actual storage
- Function units to do all the required operations
- Possible additional storage (eg. MAR, MBR, )
- Interconnect to move information among regs and
FUs - Map each instruction to sequence of RTLs
- Collate sequences into symbolic controller STD
- Lower symbolic STD to control points
- Implement controller
18
19Instruction Types
- Data Manipulation
- Add, subtract
- Increment, decrement
- Multiply
- Shift, rotate
- Immediate operands
- Data Staging
- Load/store data to/from memory
- Register-to-register move
- Control
- Conditional/unconditional branches in program
flow - Subroutine call and return
19
20Hardware Necessary To Implement Instructions
- Standard FSM Elements
- State register
- Next-state logic
- Output logic (datapath/control signaling)
- Moore or synchronous Mealy machine to avoid loops
unbroken by FF - Plus Additional Control" Registers (in DP)
- Instruction register (IR)
- Program counter (PC)
- Inputs/Outputs
- Outputs control elements of data path
- Inputs from data path used to alter flow of
program (test if zero)
20
21FSM Controller for CPU
- Putting it all togetherand closing the loop
- the famousinstructionfetchdecodeexecutecycle
21
22Representing Numbers
- What can be represented in N bits?
- 2N distinct symbols gt values
- Unsigned 0 to 2N - 1
- 2s Complement -2(N-1) to 2(N-1) - 1
- ASCII -10(N/8-2) - 1 to 10(N/8-1) - 1
- But, what about?
- Very large numbers? (seconds/century) 3,155,760,
000ten (3.15576ten x 109) - Very small numbers? (secs/ nanosecond) 0.00000000
1ten (1.0ten x 10-9) - Bohr radius ? 0.000000000052917710m (5.2917710 x
10-11) - Rationals 2/3 (0.666666666. . .)
- Irrationals 21/2 (1.414213562373. . .)
- Transcendentals e (2.718...), p (3.141...)
232s Complement Overflow
How can you tell an overflow occurred?
Add two positive numbers to get a negative
number or two negative numbers to get a positive
number
-1
-1
0
0
-2
-2
1111
0000
1
1111
0000
1
1110
1110
0001
0001
-3
-3
2
2
1101
1101
0010
0010
-4
-4
1100
3
1100
3
0011
0011
-5
-5
1011
1011
0100
4
0100
4
1010
1010
-6
-6
0101
0101
5
5
1001
1001
0110
0110
-7
-7
6
6
1000
0111
1000
0111
-8
-8
7
7
-7 - 2 7!
5 3 -8!
24Computer Arithmetic
- Circuit design for unsigned addition
- Full adder per bit slice
- Delay limited by Carry Propagation
- Ripple is algorithmically slow, but wires are
short - Carry select
- Simple, resource-intensive
- Excellent layout
- Carry look-ahead
- Excellent asymptotic behavior
- Great at the board level, but wire length effects
are significant on chip - Digital number systems
- How to represent negative numbers
- Simple operations
- Clean algorithmic properties
- 2s complement is most widely used
- Circuit for unsigned arithmetic
- Subtract by complement and carry in
- Overflow when cin xor cout of sign-bit is 1
252s Complement Adder/Subtractor
A - B A (-B) A B 1
26Combinational Multiplier accumulation of
partial products
A0 B0 A0 B0
A1 B1 A1 B0 A0 B1
A2 B2 A2 B0 A1 B1 A0 B2
A3 B3 A2 B0 A2 B1 A1 B2 A0 B3
A3 B1 A2 B2 A1 B3
A3 B3
A3 B2 A2 B3
S6
S4
S7
S5
S3
S2
S1
S0
27Another Representation
Building block full adder and
4 x 4 array of building blocks
28Digital Number Systems
- Positional notation
- Dn-1 Dn-2 D0 represents Dn-1Bn-1 Dn-2Bn-2
D0 B0 where Di ? 0, , B-1 - 2s Complement
- Dn-1 Dn-2 D0 represents - Dn-12n-1 Dn-22n-2
D0 20 - MSB has negative weight
- Binary Point is effectively at the far right
of the word
-1
0
-2
1111
0000
1
1110
0001
-3
2
1101
0010
-4
1100
3
0011
-5
1011
0100
4
0000
1010
-6
0101
5
1001
0110
-7
6
1000
0111
-8
7
29Circuits for Fixed-Point Arithmetic
- Adders
- identical circuit
- Position of the binary point is entirely in the
interpretation - Be sure the interpretations match
- i.e. binary points line up
- Subtractors
- Multipliers
- Position of the binary point just as you learned
by hand - Mult two n-bit numbers yields 2n-bit result with
binary point determined by binary point of the
inputs - 2-k 2-m 2-k-m
30Lets build an FP function unit mult
Ctrl?
31What is the range of mantissas?
Adder(8)
Ctrl?
Multiplier(24)
-127
Unnorm?
Round
32Cascaded Carry Lookahead
4 bit adders with internal carry
lookahead second level carry lookahead unit,
extends lookahead to 16 bits One more level to
64 bits
33Parallel Prefix (generalizing CLA)
70
B
A
- Compute all the prefixes Fi Fi-1 op Fi-2 op
op F0 - Assume associative and commutative
34Basic Memory Subsystem Block Diagram
RAM/ROM naming convention 32 X 8, "32 by 8" gt
32 8-bit words 1M X 1, "1 meg by 1" gt 1M 1-bit
words
35Typical SRAM Timing
OE determines direction Hi Write, Lo
ReadWrites are dangerous! Be careful!
Double signaling OE Hi, WE Lo
Write Timing
Read Timing
High Z
D
Data In
Data Out
Data Out
Junk
A
Write Address
Read Address
Read Address
OE_L
WE_L
36DRAM WRITE Timing
OE_L
WE_L
CAS_L
RAS_L
- Every DRAM access begins at
- The assertion of the RAS_L
- 2 ways to write early or late v. CAS
A
256K x 8 DRAM
D
9
8
DRAM WR Cycle Time
CAS_L
A
Row Address
Junk
Col Address
Row Address
Junk
Col Address
OE_L
WE_L
D
Junk
Junk
Data In
Data In
Junk
WR Access Time
WR Access Time
Early Wr Cycle WE_L asserted before CAS_L
Late Wr Cycle WE_L asserted after CAS_L
37DRAM with Column buffer
R O W D E C O D E R
11
A0A10
(2,048 x 2,048)
Storage
W
ord Line
Cell
Sense
Amps
Column Latches
MUX
Pull column into fast buffer storage Access
sequence of bits from there
38Hamming Error Correcting Code
- Use more parity bits to pinpoint bit(s) in error,
so they can be corrected. - Example Single error correction (SEC) on 4-bit
data - use 3 parity bits, with 4-data bits results in
7-bit code word - 3 parity bits sufficient to identify any one of 7
code word bits - overlap the assignment of parity bits so that a
single error in the 7-bit work can be corrected - Procedure group parity bits so they correspond
to subsets of the 7 bits - p1 protects bits 1,3,5,7 (bit 1 is on)
- p2 protects bits 2,3,6,7 (bit 2 is on)
- p3 protects bits 4,5,6,7 (bit 3 is on)
- 1 2 3 4 5 6 7
- p1 p2 d1 p3 d2 d3 d4
- Bit position number
- 001 110
- 011 310
- 101 510
- 111 710
- 010 210
- 011 310
- 110 610
- 111 710
- 100 410
- 101 510
- 110 610
- 111 710
-
Note number bits from left to right.
39Example 8 bit SEC
1
2
3
4
5
6
7
8
9
10
11
12
- Takes four parity bits
- In power of 2 positions
- Rest are the data bits
- Bits with i in their address feed into parity
calculation for pi - What to do with bit 0?
40Example Ethernet CRC-32
Application (HTTP,FTP, DNS)
7
Transport (TCP, UDP)
4
Network (IP)
3
Data Link (Ethernet, 802.11b)
2
Physical
1
41General Model of Synchronous Circuit
- In general, for correct operation
- for all paths.
- How do we enumerate all paths?
- Any circuit input or register output to any
register input or circuit output. - setup time for circuit outputs depends on what
it connects to - clk-Q time for circuit inputs depends on from
where it comes.
T ? time(clk?Q) time(CL) time(setup) T ?
?clk?Q ?CL ?setup
42more Boolean Expressions to Logic Gates
X
- NAND
- NOR
- XOR X ??Y
- XNOR X Y
Z
Y
X
Z
Y
X xor Y X Y' X' YX or Y but not both
("inequality", "difference")
X
Z
Y
X xnor Y X Y X' Y'X and Y are the same
("equality", "coincidence")
X
Z
Y
43Gate Switching Behavior
When does it start? How quickly does it switch?
44Xilinx Virtex-E Floorplan
- Configurable Logic Blocks
- 4-input function gens
- buffers
- flipflop
- Input/Output Blocks
- combinational, latch, and flipflop output
- sampled inputs
- Block RAM
- 4096 bits each
- every 12 CLB columns
45Limitations on Clock Rate
- Logic Gate Delay
- What are typical delay values?
- Delays in flip-flops
- Both times contribute to limiting the clock
period.
- What must happen in one clock cycle for correct
operation? - Assuming perfect clock distribution (all
flip-flops see the clock at the same time) - All signals must be ready and setup before
rising edge of clock.
46Timing Methodologies
- Rules for interconnecting components and clocks
- Guarantee proper operation of system when
strictly followed - Approach depends on building blocks used for
memory elements - Focus on systems with edge-triggered flip-flops
- Found in programmable logic devices
- Many custom integrated circuits focus on
level-sensitive latches - Basic rules for correct timing
- (1) Correct inputs, with respect to time, are
provided to the flip-flops - (2) No flip-flop changes state more than once per
clocking event
47Master-Slave Structure
- Construct D flipflop from two D latches
clk
clk
clk
clk
clk
clk
clk
clk
48Master-Slave Structure
- Break flow by alternating clocks (like an
air-lock) - Use positive clock to latch inputs into one R-S
latch - Use negative clock to change outputs with another
R-S latch - View pair as one basic unit
- master-slave flip-flop
- twice as much logic
- output changes a few gate delays after the
falling edge of clock but does not affect any
cascaded flip-flops
CLK
CLK
49(neg) Edge-Triggered Flip-Flops
- More efficient solution only 6 gates
- sensitive to inputs only near edge of clock
signal (not while high)
holds D' when clock goes low
negative edge-triggered D flip-flop (D-FF) 4-5
gate delays must respect setup and hold time
constraints to successfullycapture input
holds D whenclock goes low
characteristic equationQ(t1) D
50Two-phase non-overlapping clocks
- Sequential elements partition into two classes
- phase0 elets feed phase1
- phase1 elets feed phase0
- Approximate single phase each register replaced
by a pair of latches on two phases - Can push logic across (retiming)
- Can always slow down the clocks to meet all
timing constraints
a
b
c/l
clk1
clk-0
in
clk0
clk1
51Tackling complex digital designs
- Step 1 Decompose the system into a collection of
subsystems - Each has top-down requirements and bottom-up
constraints - Interconnected through interfaces
- Often with particular protocols
- Potentially different clock domains
- Rate matching, buffering, timing
- For Each Subsystem
- Step 2 Design the Datapath
- Step 3 Design the Controller
52In Each Datapath and Control
Datapath
Controller
Control Points
- Datapath Storage, FU, interconnect sufficient to
perform the desired functions - Inputs are Control Points
- Outputs are signals
- Controller State machine to orchestrate
operation on the data path - Based on desired function and signals
53Review Two Kinds of FSMs
- Moore Machine vs Mealy
Machine
Output (t) G( state(t), Input )
Output (t) G( state(t))
Input
Input
state
Combinational Logic
state
state(t1) F ( state(t), input)
state(t1) F ( state(t), input(t))
Input / Out
State
Input
State / out
54Review Finite State Machine Representations
- States determined by possible values in
sequential storage elements - Transitions change of state
- Clock controls when state can change by
controlling storage elements - Sequential Logic
- Sequences through a series of states
- Based on sequence of values on input signals
- Clock period defines elements of sequence
55Review Formal Design Process
Logic equations from table OUT PS NS PS xor
IN
- Review of Design Steps
- 1. Circuit functional specification
- 2. State Transition Diagram
- 3. Symbolic State Transition Table
- 4. Encoded State Transition Table
- 5. Derive Logic Equations
- 6. Circuit Diagram
- FFs for state
- CL for NS and OUT
- Circuit Diagram
- XOR gate for ns calculation
- DFF to hold present state
- no logic needed for output
Take this seriously!
56Moore Verilog FSM combinational part
always _at_(In or CurrentState) begin NextState
CurrentState Out 1b0 case
(CurrentState) STATE_Zero begin // last input
was a zero if (In) NextState
STATE_One1 end STATE_One1 begin // we've
seen one 1 if (In) NextState
STATE_Two1s else NextState
STATE_Zero end STATE_Two1s begin // we've
seen at least 2 ones Out 1 if (In)
NextState STATE_Zero end default begin //
in case we reach a bad state Out
1bx NextState STATE_X end endcase e
nd
57Moore Verilog FSM state part
// Implement the state register always _at_
(posedge Clock) begin if (Reset) CurrentState
lt STATE_Zero else CurrentState lt
NextState end endmodule
Note posedge Clock requires NONBLOCKING
ASSIGNMENT. Blocking Assignment lt-gt
Combinational Logic Nonblocking Assignment lt-gt
Sequential Logic (Registers)
58FSM Optimization
- State Reduction
- Motivation
- lower cost
- fewer flip-flops in one-hot implementations
- possibly fewer flip-flops in encoded
implementations - more dont cares in NS logic
- fewer gates in NS logic
- Simpler to design with extra states then reduce
later.
- Example Odd parity checker.
- Two machines - identical behavior.
59Algorithmic Approach to State Minimization
- Goal identify and combine states that have
equivalent behavior - Equivalent States
- Same output
- For all input combinations, states transition to
same or equivalent states - Algorithm Sketch
- 1. Place all states in one set
- 2. Initially partition set based on output
behavior - 3. Successively partition resulting subsets based
on next state transitions - 4. Repeat (3) until no further partitioning is
required - states left in the same set are equivalent
- Polynomial time procedure
60Minimized FSM
- Implication Chart Method
- Table of all pairs of stats
- 1st Eliminate incompatible states based on
outputs - Fill entry with implied equivalents based on next
state - Cross out cells where indexed chart entries are
crossed out
61State Assignment Strategies
- Possible Strategies
- Sequential just number states as they appear in
the state table - Random pick random codes
- One-hot use as many state bits as there are
states (bit1 gt state) - Output use outputs to help encode states
- Heuristic rules of thumb that seem to work in
most cases - No guarantee of optimality another intractable
problem
62Tackling complex digital designs
- Step 1 Decompose the system into a collection of
subsystems - Each has top-down requirements and bottom-up
constraints - Interconnected through interfaces
- Often with particular protocols
- Potentially different clock domains
- Rate matching, buffering, timing
- For Each Subsystem
- Step 2 Design the Datapath
- Step 3 Design the Controller
- Step 4 Compose them back together
63Design Process
Specification
Manual Design and Coding
HDL
- Start with Some Specification
- This Class
- Lab Write Ups
- Industry
- Contract Restrictions
- High and Low-Level Specifications from Architects
and Designers - Convert the Design to HDL
- This Class
- You design Microarchitecture
- Write Verilog using components provided by the
TAs or the Standard Library and also from
scratch - Industry
- Verilog or VHDL using standard components or
previous designs
RTL Synthesis
Netlist
Logic Optimization
Netlist
Physical Design
Layout
Implemetation
Final Product
64Design Process
Specification
Manual Design and Coding
- Convert HDL into RTL and Optimize Design
- This Class
- Synplify Pro
- Industry
- Other Synthesis tools
- 2 Multi-Level Logic Optimization
- Convert the Netlist into a Layout
- This Class
- Xilinx Map PAR
- Industry
- Place and Route Tools
- Technology Mapping
- Convert Layout to Final Product
- This Class
- Download to Board..Configure FPGA
- Industry
- Send Layout to Fab
HDL
RTL Synthesis
Netlist
Logic Optimization
Netlist
Physical Design
Layout
Implemetation
Final Product
65Testing
- How do I know what that what I designed is really
what I got back??? - Specification to HDL
- Verification
- Formal Verification
- Simulation - such as Model Sim
- HDL to Layout
- Equivlance testing
- Tool Verification
66Fault Model
Test Set
67Really putting it together
- Fault Models are used to generate interesting
input vectors and their corresponding output
vectors - A subset of these vectors are selected to make a
sufficiently short sequence of tests with a
reasonable amount of coverage - Vectors are combined to together to create scan
patterns that test for faults by using shift
register tests or using the BIST engine. - At the Fab the sequence of test patterns are run
on every wafer using a tester to sort the good
chips from the bad chips. - After packaging the chip another (similar) set of
test is run on the packaged chip.
6855 W-hour battery stores the energy of 1/2 a
stick of dynamite.
If battery short-circuits, catastrophe is
possible ...
69Controlling Energy Consumption What Control Do
You Have as a Designer?
- Largest contributing component to CMOS power
consumption is switching power
- Factors influencing power consumption
- n total number of nodes in circuit
- ? activity factor (probability of each node
switching) - f clock frequency (does this effect energy
consumption?) - Vdd power supply voltage
- What control do you have over each factor?
- How does each effect the total Energy?
70Day 1 CS 150 Concepts/Skills/Abilities
- Basics of logic design (concepts)
- Sound design methodologies (concepts)
- Modern specification methods (concepts)
- Familiarity with full set of CAD tools (skills)
- Appreciation for differences and similarities
(abilities) in hardware and software design - Hands-on experience with non-trivial design
New ability perform logic design with
computer-aided design tools, validating that
design via simulation, and mapping its
implementation into programmable logic devices
Appreciating the advantages/disadvantages hw vs.
sw implementation
71Broad Technology Trends
Moores Law transistors on cost-effective chip
doubles every 18 months
Bells Law a new computer class emerges every 10
years
- Today 1 million transistors per
http//www.youtube.com/watch?vAlPqL7IUT6M
72Go Forth and Design!
Go Bears