EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp - PowerPoint PPT Presentation

About This Presentation

Title:

EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp

Description:

... You design Microarchitecture Write Verilog using components provided ... building Datapath + control = digital systems Hardware system design methodology ... – PowerPoint PPT presentation

Number of Views:360

Avg rating:3.0/5.0

Slides: 72

Provided by: Cul5

Category:

more less

Transcript and Presenter's Notes

Title: EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp

1
EECS 150 - Components and Design Techniques for
Digital Systems Lec 26 - WrapUp

David Culler
Electrical Engineering and Computer Sciences
University of California, Berkeley
http//www.eecs.berkeley.edu/culler
http//inst.eecs.berkeley.edu/cs150

http//www.youtube.com/watch?vTb2Q1GGEYA4
2
Announcements

Final Exam
TUESDAY, DECEMBER 18, 2007 5-8P
Location 106 STANLEY
Course Control Number 26455
Final Exam Group 15
TA office hours tues AM
Review Sunday 12/16 5-7 _at_ 125 Cory
Project Partner forms into HW box
Project Presentations Friday as per SignUp
No lecture thurs, no labs, no discussion
Office Hours
HW 10 in box wed

3
Recall Day 1
4
Congratulations

You have accomplished a phenomenal task.

5
Day 1 What is EECS150 about?
6
Day 1 We Will Learn in EECS 150

Language of logic design
Logic optimization, state, timing, CAD tools
Concept of state in digital systems
Analogous to variables and program counters in
software systems
Hardware system building
Datapath control digital systems
Hardware system design methodology
Hardware description languages Verilog
Tools to simulate design behavior output
function (inputs)
Logic compilers synthesize hardware blocks of our
designs
Mapping onto programmable hardware (code
generation)
Contrast with software design
Both map specifications to physical devices
Both must be flawless

7
Day 26 Ready to tackle ANY digital design
8
Tackling complex digital designs

Step 1 Decompose the system into a collection of
subsystems
Each has top-down requirements and bottom-up
constraints
Interconnected through interfaces
Often with particular protocols
Potentially different clock domains
Rate matching, buffering, timing
For example

9
For Example

Encodings
Protocols
Synchronization
Commands
Formats
Specifications
Datasheets

Display
Camera (optional)
Video encoder

Audio
Hand input (limited)
10
Traversing Digital Design
CS61C
EE 40
11
In Each Datapath and Control
Datapath
Controller
Control Points

Datapath Storage, FU, interconnect sufficient to
perform the desired functions
Inputs are Control Points
Outputs are signals
Controller State machine to orchestrate
operation on the data path
Based on desired function and signals

12
Tackling complex digital designs

Step 1 Decompose the system into a collection of
subsystems
Each has top-down requirements and bottom-up
constraints
Interconnected through interfaces
Often with particular protocols
Potentially different clock domains
Rate matching, buffering, timing
For Each Subsystem
Step2 Design the Datapath

13
What makes Digital Systems tick?
Combinational Logic
clk
time
14
Register Transfer Level Descriptions

RTL comprises a set of register transfers with
optional operators as part of the transfer.
Example
regA ? regB
regC ? regA regB
if (start1) regA ? regC
Personal style
use to separate transfers that occur on
separate cycles.
Use , to separate transfers that occur on the
same cycle.
Example (2 cycles)
regA ? regB, regB ? 0
regC ? regA

A standard high-level representation for
describing systems.
It follows from the fact that all synchronous
digital system can be described as a set of state
elements connected by combination logic (CL)
blocks

15
A Register Transfer
C ? A Sel ? 0 Ld ? 1 C ? B Sel ? 1 Ld ? 1
A
B
Sel0
D E C
Sel
0 1
Sel1
Bus
Clk Sel Ld
Ld
C
Clk
A on Bus
B on Bus
One of potentially many source regs goes on the
bus to one or more destination regs Register
transfer on the clock
Ld C from Bus
?
16
Register Transfers - interconnect

Point-to-point connection
Dedicated wires
Muxes on inputs ofeach register
Common input from multiplexer
Load enablesfor each register
Control signalsfor multiplexer
Common bus with output enables
Output enables and loadenables for each register

17
Data Path (Bit-slice)

Bit-slice concept iterate to build n-bit wide
datapaths
Data bit busses run through the slice

2 bits wide
1 bit wide
18
Approaching an ISA

Instruction Set Architecture
Defines set of operations, instruction format,
hardware supported data types, named storage,
addressing modes, sequencing
Meaning of each instruction is described by RTL
on architected registers and memory
Given technology constraints assemble adequate
datapath
Architected storage mapped to actual storage
Function units to do all the required operations
Possible additional storage (eg. MAR, MBR, )
Interconnect to move information among regs and
FUs
Map each instruction to sequence of RTLs
Collate sequences into symbolic controller STD
Lower symbolic STD to control points
Implement controller

18
19
Instruction Types

Data Manipulation
Add, subtract
Increment, decrement
Multiply
Shift, rotate
Immediate operands
Data Staging
Load/store data to/from memory
Register-to-register move
Control
Conditional/unconditional branches in program
flow
Subroutine call and return

19
20
Hardware Necessary To Implement Instructions

Standard FSM Elements
State register
Next-state logic
Output logic (datapath/control signaling)
Moore or synchronous Mealy machine to avoid loops
unbroken by FF
Plus Additional Control" Registers (in DP)
Instruction register (IR)
Program counter (PC)
Inputs/Outputs
Outputs control elements of data path
Inputs from data path used to alter flow of
program (test if zero)

20
21
FSM Controller for CPU

Putting it all togetherand closing the loop
the famousinstructionfetchdecodeexecutecycle

21
22
Representing Numbers

What can be represented in N bits?
2N distinct symbols gt values
Unsigned 0 to 2N - 1
2s Complement -2(N-1) to 2(N-1) - 1
ASCII -10(N/8-2) - 1 to 10(N/8-1) - 1
But, what about?
Very large numbers? (seconds/century) 3,155,760,
000ten (3.15576ten x 109)
Very small numbers? (secs/ nanosecond) 0.00000000
1ten (1.0ten x 10-9)
Bohr radius ? 0.000000000052917710m (5.2917710 x
10-11)
Rationals 2/3 (0.666666666. . .)
Irrationals 21/2 (1.414213562373. . .)
Transcendentals e (2.718...), p (3.141...)

23
2s Complement Overflow
How can you tell an overflow occurred?
Add two positive numbers to get a negative
number or two negative numbers to get a positive
number
-1
-1
0
0
-2
-2
1111
0000
1
1111
0000
1
1110
1110
0001
0001
-3
-3
2
2
1101
1101
0010
0010
-4
-4
1100
3
1100
3
0011
0011
-5
-5
1011
1011
0100
4
0100
4
1010
1010
-6
-6
0101
0101
5
5
1001
1001
0110
0110
-7
-7
6
6
1000
0111
1000
0111
-8
-8
7
7
-7 - 2 7!
5 3 -8!
24
Computer Arithmetic

Circuit design for unsigned addition
Full adder per bit slice
Delay limited by Carry Propagation
Ripple is algorithmically slow, but wires are
short
Carry select
Simple, resource-intensive
Excellent layout
Carry look-ahead
Excellent asymptotic behavior
Great at the board level, but wire length effects
are significant on chip
Digital number systems
How to represent negative numbers
Simple operations
Clean algorithmic properties
2s complement is most widely used
Circuit for unsigned arithmetic
Subtract by complement and carry in
Overflow when cin xor cout of sign-bit is 1

25
2s Complement Adder/Subtractor
A - B A (-B) A B 1
26
Combinational Multiplier accumulation of
partial products
A0 B0 A0 B0
A1 B1 A1 B0 A0 B1
A2 B2 A2 B0 A1 B1 A0 B2
A3 B3 A2 B0 A2 B1 A1 B2 A0 B3
A3 B1 A2 B2 A1 B3
A3 B3
A3 B2 A2 B3
S6
S4
S7
S5
S3
S2
S1
S0
27
Another Representation
Building block full adder and
4 x 4 array of building blocks
28
Digital Number Systems

Positional notation
Dn-1 Dn-2 D0 represents Dn-1Bn-1 Dn-2Bn-2
D0 B0 where Di ? 0, , B-1
2s Complement
Dn-1 Dn-2 D0 represents - Dn-12n-1 Dn-22n-2
D0 20
MSB has negative weight
Binary Point is effectively at the far right
of the word

-1
0
-2
1111
0000
1
1110
0001
-3
2
1101
0010
-4
1100
3
0011
-5
1011
0100
4
0000
1010
-6
0101
5
1001
0110
-7
6
1000
0111
-8
7
29
Circuits for Fixed-Point Arithmetic

Adders
identical circuit
Position of the binary point is entirely in the
interpretation
Be sure the interpretations match
i.e. binary points line up
Subtractors
Multipliers
Position of the binary point just as you learned
by hand
Mult two n-bit numbers yields 2n-bit result with
binary point determined by binary point of the
inputs
2-k 2-m 2-k-m

30
Lets build an FP function unit mult
Ctrl?

31
What is the range of mantissas?
Adder(8)
Ctrl?
Multiplier(24)
-127
Unnorm?
Round
32
Cascaded Carry Lookahead
4 bit adders with internal carry
lookahead second level carry lookahead unit,
extends lookahead to 16 bits One more level to
64 bits
33
Parallel Prefix (generalizing CLA)
70
B
A

Compute all the prefixes Fi Fi-1 op Fi-2 op
op F0
Assume associative and commutative

34
Basic Memory Subsystem Block Diagram
RAM/ROM naming convention 32 X 8, "32 by 8" gt
32 8-bit words 1M X 1, "1 meg by 1" gt 1M 1-bit
words
35
Typical SRAM Timing
OE determines direction Hi Write, Lo
ReadWrites are dangerous! Be careful!
Double signaling OE Hi, WE Lo
Write Timing
Read Timing
High Z
D
Data In
Data Out
Data Out
Junk
A
Write Address
Read Address
Read Address
OE_L
WE_L
36
DRAM WRITE Timing
OE_L
WE_L
CAS_L
RAS_L

Every DRAM access begins at
The assertion of the RAS_L
2 ways to write early or late v. CAS

A
256K x 8 DRAM
D
9
8
DRAM WR Cycle Time
CAS_L
A
Row Address
Junk
Col Address
Row Address
Junk
Col Address
OE_L
WE_L
D
Junk
Junk
Data In
Data In
Junk
WR Access Time
WR Access Time
Early Wr Cycle WE_L asserted before CAS_L
Late Wr Cycle WE_L asserted after CAS_L
37
DRAM with Column buffer
R O W D E C O D E R

11
A0A10
(2,048 x 2,048)
Storage
W
ord Line
Cell
Sense
Amps
Column Latches
MUX
Pull column into fast buffer storage Access
sequence of bits from there
38
Hamming Error Correcting Code

Use more parity bits to pinpoint bit(s) in error,
so they can be corrected.
Example Single error correction (SEC) on 4-bit
data
use 3 parity bits, with 4-data bits results in
7-bit code word
3 parity bits sufficient to identify any one of 7
code word bits
overlap the assignment of parity bits so that a
single error in the 7-bit work can be corrected
Procedure group parity bits so they correspond
to subsets of the 7 bits
p1 protects bits 1,3,5,7 (bit 1 is on)
p2 protects bits 2,3,6,7 (bit 2 is on)
p3 protects bits 4,5,6,7 (bit 3 is on)

1 2 3 4 5 6 7
p1 p2 d1 p3 d2 d3 d4
Bit position number
001 110
011 310
101 510
111 710
010 210
011 310
110 610
111 710
100 410
101 510
110 610
111 710

Note number bits from left to right.
39
Example 8 bit SEC
1
2
3
4
5
6
7
8
9
10
11
12

Takes four parity bits
In power of 2 positions
Rest are the data bits
Bits with i in their address feed into parity
calculation for pi
What to do with bit 0?

40
Example Ethernet CRC-32
Application (HTTP,FTP, DNS)
7
Transport (TCP, UDP)
4
Network (IP)
3
Data Link (Ethernet, 802.11b)
2
Physical
1
41
General Model of Synchronous Circuit

In general, for correct operation
for all paths.
How do we enumerate all paths?
Any circuit input or register output to any
register input or circuit output.
setup time for circuit outputs depends on what
it connects to
clk-Q time for circuit inputs depends on from
where it comes.

T ? time(clk?Q) time(CL) time(setup) T ?
?clk?Q ?CL ?setup
42
more Boolean Expressions to Logic Gates
X

NAND
NOR
XOR X ??Y
XNOR X Y

Z
Y
X
Z
Y
X xor Y X Y' X' YX or Y but not both
("inequality", "difference")
X
Z
Y
X xnor Y X Y X' Y'X and Y are the same
("equality", "coincidence")
X
Z
Y
43
Gate Switching Behavior

Inverter

NAND gate

When does it start? How quickly does it switch?
44
Xilinx Virtex-E Floorplan

Configurable Logic Blocks
4-input function gens
buffers
flipflop

Input/Output Blocks
combinational, latch, and flipflop output
sampled inputs

Block RAM
4096 bits each
every 12 CLB columns

45
Limitations on Clock Rate

Logic Gate Delay
What are typical delay values?

Delays in flip-flops
Both times contribute to limiting the clock
period.

What must happen in one clock cycle for correct
operation?
Assuming perfect clock distribution (all
flip-flops see the clock at the same time)
All signals must be ready and setup before
rising edge of clock.

46
Timing Methodologies

Rules for interconnecting components and clocks
Guarantee proper operation of system when
strictly followed
Approach depends on building blocks used for
memory elements
Focus on systems with edge-triggered flip-flops
Found in programmable logic devices
Many custom integrated circuits focus on
level-sensitive latches
Basic rules for correct timing
(1) Correct inputs, with respect to time, are
provided to the flip-flops
(2) No flip-flop changes state more than once per
clocking event

47
Master-Slave Structure

Construct D flipflop from two D latches

clk
clk
clk
clk
clk
clk
clk
clk
48
Master-Slave Structure

Break flow by alternating clocks (like an
air-lock)
Use positive clock to latch inputs into one R-S
latch
Use negative clock to change outputs with another
R-S latch
View pair as one basic unit
master-slave flip-flop
twice as much logic
output changes a few gate delays after the
falling edge of clock but does not affect any
cascaded flip-flops

CLK
CLK
49
(neg) Edge-Triggered Flip-Flops

More efficient solution only 6 gates
sensitive to inputs only near edge of clock
signal (not while high)

holds D' when clock goes low
negative edge-triggered D flip-flop (D-FF) 4-5
gate delays must respect setup and hold time
constraints to successfullycapture input
holds D whenclock goes low
characteristic equationQ(t1) D
50
Two-phase non-overlapping clocks

Sequential elements partition into two classes
phase0 elets feed phase1
phase1 elets feed phase0
Approximate single phase each register replaced
by a pair of latches on two phases
Can push logic across (retiming)
Can always slow down the clocks to meet all
timing constraints

a
b
c/l
clk1
clk-0
in
clk0
clk1
51
Tackling complex digital designs

Step 1 Decompose the system into a collection of
subsystems
Each has top-down requirements and bottom-up
constraints
Interconnected through interfaces
Often with particular protocols
Potentially different clock domains
Rate matching, buffering, timing
For Each Subsystem
Step 2 Design the Datapath
Step 3 Design the Controller

52
In Each Datapath and Control
Datapath
Controller
Control Points

Datapath Storage, FU, interconnect sufficient to
perform the desired functions
Inputs are Control Points
Outputs are signals
Controller State machine to orchestrate
operation on the data path
Based on desired function and signals

53
Review Two Kinds of FSMs

Moore Machine vs Mealy
Machine

Output (t) G( state(t), Input )
Output (t) G( state(t))
Input
Input
state
Combinational Logic
state
state(t1) F ( state(t), input)
state(t1) F ( state(t), input(t))
Input / Out
State
Input
State / out
54
Review Finite State Machine Representations

States determined by possible values in
sequential storage elements
Transitions change of state
Clock controls when state can change by
controlling storage elements
Sequential Logic
Sequences through a series of states
Based on sequence of values on input signals
Clock period defines elements of sequence

55
Review Formal Design Process
Logic equations from table OUT PS NS PS xor
IN

Review of Design Steps
1. Circuit functional specification
2. State Transition Diagram
3. Symbolic State Transition Table
4. Encoded State Transition Table
5. Derive Logic Equations
6. Circuit Diagram
FFs for state
CL for NS and OUT

Circuit Diagram
XOR gate for ns calculation
DFF to hold present state
no logic needed for output

Take this seriously!
56
Moore Verilog FSM combinational part
always _at_(In or CurrentState) begin NextState
CurrentState Out 1b0 case
(CurrentState) STATE_Zero begin // last input
was a zero if (In) NextState
STATE_One1 end STATE_One1 begin // we've
seen one 1 if (In) NextState
STATE_Two1s else NextState
STATE_Zero end STATE_Two1s begin // we've
seen at least 2 ones Out 1 if (In)
NextState STATE_Zero end default begin //
in case we reach a bad state Out
1bx NextState STATE_X end endcase e
nd
57
Moore Verilog FSM state part
// Implement the state register always _at_
(posedge Clock) begin if (Reset) CurrentState
lt STATE_Zero else CurrentState lt
NextState end endmodule
Note posedge Clock requires NONBLOCKING
ASSIGNMENT. Blocking Assignment lt-gt
Combinational Logic Nonblocking Assignment lt-gt
Sequential Logic (Registers)
58
FSM Optimization

State Reduction
Motivation
lower cost
fewer flip-flops in one-hot implementations
possibly fewer flip-flops in encoded
implementations
more dont cares in NS logic
fewer gates in NS logic
Simpler to design with extra states then reduce
later.

Example Odd parity checker.
Two machines - identical behavior.

59
Algorithmic Approach to State Minimization

Goal identify and combine states that have
equivalent behavior
Equivalent States
Same output
For all input combinations, states transition to
same or equivalent states
Algorithm Sketch
1. Place all states in one set
2. Initially partition set based on output
behavior
3. Successively partition resulting subsets based
on next state transitions
4. Repeat (3) until no further partitioning is
required
states left in the same set are equivalent
Polynomial time procedure

60
Minimized FSM

Implication Chart Method
Table of all pairs of stats
1st Eliminate incompatible states based on
outputs
Fill entry with implied equivalents based on next
state
Cross out cells where indexed chart entries are
crossed out

61
State Assignment Strategies

Possible Strategies
Sequential just number states as they appear in
the state table
Random pick random codes
One-hot use as many state bits as there are
states (bit1 gt state)
Output use outputs to help encode states
Heuristic rules of thumb that seem to work in
most cases
No guarantee of optimality another intractable
problem

62
Tackling complex digital designs

Step 1 Decompose the system into a collection of
subsystems
Each has top-down requirements and bottom-up
constraints
Interconnected through interfaces
Often with particular protocols
Potentially different clock domains
Rate matching, buffering, timing
For Each Subsystem
Step 2 Design the Datapath
Step 3 Design the Controller
Step 4 Compose them back together

63
Design Process
Specification
Manual Design and Coding
HDL

Start with Some Specification
This Class
Lab Write Ups
Industry
Contract Restrictions
High and Low-Level Specifications from Architects
and Designers
Convert the Design to HDL
This Class
You design Microarchitecture
Write Verilog using components provided by the
TAs or the Standard Library and also from
scratch
Industry
Verilog or VHDL using standard components or
previous designs

RTL Synthesis
Netlist
Logic Optimization
Netlist
Physical Design
Layout
Implemetation
Final Product
64
Design Process
Specification
Manual Design and Coding

Convert HDL into RTL and Optimize Design
This Class
Synplify Pro
Industry
Other Synthesis tools
2 Multi-Level Logic Optimization
Convert the Netlist into a Layout
This Class
Xilinx Map PAR
Industry
Place and Route Tools
Technology Mapping
Convert Layout to Final Product
This Class
Download to Board..Configure FPGA
Industry
Send Layout to Fab

HDL
RTL Synthesis
Netlist
Logic Optimization
Netlist
Physical Design
Layout
Implemetation
Final Product
65
Testing

How do I know what that what I designed is really
what I got back???
Specification to HDL
Verification
Formal Verification
Simulation - such as Model Sim
HDL to Layout
Equivlance testing
Tool Verification

66
Fault Model

Simple example

Test Set
67
Really putting it together

Fault Models are used to generate interesting
input vectors and their corresponding output
vectors
A subset of these vectors are selected to make a
sufficiently short sequence of tests with a
reasonable amount of coverage
Vectors are combined to together to create scan
patterns that test for faults by using shift
register tests or using the BIST engine.
At the Fab the sequence of test patterns are run
on every wafer using a tester to sort the good
chips from the bad chips.
After packaging the chip another (similar) set of
test is run on the packaged chip.

68
55 W-hour battery stores the energy of 1/2 a
stick of dynamite.
If battery short-circuits, catastrophe is
possible ...
69
Controlling Energy Consumption What Control Do
You Have as a Designer?

Largest contributing component to CMOS power
consumption is switching power

Factors influencing power consumption
n total number of nodes in circuit
? activity factor (probability of each node
switching)
f clock frequency (does this effect energy
consumption?)
Vdd power supply voltage
What control do you have over each factor?
How does each effect the total Energy?

70
Day 1 CS 150 Concepts/Skills/Abilities

Basics of logic design (concepts)
Sound design methodologies (concepts)
Modern specification methods (concepts)
Familiarity with full set of CAD tools (skills)
Appreciation for differences and similarities
(abilities) in hardware and software design
Hands-on experience with non-trivial design

New ability perform logic design with
computer-aided design tools, validating that
design via simulation, and mapping its
implementation into programmable logic devices
Appreciating the advantages/disadvantages hw vs.
sw implementation
71
Broad Technology Trends
Moores Law transistors on cost-effective chip
doubles every 18 months
Bells Law a new computer class emerges every 10
years