EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp - PowerPoint PPT Presentation

About This Presentation
Title:

EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp

Description:

... You design Microarchitecture Write Verilog using components provided ... building Datapath + control = digital systems Hardware system design methodology ... – PowerPoint PPT presentation

Number of Views:360
Avg rating:3.0/5.0
Slides: 72
Provided by: Cul5
Category:

less

Transcript and Presenter's Notes

Title: EECS 150 - Components and Design Techniques for Digital Systems Lec 26 - WrapUp


1
EECS 150 - Components and Design Techniques for
Digital Systems Lec 26 - WrapUp
  • David Culler
  • Electrical Engineering and Computer Sciences
  • University of California, Berkeley
  • http//www.eecs.berkeley.edu/culler
  • http//inst.eecs.berkeley.edu/cs150

http//www.youtube.com/watch?vTb2Q1GGEYA4
2
Announcements
  • Final Exam
  • TUESDAY, DECEMBER 18, 2007   5-8P
  • Location 106 STANLEY
  • Course Control Number 26455
  • Final Exam Group 15
  • TA office hours tues AM
  • Review Sunday 12/16 5-7 _at_ 125 Cory
  • Project Partner forms into HW box
  • Project Presentations Friday as per SignUp
  • No lecture thurs, no labs, no discussion
  • Office Hours
  • HW 10 in box wed

3
Recall Day 1
4
Congratulations
  • You have accomplished a phenomenal task.

5
Day 1 What is EECS150 about?
6
Day 1 We Will Learn in EECS 150
  • Language of logic design
  • Logic optimization, state, timing, CAD tools
  • Concept of state in digital systems
  • Analogous to variables and program counters in
    software systems
  • Hardware system building
  • Datapath control digital systems
  • Hardware system design methodology
  • Hardware description languages Verilog
  • Tools to simulate design behavior output
    function (inputs)
  • Logic compilers synthesize hardware blocks of our
    designs
  • Mapping onto programmable hardware (code
    generation)
  • Contrast with software design
  • Both map specifications to physical devices
  • Both must be flawless

7
Day 26 Ready to tackle ANY digital design
8
Tackling complex digital designs
  • Step 1 Decompose the system into a collection of
    subsystems
  • Each has top-down requirements and bottom-up
    constraints
  • Interconnected through interfaces
  • Often with particular protocols
  • Potentially different clock domains
  • Rate matching, buffering, timing
  • For example

9
For Example
  • Encodings
  • Protocols
  • Synchronization
  • Commands
  • Formats
  • Specifications
  • Datasheets

Display
Camera (optional)
Video encoder

Audio
Hand input (limited)
10
Traversing Digital Design
CS61C
EE 40
11
In Each Datapath and Control
Datapath
Controller
Control Points
  • Datapath Storage, FU, interconnect sufficient to
    perform the desired functions
  • Inputs are Control Points
  • Outputs are signals
  • Controller State machine to orchestrate
    operation on the data path
  • Based on desired function and signals

12
Tackling complex digital designs
  • Step 1 Decompose the system into a collection of
    subsystems
  • Each has top-down requirements and bottom-up
    constraints
  • Interconnected through interfaces
  • Often with particular protocols
  • Potentially different clock domains
  • Rate matching, buffering, timing
  • For Each Subsystem
  • Step2 Design the Datapath

13
What makes Digital Systems tick?
Combinational Logic
clk
time
14
Register Transfer Level Descriptions
  • RTL comprises a set of register transfers with
    optional operators as part of the transfer.
  • Example
  • regA ? regB
  • regC ? regA regB
  • if (start1) regA ? regC
  • Personal style
  • use to separate transfers that occur on
    separate cycles.
  • Use , to separate transfers that occur on the
    same cycle.
  • Example (2 cycles)
  • regA ? regB, regB ? 0
  • regC ? regA
  • A standard high-level representation for
    describing systems.
  • It follows from the fact that all synchronous
    digital system can be described as a set of state
    elements connected by combination logic (CL)
    blocks

15
A Register Transfer
C ? A Sel ? 0 Ld ? 1 C ? B Sel ? 1 Ld ? 1
A
B
Sel0
D E C
Sel
0 1
Sel1
Bus
Clk Sel Ld
Ld
C
Clk
A on Bus
B on Bus
One of potentially many source regs goes on the
bus to one or more destination regs Register
transfer on the clock
Ld C from Bus
?
16
Register Transfers - interconnect
  • Point-to-point connection
  • Dedicated wires
  • Muxes on inputs ofeach register
  • Common input from multiplexer
  • Load enablesfor each register
  • Control signalsfor multiplexer
  • Common bus with output enables
  • Output enables and loadenables for each register

17
Data Path (Bit-slice)
  • Bit-slice concept iterate to build n-bit wide
    datapaths
  • Data bit busses run through the slice

2 bits wide
1 bit wide
18
Approaching an ISA
  • Instruction Set Architecture
  • Defines set of operations, instruction format,
    hardware supported data types, named storage,
    addressing modes, sequencing
  • Meaning of each instruction is described by RTL
    on architected registers and memory
  • Given technology constraints assemble adequate
    datapath
  • Architected storage mapped to actual storage
  • Function units to do all the required operations
  • Possible additional storage (eg. MAR, MBR, )
  • Interconnect to move information among regs and
    FUs
  • Map each instruction to sequence of RTLs
  • Collate sequences into symbolic controller STD
  • Lower symbolic STD to control points
  • Implement controller

18
19
Instruction Types
  • Data Manipulation
  • Add, subtract
  • Increment, decrement
  • Multiply
  • Shift, rotate
  • Immediate operands
  • Data Staging
  • Load/store data to/from memory
  • Register-to-register move
  • Control
  • Conditional/unconditional branches in program
    flow
  • Subroutine call and return

19
20
Hardware Necessary To Implement Instructions
  • Standard FSM Elements
  • State register
  • Next-state logic
  • Output logic (datapath/control signaling)
  • Moore or synchronous Mealy machine to avoid loops
    unbroken by FF
  • Plus Additional Control" Registers (in DP)
  • Instruction register (IR)
  • Program counter (PC)
  • Inputs/Outputs
  • Outputs control elements of data path
  • Inputs from data path used to alter flow of
    program (test if zero)

20
21
FSM Controller for CPU
  • Putting it all togetherand closing the loop
  • the famousinstructionfetchdecodeexecutecycle

21
22
Representing Numbers
  • What can be represented in N bits?
  • 2N distinct symbols gt values
  • Unsigned 0 to 2N - 1
  • 2s Complement -2(N-1) to 2(N-1) - 1
  • ASCII -10(N/8-2) - 1 to 10(N/8-1) - 1
  • But, what about?
  • Very large numbers? (seconds/century) 3,155,760,
    000ten (3.15576ten x 109)
  • Very small numbers? (secs/ nanosecond) 0.00000000
    1ten (1.0ten x 10-9)
  • Bohr radius ? 0.000000000052917710m (5.2917710 x
    10-11)
  • Rationals 2/3 (0.666666666. . .)
  • Irrationals 21/2 (1.414213562373. . .)
  • Transcendentals e (2.718...), p (3.141...)

23
2s Complement Overflow
How can you tell an overflow occurred?
Add two positive numbers to get a negative
number or two negative numbers to get a positive
number
-1
-1
0
0
-2
-2
1111
0000
1
1111
0000
1
1110
1110
0001
0001
-3
-3
2
2
1101
1101
0010
0010
-4
-4
1100
3
1100
3
0011
0011
-5
-5
1011
1011
0100
4
0100
4
1010
1010
-6
-6
0101
0101
5
5
1001
1001
0110
0110
-7
-7
6
6
1000
0111
1000
0111
-8
-8
7
7
-7 - 2 7!
5 3 -8!
24
Computer Arithmetic
  • Circuit design for unsigned addition
  • Full adder per bit slice
  • Delay limited by Carry Propagation
  • Ripple is algorithmically slow, but wires are
    short
  • Carry select
  • Simple, resource-intensive
  • Excellent layout
  • Carry look-ahead
  • Excellent asymptotic behavior
  • Great at the board level, but wire length effects
    are significant on chip
  • Digital number systems
  • How to represent negative numbers
  • Simple operations
  • Clean algorithmic properties
  • 2s complement is most widely used
  • Circuit for unsigned arithmetic
  • Subtract by complement and carry in
  • Overflow when cin xor cout of sign-bit is 1

25
2s Complement Adder/Subtractor
A - B A (-B) A B 1
26
Combinational Multiplier accumulation of
partial products
A0 B0 A0 B0
A1 B1 A1 B0 A0 B1
A2 B2 A2 B0 A1 B1 A0 B2
A3 B3 A2 B0 A2 B1 A1 B2 A0 B3
A3 B1 A2 B2 A1 B3
A3 B3
A3 B2 A2 B3
S6
S4
S7
S5
S3
S2
S1
S0
27
Another Representation
Building block full adder and
4 x 4 array of building blocks
28
Digital Number Systems
  • Positional notation
  • Dn-1 Dn-2 D0 represents Dn-1Bn-1 Dn-2Bn-2
    D0 B0 where Di ? 0, , B-1
  • 2s Complement
  • Dn-1 Dn-2 D0 represents - Dn-12n-1 Dn-22n-2
    D0 20
  • MSB has negative weight
  • Binary Point is effectively at the far right
    of the word

-1
0
-2
1111
0000
1
1110
0001
-3
2
1101
0010
-4
1100
3
0011
-5
1011
0100
4
0000
1010
-6
0101
5
1001
0110
-7
6
1000
0111
-8
7
29
Circuits for Fixed-Point Arithmetic
  • Adders
  • identical circuit
  • Position of the binary point is entirely in the
    interpretation
  • Be sure the interpretations match
  • i.e. binary points line up
  • Subtractors
  • Multipliers
  • Position of the binary point just as you learned
    by hand
  • Mult two n-bit numbers yields 2n-bit result with
    binary point determined by binary point of the
    inputs
  • 2-k 2-m 2-k-m



30
Lets build an FP function unit mult
Ctrl?

31
What is the range of mantissas?
Adder(8)
Ctrl?
Multiplier(24)
-127
Unnorm?
Round
32
Cascaded Carry Lookahead
4 bit adders with internal carry
lookahead second level carry lookahead unit,
extends lookahead to 16 bits One more level to
64 bits
33
Parallel Prefix (generalizing CLA)
70
B
A
  • Compute all the prefixes Fi Fi-1 op Fi-2 op
    op F0
  • Assume associative and commutative

34
Basic Memory Subsystem Block Diagram
RAM/ROM naming convention 32 X 8, "32 by 8" gt
32 8-bit words 1M X 1, "1 meg by 1" gt 1M 1-bit
words
35
Typical SRAM Timing
OE determines direction Hi Write, Lo
ReadWrites are dangerous! Be careful!
Double signaling OE Hi, WE Lo
Write Timing
Read Timing
High Z
D
Data In
Data Out
Data Out
Junk
A
Write Address
Read Address
Read Address
OE_L
WE_L
36
DRAM WRITE Timing
OE_L
WE_L
CAS_L
RAS_L
  • Every DRAM access begins at
  • The assertion of the RAS_L
  • 2 ways to write early or late v. CAS

A
256K x 8 DRAM
D
9
8
DRAM WR Cycle Time
CAS_L
A
Row Address
Junk
Col Address
Row Address
Junk
Col Address
OE_L
WE_L
D
Junk
Junk
Data In
Data In
Junk
WR Access Time
WR Access Time
Early Wr Cycle WE_L asserted before CAS_L
Late Wr Cycle WE_L asserted after CAS_L
37
DRAM with Column buffer
R O W D E C O D E R

11
A0A10
(2,048 x 2,048)
Storage
W
ord Line
Cell
Sense
Amps
Column Latches
MUX
Pull column into fast buffer storage Access
sequence of bits from there
38
Hamming Error Correcting Code
  • Use more parity bits to pinpoint bit(s) in error,
    so they can be corrected.
  • Example Single error correction (SEC) on 4-bit
    data
  • use 3 parity bits, with 4-data bits results in
    7-bit code word
  • 3 parity bits sufficient to identify any one of 7
    code word bits
  • overlap the assignment of parity bits so that a
    single error in the 7-bit work can be corrected
  • Procedure group parity bits so they correspond
    to subsets of the 7 bits
  • p1 protects bits 1,3,5,7 (bit 1 is on)
  • p2 protects bits 2,3,6,7 (bit 2 is on)
  • p3 protects bits 4,5,6,7 (bit 3 is on)
  • 1 2 3 4 5 6 7
  • p1 p2 d1 p3 d2 d3 d4
  • Bit position number
  • 001 110
  • 011 310
  • 101 510
  • 111 710
  • 010 210
  • 011 310
  • 110 610
  • 111 710
  • 100 410
  • 101 510
  • 110 610
  • 111 710

Note number bits from left to right.
39
Example 8 bit SEC
1
2
3
4
5
6
7
8
9
10
11
12
  • Takes four parity bits
  • In power of 2 positions
  • Rest are the data bits
  • Bits with i in their address feed into parity
    calculation for pi
  • What to do with bit 0?

40
Example Ethernet CRC-32
Application (HTTP,FTP, DNS)
7
Transport (TCP, UDP)
4
Network (IP)
3
Data Link (Ethernet, 802.11b)
2
Physical
1
41
General Model of Synchronous Circuit
  • In general, for correct operation
  • for all paths.
  • How do we enumerate all paths?
  • Any circuit input or register output to any
    register input or circuit output.
  • setup time for circuit outputs depends on what
    it connects to
  • clk-Q time for circuit inputs depends on from
    where it comes.

T ? time(clk?Q) time(CL) time(setup) T ?
?clk?Q ?CL ?setup
42
more Boolean Expressions to Logic Gates
X
  • NAND
  • NOR
  • XOR X ??Y
  • XNOR X Y

Z
Y
X
Z
Y
X xor Y X Y' X' YX or Y but not both
("inequality", "difference")
X
Z
Y
X xnor Y X Y X' Y'X and Y are the same
("equality", "coincidence")
X
Z
Y
43
Gate Switching Behavior
  • Inverter
  • NAND gate

When does it start? How quickly does it switch?
44
Xilinx Virtex-E Floorplan
  • Configurable Logic Blocks
  • 4-input function gens
  • buffers
  • flipflop
  • Input/Output Blocks
  • combinational, latch, and flipflop output
  • sampled inputs
  • Block RAM
  • 4096 bits each
  • every 12 CLB columns

45
Limitations on Clock Rate
  • Logic Gate Delay
  • What are typical delay values?
  • Delays in flip-flops
  • Both times contribute to limiting the clock
    period.
  • What must happen in one clock cycle for correct
    operation?
  • Assuming perfect clock distribution (all
    flip-flops see the clock at the same time)
  • All signals must be ready and setup before
    rising edge of clock.

46
Timing Methodologies
  • Rules for interconnecting components and clocks
  • Guarantee proper operation of system when
    strictly followed
  • Approach depends on building blocks used for
    memory elements
  • Focus on systems with edge-triggered flip-flops
  • Found in programmable logic devices
  • Many custom integrated circuits focus on
    level-sensitive latches
  • Basic rules for correct timing
  • (1) Correct inputs, with respect to time, are
    provided to the flip-flops
  • (2) No flip-flop changes state more than once per
    clocking event

47
Master-Slave Structure
  • Construct D flipflop from two D latches

clk
clk
clk
clk
clk
clk
clk
clk
48
Master-Slave Structure
  • Break flow by alternating clocks (like an
    air-lock)
  • Use positive clock to latch inputs into one R-S
    latch
  • Use negative clock to change outputs with another
    R-S latch
  • View pair as one basic unit
  • master-slave flip-flop
  • twice as much logic
  • output changes a few gate delays after the
    falling edge of clock but does not affect any
    cascaded flip-flops

CLK
CLK
49
(neg) Edge-Triggered Flip-Flops
  • More efficient solution only 6 gates
  • sensitive to inputs only near edge of clock
    signal (not while high)

holds D' when clock goes low
negative edge-triggered D flip-flop (D-FF) 4-5
gate delays must respect setup and hold time
constraints to successfullycapture input
holds D whenclock goes low
characteristic equationQ(t1) D
50
Two-phase non-overlapping clocks
  • Sequential elements partition into two classes
  • phase0 elets feed phase1
  • phase1 elets feed phase0
  • Approximate single phase each register replaced
    by a pair of latches on two phases
  • Can push logic across (retiming)
  • Can always slow down the clocks to meet all
    timing constraints

a
b
c/l
clk1
clk-0
in
clk0
clk1
51
Tackling complex digital designs
  • Step 1 Decompose the system into a collection of
    subsystems
  • Each has top-down requirements and bottom-up
    constraints
  • Interconnected through interfaces
  • Often with particular protocols
  • Potentially different clock domains
  • Rate matching, buffering, timing
  • For Each Subsystem
  • Step 2 Design the Datapath
  • Step 3 Design the Controller

52
In Each Datapath and Control
Datapath
Controller
Control Points
  • Datapath Storage, FU, interconnect sufficient to
    perform the desired functions
  • Inputs are Control Points
  • Outputs are signals
  • Controller State machine to orchestrate
    operation on the data path
  • Based on desired function and signals

53
Review Two Kinds of FSMs
  • Moore Machine vs Mealy
    Machine

Output (t) G( state(t), Input )
Output (t) G( state(t))
Input
Input
state
Combinational Logic
state
state(t1) F ( state(t), input)
state(t1) F ( state(t), input(t))
Input / Out
State
Input
State / out
54
Review Finite State Machine Representations
  • States determined by possible values in
    sequential storage elements
  • Transitions change of state
  • Clock controls when state can change by
    controlling storage elements
  • Sequential Logic
  • Sequences through a series of states
  • Based on sequence of values on input signals
  • Clock period defines elements of sequence

55
Review Formal Design Process
Logic equations from table OUT PS NS PS xor
IN
  • Review of Design Steps
  • 1. Circuit functional specification
  • 2. State Transition Diagram
  • 3. Symbolic State Transition Table
  • 4. Encoded State Transition Table
  • 5. Derive Logic Equations
  • 6. Circuit Diagram
  • FFs for state
  • CL for NS and OUT
  • Circuit Diagram
  • XOR gate for ns calculation
  • DFF to hold present state
  • no logic needed for output

Take this seriously!
56
Moore Verilog FSM combinational part
always _at_(In or CurrentState) begin NextState
CurrentState Out 1b0 case
(CurrentState) STATE_Zero begin // last input
was a zero if (In) NextState
STATE_One1 end STATE_One1 begin // we've
seen one 1 if (In) NextState
STATE_Two1s else NextState
STATE_Zero end STATE_Two1s begin // we've
seen at least 2 ones Out 1 if (In)
NextState STATE_Zero end default begin //
in case we reach a bad state Out
1bx NextState STATE_X end endcase e
nd
57
Moore Verilog FSM state part
// Implement the state register always _at_
(posedge Clock) begin if (Reset) CurrentState
lt STATE_Zero else CurrentState lt
NextState end endmodule
Note posedge Clock requires NONBLOCKING
ASSIGNMENT. Blocking Assignment lt-gt
Combinational Logic Nonblocking Assignment lt-gt
Sequential Logic (Registers)
58
FSM Optimization
  • State Reduction
  • Motivation
  • lower cost
  • fewer flip-flops in one-hot implementations
  • possibly fewer flip-flops in encoded
    implementations
  • more dont cares in NS logic
  • fewer gates in NS logic
  • Simpler to design with extra states then reduce
    later.
  • Example Odd parity checker.
  • Two machines - identical behavior.

59
Algorithmic Approach to State Minimization
  • Goal identify and combine states that have
    equivalent behavior
  • Equivalent States
  • Same output
  • For all input combinations, states transition to
    same or equivalent states
  • Algorithm Sketch
  • 1. Place all states in one set
  • 2. Initially partition set based on output
    behavior
  • 3. Successively partition resulting subsets based
    on next state transitions
  • 4. Repeat (3) until no further partitioning is
    required
  • states left in the same set are equivalent
  • Polynomial time procedure

60
Minimized FSM
  • Implication Chart Method
  • Table of all pairs of stats
  • 1st Eliminate incompatible states based on
    outputs
  • Fill entry with implied equivalents based on next
    state
  • Cross out cells where indexed chart entries are
    crossed out

61
State Assignment Strategies
  • Possible Strategies
  • Sequential just number states as they appear in
    the state table
  • Random pick random codes
  • One-hot use as many state bits as there are
    states (bit1 gt state)
  • Output use outputs to help encode states
  • Heuristic rules of thumb that seem to work in
    most cases
  • No guarantee of optimality another intractable
    problem

62
Tackling complex digital designs
  • Step 1 Decompose the system into a collection of
    subsystems
  • Each has top-down requirements and bottom-up
    constraints
  • Interconnected through interfaces
  • Often with particular protocols
  • Potentially different clock domains
  • Rate matching, buffering, timing
  • For Each Subsystem
  • Step 2 Design the Datapath
  • Step 3 Design the Controller
  • Step 4 Compose them back together

63
Design Process
Specification
Manual Design and Coding
HDL
  • Start with Some Specification
  • This Class
  • Lab Write Ups
  • Industry
  • Contract Restrictions
  • High and Low-Level Specifications from Architects
    and Designers
  • Convert the Design to HDL
  • This Class
  • You design Microarchitecture
  • Write Verilog using components provided by the
    TAs or the Standard Library and also from
    scratch
  • Industry
  • Verilog or VHDL using standard components or
    previous designs

RTL Synthesis
Netlist
Logic Optimization
Netlist
Physical Design
Layout
Implemetation
Final Product
64
Design Process
Specification
Manual Design and Coding
  • Convert HDL into RTL and Optimize Design
  • This Class
  • Synplify Pro
  • Industry
  • Other Synthesis tools
  • 2 Multi-Level Logic Optimization
  • Convert the Netlist into a Layout
  • This Class
  • Xilinx Map PAR
  • Industry
  • Place and Route Tools
  • Technology Mapping
  • Convert Layout to Final Product
  • This Class
  • Download to Board..Configure FPGA
  • Industry
  • Send Layout to Fab

HDL
RTL Synthesis
Netlist
Logic Optimization
Netlist
Physical Design
Layout
Implemetation
Final Product
65
Testing
  • How do I know what that what I designed is really
    what I got back???
  • Specification to HDL
  • Verification
  • Formal Verification
  • Simulation - such as Model Sim
  • HDL to Layout
  • Equivlance testing
  • Tool Verification

66
Fault Model
  • Simple example

Test Set
67
Really putting it together
  • Fault Models are used to generate interesting
    input vectors and their corresponding output
    vectors
  • A subset of these vectors are selected to make a
    sufficiently short sequence of tests with a
    reasonable amount of coverage
  • Vectors are combined to together to create scan
    patterns that test for faults by using shift
    register tests or using the BIST engine.
  • At the Fab the sequence of test patterns are run
    on every wafer using a tester to sort the good
    chips from the bad chips.
  • After packaging the chip another (similar) set of
    test is run on the packaged chip.

68
55 W-hour battery stores the energy of 1/2 a
stick of dynamite.
If battery short-circuits, catastrophe is
possible ...
69
Controlling Energy Consumption What Control Do
You Have as a Designer?
  • Largest contributing component to CMOS power
    consumption is switching power
  • Factors influencing power consumption
  • n total number of nodes in circuit
  • ? activity factor (probability of each node
    switching)
  • f clock frequency (does this effect energy
    consumption?)
  • Vdd power supply voltage
  • What control do you have over each factor?
  • How does each effect the total Energy?

70
Day 1 CS 150 Concepts/Skills/Abilities
  • Basics of logic design (concepts)
  • Sound design methodologies (concepts)
  • Modern specification methods (concepts)
  • Familiarity with full set of CAD tools (skills)
  • Appreciation for differences and similarities
    (abilities) in hardware and software design
  • Hands-on experience with non-trivial design

New ability perform logic design with
computer-aided design tools, validating that
design via simulation, and mapping its
implementation into programmable logic devices
Appreciating the advantages/disadvantages hw vs.
sw implementation
71
Broad Technology Trends
Moores Law transistors on cost-effective chip
doubles every 18 months
Bells Law a new computer class emerges every 10
years
  • Today 1 million transistors per

http//www.youtube.com/watch?vAlPqL7IUT6M
72
Go Forth and Design!
Go Bears
Write a Comment
User Comments (0)
About PowerShow.com