332:578 Deep Submicron VLSI Design Lecture 17 Functional Units, Multiplication, and Division - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

332:578 Deep Submicron VLSI Design Lecture 17 Functional Units, Multiplication, and Division

Description:

332:578 Deep Submicron VLSI Design Lecture 17 Functional Units, Multiplication, and Division – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 75
Provided by: davidh187
Category:

less

Transcript and Presenter's Notes

Title: 332:578 Deep Submicron VLSI Design Lecture 17 Functional Units, Multiplication, and Division


1
332578 Deep SubmicronVLSI DesignLecture
17Functional Units, Multiplication, and Division
  • David Harris and Mike Bushnell
  • Harvey Mudd College and Rutgers University
  • Spring 2005

2
Outline
  • Unsigned vs. Signed Numbers
  • Boolean Operations
  • Error Correcting Codes
  • Multi-input Adders
  • Multipliers
  • Priority Encoders
  • Dividers
  • Summary

Material from CMOS VLSI Design, by Weste and
Harris, Addison-Wesley, 2005
3
Signed vs. Unsigned
  • For signed numbers, comparison is harder
  • C carry out
  • Z zero (all bits of A-B are 0)
  • N negative (MSB of result)
  • V overflow (inputs had different signs, output
    sign ? B)

4
Signed vs. Unsigned
5
Boolean Logical Operations
  • Use a MUX circuit

6
Circuit Operation
  • Assign different P values to get various Boolean
    operations
  • MUX between adder and Boolean unit or merge
    Boolean unit into adder as in TTL 181 ALU

7
Coding
  • Correct SRAM/DRAM soft errors
  • Due to a particles or cosmic rays
  • Reduce bit error rates of communication links
  • Parity tree example

8
Hamming Error Correcting Codes (ECCs)
  • Hamming distance Hd between 2 numbers -- bits
    in which they differ
  • Add check bits to data words for ECC
  • Increase the Hd between legal code words
  • If an illegal code word detected, the legal code
    word closest to it is the corrected word
  • Parity has Hd of 2 detects but cannot correct
    errors
  • Make Hd 3 Hamming code of length 2c-1 with c
    check bits and N 2c c 1 data bits

9
Code Generation Procedure
  • Number bits from 1 to 2c 1
  • Each bit in a position that is power of 2 is
    check bit
  • Choose check bit value to get even parity for all
    bits with a 1 in the same position as the check
    bit

10
Gray Codes
  • Binary-reflected code
  • Start with all 0 and keep flipping the right-most
    bit that gives a new string
  • Use to save power in finite state machines
    successive states follow Gray code
  • Use also to synchronize counters across clock
    domains
  • Either get the current or the previous value
    because only 1 bit changes per clock

11
Gray Code
12
Static XOR/XNOR Circuits
13
Static XOR/XNOR Circuit
  • Does not swing rail-to-rail

14
STATIC CMOS XOR
15
CPL XOR/XNOR Circuit
16
CVSL XOR/XNOR
17
Dynamic XOR/XNOR
  • Both true complementary inputs needed
  • Violates monotonicity rule
  • Solutions
  • Push XOR/XNOR to end of chain of Domino logic and
    built it as static logic
  • Use dual-rail Domino logic

18
Multi-input Adders
  • Suppose we want to add k N-bit words
  • Ex 0001 0111 1101 0010 _____

19
Multi-input Adders
  • Suppose we want to add k N-bit words
  • Ex 0001 0111 1101 0010 10111

20
Multi-input Adders
  • Suppose we want to add k N-bit words
  • Ex 0001 0111 1101 0010 10111
  • Straightforward solution k-1 N-input CPAs
  • Large and slow

21
Carry Save Addition
  • A full adder sums 3 inputs and produces 2 outputs
  • Carry output has twice weight of sum output
  • N full adders in parallel are called carry save
    adder
  • Produce N sums and N carry outs

22
CSA Application
  • Use k-2 stages of CSAs
  • Keep result in carry-save redundant form
  • Final CPA computes actual result

23
CSA Application
  • Use k-2 stages of CSAs
  • Keep result in carry-save redundant form
  • Final CPA computes actual result

24
CSA Application
  • Use k-2 stages of CSAs
  • Keep result in carry-save redundant form
  • Final CPA computes actual result

25
Multiplication
  • Example

26
Multiplication
  • Example

27
Multiplication
  • Example

28
Multiplication
  • Example

29
Multiplication
  • Example

30
Multiplication
  • Example

31
Multiplication
  • Example
  • M x N-bit multiplication
  • Produce N M-bit partial products
  • Sum these to produce MN-bit product

32
General Form
  • Multiplicand Y (yM-1, yM-2, , y1, y0)
  • Multiplier X (xN-1, xN-2, , x1, x0)
  • Product

33
16X16 Mult. Dot Diagram
  • Each dot represents a bit

34
Array Multiplier
35
Rectangular Array
  • Squash array to fit rectangular floorplan

36
Optimizations
  • 1st row adds 1st partial product to pair of 0s
  • Change first CSA row to add 1st 3 partial
    products together
  • Reduces row count by 2 and reduces adder
    propagation delay
  • Can also use 1st row of CSAs to add one or two
    other inputs with no extra delay
  • Most common DSP operation Y A B C
  • Speed up by replacing bottommost row with CPA or
    lookahead or tree adder
  • Asymmetric circuit some inputs have more logical
    effort than others

37
2s Complement Multiplication
  • 2 partial products have negative weight
  • Must be subtracted
  • Baugh-Woodley algorithm takes 2s comp. of terms
    to be subtracted
  • In example, AND gates replaced by NAND gates in
    hatched cells
  • Extra ones added in unused inputs to take correct
    2s complement
  • Use XORs to conditionally invert some of the
    terms to select between signed and unsigned
    multiplication

38
2s Comp. Multiplier
39
Simplified Partial Products
40
Modified Baugh-Woodley
41
Fewer Partial Products
  • Array multiplier requires N partial products
  • If we looked at groups of r bits, we could form
    N/r partial products.
  • Faster and smaller?
  • Called radix-2r encoding
  • Ex r 2 look at pairs of bits
  • Form partial products of 0, Y, 2Y, 3Y
  • First three are easy, but 3Y requires adder ?

42
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

43
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

44
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

45
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

46
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

47
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

48
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

49
Booth Encoding
  • Instead of 3Y, try Y, then increment next
    partial product to add 4Y
  • Similarly, for 2Y, try 2Y 4Y in next partial
    product

Current
Prev.
50
Booth Hardware
  • Booth encoder generates control lines for each PP
  • Booth selectors choose PP bits

Xi means add in Y 2Xi means add in 2Y M means
negate partial prod.
51
Sign Extension
  • Partial products can be negative
  • Require sign extension, which is cumbersome
  • High fanout on most significant bit

52
Simplified Sign Ext.
  • Sign bits are either all 0s or all 1s
  • Note that all 0s is all 1s 1 in proper column
  • Use this to reduce loading on MSB

53
Even Simpler Sign Ext.
  • No need to add all the 1s in hardware
  • Precompute the answer!

54
Advanced Multiplication
  • Signed vs. unsigned inputs
  • Higher radix Booth encoding
  • Array vs. tree CSA networks

55
Wallace Tree Multiplication
  • CSA is effectively a ones counter
  • Called a (3,2) counter converts 3 inputs into
    count encoded as 2 outputs

56
Dot Diagram of Array Mult.
57
Wallace Tree
  • Sum partial products in parallel

58
Example Wallace Tree
59
Original Wallace Tree
60
Use 4,2 Compressor
  • Also called (5,3) counter

61
Better Wallace Tree
62
Hybrid Multiplication
  • Tradeoff
  • Arrays offer regular layout
  • Wallace Trees have fewer levels of CSAs but less
    regular layout
  • Hybrids give tradeoffs
  • Odd/even arrays
  • Arrays of arrays
  • Balanced delay trees
  • Overturned-staircase trees
  • Have as few levels of logic as Wallace trees but
    with more regular wiring

63
Fused Multiply-Add
  • DSP frequently requires computation of P XY
    Z
  • Can do with multiplier and adder
  • Better to do it with fused multiply-add unit
  • Ordinary multiplier modified to have another
    partial product Z

64
Serial Multiplication
  • Serial multiplies M-bit multiplicand and N-bit
    multiplier in N X M clocks use for wrist
    watches
  • Half-Parallel use this
  • Multiplies M-bit multiplicand and N-bit
    multiplier in N clocks
  • Widely used in DSP units
  • Needs an M-bit adder and an MN bit shift
    register
  • Obtains final product after N steps

65
Half-Parallel Multiplier
66
Multiplication Steps
67
Priority Encoders
  • Also a prefix computation
  • Arbitrate among N units requesting a shared
    resource
  • Ai unit i requests service
  • Logic

68
Prefix Equations
Bitwise precomputation Group logic Output logic
69
Priority Encoder Trees
70
Priority Encoder Trees
71
Other Prefix Computations
  • Incrementers
  • Decrementers
  • 2s complement circuits
  • Modified priority encoders

72
Wheelers Division Algorithm
  • Invert the divisor with hardware that converges
    in 3 iterations
  • Multiply dividend by inverted divisor using
    Wallace tree
  • Better than ordinary division algorithms, which
    are usually serial subtraction
  • x is positive normalized fraction
  • p is approximation to 1/x
  • Set a1 px and b1 p and iterate

73
Wheeler Division
  • Converges quadratically, an to 1 and bn to 1 / x
  • Inspect 1st 6 digits of x and then generate p
  • Use part of Wallace tree to compute this
    approximation
  • Get 40-bit reciprocal in only 3 steps
  • Sometimes necessary to extend pseudo-adders in
    Wallace tree to guarantee accuracy

74
Summary
  • Unsigned vs. Signed numbers
  • Boolean operations
  • Error Correcting Codes
  • Multi-input Adders
  • Carry Save Addition
  • Multipliers
  • Booth Encoding
  • Array vs. Tree CSA Networks
  • Priority Encoders
  • Dividers
Write a Comment
User Comments (0)
About PowerShow.com