Projects - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Projects

Description:

Projects & Design Multipliers – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 69
Provided by: Smit1194
Category:
Tags: projects | radix | tree

less

Transcript and Presenter's Notes

Title: Projects


1
Projects Design
  • Multipliers

2
A decimal multiply-add primitive
Project 1
3
The multiplier-cell
proj1.mo
load (modl (lib/decimal/mul_cell,
lib/array/mul_add))
mul_cell.mo
declare (domain ((dec/, 0, 9), (hect/,
0, 99)), domain_fn (dec/,
hect/)) MODEL mul_cell (X, Y, N, W, E, S),
Z X dec/ Y hect/ N E, S
remainder (Z, 10), W quotient (Z,
10), ENDMOD
Local
Inputs
Outputs
4
The mul-add primitive
load (modl (lib/decimal/mul_cell,
lib/array/mul_add))
MODEL array/mul_add (N3, Y3, N2, Y2, N1, Y1,
E1, X1, P1,
E2, X2, P2, E3, X3, P3,
P4, P5, P6),
ltmodel-bodygt,ENDMOD
5
The mul-add primitive
MODEL array/mul_add (N3, Y3, N2, Y2, N1, Y1,
E1, X1, P1,
E2, X2, P2, E3, X3, P3,
P4, P5, P6), X
Y N W E S
mul_cell (X1, Y1, N1, W11, E1, P1),
mul_cell (X1, Y2, N2, W12, W11,
S12), mul_cell (X1, Y3, N3, W13,
W12, S13), mul_cell (X2, Y1, S12,
W21, E2, P2), mul_cell (X2, Y2,
S13, W22, W21, S22), mul_cell (X2,
Y3, W13, W23, W22, S23), mul_cell (X3,
Y1, S22, W31, E3, P3), mul_cell
(X3, Y2, S23, W32, W31, P4),
mul_cell (X3, Y3, W23, P6, W32,
P5),ENDMOD
6
The top-model
load (modl (lib/decimal/mul_cell,
lib/array/mul_add))MODEL top (), input
(X3, X2, X1, '"\n", Y3, Y2, Y1),
array/mul_add (0, Y3, 0, Y2, 0, Y1,
0, X1, P1, 0, X2, P2, 0, X3,
P3, P4, P5,
P6), print ('"\n", X3, X2, X1, '" x ", Y3,
Y2, Y1, '" ", P6, P5, P4, P3,
P2, P1, '"\n\n"), ENDMODFUNCTION proj1 (),
sim (top ()),ENDFUN
proj1.mo
proj1 is undefined, load (modl (proj1))
proj1 ()
simulation ()
7
Structured models
MODEL m (N1, N2, W1, W2,
W3, E1, E2, E3, S1,
S2), ltmodel-bodygt, ENDMOD
Note the grouping order of the arguments
8
Feed-through wires
MODEL M (N1, N2, feed,
W2, W3, feed, E2, feed,
S1, S2), ltmodel-bodygt, ENDMOD
Model arguments are local variables
9
Horizontal-abutment
abut (ltnew-modelgt, m m)
Internal nodes
10
The new MODEL created
MODEL M12 (N1, N2, N1, N2,
W1, W2, W3, E1, E2, E3,
S1, S2, S1, S2), ltmodel-body m1gt,
ltmodel-body m2gt, ENDMOD
S1 ? S1
11
Vertical abutment
abut (ltnew-modelgt, m
m)
12
Reorientation of a MODEL
arrange (n, m)
13
3x3 u-binary multiplier-adder
Project 2
load (modl ('lib/array/abut/mul_cell,
'lib/array/abut/mul_add)) MODEL top (),
input (X3, X2, X1, '"\n", Y3, Y2, Y1, '"\n",
N3, N2, N1, '"\n", E3, E2, E1, '"\n"),
array/abut/mul_add (N3, Y3, N2, Y2, N1, Y1,
E1, X1, P1, E2, X2, P2, E3, X3, P3,
P6, P5, P4), output (P6, P5, P4, P3, P2, P1,
'"\n"), ENDMOD FUNCTION proj2 (), sim (top
()), ENDFUN
14
The multiplier-cell
MODEL full_add (a, b, cin, sum, cout), sum a
/bit/xor b xor cin, cout a /bit/and cin
/bit/or b /bit/and cin or a
/bit/and b, ENDMOD
full_add.mo
load (modl ('lib/full_add)) MODEL
array/abut/mul_cell (north, Y,
west, X, feed,
east, X, south,
feed, Y), prod X /bit/and Y, full_add
(north, prod, east, south, west), ENDMOD
mul_cell.mo
15
multiplier-adder abutment
MODEL west (, , feed, X, feed, ),
false, ENDMOD MODEL east (, gnd, X, P, X,
P, ), gnd 0, ENDMOD MODEL south (P, Y,
, , P), false, ENDMOD abut
(array/abut/mul_add, ((3
(west (3 array/abut/mul_cell)))
(corner (3 south))))
16
The complete mul_add primitive
abut (array/abut/mul_add, ((3
(west (3 array/abut/mul_cell)))
(corner (3 south))))
17
Timing analysis array-multiplier
Project 3
The model mul_add works for any radix !
load (modl ('lib/timing/mul_cell,
'lib/array/mul_add)) FUNCTION proj3 (),
timing (array/mul ('X3, 'X2, 'X1,
'Y3, 'Y2, 'Y1, P6,
P5, P4, P3, P2, P1)), ENDFUN
proj3.mo
The file mull_add.mo contains a mul_add primitive
as well as a multiplier !
Inputs are symbolic constants !
18
mul_cell and full-adder
load (modl ('lib/timing/full_add)) delay
('/bit/and) 0n5 MODEL mul_cell (X, Y, north,
west, east, south), prod X /bit/and Y,
a, b, cin, sum, cout full_add
(north, prod, east, south, west), ENDMOD
mul_cell.mo
19
Timing model for the full-adder
delay ('Fsum) 1n7 delay ('Fcarry)
1n07 MODEL full_add (a, b, cin, sum, cout),
sum Fsum (a, b, cin), cout Fcarry (a, b,
cin), ENDMOD
full_add.mo
Dedicated delay model
20
Worst-case signal propagation
tcarry
tand
tsum
1
2
3
1
2
3
4
5
4
6
5
6
7
8
7
8
9
9
delay tand tcarry n tsum m tcarry
21
Critical expressions generated
time valid mul/mul_cell/prod 'X1
bit/and 'Y1, 500p mul/W11 Fcarry (,
mul/mul_cell/prod), 1n57 mul/W12 Fcarry (,
mul/W11), 2n64 mul/S13 Fsum (,
mul/W12), 4n34 mul/S12 Fsum (,
mul/W11), 3n27 mul/W21 Fcarry (,
mul/S12), 4n34 mul/W22 Fcarry (, mul/S13,
mul/W21), 5n41 mul/S23 Fsum (,
mul/W22), 7n11 mul/S22 Fsum (, mul/S13,
mul/W21), 6n04 mul/W31 Fcarry (,
mul/S22), 7n11 mul/W32 Fcarry (, mul/S23,
mul/W31), 8n18 P5 Fsum (,
mul/W32), 9n88
22
The critical path
delay tand tcarry n tsum m tcarry
23
Timing modified array-multiplier
Project 4
East inputs are moved to the north-side
Carry-signals are running diagonally
24
Timing modified array-multiplier
load (modl ('lib/modified/mul_add,
'lib/timing/mul_cell)) FUNCTION proj4 (),
timing (modified/mul ('X3,'X2,'X1,
'Y3,'Y2,'Y1,
P6, P5, P4, P3, P2, P1)), ENDFUN
25
The modified array-multiplier
MODEL modified/mul (X3, X2, X1,
Y3, Y2, Y1, P6, P5, P4, P3,
P2, P1), X Y N W E S
mul_cell (X1, Y1, 0, W11, 0, P1), mul_cell
(X1, Y2, 0, W12, 0, S12), mul_cell (X1, Y3,
0, W13, 0, S13), mul_cell (X2, Y1, S12,
W21, W11, P2), mul_cell (X2, Y2, S13, W22,
W12, S22), mul_cell (X2, Y3, W13, W23, W22,
S23), mul_cell (X3, Y1, S22, W31, W21, P3),
mul_cell (X3, Y2, S23, W32, W31, P4), mul_cell
(X3, Y3, W23, P6, W32, P5), ENDMOD
Modified entries
26
Critical expressions generated
time valid mul/mul_cell_2/prod 'X1
bit/and 'Y3, 500p mul/S13 Fsum (,
mul/mul_cell_2/prod), 2n2 mul/W22 Fcarry (,
mul/S13), 3n27 mul/S23 Fsum (,
mul/W22), 4n97 mul/W32 Fcarry (,
mul/S23), 6n04 P5 Fsum (,
mul/W32), 7n74
27
The critical path
delay tand n tsum (m - 1) tcarry
28
Timing carry-save mul_add
Project 5
Extra north inputs on the west-side
East inputs are moved to the north-side
All carry-signals are running diagonally
Extra east input
Final adder
29
The carry_save-multiplier
MODEL carry_save/mul_add (N6, N5, N4, N3, Y3, N2,
Y2, N1, Y1,
E1, X1, P1, E2, X2, P2, E3, X3,
P3, E4,
P7, P6, P5, P4), X
Y N W E S mul_cell (X1,
Y1, N1, W11, E1, P1), mul_cell (X1, Y2,
N2, W12, E2, S12), mul_cell (X1, Y3, N3,
W13, E3, S13), mul_cell (X2, Y1, S12, W21,
W11, P2), mul_cell (X2, Y2, S13, W22, W12,
S22), mul_cell (X2, Y3, N4, W23, W13, S23),
mul_cell (X3, Y1, S22, W31, W21, P3),
mul_cell (X3, Y2, S23, W32, W22, S32),
mul_cell (X3, Y3, N5, W33, W23, S33),
a b cin sum cout full_add
(S32, E4, W31, P4, W41), full_add (S33,
W41, W32, P5, W42), full_add ( N6, W42, W33,
P6, P7), ENDMOD
30
Critical expressions generated
mul/mul_cell_2/prod 'X1 bit/and
'Y3, 500p mul/W13 Fcarry (,
mul/mul_cell_2/prod), 1n57 mul/S23 Fsum
(, mul/W13), 3n27 mul/S13 Fsum (,
mul/mul_cell_2/prod), 2n2 mul/W22 Fcarry (,
mul/S13), 3n27 mul/S32 Fsum (, mul/S23,
mul/W22), 4n97 mul/W41 Fcarry (,
mul/S32), 6n04 mul/W42 Fcarry (,
mul/W41), 7n11 P6 Fsum (,
mul/W42), 8n81
31
The critical path
delay tand (n 1) tsum tcarry (m -
1) tcarry
32
The carry-save multiplier
The top-row cells can be replaced by and circuits
delay tand n tsum tcarry (m - 1)
tcarry
33
Prove algebraic equivalence
Project 6
and
full_adder
half_adder
34
Load an algebraic package
ld (modl ('bdg)), algebraic (), input (X3, X2,
X1, '"\n", Y3, Y2, Y1, '"\n"),
Read algebraic input-values
Print algebraic results
Works on the /bit/operators and, nand, or, nor,
xor, etc.
output (P6, P5, P4, P3, P2, P1, '"\n", P6
/bit/ Q6, P5 /bit/ Q5, P4 /bit/ Q4,
P3 /bit/ Q3, P2 /bit/ Q2, P1 /bit/
Q1), symbolic (),
35
load (modl ('lib/array/abut/mul_cell,
'lib/array/abut/mul_add,
'lib/binary/mul_cell, 'lib/carry_save/mul_ad
d)) MODEL top (), ld (modl ('bdg)),
algebraic (), input (X3, X2, X1, '"\n", Y3, Y2,
Y1, '"\n"), array/abut/mul (Y3, Y2, Y1,
X1, P1, X2, P2, X3, P3,
P6, P5, P4), carry_save/mul (X3, X2, X1,
Y3, Y2, Y1, Q6,
Q5, Q4, Q3, Q2, Q1), output (P6, P5, P4, P3,
P2, P1, '"\n", P6 /bit/ Q6, P5 /bit/
Q5, P4 /bit/ Q4, P3 /bit/ Q3, P2
/bit/ Q2, P1 /bit/ Q1), symbolic
(), ENDMOD FUNCTION proj5_v (), sim (top
()), ENDFUN
36
Review carry-save multiplier
  • The carry-save multiplier can be partitioned into
    an and array, an n 2 reductor and a final adder

37
Projects Design
  • The carry-save multiplier

38
Critical path carry-save mul
mul/mul_cell_4/prod y6 bit/and 'x1, 500p
mul/s15 Fsum (, mul/mul_cell_4/prod), 2n2 mu
l/s24 Fsum (, mul/s15), 3n9 mul/s33
Fsum (, mul/s24), 5n6 mul/s42 Fsum (,
mul/s33), 7n3 mul/w51 Fcarry (, mul/s42),
8n37 mul/w52 Fcarry (, mul/w51), 9n44 mul
/w53 Fcarry (, mul/w52), 10n51 mul/w54
Fcarry (, mul/w53), 11n58 mul/w55 Fcarry
(, mul/w54), 12n65 mul/w56 Fcarry (,
mul/w55), 13n72 mul/w57 Fcarry (,
mul/w56), 14n79 mul/w58 Fcarry (,
mul/w57), 15n86 P13 Fsum (,
mul/w58), 17n56
39
Critical path carry-save mul
  • One and, n n - 1 sum stages, one sum stage,
    and m - 1 carry stages in the final adder

40
A fast carry-path
  • The ATLAS full-adder

41
A fast carry path ATLAS adder
  • The ATLAS full-adder uses a switch to either
    propagate the cin signal to the cout output or to
    generate a 0 or an 1 value

First use Relay based computers
Carry signal is propagated with the speed of
light
ATLAS full_adder
42
A fast carry path ATLAS adder
Propagate Cin
Generate 1
Generate 0
tcy 0n2 _at_ 1mm 5V
p a /bit/xor b
sum p /bit/xor cin
43
The /bit/xor /bit/nxor functions
44
Timing of the carry-path
  • Delay per section

Approximate formulas
45
Insertion of inverters
  • Use inverters to avoid excessive delay
  • Speed-up of the carry-path with a factor of two

46
The inverted full-adder
  • Application of negative logic to all I/O signals
    of a full-adder gives a full-adder
  • The xor circuits which calculate p and the sum
    should be replaced by nxor circuits

47
The carry-select adder
  • sum CASE cin 0 a /bits_5/
    b, cin 1 a /bits_5/ b 1,
    ENDCASE,

cin
cout
48
Improved physical design
49
High level model
  • Bit-level word-level description

50
Optimal partitioning
  • An m-bit wide adder should be partitioned in
    sections of approximate length m ?m
  • The length of the sections should be rebalanced
    such that all delays are equal

When tcin-cout ? 0n3
51
Example final-adder partitioning
  • m 32-bits _at_ 1?m 5Vm ? 6
    partitioning Delay x
    1/8(6,6,5,5,4,4) ? 30 6x0n60n33n9
    487p(6,6,6,5,5,4) ? 32 7x0n6 4n2
    525p (7,6,6,5,5,4) ? 33 7x0n60n34n5 562p
  • m 64-bitsm ? 8 partitioning
    Delay(9,9,8,8,7,7,6,6) ? 60 9x0n60n35n4
    675p (10,9,9,8,8,7,7,6)? 64 10x0n6 6n0
    750p
  • _at_ 0.25?m 2.5V x 1/8

52
Composition of larger multipliers
53
Composition of larger multipliers
  • Mixed Architecture Wallace-Tree

54
Composition of larger multipliers
55
The Wallace-Tree multiplier
  • The full-adder can be used as a 32 reductor
  • A 9 2 reductor can be build using 5 full-adders

56
Building a Wallace tree
  • An mxn multiplier uses 32 reductors in the the
    carry save tree
  • The sum-signals are kept at the current bit
    position
  • The carry-signals are propagated to the next bit
    position
  • The tree is completed one level at a time

57
A 16x16 Wallace-tree
Delay 6 tsum 10n2 _at_ 1mm 5V 1n125 _at_ 0.25mm
2.5V
level7
58
Review
  • It is almost impossible to make a layout of a
    Wallace tree multiplier using full-custom layout
    tools
  • The wiring within the Wallace tree is long when
    compared to the wire-length in an array- or a
    carry-save multiplier
  • There are marginally fewer ripples in a Wallace
    tree, provided that it is well balanced

59
Cascadable multipliers
  • It has been shown that an unsigned multiplier can
    be constructed from smaller ones, provided that
    these are unsigned
  • Each embedded-multiplier which should be able to
    operate as a stand-alone signed multiplier should
    be equipped with a circuit for sign-extension
    using either sign-magnitude arithmetic or
    two-complement arithmetic

60
Cascadable multipliers
  • MMX VIS

61
The splitting interface
62
MMX (INTEL) VIS (SUN)
  • SUN was the first to introduce a splittable
    multiplier, consiting of 8 16x16 multiplers which
    can be used to perform 4 16x16 multiply-add
    operations or one 64x64 floating point multiply
  • INTEL uses a splittable multiplier consiting of 4
    16x16 multiplers which can be used to perform 4
    16x16 multiply-add operations or a 64x32 (64x64)
    floating point multiply in one (two) clock
    cycle(s)

63
Load Store operations
  • Operands are normally de-normalized such that the
    mantissa is scaled in multiples of 216 instead
    of 21
  • A FP-(MMX)-load performs (discards) the
    de-normalization
  • The FP-(MMX)-store performs (discards) the
    FP-normalization

64
Sequential Circuits
65
The cut-theorem
Provided that the initial conditions are
compatible
66
The fully pipelined multiplier
Multiple D-FF in X,N
Multiple D-FF in P
Consider all iso-chronous wires
67
Folding
  • Map data back from one line of iso-chronity to an
    earlier one, use multiplexers for control

A multiplier can be realized with just n
full_adders without speed penalty
68
Folding
  • Can be used to derive equivalent circuits
  • E.g. parallel multiplier serial-parallel
    multiplier, based on the shift-add algorithm

The serial-parallel multiplier can be realized
with n full_adders Its delay comes from the
carry-path of an n-bit wide adder
Write a Comment
User Comments (0)
About PowerShow.com