Projects - PowerPoint PPT Presentation

1 / 68

About This Presentation

Title:

Projects

Description:

Projects & Design Multipliers – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 69

Provided by: Smit1194

Category:

more less

Transcript and Presenter's Notes

Title: Projects

1
Projects Design

Multipliers

2
A decimal multiply-add primitive
Project 1
3
The multiplier-cell
proj1.mo
load (modl (lib/decimal/mul_cell,
lib/array/mul_add))
mul_cell.mo
declare (domain ((dec/, 0, 9), (hect/,
0, 99)), domain_fn (dec/,
hect/)) MODEL mul_cell (X, Y, N, W, E, S),
Z X dec/ Y hect/ N E, S
remainder (Z, 10), W quotient (Z,
10), ENDMOD
Local
Inputs
Outputs
4
The mul-add primitive
load (modl (lib/decimal/mul_cell,
lib/array/mul_add))
MODEL array/mul_add (N3, Y3, N2, Y2, N1, Y1,
E1, X1, P1,
E2, X2, P2, E3, X3, P3,
P4, P5, P6),
ltmodel-bodygt,ENDMOD
5
The mul-add primitive
MODEL array/mul_add (N3, Y3, N2, Y2, N1, Y1,
E1, X1, P1,
E2, X2, P2, E3, X3, P3,
P4, P5, P6), X
Y N W E S
mul_cell (X1, Y1, N1, W11, E1, P1),
mul_cell (X1, Y2, N2, W12, W11,
S12), mul_cell (X1, Y3, N3, W13,
W12, S13), mul_cell (X2, Y1, S12,
W21, E2, P2), mul_cell (X2, Y2,
S13, W22, W21, S22), mul_cell (X2,
Y3, W13, W23, W22, S23), mul_cell (X3,
Y1, S22, W31, E3, P3), mul_cell
(X3, Y2, S23, W32, W31, P4),
mul_cell (X3, Y3, W23, P6, W32,
P5),ENDMOD
6
The top-model
load (modl (lib/decimal/mul_cell,
lib/array/mul_add))MODEL top (), input
(X3, X2, X1, '"\n", Y3, Y2, Y1),
array/mul_add (0, Y3, 0, Y2, 0, Y1,
0, X1, P1, 0, X2, P2, 0, X3,
P3, P4, P5,
P6), print ('"\n", X3, X2, X1, '" x ", Y3,
Y2, Y1, '" ", P6, P5, P4, P3,
P2, P1, '"\n\n"), ENDMODFUNCTION proj1 (),
sim (top ()),ENDFUN
proj1.mo
proj1 is undefined, load (modl (proj1))
proj1 ()
simulation ()
7
Structured models
MODEL m (N1, N2, W1, W2,
W3, E1, E2, E3, S1,
S2), ltmodel-bodygt, ENDMOD
Note the grouping order of the arguments
8
Feed-through wires
MODEL M (N1, N2, feed,
W2, W3, feed, E2, feed,
S1, S2), ltmodel-bodygt, ENDMOD
Model arguments are local variables
9
Horizontal-abutment
abut (ltnew-modelgt, m m)
Internal nodes
10
The new MODEL created
MODEL M12 (N1, N2, N1, N2,
W1, W2, W3, E1, E2, E3,
S1, S2, S1, S2), ltmodel-body m1gt,
ltmodel-body m2gt, ENDMOD
S1 ? S1
11
Vertical abutment
abut (ltnew-modelgt, m
m)
12
Reorientation of a MODEL
arrange (n, m)
13
3x3 u-binary multiplier-adder
Project 2
load (modl ('lib/array/abut/mul_cell,
'lib/array/abut/mul_add)) MODEL top (),
input (X3, X2, X1, '"\n", Y3, Y2, Y1, '"\n",
N3, N2, N1, '"\n", E3, E2, E1, '"\n"),
array/abut/mul_add (N3, Y3, N2, Y2, N1, Y1,
E1, X1, P1, E2, X2, P2, E3, X3, P3,
P6, P5, P4), output (P6, P5, P4, P3, P2, P1,
'"\n"), ENDMOD FUNCTION proj2 (), sim (top
()), ENDFUN
14
The multiplier-cell
MODEL full_add (a, b, cin, sum, cout), sum a
/bit/xor b xor cin, cout a /bit/and cin
/bit/or b /bit/and cin or a
/bit/and b, ENDMOD
full_add.mo
load (modl ('lib/full_add)) MODEL
array/abut/mul_cell (north, Y,
west, X, feed,
east, X, south,
feed, Y), prod X /bit/and Y, full_add
(north, prod, east, south, west), ENDMOD
mul_cell.mo
15
multiplier-adder abutment
MODEL west (, , feed, X, feed, ),
false, ENDMOD MODEL east (, gnd, X, P, X,
P, ), gnd 0, ENDMOD MODEL south (P, Y,
, , P), false, ENDMOD abut
(array/abut/mul_add, ((3
(west (3 array/abut/mul_cell)))
(corner (3 south))))
16
The complete mul_add primitive
abut (array/abut/mul_add, ((3
(west (3 array/abut/mul_cell)))
(corner (3 south))))
17
Timing analysis array-multiplier
Project 3
The model mul_add works for any radix !
load (modl ('lib/timing/mul_cell,
'lib/array/mul_add)) FUNCTION proj3 (),
timing (array/mul ('X3, 'X2, 'X1,
'Y3, 'Y2, 'Y1, P6,
P5, P4, P3, P2, P1)), ENDFUN
proj3.mo
The file mull_add.mo contains a mul_add primitive
as well as a multiplier !
Inputs are symbolic constants !
18
mul_cell and full-adder
load (modl ('lib/timing/full_add)) delay
('/bit/and) 0n5 MODEL mul_cell (X, Y, north,
west, east, south), prod X /bit/and Y,
a, b, cin, sum, cout full_add
(north, prod, east, south, west), ENDMOD
mul_cell.mo
19
Timing model for the full-adder
delay ('Fsum) 1n7 delay ('Fcarry)
1n07 MODEL full_add (a, b, cin, sum, cout),
sum Fsum (a, b, cin), cout Fcarry (a, b,
cin), ENDMOD
full_add.mo
Dedicated delay model
20
Worst-case signal propagation
tcarry
tand
tsum
1
2
3
1
2
3
4
5
4
6
5
6
7
8
7
8
9
9
delay tand tcarry n tsum m tcarry
21
Critical expressions generated
time valid mul/mul_cell/prod 'X1
bit/and 'Y1, 500p mul/W11 Fcarry (,
mul/mul_cell/prod), 1n57 mul/W12 Fcarry (,
mul/W11), 2n64 mul/S13 Fsum (,
mul/W12), 4n34 mul/S12 Fsum (,
mul/W11), 3n27 mul/W21 Fcarry (,
mul/S12), 4n34 mul/W22 Fcarry (, mul/S13,
mul/W21), 5n41 mul/S23 Fsum (,
mul/W22), 7n11 mul/S22 Fsum (, mul/S13,
mul/W21), 6n04 mul/W31 Fcarry (,
mul/S22), 7n11 mul/W32 Fcarry (, mul/S23,
mul/W31), 8n18 P5 Fsum (,
mul/W32), 9n88
22
The critical path
delay tand tcarry n tsum m tcarry
23
Timing modified array-multiplier
Project 4
East inputs are moved to the north-side
Carry-signals are running diagonally
24
Timing modified array-multiplier
load (modl ('lib/modified/mul_add,
'lib/timing/mul_cell)) FUNCTION proj4 (),
timing (modified/mul ('X3,'X2,'X1,
'Y3,'Y2,'Y1,
P6, P5, P4, P3, P2, P1)), ENDFUN
25
The modified array-multiplier
MODEL modified/mul (X3, X2, X1,
Y3, Y2, Y1, P6, P5, P4, P3,
P2, P1), X Y N W E S
mul_cell (X1, Y1, 0, W11, 0, P1), mul_cell
(X1, Y2, 0, W12, 0, S12), mul_cell (X1, Y3,
0, W13, 0, S13), mul_cell (X2, Y1, S12,
W21, W11, P2), mul_cell (X2, Y2, S13, W22,
W12, S22), mul_cell (X2, Y3, W13, W23, W22,
S23), mul_cell (X3, Y1, S22, W31, W21, P3),
mul_cell (X3, Y2, S23, W32, W31, P4), mul_cell
(X3, Y3, W23, P6, W32, P5), ENDMOD
Modified entries
26
Critical expressions generated
time valid mul/mul_cell_2/prod 'X1
bit/and 'Y3, 500p mul/S13 Fsum (,
mul/mul_cell_2/prod), 2n2 mul/W22 Fcarry (,
mul/S13), 3n27 mul/S23 Fsum (,
mul/W22), 4n97 mul/W32 Fcarry (,
mul/S23), 6n04 P5 Fsum (,
mul/W32), 7n74
27
The critical path
delay tand n tsum (m - 1) tcarry
28
Timing carry-save mul_add
Project 5
Extra north inputs on the west-side
East inputs are moved to the north-side
All carry-signals are running diagonally
Extra east input
Final adder
29
The carry_save-multiplier
MODEL carry_save/mul_add (N6, N5, N4, N3, Y3, N2,
Y2, N1, Y1,
E1, X1, P1, E2, X2, P2, E3, X3,
P3, E4,
P7, P6, P5, P4), X
Y N W E S mul_cell (X1,
Y1, N1, W11, E1, P1), mul_cell (X1, Y2,
N2, W12, E2, S12), mul_cell (X1, Y3, N3,
W13, E3, S13), mul_cell (X2, Y1, S12, W21,
W11, P2), mul_cell (X2, Y2, S13, W22, W12,
S22), mul_cell (X2, Y3, N4, W23, W13, S23),
mul_cell (X3, Y1, S22, W31, W21, P3),
mul_cell (X3, Y2, S23, W32, W22, S32),
mul_cell (X3, Y3, N5, W33, W23, S33),
a b cin sum cout full_add
(S32, E4, W31, P4, W41), full_add (S33,
W41, W32, P5, W42), full_add ( N6, W42, W33,
P6, P7), ENDMOD
30
Critical expressions generated
mul/mul_cell_2/prod 'X1 bit/and
'Y3, 500p mul/W13 Fcarry (,
mul/mul_cell_2/prod), 1n57 mul/S23 Fsum
(, mul/W13), 3n27 mul/S13 Fsum (,
mul/mul_cell_2/prod), 2n2 mul/W22 Fcarry (,
mul/S13), 3n27 mul/S32 Fsum (, mul/S23,
mul/W22), 4n97 mul/W41 Fcarry (,
mul/S32), 6n04 mul/W42 Fcarry (,
mul/W41), 7n11 P6 Fsum (,
mul/W42), 8n81
31
The critical path
delay tand (n 1) tsum tcarry (m -
1) tcarry
32
The carry-save multiplier
The top-row cells can be replaced by and circuits
delay tand n tsum tcarry (m - 1)
tcarry
33
Prove algebraic equivalence
Project 6
and
full_adder
half_adder
34
Load an algebraic package
ld (modl ('bdg)), algebraic (), input (X3, X2,
X1, '"\n", Y3, Y2, Y1, '"\n"),
Read algebraic input-values
Print algebraic results
Works on the /bit/operators and, nand, or, nor,
xor, etc.
output (P6, P5, P4, P3, P2, P1, '"\n", P6
/bit/ Q6, P5 /bit/ Q5, P4 /bit/ Q4,
P3 /bit/ Q3, P2 /bit/ Q2, P1 /bit/
Q1), symbolic (),
35
load (modl ('lib/array/abut/mul_cell,
'lib/array/abut/mul_add,
'lib/binary/mul_cell, 'lib/carry_save/mul_ad
d)) MODEL top (), ld (modl ('bdg)),
algebraic (), input (X3, X2, X1, '"\n", Y3, Y2,
Y1, '"\n"), array/abut/mul (Y3, Y2, Y1,
X1, P1, X2, P2, X3, P3,
P6, P5, P4), carry_save/mul (X3, X2, X1,
Y3, Y2, Y1, Q6,
Q5, Q4, Q3, Q2, Q1), output (P6, P5, P4, P3,
P2, P1, '"\n", P6 /bit/ Q6, P5 /bit/
Q5, P4 /bit/ Q4, P3 /bit/ Q3, P2
/bit/ Q2, P1 /bit/ Q1), symbolic
(), ENDMOD FUNCTION proj5_v (), sim (top
()), ENDFUN
36
Review carry-save multiplier

The carry-save multiplier can be partitioned into
an and array, an n 2 reductor and a final adder

37
Projects Design

The carry-save multiplier

38
Critical path carry-save mul
mul/mul_cell_4/prod y6 bit/and 'x1, 500p
mul/s15 Fsum (, mul/mul_cell_4/prod), 2n2 mu
l/s24 Fsum (, mul/s15), 3n9 mul/s33
Fsum (, mul/s24), 5n6 mul/s42 Fsum (,
mul/s33), 7n3 mul/w51 Fcarry (, mul/s42),
8n37 mul/w52 Fcarry (, mul/w51), 9n44 mul
/w53 Fcarry (, mul/w52), 10n51 mul/w54
Fcarry (, mul/w53), 11n58 mul/w55 Fcarry
(, mul/w54), 12n65 mul/w56 Fcarry (,
mul/w55), 13n72 mul/w57 Fcarry (,
mul/w56), 14n79 mul/w58 Fcarry (,
mul/w57), 15n86 P13 Fsum (,
mul/w58), 17n56
39
Critical path carry-save mul

One and, n n - 1 sum stages, one sum stage,
and m - 1 carry stages in the final adder

40
A fast carry-path

The ATLAS full-adder

41
A fast carry path ATLAS adder

The ATLAS full-adder uses a switch to either
propagate the cin signal to the cout output or to
generate a 0 or an 1 value

First use Relay based computers
Carry signal is propagated with the speed of
light
ATLAS full_adder
42
A fast carry path ATLAS adder
Propagate Cin
Generate 1
Generate 0
tcy 0n2 _at_ 1mm 5V
p a /bit/xor b
sum p /bit/xor cin
43
The /bit/xor /bit/nxor functions
44
Timing of the carry-path

Delay per section

Approximate formulas
45
Insertion of inverters

Use inverters to avoid excessive delay

Speed-up of the carry-path with a factor of two

46
The inverted full-adder

Application of negative logic to all I/O signals
of a full-adder gives a full-adder

The xor circuits which calculate p and the sum
should be replaced by nxor circuits

47
The carry-select adder

sum CASE cin 0 a /bits_5/
b, cin 1 a /bits_5/ b 1,
ENDCASE,

cin
cout
48
Improved physical design
49
High level model

Bit-level word-level description

50
Optimal partitioning

An m-bit wide adder should be partitioned in
sections of approximate length m ?m
The length of the sections should be rebalanced
such that all delays are equal

When tcin-cout ? 0n3
51
Example final-adder partitioning

m 32-bits _at_ 1?m 5Vm ? 6
partitioning Delay x
1/8(6,6,5,5,4,4) ? 30 6x0n60n33n9
487p(6,6,6,5,5,4) ? 32 7x0n6 4n2
525p (7,6,6,5,5,4) ? 33 7x0n60n34n5 562p
m 64-bitsm ? 8 partitioning
Delay(9,9,8,8,7,7,6,6) ? 60 9x0n60n35n4
675p (10,9,9,8,8,7,7,6)? 64 10x0n6 6n0
750p
_at_ 0.25?m 2.5V x 1/8

52
Composition of larger multipliers
53
Composition of larger multipliers

Mixed Architecture Wallace-Tree

54
Composition of larger multipliers
55
The Wallace-Tree multiplier

The full-adder can be used as a 32 reductor
A 9 2 reductor can be build using 5 full-adders

56
Building a Wallace tree

An mxn multiplier uses 32 reductors in the the
carry save tree
The sum-signals are kept at the current bit
position
The carry-signals are propagated to the next bit
position
The tree is completed one level at a time

57
A 16x16 Wallace-tree
Delay 6 tsum 10n2 _at_ 1mm 5V 1n125 _at_ 0.25mm
2.5V
level7
58
Review

It is almost impossible to make a layout of a
Wallace tree multiplier using full-custom layout
tools
The wiring within the Wallace tree is long when
compared to the wire-length in an array- or a
carry-save multiplier
There are marginally fewer ripples in a Wallace
tree, provided that it is well balanced

59
Cascadable multipliers

It has been shown that an unsigned multiplier can
be constructed from smaller ones, provided that
these are unsigned
Each embedded-multiplier which should be able to
operate as a stand-alone signed multiplier should
be equipped with a circuit for sign-extension
using either sign-magnitude arithmetic or
two-complement arithmetic

60
Cascadable multipliers

MMX VIS

61
The splitting interface
62
MMX (INTEL) VIS (SUN)

SUN was the first to introduce a splittable
multiplier, consiting of 8 16x16 multiplers which
can be used to perform 4 16x16 multiply-add
operations or one 64x64 floating point multiply
INTEL uses a splittable multiplier consiting of 4
16x16 multiplers which can be used to perform 4
16x16 multiply-add operations or a 64x32 (64x64)
floating point multiply in one (two) clock
cycle(s)

63
Load Store operations

Operands are normally de-normalized such that the
mantissa is scaled in multiples of 216 instead
of 21
A FP-(MMX)-load performs (discards) the
de-normalization
The FP-(MMX)-store performs (discards) the
FP-normalization

64
Sequential Circuits
65
The cut-theorem
Provided that the initial conditions are
compatible
66
The fully pipelined multiplier
Multiple D-FF in X,N
Multiple D-FF in P
Consider all iso-chronous wires
67
Folding