VLSI Arithmetic Adders - PowerPoint PPT Presentation

About This Presentation
Title:

VLSI Arithmetic Adders

Description:

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Introduction Digital Computer Arithmetic ... – PowerPoint PPT presentation

Number of Views:476
Avg rating:3.0/5.0
Slides: 173
Provided by: eceUcdav
Category:

less

Transcript and Presenter's Notes

Title: VLSI Arithmetic Adders


1
VLSI ArithmeticAdders Multipliers
  • Prof. Vojin G. Oklobdzija
  • University of California
  • http//www.ece.ucdavis.edu/acsel

2
Introduction
  • Digital Computer Arithmetic belongs to Computer
    Architecture, however, it is also an aspect of
    logic design.
  • The objective of Computer Arithmetic is to
    develop appropriate algorithms that are utilizing
    available hardware in the most efficient way.
  • Ultimately, speed, power and chip area are the
    most often used measures, making a strong link
    between the algorithms and technology of
    implementation.

3
Basic Operations
  • Addition
  • Multiplication
  • Multiply-Add
  • Division
  • Evaluation of Functions
  • Multi-Media

4
Addition of Binary Numbers
5
Addition of Binary Numbers
Full Adder. The full adder is the fundamental
building block of most arithmetic circuits
  The sum and carry outputs are described
as
ai
bi
Full Adder
Cin
Cout
si
6
Addition of Binary Numbers
Propagate
Generate
Propagate
Generate
7
Full-Adder Implementation
  • Full Adder operations is defined by equations

Carry-Propagate and Carry-Generate gi
One-bit adder could be implemented as shown
8
High-Speed Addition
One-bit adder could be implemented more
efficiently because MUX is faster
9
The Ripple-Carry Adder
10
The Ripple-Carry Adder
From Rabaey
11
Inversion Property
From Rabaey
12
Minimize Critical Path by Reducing Inverting
Stages
From Rabaey
13
Ripple Carry Adder
  • Carry-Chain of an RCA implemented using
    multiplexer from the standard cell library

Critical Path
Oklobdzija, ISCAS88
14
Manchester Carry-Chain Realization of the Carry
Path
  • Simple and very popular scheme for implementation
    of carry signal path

15
Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall,
"Parallel Addition in Digital Computers A New
Fast "Carry" Circuit", Proceedings of IEE, Vol.
106, pt. B, p. 464, September 1959.
16
Manchester Carry Chain (CMOS)
  • Implement P with pass-transistors
  • Implement G with pull-up, kill (delete) with
    pull-down
  • Use dynamic logic to reduce the complexity and
    speed up

Kilburn, et al, IEE Proc, 1959.
17
Pass-Transistor Realization in DPL
18
Carry-Skip Adder
MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans
on Comp, 12/61
19
Carry-Skip Adder
Bypass
From Rabaey
20
Carry-Skip Adder N-bits, k-bits/group, rN/k
groups
21
Carry-Skip Adder
k
22
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
23
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
24
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
25
Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
26
Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
27
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
28
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths
  • No closed form solution for delay
  • It is a dynamic programming problem

29
Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
30
Delay Comparison Variable Block Adder
VBA
CLA
VBA- Multi-Level
31
VLSI ArithmeticLecture 4
  • Prof. Vojin G. Oklobdzija
  • University of California
  • http//www.ece.ucdavis.edu/acsel

32
Review
  • Lecture 3

33
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
34
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
35
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
36
Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
37
Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
38
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
39
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths
  • No closed form solution for delay
  • It is a dynamic programming problem

40
Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
41
Delay Comparison Variable Block Adder
Square Root Dependency
VBA
Log Dependency
CLA
VBA- Multi-Level
42
Circuit Issues
  • Adder speed can not be estimated based on
  • logic gates in the critical path
  • number of transistors in the path
  • logic levels in the path
  • Estimating Adders speed is much more complex and
    many of the fast schemes may be misleading you.

43
Fan-Out Dependency
44
Fan-In Dependency
This looks like Logical Effort (1985)
45
Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
46
(No Transcript)
47
Carry-Lookahead Adder(Weinberger and Smith, 1958)
ARITH-13 Presenting Achievement Award to Arnold
Weinberger of IBM (who invented CLA adder in 1958)
Ref A. Weinberger and J. L. Smith, A Logic for
High-Speed Addition, National Bureau of
Standards, Circ. 591, p.3-12, 1958.
48
CLA Definitions One-bit adder

49
CLA Definitions 4-bit Adder
50
Carry-Lookahead Adder 4-bits
Gj
Pj
51
Carry-Lookahead Adder
One gate delay D to calculate p, g
One D to calculate P and two for G
Three gate delays To calculate C4(j1)
Compare that to 8 D in RCA !
52
Carry-Lookahead Adder(Weinberger and Smith)
   
Additional two gate delays
C16 will take a total of 5D vs. 32D for RCA !
53
32-bit Carry Lookahead Adder
54
Carry-Lookahead Adder(Weinberger and Smith
original derivation, 1958 )
55
Carry-Lookahead Adder(Weinberger and Smith
original derivation )
56
Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
57
Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
58
Motorola CLA Implementation Example
  • A. Naini, D. Bearden and W. Anderson, A 4.5nS
    96b CMOS Adder Design,
  • Proceedings of the IEEE Custom Integrated
    Circuits Conference, May 3-6, 1992.

59
Critical path in Motorola's 64-bit CLA
4.8nS
1.05nS
1.7nS
3.75nS
2.7nS
2.0nS
2.35nS
60
Motorola's 64-bit CLAconventional PG Block
no better situation here !
carry ripples locally 5-transistors in the path
Basically, this is MCC performance with
Carry-Skip. One should not expect any better
results than VBA.
61
Motorola's 64-bit CLAModified PG Block
Intermediate propagate signals Pi0 are
generated to speed-up C3
still critical path resembles MCC
62
Motorola's 64-bit CLA
63
(No Transcript)
64
Delay Optimized CLA
  • B. Lee, V. G. Oklobdzija
  • Journal of VLSI Signal Processing, Vol.3, No.4,
    October 1991

65
Delay Optimized CLA Lee-Oklobdzija 91
(a.) Fixed groups and levels (b.) variable-sized
groups, fixed levels (c.) variable-sized groups
and fixed levels (d.) variable-sized groups and
levels
66
Two-Levels of Logic Implementation of the Carry
Block
67
Two-Levels of Logic Implementation of the
Carry-Lookahead Block
68
Three-Levels of Logic Implementation of the Carry
Block (restricted fan-in)
69
Three-Levels of Logic Implementation of the Carry
Lookahead (restricted fan-in)
70
Delay Optimized CLA Lee-Oklobdzija 91
Delay Three-level BCLA
Delay Two-level BCLA
71
Delay Optimized CLA Lee-Oklobdzija 91
(a.) 2-level BCLA D8.5nS (b.) 3-level
BCLA D8.9nS
72
Lings Adder
  • Huey Ling, High-Speed Binary Adder
  • IBM Journal of Research and Development, Vol.5,
    No.3, 1981.
  • Used in IBM 3033, IBM 168, Amdahl V6, HP etc.

73
Lings Derivations
define
gi implies Ci1 which implies Hi1 , thus gi gi
Hi1
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
74
Lings Derivations
From
and
because
fundamental expansion
Now we need to derive Sum equation
75
Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
76
Ling Adder
Lings equation
Variation of CLA
Ling uses different transfer function. Four of
those functions have desired properties (Lings
is one of them)
see Doran, IEEE Trans on Comp. Vol 37, No.9
Sept. 1988.
77
Ling Adder
Conventional
Fan-in of 5
Ling
Fan-in of 4
78
Advantages of Lings Adder
  • Uniform loading in fan-in and fan-out
  • H16 contains 8 terms as compared to G16 that
    contains 15.
  • H16 can be implemented with one level of logic
    (in ECL), while G16 can not.
  • (Lings adder takes full advantage of wired-OR,
    of special importance when ECL technology is
    used)

79
VLSI ArithmeticLecture 5
  • Prof. Vojin G. Oklobdzija
  • University of California
  • http//www.ece.ucdavis.edu/acsel

80
Review
  • Lecture 4

81
Lings Adder
  • Huey Ling, High-Speed Binary Adder
  • IBM Journal of Research and Development, Vol.5,
    No.3, 1981.
  • Used in IBM 3033, IBM S370/168, Amdahl V6, HP
    etc.

82
Lings Derivations
define
gi implies Ci1 which implies Hi1 , thus gi gi
Hi1
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
83
Lings Derivations
From
and
because
fundamental expansion
Now we need to derive Sum equation
84
Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
85
Ling Adder
Lings equation
Variation of CLA
Ling uses different transfer function. Four of
those functions have desired properties (Lings
is one of them)
see Doran, IEEE Trans on Comp. Vol 37, No.9
Sept. 1988.
86
Ling Adder
Conventional
Fan-in of 5
Ling
Fan-in of 4
87
Advantages of Lings Adder
  • Uniform loading in fan-in and fan-out
  • H16 contains 8 terms as compared to G16 that
    contains 15.
  • H16 can be implemented with one level of logic
    (in ECL), while G16 can not (with 8-way wire-OR).
  • (Lings adder takes full advantage of wired-OR,
    of special importance when ECL technology is used
    - his IBM limitation was fan-in of 4 and wire-OR
    of 8)

88
Ling Weinberger Notes
89
Ling Weinberger Notes
90
Ling Weinberger Notes
91
Advantage of Lings Adder
  • 32-bit adder used in IBM 3033, IBM S370/
    Model168, Amdahl V6.
  • Implements 32-bit addition in 3 levels of logic
  • Implements 32-bit AGEN BIndexDisp in 4 levels
    of logic (rather than 6)
  • 5 levels of logic for 64-bit adder used in HP
    processor

92
Implementation of Lings Adder in CMOS(S.
Naffziger, A Subnanosecond 64-b Adder, ISSCC
96)
93
S. Naffziger, ISSCC96
94
S. Naffziger, ISSCC96
95
S. Naffziger, ISSCC96
96
S. Naffziger, ISSCC96
97
S. Naffziger, ISSCC96
98
S. Naffziger, ISSCC96
99
S. Naffziger, ISSCC96
100
S. Naffziger, ISSCC96
101
S. Naffziger, ISSCC96
102
S. Naffziger, ISSCC96
103
S. Naffziger, ISSCC96
104
Ling Adder Critical Path
105
Ling Adder Circuits
106
LCS4 Critical G Path
107
LCS4 Logical Effort Delay
108
Results
  • 0.5u Technology
  • Speed 0.930 nS
  • Nominal process, 80C, V3.3V

See S. Naffziger, A Subnanosecond 64-b Adder,
ISSCC 96
109
Prefix Addersand Parallel Prefix Adders
110
from Ercegovac-Lang
111
Prefix Adders
Following recurrence operation is defined
(g, p)o(g,p)(gpg, pp)
such that
(g0, p0)
i0
Gi, Pi
(gi, pi)o(Gi-1, Pi-1 )
1 i n
ci1 Gi
for i0, 1, .. n
(g-1, p-1)(cin,cin)
c1 g0 p0 cin
This operation is associative, but not commutative
It can also span a range of bits (overlapping and
adjacent)
112
from Ercegovac-Lang
113
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
114
Pyramid AdderM. Lehman, A Comparative Study of
Propagation Speed-up Circuits in Binary
Arithmetic Units, IFIP Congress, Munich,
Germany, 1962.
115
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
116
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
117
Hybrid BK-KS Adder
118
Parallel Prefix Adders S. Knowles 1999
operation is associative hgtijk
operation is idempotent hgtijk
produces carry cin0
119
Parallel Prefix Adders Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
120
Parallel Prefix Adders Ladner-Fisher(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of
two. Large fan-out (of 16 n/2) Large capacitive
loading combined with the long wires (in the last
stages)
121
Parallel Prefix Adders Kogge-Stone
Exploits idempotency to limit the fan-out to 1.
Dramatic increase in wires. The wire span
remains the same as in Ladner-Fisher. Buffers
needed in both cases K-S, L-F
122
Kogge-Stone Adder
123
Parallel Prefix Adders Brent-Kung
  • Set the fan-out to one
  • Avoids explosion of wires (as in K-S)
  • Makes no sense in CMOS
  • fan-out 1 limit is arbitrary and extreme
  • much of the capacitive load is due to wire
    (anyway)
  • It is more efficient to insert buffers in L-F
    than to use B-K scheme

124
Brent-Kung Adder
125
Parallel Prefix Adders Han-Carlson
  • Is a hybrid synthesis of L-F and K-S
  • Trades increase in logic depth for a reduction in
    fan-out
  • effectively a higher-radix variant of K-S.
  • others do it similarly by serializing the prefix
    computation at the higher fan-out nodes.
  • Others, similarly trade the logical depth for
    reduction of fan-out and wire.

126
Parallel Prefix Adders variety of possibilities
from Knowles
bounded by L-F and K-S at ends
127
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • Following rules are used
  • Lateral wires at the jth level span 2j bits
  • Lateral fan-out at jth level is power of 2 up to
    2j
  • Lateral fan-out at the jth level cannot exceed
    that a the (j1)th level.

128
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • The number of minimal depth graphs of this type
    is given in
  • at 4-bits there is only K-S and L-F, afterwards
    there are several new possibilities.

129
Parallel Prefix Adders variety of possibilities
Knowles 1999
  • example of a new 32-bit adder 4,4,2,2,1

130
Parallel Prefix Adders variety of possibilities
Knowles 1999
  • Example of a new 32-bit adder 4,4,2,2,1

131
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • Delay is given in terms of FO4 inverter delay
    w.c.
  • (nominal case is 40-50 faster)
  • K-S is the fastest
  • K-S adders are wire limited (requiring 80 more
    area)
  • The difference is less than 15 between examined
    schemes

132
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • Conclusion
  • Irregular, hybrid schmes are possible
  • The speed-up of 15 is achieved at the cost of
    large wiring, hence area and power
  • Circuits close in speed to K-S are available at
    significantly lower wiring cost

133
VLSI ArithmeticLecture 6
  • Prof. Vojin G. Oklobdzija
  • University of California
  • http//www.ece.ucdavis.edu/acsel

134
Review
  • Lecture 5

135
Prefix Addersand Parallel Prefix Adders
136
from Ercegovac-Lang
137
Prefix Adders
Following recurrence operation is defined
(g, p)o(g,p)(gpg, pp)
such that
(g0, p0)
i0
Gi, Pi
(gi, pi)o(Gi-1, Pi-1 )
1 i n
ci1 Gi
for i0, 1, .. n
(g-1, p-1)(cin,cin)
c1 g0 p0 cin
This operation is associative, but not commutative
It can also span a range of bits (overlapping and
adjacent)
138
Parallel Prefix Adders S. Knowles 1999
operation is associative hgtijk
operation is idempotent hgtijk
produces carry cin0
139
from Ercegovac-Lang
140
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
141
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
142
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
143
Kogge-Stone Adder
144
Brent-Kung Adder
145
Hybrid BK-KS Adder
146
Pyramid AdderM. Lehman, A Comparative Study of
Propagation Speed-up Circuits in Binary
Arithmetic Units, IFIP Congress, Munich,
Germany, 1962.
147
Parallel Prefix Adders Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
148
Parallel Prefix Adders Ladner-Fisher(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of
two. Large fan-out (of 16 n/2) Large capacitive
loading combined with the long wires (in the last
stages)
149
Parallel Prefix Adders Kogge-Stone
Exploits idempotency to limit the fan-out to 1.
Dramatic increase in wires. The wire span
remains the same as in Ladner-Fisher. Buffers
needed in both cases K-S, L-F
150
Parallel Prefix Adders Brent-Kung
  • Set the fan-out to one
  • Avoids explosion of wires (as in K-S)
  • Makes no sense in CMOS
  • fan-out 1 limit is arbitrary and extreme
  • much of the capacitive load is due to wire
    (anyway)
  • It is more efficient to insert buffers in L-F
    than to use B-K scheme

151
Two Parallel Prefix Adder Structures
Kogge-Stone
Han-Carlson
  • log(bits) carry stages
  • Extra Wiring
  • log(bits) 1 carry stages
  • Reduced Wiring and Gates

152
Parallel Prefix Adders Han-Carlson
  • Is a hybrid synthesis of L-F and K-S
  • Trades increase in logic depth for a reduction in
    fan-out
  • effectively a higher-radix variant of K-S.
  • others do it similarly by serializing the prefix
    computation at the higher fan-out nodes.
  • Others, similarly trade the logical depth for
    reduction of fan-out and wire.

153
Parallel Prefix Adders variety of possibilities
from Knowles
bounded by L-F and K-S at ends
154
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • Following rules are used
  • Lateral wires at the jth level span 2j bits
  • Lateral fan-out at jth level is power of 2 up to
    2j
  • Lateral fan-out at the jth level cannot exceed
    that a the (j1)th level.

155
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • The number of minimal depth graphs of this type
    is given in
  • at 4-bits there is only K-S and L-F, afterwards
    there are several new possibilities.

156
Parallel Prefix Adders variety of possibilities
Knowles 1999
  • example of a new 32-bit adder 4,4,2,2,1

157
Parallel Prefix Adders variety of possibilities
Knowles 1999
  • Example of a new 32-bit adder 4,4,2,2,1

158
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • Delay is given in terms of FO4 inverter delay
    w.c.
  • (nominal case is 40-50 faster)
  • K-S is the fastest
  • K-S adders are wire limited (requiring 80 more
    area)
  • The difference is less than 15 between examined
    schemes

159
Parallel Prefix Adders variety of
possibilitiesKnowles 1999
  • Conclusion
  • Irregular, hybrid schmes are possible
  • The speed-up of 15 is achieved at the cost of
    large wiring, hence area and power
  • Circuits close in speed to K-S are available at
    significantly lower wiring cost

160
Possibilities for Further Research
  • The logical depth is important (Knowles was
    right)
  • The fan-out is less important than fan-in
    (Knowles was wrong)
  • It is possible to examine a variety of topologies
    with restricted and varied fan-in.
  • Driving strength and Logical Effort rules were
    overlooked and at least neglected
  • It is possible to create number of topologies
    taking LE rules into account.
  • It is further possible to combine the rules with
    compound domino implementation taking advantage
    of two different rules governing dynamic and
    static.
  • It is still possible to produce a better adder !

161
Other Types of Adders
162
Conditional Sum Adder
  • J. Sklansky, Conditional-Sum Addition Logic,
    IRE Transactions on Electronic
  • Computers, EC-9, p.226-231, 1960.

163
Conditional Sum Adder
from Ercegovac-Lang
164
ConditionalSum Adder
165
Conditional Sum Adder
from Ercegovac-Lang
166
Conditional Sum Adder
from Ercegovac-Lang
167
Conditional Sum Adder
168
Carry-Select Adder
  • O. J. Bedrij, Carry-Select Adder, IRE
    Transactions on Electronic Computers, June
  • 1962, p.340-34

169
Carry-Select Sum Adder
from Ercegovac-Lang
170
Carry-Select Adder
  • Addition under assumption of Cin0 and Cin 1.

171
Carry Select Addercombining two 32-b VBAs in
select mode
Delay DVBA32 DMUX
172
Carry-Select Adder
O.J. Bedrij, IBM Poughkeepsie, 1962
Write a Comment
User Comments (0)
About PowerShow.com