Title: CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel
1CSE 8351Computer ArithmeticFall
2005Instructors Peter-Michael Seidel
2II. Simple Algorithms for Arithmetic Unit Design
in Hardware
- Addition/Multiplication/ SRT Division/Square Root
/Reciprocal Approximation
3Arithmetic Unit Design in Hardware
- Input Interface n-bit operands A, B
- Output Interface n-bit result C
- Arithmetic Function f Bn x Bn ? Bn
- What is different from
- other (non-arithmetic)
- Hardware Units
4Arithmetic Unit Design in Hardware
- Specification
- Truth tables not feasible (22n x n entries)
- for n64 more than a Gogool
- Complexity
- there are 2(22n x n) different functions
- for n64 more than a Gogoolplex
- Only a handful Interesting/Used out of a
Gogoolplex !!! -
- Other forms of specification possible
5Arithmetic functions supported
- Functions interesting
- because of specific properties that arise at
operand level - -gt use mathematic formalism to specify
functionality, e.g. - For defined values ltAgt, ltBgt, ltCgt in N
- ltCgt f(ltAgt,ltBgt) ltAgt ltBgt
- -gt does not directly help in implementing or
testing - -gt use tools from the (well established) science
of mathematics to transform equation and extract
local properties - -gt help use oflimited global influence, local
computation, reuse, recurrence
6Notations, Representations, Values
- Bit strings sequences of bits
(concatenation also by (..,..,..) ) - a 0011010
(001,10,10) - For bit and natural numbers n
- string consisting of n copies of x
- Bits of strings are indexed from
right (0) to left (n-1) - or
7Binary representation
- Natural number with binary representation
- Range of numbers which have a binary
representation of length n - n -bit binary representation of a natural number
- with
8Twos complement representation
- Natural number with twos complement
representation - Range of numbers with twos complement
representation of length n - n -bit twos complement representation of a
natural number - with
9Binary Addition
- Binary Addition (Specification)
- Coping with Complexity
- Simple for n1
10Addition (n1)
- Half adder adding two bits , sum
represented by - obvious equations
- Full adder adding three bits
- obvious equations
11Addition (n1)
- Half adder Full adder implementations
12Binary Addition
- Greedy Approach (right to left) -gt Ripple
Carry Adder - Development/ Verification based on equivalence
transform of Specification
13Ripple Carry Adder
14Basic properties (1)
- For
- leading zeros do not change the value of a binary
representation - binary representations can be split for each
- twos complement representations have a sign bit
an-1 - construct twos complement representation from
binary representationnote, that twos
complement representation is longer by one bit
15Basic properties(2)
- For
- sign extension does not change the value
- negation of a number in twos complement
representationbasis for subtraction algorithm
! - congruencies modulo ,
16Basic properties(3)
- Twos complement addition based on binary
addition - For
- the result of the n -bit binary addition
- is useful for n -bit twos complement addition
17Ripple Carry Adder
18Ripple Carry Adder
19Ripple Carry Adder
20Ripple Carry Adder
21Ripple Carry Adder
22Ripple Carry Adder
23Ripple Carry Adder
24Ripple Carry Adder
25Binary Addition
- Complexity Delay, Cost, Power
- Lower Bounds ? What
computational model ? - What assumptions ?
26Faster Addition
- Challenge (KP95)
- numbers given as stacks of digits
- it takes 1 second to add two digits and put
result digit on result stack - one person can add two 5000 digit number in 5000
seconds ?! How? - Can two people add two 5000 digit numbers in
less than an hour? - Observation (Notion of Carries)
- limited carry propagation
- -gt pre-computing upper sums for all cases
ck1 and ck0 - Divide and conquer, but also Ripple-carry
approach is divide and conquer
27Conditional Sum Adder
- Main observation
- limited carry propagation
- -gt pre-computing upper sums for all cases
ck1 and ck0 - Assume n is power of 2
28Conditional Sum Adder
- Main principle pre-computing upper sums for the
cases ck1 and ck0 - Assume n is power of 2
29Conditional Sum Adder
- Main principle pre-computing upper sums for the
cases ck1 and ck0 - Assume n is power of 2
30Conditional Sum Adder
- Main principle pre-computing upper sums for the
cases ck1 and ck0 - Assume n is power of 2
31Conditional Sum Adder
- Main principle pre-computing upper sums for the
cases ck1 and ck0 - Assume n is power of 2
32Conditional Sum Adder
- Main principle pre-computing upper sums for the
cases ck1 and ck0 - Assume n is power of 2
33Conditional Sum Adder
- Main principle pre-computing upper sums for the
cases ck1 and ck0 - Assume n is power of 2
34Conditional Sum Adder
35Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
36Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
37Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
38Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
39Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
40Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
41Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
42Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
43Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
44Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
45Conditional Sum Adder
- full adder implements adder for n1 CSA(1)
FA !!!!
46Binary Multiplication
- Binary Multiplication (Specification)
- Remember Binary Addition (Specification)
representable with 2n bits !!
47Implementation to cope with Complexity
- Strategies that worked for binary addition
- - consideration of small n
- - property extraction from specification
- - greedy approach
- - divide conquer
-
- Strategies for binary multiplication
- - consideration of small n
- - reduction approach
- - divide conquer
- - reduction to binary addition
- - rewriting of specification
- - considering logarithms (European
logarithmic processor)
48Consideration of small n
- Binary multiplication
- even simpler than Addition for n1
- This also works for n x 1 -bit multiplication
Consider ltbn-10gt 2n ?
49Reduction Approach
(n-1)-bit multiplication
1-bit AND addition (carry-in)
(n-1)-bit AND additions
Implementation, Complexity ?
50Multiplication Reduction in Sums
Definition Partial Products
(simple to compute in binary)
(not affected by remaining sum)
51Binary Multiplication
-
- Implementations similar to grade school
algorithm 0010 (multiplicand) __x_101
1 (multiplier) 0010
0010 0000
0010 00010110 - Negative numbers convert and multiply
- better technique using Booth Recoding
52Multiplier Implementation
- Stage i accumulates A 2 i if Bi 1
- What are the boxes ? How much hardware for n-bit
multiplier?
53Multiplication Complexity
- So far
- Delay(n) DAND n DFA O(n)
- Cost(n) n2 (CAND CFA) O(n2)
-
- Inherent problem
- Adding n partial products (n-bit numbers)
- Addition (can be done in delay O(log(n)) )
-
54Parallel Multiplication (PP adder tree)
- Partial Product Generations and Additions can be
done in parallel -
Operand A
Operand B
PPG
PPG
PPG
PPG
PPG
PPG
Binary Adder
Binary Adder
Binary Adder
Fanout? Precisions? Delay? Cost?
Binary Adder
Product C
55Redundant Adder Tree
- Redundant Addition (Carry-Save Adders)
compression of 3 binary operands to 2
By the use of a line of full adders
56Redundant Adder Tree
- Redundant Addition of 3 partial products to 2
57Redundant Adder Tree
- Redundant Addition of 4 partial products to 2
58Redundant Adder Tree
- Redundant Addition of 4 internal partial products
to 2
59Redundant Adder Tree
- Tree structure of redundant compressors
Cost? Delay? See Wallace tree designs
60(Modified) Booth Recoding
- Operand recoding to
- Allow for signed multiplication
- Reduce number of partial products
- Popular Recoding Choice based on
61(Modified) Booth Recoding
62(Modified) Booth Recoding
63(Modified) Booth Recoding
- Implementation of Recoding
64Recursive Multiplication
-
- What does Implementation require?
- Is it better than previous designs?
- Do improvements by Karatsuba (1962) in asymptotic
complexity help ?
65Division
- Multiplication specification
- For division inputs output
- Not always a solution!
- Consider
- so that
remainder
66Division
- Two simple approaches
- Reduce to simpler operations
- Subtractions
- Multiplications
67Subtractive Division
- Dividing Using Subtractions!
- Starting left or from right ?
- Considering ranges
-
68SRT division
- Consider
- Choose largest k with
- bn-1k1 ?
- Recurrence i radix-2
- with qi-1 (10 ? )
- implies
69SRT Division
- Recurrence
- Implementation
70Multiplicative Division
- Approximation of A/B
- Contemporary microprocessors implementmultiplica
tive Division with - Newton-Raphsons Algorithm (e.g. INTEL IA-64)
- Goldschmidts Algorithm (e.g. AMD K7)
- Steps in multiplicative Division
- Rough Approximation of 1/B
- Iterative improvement of approximation accuracy
of A/B or 1/B by - Multiplications,
- Complementations
- Shifts
- ( Multiplication with A if 1/B was approximated
in Step (2.) )
71Newtons Algorithm
- Newton-Raphson Approximation of 1/B
- Initialization
- with relative error
- k iterations
- Scaling with A
- quadratic convergence of one sided relative
approximation error
72Goldschmidts Algorithm
- Goldschmidts Approximation of A/B
- Initialization
- Iteration i for
- Approximation of A/B by after k
iterationen - Computation of
like Newton iteration with B1 - gt converges quadratically to 1
- From initialization
- gt converges quadratically to A/B
2 independent multiplications per iteration
73 Multiplication Scheduling
- Newton-Raphson Goldschmidt-Powers
- For both 2k1 multiplications in total
- but Newton 2k1 multiplications
on critical path - Goldschmidt k1 multiplications
on critical path
A
B
B
A
74 Quadratic Convergence
- for exact computation
- Newton-Raphson
- Goldschmidt-Powers
75Precision Problems for exact computations
- Example with a bit-width( ) 8,
p 64 - Iteration i Goldschmidt-Powers (2 Mults with)
- 0 a x p 8 x 64
bits - 1 (ap) x (ap) 72 x 72 bits
- 2 2(ap) x 2(ap) 144 x
144 bits - 3 4(ap) x 4(ap) 288 x
288 bits - gt Rounding of intermediate values required
76Problems and State of the Art
- Newton Raphson is self correcting,
- i.e. converges even with rounded intermediate
results to 1/B - Correction factor moves any rounded
intermediateapproximation in the direction 1/B. - rounding can even be chosen to maintain
quadraticconvergence, e.g. Cook - Goldschmidt-Powers is not self correcting,
- i.e. convergence to A/B is not granted with
rounded intermediate results,because the
following does not hold anymore - quadratic convergence can not be achieved with
rounded intermediate results - Error analysis more complicated than for
Newton-Raphson
77Problems and State of the Art
- R. Goldschmidt (1964)
- Presentation of algorithm for exact computations
- Implementation (IBM) with rough error analysis
(absolute errors) - E. Krishnamurthy (1970)
- Goldschmidts Algorithm is NOT self correcting
- O. Spaniol in his Book Computer Arithmetic
(1982) - claims that Goldschmidts Algorithm is self
correcting - R. Golliver, INTEL IA64 (1999)
- INTEL implements Newton-Raphson for simpler error
analysis (and for smaller multiplier) - S. Oberman, AMD K7 (1999)
- AMD uses 76x76 multiplier for Goldschmidt
Division (68 bit), because mechanically checked
correctness proof exists - Consideration of absolute errors
78Problems and State of the Art
- Most work only considers variations of
Newton-Raphson for - simpler error analysis
- quadratic convergence
- no interest in constant factors
- Practical implementations
- constant acceleration through Goldschmidts
Algorithm interesting - Previous error analysis rough and limited to
special cases - General precise error analysis is important for
cost, power and delay optimizations in
practical implementations