CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel - PowerPoint PPT Presentation

1 / 78

About This Presentation

Title:

CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel

Description:

construct two's complement representation from binary representation: ... full adder implements adder for n=1: CSA(1) = FA !!!! CSE 8351 Computer Arithmetic ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 79

Provided by: petermich

Learn more at: http://lyle.smu.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel

1
CSE 8351Computer ArithmeticFall
2005Instructors Peter-Michael Seidel

2
II. Simple Algorithms for Arithmetic Unit Design
in Hardware

Addition/Multiplication/ SRT Division/Square Root
/Reciprocal Approximation

3
Arithmetic Unit Design in Hardware

Input Interface n-bit operands A, B
Output Interface n-bit result C
Arithmetic Function f Bn x Bn ? Bn
What is different from
other (non-arithmetic)
Hardware Units

4
Arithmetic Unit Design in Hardware

Specification
Truth tables not feasible (22n x n entries)
for n64 more than a Gogool
Complexity
there are 2(22n x n) different functions
for n64 more than a Gogoolplex
Only a handful Interesting/Used out of a
Gogoolplex !!!
Other forms of specification possible

5
Arithmetic functions supported

Functions interesting
because of specific properties that arise at
operand level
-gt use mathematic formalism to specify
functionality, e.g.
For defined values ltAgt, ltBgt, ltCgt in N
ltCgt f(ltAgt,ltBgt) ltAgt ltBgt
-gt does not directly help in implementing or
testing
-gt use tools from the (well established) science
of mathematics to transform equation and extract
local properties
-gt help use oflimited global influence, local
computation, reuse, recurrence

6
Notations, Representations, Values

Bit strings sequences of bits
(concatenation also by (..,..,..) )
a 0011010
(001,10,10)
For bit and natural numbers n
string consisting of n copies of x
Bits of strings are indexed from
right (0) to left (n-1)
or

7
Binary representation

Natural number with binary representation
Range of numbers which have a binary
representation of length n
n -bit binary representation of a natural number
with

8
Twos complement representation

Natural number with twos complement
representation
Range of numbers with twos complement
representation of length n
n -bit twos complement representation of a
natural number
with

9
Binary Addition

Binary Addition (Specification)
Coping with Complexity
Simple for n1

10
Addition (n1)

Half adder adding two bits , sum
represented by
obvious equations
Full adder adding three bits
obvious equations

11
Addition (n1)

Half adder Full adder implementations

12
Binary Addition

Greedy Approach (right to left) -gt Ripple
Carry Adder
Development/ Verification based on equivalence
transform of Specification

13
Ripple Carry Adder

14
Basic properties (1)

For
leading zeros do not change the value of a binary
representation
binary representations can be split for each
twos complement representations have a sign bit
an-1
construct twos complement representation from
binary representationnote, that twos
complement representation is longer by one bit

15
Basic properties(2)

For
sign extension does not change the value
negation of a number in twos complement
representationbasis for subtraction algorithm
!
congruencies modulo ,

16
Basic properties(3)

Twos complement addition based on binary
addition
For
the result of the n -bit binary addition
is useful for n -bit twos complement addition

17
Ripple Carry Adder

18
Ripple Carry Adder

19
Ripple Carry Adder

20
Ripple Carry Adder

21
Ripple Carry Adder

22
Ripple Carry Adder

23
Ripple Carry Adder

24
Ripple Carry Adder

25
Binary Addition

Complexity Delay, Cost, Power
Lower Bounds ? What
computational model ?
What assumptions ?

26
Faster Addition

Challenge (KP95)
numbers given as stacks of digits
it takes 1 second to add two digits and put
result digit on result stack
one person can add two 5000 digit number in 5000
seconds ?! How?
Can two people add two 5000 digit numbers in
less than an hour?
Observation (Notion of Carries)
limited carry propagation
-gt pre-computing upper sums for all cases
ck1 and ck0
Divide and conquer, but also Ripple-carry
approach is divide and conquer

27
Conditional Sum Adder

Main observation
limited carry propagation
-gt pre-computing upper sums for all cases
ck1 and ck0
Assume n is power of 2

28
Conditional Sum Adder

Main principle pre-computing upper sums for the
cases ck1 and ck0
Assume n is power of 2

29
Conditional Sum Adder

Main principle pre-computing upper sums for the
cases ck1 and ck0
Assume n is power of 2

30
Conditional Sum Adder

Main principle pre-computing upper sums for the
cases ck1 and ck0
Assume n is power of 2

31
Conditional Sum Adder

Main principle pre-computing upper sums for the
cases ck1 and ck0
Assume n is power of 2

32
Conditional Sum Adder

Main principle pre-computing upper sums for the
cases ck1 and ck0
Assume n is power of 2

33
Conditional Sum Adder

Main principle pre-computing upper sums for the
cases ck1 and ck0
Assume n is power of 2

34
Conditional Sum Adder
35
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

36
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

37
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

38
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

39
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

40
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

41
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

42
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

43
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

44
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

45
Conditional Sum Adder

full adder implements adder for n1 CSA(1)
FA !!!!

46
Binary Multiplication

Binary Multiplication (Specification)
Remember Binary Addition (Specification)

representable with 2n bits !!
47
Implementation to cope with Complexity

Strategies that worked for binary addition
- consideration of small n
- property extraction from specification
- greedy approach
- divide conquer
Strategies for binary multiplication
- consideration of small n
- reduction approach
- divide conquer
- reduction to binary addition
- rewriting of specification
- considering logarithms (European
logarithmic processor)

48
Consideration of small n

Binary multiplication
even simpler than Addition for n1
This also works for n x 1 -bit multiplication

Consider ltbn-10gt 2n ?
49
Reduction Approach

Reduction n -gt n-1

(n-1)-bit multiplication
1-bit AND addition (carry-in)
(n-1)-bit AND additions
Implementation, Complexity ?
50
Multiplication Reduction in Sums

Definition Partial Products
(simple to compute in binary)
(not affected by remaining sum)
51
Binary Multiplication

Implementations similar to grade school
algorithm 0010 (multiplicand) __x_101
1 (multiplier) 0010
0010 0000
0010 00010110
Negative numbers convert and multiply
better technique using Booth Recoding

52
Multiplier Implementation

Stage i accumulates A 2 i if Bi 1
What are the boxes ? How much hardware for n-bit
multiplier?

53
Multiplication Complexity

So far
Delay(n) DAND n DFA O(n)
Cost(n) n2 (CAND CFA) O(n2)
Inherent problem
Adding n partial products (n-bit numbers)
Addition (can be done in delay O(log(n)) )

54
Parallel Multiplication (PP adder tree)

Partial Product Generations and Additions can be
done in parallel

Operand A
Operand B
PPG
PPG
PPG
PPG
PPG
PPG
Binary Adder
Binary Adder
Binary Adder
Fanout? Precisions? Delay? Cost?
Binary Adder
Product C
55
Redundant Adder Tree

Redundant Addition (Carry-Save Adders)

compression of 3 binary operands to 2
By the use of a line of full adders
56
Redundant Adder Tree

Redundant Addition of 3 partial products to 2

57
Redundant Adder Tree

Redundant Addition of 4 partial products to 2

58
Redundant Adder Tree

Redundant Addition of 4 internal partial products
to 2

59
Redundant Adder Tree

Tree structure of redundant compressors

Cost? Delay? See Wallace tree designs
60
(Modified) Booth Recoding

Operand recoding to
Allow for signed multiplication
Reduce number of partial products
Popular Recoding Choice based on

61
(Modified) Booth Recoding

62
(Modified) Booth Recoding

63
(Modified) Booth Recoding

Implementation of Recoding

64
Recursive Multiplication

What does Implementation require?
Is it better than previous designs?
Do improvements by Karatsuba (1962) in asymptotic
complexity help ?

65
Division

Multiplication specification
For division inputs output
Not always a solution!
Consider
so that

remainder
66
Division

Two simple approaches
Reduce to simpler operations
Subtractions
Multiplications

67
Subtractive Division

Dividing Using Subtractions!
Starting left or from right ?
Considering ranges

68
SRT division

Consider
Choose largest k with
bn-1k1 ?
Recurrence i radix-2
with qi-1 (10 ? )
implies

69
SRT Division

Recurrence
Implementation

70
Multiplicative Division

Approximation of A/B
Contemporary microprocessors implementmultiplica
tive Division with
Newton-Raphsons Algorithm (e.g. INTEL IA-64)
Goldschmidts Algorithm (e.g. AMD K7)
Steps in multiplicative Division
Rough Approximation of 1/B
Iterative improvement of approximation accuracy
of A/B or 1/B by
Multiplications,
Complementations
Shifts
( Multiplication with A if 1/B was approximated
in Step (2.) )

71
Newtons Algorithm

Newton-Raphson Approximation of 1/B
Initialization
with relative error
k iterations
Scaling with A
quadratic convergence of one sided relative
approximation error

72
Goldschmidts Algorithm

Goldschmidts Approximation of A/B
Initialization
Iteration i for
Approximation of A/B by after k
iterationen
Computation of
like Newton iteration with B1
gt converges quadratically to 1
From initialization
gt converges quadratically to A/B

2 independent multiplications per iteration
73
Multiplication Scheduling

Newton-Raphson Goldschmidt-Powers
For both 2k1 multiplications in total
but Newton 2k1 multiplications
on critical path
Goldschmidt k1 multiplications
on critical path

A
B
B
A
74
Quadratic Convergence

for exact computation
Newton-Raphson
Goldschmidt-Powers

75
Precision Problems for exact computations

Example with a bit-width( ) 8,
p 64
Iteration i Goldschmidt-Powers (2 Mults with)
0 a x p 8 x 64
bits
1 (ap) x (ap) 72 x 72 bits
2 2(ap) x 2(ap) 144 x
144 bits
3 4(ap) x 4(ap) 288 x
288 bits
gt Rounding of intermediate values required

76
Problems and State of the Art

Newton Raphson is self correcting,
i.e. converges even with rounded intermediate
results to 1/B
Correction factor moves any rounded
intermediateapproximation in the direction 1/B.
rounding can even be chosen to maintain
quadraticconvergence, e.g. Cook
Goldschmidt-Powers is not self correcting,
i.e. convergence to A/B is not granted with
rounded intermediate results,because the
following does not hold anymore
quadratic convergence can not be achieved with
rounded intermediate results
Error analysis more complicated than for
Newton-Raphson

77
Problems and State of the Art

R. Goldschmidt (1964)
Presentation of algorithm for exact computations
Implementation (IBM) with rough error analysis
(absolute errors)
E. Krishnamurthy (1970)
Goldschmidts Algorithm is NOT self correcting
O. Spaniol in his Book Computer Arithmetic
(1982)
claims that Goldschmidts Algorithm is self
correcting
R. Golliver, INTEL IA64 (1999)
INTEL implements Newton-Raphson for simpler error
analysis (and for smaller multiplier)
S. Oberman, AMD K7 (1999)
AMD uses 76x76 multiplier for Goldschmidt
Division (68 bit), because mechanically checked
correctness proof exists
Consideration of absolute errors

78
Problems and State of the Art

Most work only considers variations of
Newton-Raphson for
simpler error analysis
quadratic convergence
no interest in constant factors
Practical implementations
constant acceleration through Goldschmidts
Algorithm interesting
Previous error analysis rough and limited to
special cases
General precise error analysis is important for
cost, power and delay optimizations in
practical implementations