CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel

Description:

construct two's complement representation from binary representation: ... full adder implements adder for n=1: CSA(1) = FA !!!! CSE 8351 Computer Arithmetic ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 79
Provided by: petermich
Learn more at: http://lyle.smu.edu
Category:

less

Transcript and Presenter's Notes

Title: CSE 8351 Computer Arithmetic Fall 2005 Instructors: PeterMichael Seidel


1
CSE 8351Computer ArithmeticFall
2005Instructors Peter-Michael Seidel

2
II. Simple Algorithms for Arithmetic Unit Design
in Hardware
  • Addition/Multiplication/ SRT Division/Square Root
    /Reciprocal Approximation

3
Arithmetic Unit Design in Hardware
  • Input Interface n-bit operands A, B
  • Output Interface n-bit result C
  • Arithmetic Function f Bn x Bn ? Bn
  • What is different from
  • other (non-arithmetic)
  • Hardware Units

4
Arithmetic Unit Design in Hardware
  • Specification
  • Truth tables not feasible (22n x n entries)
  • for n64 more than a Gogool
  • Complexity
  • there are 2(22n x n) different functions
  • for n64 more than a Gogoolplex
  • Only a handful Interesting/Used out of a
    Gogoolplex !!!
  • Other forms of specification possible

5
Arithmetic functions supported
  • Functions interesting
  • because of specific properties that arise at
    operand level
  • -gt use mathematic formalism to specify
    functionality, e.g.
  • For defined values ltAgt, ltBgt, ltCgt in N
  • ltCgt f(ltAgt,ltBgt) ltAgt ltBgt
  • -gt does not directly help in implementing or
    testing
  • -gt use tools from the (well established) science
    of mathematics to transform equation and extract
    local properties
  • -gt help use oflimited global influence, local
    computation, reuse, recurrence

6
Notations, Representations, Values
  • Bit strings sequences of bits
    (concatenation also by (..,..,..) )
  • a 0011010
    (001,10,10)
  • For bit and natural numbers n
  • string consisting of n copies of x
  • Bits of strings are indexed from
    right (0) to left (n-1)
  • or

7
Binary representation
  • Natural number with binary representation
  • Range of numbers which have a binary
    representation of length n
  • n -bit binary representation of a natural number
  • with

8
Twos complement representation
  • Natural number with twos complement
    representation
  • Range of numbers with twos complement
    representation of length n
  • n -bit twos complement representation of a
    natural number
  • with

9
Binary Addition
  • Binary Addition (Specification)
  • Coping with Complexity
  • Simple for n1

10
Addition (n1)
  • Half adder adding two bits , sum
    represented by
  • obvious equations
  • Full adder adding three bits
  • obvious equations

11
Addition (n1)
  • Half adder Full adder implementations

12
Binary Addition
  • Greedy Approach (right to left) -gt Ripple
    Carry Adder
  • Development/ Verification based on equivalence
    transform of Specification

13
Ripple Carry Adder

14
Basic properties (1)
  • For
  • leading zeros do not change the value of a binary
    representation
  • binary representations can be split for each
  • twos complement representations have a sign bit
    an-1
  • construct twos complement representation from
    binary representationnote, that twos
    complement representation is longer by one bit

15
Basic properties(2)
  • For
  • sign extension does not change the value
  • negation of a number in twos complement
    representationbasis for subtraction algorithm
    !
  • congruencies modulo ,

16
Basic properties(3)
  • Twos complement addition based on binary
    addition
  • For
  • the result of the n -bit binary addition
  • is useful for n -bit twos complement addition

17
Ripple Carry Adder

18
Ripple Carry Adder

19
Ripple Carry Adder

20
Ripple Carry Adder

21
Ripple Carry Adder

22
Ripple Carry Adder

23
Ripple Carry Adder

24
Ripple Carry Adder

25
Binary Addition
  • Complexity Delay, Cost, Power
  • Lower Bounds ? What
    computational model ?
  • What assumptions ?

26
Faster Addition
  • Challenge (KP95)
  • numbers given as stacks of digits
  • it takes 1 second to add two digits and put
    result digit on result stack
  • one person can add two 5000 digit number in 5000
    seconds ?! How?
  • Can two people add two 5000 digit numbers in
    less than an hour?
  • Observation (Notion of Carries)
  • limited carry propagation
  • -gt pre-computing upper sums for all cases
    ck1 and ck0
  • Divide and conquer, but also Ripple-carry
    approach is divide and conquer

27
Conditional Sum Adder
  • Main observation
  • limited carry propagation
  • -gt pre-computing upper sums for all cases
    ck1 and ck0
  • Assume n is power of 2

28
Conditional Sum Adder
  • Main principle pre-computing upper sums for the
    cases ck1 and ck0
  • Assume n is power of 2

29
Conditional Sum Adder
  • Main principle pre-computing upper sums for the
    cases ck1 and ck0
  • Assume n is power of 2

30
Conditional Sum Adder
  • Main principle pre-computing upper sums for the
    cases ck1 and ck0
  • Assume n is power of 2

31
Conditional Sum Adder
  • Main principle pre-computing upper sums for the
    cases ck1 and ck0
  • Assume n is power of 2

32
Conditional Sum Adder
  • Main principle pre-computing upper sums for the
    cases ck1 and ck0
  • Assume n is power of 2

33
Conditional Sum Adder
  • Main principle pre-computing upper sums for the
    cases ck1 and ck0
  • Assume n is power of 2

34
Conditional Sum Adder
35
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

36
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

37
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

38
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

39
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

40
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

41
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

42
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

43
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

44
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

45
Conditional Sum Adder
  • full adder implements adder for n1 CSA(1)
    FA !!!!

46
Binary Multiplication
  • Binary Multiplication (Specification)
  • Remember Binary Addition (Specification)

representable with 2n bits !!
47
Implementation to cope with Complexity
  • Strategies that worked for binary addition
  • - consideration of small n
  • - property extraction from specification
  • - greedy approach
  • - divide conquer
  • Strategies for binary multiplication
  • - consideration of small n
  • - reduction approach
  • - divide conquer
  • - reduction to binary addition
  • - rewriting of specification
  • - considering logarithms (European
    logarithmic processor)

48
Consideration of small n
  • Binary multiplication
  • even simpler than Addition for n1
  • This also works for n x 1 -bit multiplication

Consider ltbn-10gt 2n ?
49
Reduction Approach
  • Reduction n -gt n-1

(n-1)-bit multiplication
1-bit AND addition (carry-in)
(n-1)-bit AND additions
Implementation, Complexity ?
50
Multiplication Reduction in Sums

Definition Partial Products
(simple to compute in binary)
(not affected by remaining sum)
51
Binary Multiplication
  • Implementations similar to grade school
    algorithm 0010 (multiplicand) __x_101
    1 (multiplier) 0010
    0010 0000
    0010 00010110
  • Negative numbers convert and multiply
  • better technique using Booth Recoding

52
Multiplier Implementation
  • Stage i accumulates A 2 i if Bi 1
  • What are the boxes ? How much hardware for n-bit
    multiplier?

53
Multiplication Complexity
  • So far
  • Delay(n) DAND n DFA O(n)
  • Cost(n) n2 (CAND CFA) O(n2)
  • Inherent problem
  • Adding n partial products (n-bit numbers)
  • Addition (can be done in delay O(log(n)) )

54
Parallel Multiplication (PP adder tree)
  • Partial Product Generations and Additions can be
    done in parallel

Operand A
Operand B
PPG
PPG
PPG
PPG
PPG
PPG
Binary Adder
Binary Adder
Binary Adder
Fanout? Precisions? Delay? Cost?
Binary Adder
Product C
55
Redundant Adder Tree
  • Redundant Addition (Carry-Save Adders)

compression of 3 binary operands to 2
By the use of a line of full adders
56
Redundant Adder Tree
  • Redundant Addition of 3 partial products to 2

57
Redundant Adder Tree
  • Redundant Addition of 4 partial products to 2

58
Redundant Adder Tree
  • Redundant Addition of 4 internal partial products
    to 2

59
Redundant Adder Tree
  • Tree structure of redundant compressors

Cost? Delay? See Wallace tree designs
60
(Modified) Booth Recoding
  • Operand recoding to
  • Allow for signed multiplication
  • Reduce number of partial products
  • Popular Recoding Choice based on

61
(Modified) Booth Recoding

62
(Modified) Booth Recoding

63
(Modified) Booth Recoding
  • Implementation of Recoding

64
Recursive Multiplication
  • What does Implementation require?
  • Is it better than previous designs?
  • Do improvements by Karatsuba (1962) in asymptotic
    complexity help ?

65
Division
  • Multiplication specification
  • For division inputs output
  • Not always a solution!
  • Consider
  • so that

remainder
66
Division
  • Two simple approaches
  • Reduce to simpler operations
  • Subtractions
  • Multiplications

67
Subtractive Division
  • Dividing Using Subtractions!
  • Starting left or from right ?
  • Considering ranges

68
SRT division
  • Consider
  • Choose largest k with
  • bn-1k1 ?
  • Recurrence i radix-2
  • with qi-1 (10 ? )
  • implies

69
SRT Division
  • Recurrence
  • Implementation

70
Multiplicative Division
  • Approximation of A/B
  • Contemporary microprocessors implementmultiplica
    tive Division with
  • Newton-Raphsons Algorithm (e.g. INTEL IA-64)
  • Goldschmidts Algorithm (e.g. AMD K7)
  • Steps in multiplicative Division
  • Rough Approximation of 1/B
  • Iterative improvement of approximation accuracy
    of A/B or 1/B by
  • Multiplications,
  • Complementations
  • Shifts
  • ( Multiplication with A if 1/B was approximated
    in Step (2.) )

71
Newtons Algorithm
  • Newton-Raphson Approximation of 1/B
  • Initialization
  • with relative error
  • k iterations
  • Scaling with A
  • quadratic convergence of one sided relative
    approximation error

72
Goldschmidts Algorithm
  • Goldschmidts Approximation of A/B
  • Initialization
  • Iteration i for
  • Approximation of A/B by after k
    iterationen
  • Computation of
    like Newton iteration with B1
  • gt converges quadratically to 1
  • From initialization
  • gt converges quadratically to A/B

2 independent multiplications per iteration
73
Multiplication Scheduling
  • Newton-Raphson Goldschmidt-Powers
  • For both 2k1 multiplications in total
  • but Newton 2k1 multiplications
    on critical path
  • Goldschmidt k1 multiplications
    on critical path

A
B
B
A
74
Quadratic Convergence
  • for exact computation
  • Newton-Raphson
  • Goldschmidt-Powers

75
Precision Problems for exact computations
  • Example with a bit-width( ) 8,
    p 64
  • Iteration i Goldschmidt-Powers (2 Mults with)
  • 0 a x p 8 x 64
    bits
  • 1 (ap) x (ap) 72 x 72 bits
  • 2 2(ap) x 2(ap) 144 x
    144 bits
  • 3 4(ap) x 4(ap) 288 x
    288 bits
  • gt Rounding of intermediate values required

76
Problems and State of the Art
  • Newton Raphson is self correcting,
  • i.e. converges even with rounded intermediate
    results to 1/B
  • Correction factor moves any rounded
    intermediateapproximation in the direction 1/B.
  • rounding can even be chosen to maintain
    quadraticconvergence, e.g. Cook
  • Goldschmidt-Powers is not self correcting,
  • i.e. convergence to A/B is not granted with
    rounded intermediate results,because the
    following does not hold anymore
  • quadratic convergence can not be achieved with
    rounded intermediate results
  • Error analysis more complicated than for
    Newton-Raphson

77
Problems and State of the Art
  • R. Goldschmidt (1964)
  • Presentation of algorithm for exact computations
  • Implementation (IBM) with rough error analysis
    (absolute errors)
  • E. Krishnamurthy (1970)
  • Goldschmidts Algorithm is NOT self correcting
  • O. Spaniol in his Book Computer Arithmetic
    (1982)
  • claims that Goldschmidts Algorithm is self
    correcting
  • R. Golliver, INTEL IA64 (1999)
  • INTEL implements Newton-Raphson for simpler error
    analysis (and for smaller multiplier)
  • S. Oberman, AMD K7 (1999)
  • AMD uses 76x76 multiplier for Goldschmidt
    Division (68 bit), because mechanically checked
    correctness proof exists
  • Consideration of absolute errors

78
Problems and State of the Art
  • Most work only considers variations of
    Newton-Raphson for
  • simpler error analysis
  • quadratic convergence
  • no interest in constant factors
  • Practical implementations
  • constant acceleration through Goldschmidts
    Algorithm interesting
  • Previous error analysis rough and limited to
    special cases
  • General precise error analysis is important for
    cost, power and delay optimizations in
    practical implementations
Write a Comment
User Comments (0)
About PowerShow.com