More ALUs and floating point numbers - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

More ALUs and floating point numbers

Description:

use next bit of B to determine whether to add in shifted multiplicand ... ALU with add or subtract gets same result in more than one way: 6 = 2 8 0110 ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 48

Provided by: tar115

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: More ALUs and floating point numbers

1
More ALUs and floating point numbers

Today The rest of chap 4
Multiplication, Division and Floating point
numbers

2
The Story so far

Instruction Set Architectures
Performance issues
2s complement, Addition, Subtraction

Basically ISA and some ALU stuff
3
CPU The big picture
Execute
Decode
Fetch
Fetch
Store
Next
Execute an entire instruction
Design hardware for each of these steps!!!
4
CPU Clocking
Clk
Setup
Hold
Setup
Hold
Dont Care

All storage elements are clocked by the same
clock edge

5
CPU Big Picture Control and Data Path
Instructionlt310gt
Inst Memory
lt2125gt
lt2125gt
lt1620gt
lt1115gt
lt015gt
Adr
Op
Fun
Imm16
Rd
Rs
Rt
Control
ALUctr
MemtoReg
MemWr
nPC_sel
ALUSrc
RegDst
ExtOp
RegWr
Equal
DATA PATH
6
CPU The abstract version
Control
Ideal Instruction Memory
Control Signals
Conditions
Instruction
Rd
Rs
Rt
5
5
5
Instruction Address
A
Data Address
Data Out
32
Rw
Ra
Rb
32
Ideal Data Memory
32
32 32-bit Registers
Next Address
Data In
B
Clk
Clk
32
Datapath

Logical vs. Physical Structure

7
Computer Performance
Multiplication and Division
8
The 32 bit ALU-limited edition

Bit-slice plus extra on the two ends
Overflow means number too large for the
representation
Carry-look ahead and other adder tricks

32
A
B
32
signed-arith and cin xor co
a0
b0
a31
b31
4
ALU0
ALU31
M
cin
co
cin
co
s0
s31
C/L to produce select, comp, c-in
32
Ovflw
S
9
The Design Process

Divide and Conquer (e.g., ALU)
Formulate a solution in terms of simpler
components.
Design each of the components (subproblems)
Generate and Test (e.g., ALU)
Given a collection of building blocks, look for
ways of putting them together that meets
requirement
Successive Refinement (e.g., multiplier, divider)
Solve "most" of the problem (i.e., ignore some
constraints or special cases), examine and
correct shortcomings.
Formulate High-Level Alternatives (e.g., shifter)
Articulate many strategies to "keep in mind"
while pursuing any one approach.
Work on the Things you Know How to Do
The unknown will become obvious as you make
progress.

Optimization Criteria
Delay Logic levels, Fan in/out,
Area Gate count, Package count, Pin out
Cost, Power, Design time

10
The 32 bit ALU-limited edition

Supported Operations000 and001 or010
add110 subtract111 slt

Tuned performance by using Carry-lookahead
adders.

What about other instructions ?

multiply mult 2,3 Hi, Lo 2 x 3 64-bit
signed product
multiply unsigned multu2,3 Hi, Lo 2 x 3
64-bit unsigned product
divide div 2,3 Lo 2 3, Lo quotient, Hi
remainder
Hi 2 mod 3
divide unsigned divu 2,3 Lo 2
3, Unsigned quotient remainder

11
Grade school

Paper and pencil example
Multiplicand 1000Multiplier
x 1001 1000 0000
0000 1000 Product 1001000
m bits x n bits mn bit product
Binary makes it easy
0 gt place 0 ( 0 x multiplicand)
1 gt place multiplicand ( 1 x multiplicand)
well look at a couple of versions of
multiplication hardware

12
Unsigned basic multiplier

Stage i accumulates A 2 i if Bi 1

13
Unsigned basic multiplier
0
0
0
0
0
0
0
B0
B1
B2
B3
P0
P1
P2
P3
P4
P5
P6
P7

at each stage shift A left ( x 2)
use next bit of B to determine whether to add in
shifted multiplicand
accumulate 2n bit partial product at each stage

14
Unsigned basic multiplier
The algorithm
for(i0 ilt32 i) If ( mulitplier0 1 )
// we could do multiplieri and skip the shift
product multiplicand // product is 64
bit register // adder is 64 bit.
! multiplicand ltlt 1 // shift multiplicand to
prepare for next add // multiplicand is in a 64
bit register mulitplier gtgt 1 // position the
ith bit on lsb for test.
15
Unsigned basic multiplier

64-bit Multiplicand reg, 64-bit ALU, 64-bit
Product reg, 32-bit multiplier reg

Product Multiplier Multiplicand 0000 0000
0011 0000 0010
0000 0010 0001 0000 0100
0000 0110 0000 0000 1000
0000 0110

Multiplier datapath control
16
Some observations

Speed ?
Power/efficiency of the adder ?
Pattern of result on product register ?

1 clock per cycle gt 100 clocks per multiply
Ratio of multiply to add 51 to 1001
1/2 the bits in multiplicand always 0gt 64-bit
adder is wasted
0s inserted in left of multiplicand as
shiftedgt least significant bits of product
never changed once formed
Instead of shifting multiplicand to left, shift
product to right?

17
Multiplier 2.0

32-bit Multiplicand reg, 32 -bit ALU, 64-bit
Product reg, 32-bit Multiplier reg

Multiplicand
32 bits
Multiplier
Shift Right
32-bit ALU
32 bits
Shift Right
Product
Control
Write
64 bits
18
Multiplier 2.0
for(i0 ilt32 i) If ( mulitplier0 1 )
product3116 multiplicand //
product is 64 bit register // adder is 32 bit.
! product gtgt 1 // shift product right //
saving producti0 for final result mulitplier
gtgt 1 // position the ith bit on lsb for
test.
19
Multiplier 2.0

Product Multiplier Multiplicand NextProduct
0000 0000 0011 0010 00000010 0010 0000
0001 0000 0001 0010 00010010 0011 0000
0001 1000 0000 0010 00010000 0001 1000
0000 1100 0000 0010 00000000 0000 1100
0000 0110

20
Multiplier 3.0

Product register wastes space that exactly
matches size of multipliergt combine Multiplier
register and Product register

21
Multiplier 3.0
for(i0 ilt32 i) If ( product0 1 )
product3116 multiplicand //
product is 64 bit register // adder is 32 bit.
! product gtgt 1 // shift product right //
saving producti0 for final result
22
More observations ?

2 steps per bit because Multiplier Product
combined
MIPS registers Hi and Lo are left and right half
of Product
Gives us MIPS instruction MultU
How can you make it faster?
What about signed multiplication?
easiest solution is to make both positive
remember whether tocomplement product when done
(leave out the sign bit, run for 31 steps)
apply definition of 2s complement
need to sign-extend partial products and subtract
at the end
Booths Algorithm is elegant way to multiply
signed numbers using same hardware as before and
save cycles
can handle multiple bits at a time

23
Booths algorithm

Example 2 x 6 0010 x 0110
0010 x 0110 0000 shift (0 in
multiplier) 0010 add (1 in multiplier)
0100 add (1 in multiplier) 0000 shift
(0 in multiplier) 00001100
ALU with add or subtract gets same result in more
than one way 6 2 8 0110
00010 01000 11110 01000
For example
0010 x 0110 0000
shift (0 in multiplier) 0010 sub (first 1
in multpl.) . 0000 shift (mid
string of 1s) . 0010 add (prior step
had last 1) 00001100

24
Booths algorithm

Current Bit Bit to the Right Explanation Example O
p
1 0 Begins run of 1s 0001111000 sub
1 1 Middle of run of 1s 0001111000 none
0 1 End of run of 1s 0001111000 add
0 0 Middle of run of 0s 0001111000 none
Originally for Speed (when shift was faster than
add)
Replace a string of 1s in multiplier with an
initial subtract when we first see a one and then
later add for the bit after the last one

1 10000 01111
25
Booths algorithm
Booths Example (2 x 7)
Operation Multiplicand Product next? 0. initial
value 0010 0000 0111 0 10 -gt sub
1a. P P - m 1110
1110 1110 0111 0 shift P (sign ext) 1b.
0010 1111 0011 1 11 -gt nop, shift 2. 0010 1111
1001 1 11 -gt nop, shift 3. 0010 1111 1100 1 01
-gt add 4a. 0010 0010 0001
1100 1 shift 4b. 0010 0000 1110 0 done
26
Booths algorithm
Booths Example (2 x -3)
Operation Multiplicand Product next? 0. initial
value 0010 0000 1101 0 10 -gt sub
1a. P P - m 1110
1110 1110 1101 0 shift P (sign ext) 1b.
0010 1111 0110 1 01 -gt add
0010 2a. 0001 0110 1 shift
P 2b. 0010 0000 1011 0 10 -gt sub
1110 3a. 0010 1110 1011
0 shift 3b. 0010 1111 0101 1 11
-gt nop 4a 1111 0101 1 shift 4b. 0010 1111 1010
1 done
27
Division
1001 Quotient Divisor 1000 1001010
Dividend 1000 10 101
1010 1000 10
Remainder (or Modulo result) See how big a
number can be subtracted, creating quotient bit
on each step Binary gt 1 divisor or 0
divisor Dividend Quotient x Divisor
Remaindergt sizeof( Dividend ) sizeof(
Quotient ) sizeof( Divisor ) 3 versions of
divide, successive refinement
28
Division 1.0

64-bit Divisor reg, 64-bit ALU, 64-bit Remainder
reg, 32-bit Quotient reg

Shift Right
Divisor
64 bits
Quotient
Shift Left
64-bit ALU
32 bits
Write
Remainder
Control
64 bits
29
Division 1.0

Takes n1 steps for n-bit Quotient Rem.
Quotient Divisor Remainder0000 0010 0000 0000
0111

30
Division 2.0

1/2 bits in divisor always 0gt 1/2 of 64-bit
adder is wasted gt 1/2 of divisor is wasted
Instead of shifting divisor to right, shift
remainder to left?
1st step cannot produce a 1 in quotient bit
(otherwise too big) gt switch order to shift
first and then subtract, can save 1 iteration

32-bit Divisor reg, 32-bit ALU, 64-bit Remainder
reg, 32-bit Quotient reg

31
Division 2.0
Remainder gt 0
Test Remainder
Remainder lt 0
No lt n repetitions
Yes n repetitions
32
Division 3.0

Eliminate Quotient register by combining with
Remainder as shifted left
Start by shifting the Remainder left as before.
Thereafter loop contains only two steps because
the shifting of the Remainder register shifts
both the remainder in the left half and the
quotient in the right half
The consequence of combining the two registers
together and the new order of the operations in
the loop is that the remainder will shifted left
one time too many.
Thus the final correction step must shift back
only the remainder in the left half of the
register

32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder
reg, (0-bit Quotient reg)

33
Division 3.0
Remainder Divisor0000 0111 0010
Test Remainder
Remainder lt 0
Remainder 0
No lt n repetitions
Yes n repetitions (n 4 here)
34
Division some signed details

Sign of remainder ?
7/4 (Q1, R3)
7/4 (Q2, R-1)
Which do you prefer?
Convention
a/b (Q , R)
Sign(R) lt Sign(a)
Thus
7/4 (Q1, R3)
-7/4 (Q-1,R-3)

a Qb R
a
R
Qb
0
Qb
R
-a
35
Floating Point

What can be represented in N bits?
Unsigned 0 to 2
2s Complement - 2 to 2 - 1
1s Complement -2 1 to 2 -1
But, what about?
very large numbers? 9,349,398,989,787,762,244,859,
087,678
very small number? 0.0000000000000000000000045691
rationals 2/3
irrationals 2
transcendentals e

N
N-1
N-1
N-1
N-1
36
Floating Point
exponent
decimal point
23
-24
6.02 x 10 1.673 x 10
radix (base)
Mantissa
e - 127
IEEE F.P. 1.M x 2
Issues Arithmetic (, -, , / )
Representation, Normal form Range and
Precision Rounding Exceptions (e.g., divide
by zero, overflow, underflow) Errors
Properties ( negation, inversion, if A B then
A - B 0 )
37
Floating Point
Binary Fractions
10112 1x23 0x22 1x21 1x20 so... 101.0112
1x22 0x21 1x20 0x2-1 1x2-2
1x2-3 e.g., .75 3/4 3/22 1/2 1/4 .11
38
Floating Point
Representation of floating point numbers in IEEE
754 standard single precision
1
8
23
S
E
sign
M
mantissa sign magnitude, normalized binary
significand w/ hidden integer bit 1.M
exponent excess 127 binary integer
actual exponent is e E - 127
0 lt E lt 255
S
E-127
N (-1) 2 (1.M)
0 0 00000000 0 . . . 0 -1.5 1
01111111 10 . . . 0
Magnitude of numbers that can be represented is
in the range
-126
127
23
)
2
(1.0)
(2 - 2
to
2
which is approximately
-38
38
integer comparison valid on IEEE Fl.Pt. numbers
of same sign!
to
3.40 x 10
1.8 x 10
39
Floating Point

Leading 1 bit of significand is implicit
Exponent is biased to make sorting easier
all 0s is smallest exponent all 1s is largest
bias of 127 for single precision and 1023 for
double precision
summary (1)sign (1significand)
2exponent bias
Example
decimal -.75 -3/4 -3/22
binary -.11 -1.1 x 2-1
floating point exponent 126 01111110
IEEE single precision 10111111010000000000000000
000000

Significand
Sign
Exponent
40
Floating Point
Floating Point Addition

How do you add in scientific notation?
9.962 x 104 5.231 x 102
Basic Algorithm
1. Align
2. Add
3. Normalize
4. Round

Approximate algorithm.
While (Exp(A) gt Exp(B) )
shift Mantissa(B) right
Exp(B)
Mantissa(Result) Mantissa(A) Mantissa(B)
Exp(Result) Exp(A) // or Exp(B)
While (Mantissa(Result)msb !1!)
Exp(Result)--
Round(Mantissa)
Round(Exponent)

41
Floating Point
42
Floating Point Addition
43
Floating Point
Floating Point Multiplication

How do you multiply in scientific notation?
(9.9 x 104)(5.2 x 102) 5.148 x 107
Basic Algorithm
1. Add exponents
1a. Correct for bias in exponent representation
(Exp - 127)
2. Multiply
3. Normalize
4. Round
5. Set Sign

44
Floating Point Accuracy Issues
FP Accuracy

Extremely important in scientific calculations
Very tiny errors can accumulate over time
IEEE 754 FP standard has four rounding modes
always round up
always round down
truncate
round to nearest
gt in case of tie, round to nearest even
Requires extra bits in intermediate
representations

45
Floating Point Accuracy Issues
How many extra bits? IEEE Spec As if computed
the result exactly and rounded.

Guard bits -- bits to the right of the least
significant bit of the significand computed for
use in normalization (could become significant at
that point) and rounding.
IEEE 754 has three extra bits and calls them
guard, round, and sticky.

46
Floating Point Overflows
Infinity and NaNs
result of operation overflows, i.e., is larger
than the largest number that can be
represented overflow is not the same as divide
by zero (raises a different exception)
S 1 . . . 1 0 . . . 0
/- infinity
It may make sense to do further computations with
infinity e.g., X/0 gt Y may be a valid
comparison
Not a number, but not infinity (e.q. sqrt(-4))
invalid operation exception (unless operation
is or )
S 1 . . . 1 non-zero
NaN
HW decides what goes here
NaNs propagate f(NaN) NaN
47
Summary

Multiplication and division take much longer than
addition, requiring multiple addition steps.
Floating Point extends the range of numbers that
can be represented, at the expense of precision
(accuracy).
FP operations are very similar to integer, but
with pre- and post-processing.
Rounding implementation is critical to accuracy
over time.

Write a Comment

User Comments (0)