# COMPUTER ARITHMETIC - PowerPoint PPT Presentation

Title:

## COMPUTER ARITHMETIC

Description:

### Title: COSC3330/6308 Computer Architecture Author: Jehan-Fran ois P ris Last modified by: Jehan-Fran ois P ris Created Date: 8/29/2001 4:04:21 AM – PowerPoint PPT presentation

Number of Views:365
Avg rating:3.0/5.0
Slides: 134
Provided by: Jehan85
Category:
Tags:
Transcript and Presenter's Notes

Title: COMPUTER ARITHMETIC

1
COMPUTER ARITHMETIC
• Jehan-François Pâris
• jparis_at_uh.edu

2
Chapter Organization
• Representing negative numbers
• Integer addition and subtraction
• Integer multiplication and division
• Floating point operations
• Examples of implementation
• IBM 360, RISC, x86

3
A warning
• Binary addition, subtraction, multiplication and
division are very easy

4
5
General concept
• (carry) 1_ 19
• 7
• 26
• ( carry) 111_
• 10011
• 111
• 11010
• 1682 26

6
Realization
• Simplest solution is a battery of full adders

o s3
s2
s1
s0
x3 y3
x2 y2
x1 y1
x0 y0
7
Observations
• Output o indicates if there is an overflow
• A result that cannot be represented using 4 bits
• Happens when x y gt 15
• Operation is slowed down by carry propagation
• Faster solutions (not discussed here)

8
Signed and unsigned additions
• Unsigned addition in 4-bit arithmetic
• ( carry) 11_
• 1011
• 0011
• 1110
• 11 3 14(8 4 2)
• Signed addition in4-bit arithmetic
• ( carry) 11_
• 1011
• 0011
• 1110
• -5 3 -2

9
Signed and unsigned additions
• Same rules apply even though bit strings
represent different values
• Sole difference is overflow handling

10
Overflow handling (I)
• No overflow in signed arithmetic
• ( carry) 111_
• 1110
• 0011
• 0001
• -2 3 1(correct)
• Signed addition in4-bit arithmetic
• ( carry) 1__
• 0110
• 0011
• 1001
• 6 3 ?? -7(false)

11
Overflow handling (II)
• In signed arithmetic an overflow happens when
• The sum of two positive numbers exceeds the
maximum positive value that can be represented
using n bits 2n 1 1
• The sum of two negative numbers falls below the
minimum negative value that can be represented
using n bits 2n 1

12
Example
• Four-bit arithmetic
• Sixteen possible values
• Positive overflow happens when result gt 7
• Negative overflow happens when result lt -8
• Eight-bit arithmetic
• 256 possible values
• Positive overflow happens when result gt 127
• Negative overflow happens when result lt -128

13
Overflow handling (III)
• MIPS architecture handles signed and unsigned
overflows in a very different fashion
• Ignores unsigned overflows
• Implements modulo 2n arithmetic
• Generates an interrupt whenever it detects a
signed overflows
• Lets the OS handled the condition

14
Why?
• To keep the CPU as simple and regular as possible

15
An interesting consequence
• Most C compilers ignore overflows
• C compilers must use unsigned arithmetic for
their integer operations
• Fortran compilers expect overflow conditions to
be detected
• Fortran compilers must use signed arithmetic for
their integer operations

16
Subtraction
• Can be implementing by
• Specific hardware
• Negating the subtrahend

17
Negating a number
• Toggle all bits then add one

18
In 4-bit arithmetic (I)
0000 0 1111 1 0000 0
0001 1 1110 1 1111 -1
0010 2 1101 1 1110 -2
0011 3 1100 1 1101 -3
0100 4 1011 1 1100 -4
0101 5 1010 1 1011 -5
0110 6 1001 1 1010 -6
0111 7 1000 1 1001 -7
19
In 4-bit arithmetic (II)
1000 -8 0111 1 1000 ?
1001 -7 0110 1 0111 7
1010 -6 0101 1 0110 6
1011 -5 0100 1 0101 5
1100 -4 0011 1 0100 4
1101 -3 0010 1 0011 3
1110 -2 0001 1 0010 2
1111 -1 0000 1 0001 1
20
MULTIPLICATION
21
Decimal multiplication
• What are the rules?
• Successively multiply the multiplicand by each
digit of the multiplier starting at the right
shifting the result left by an extra left
position each time each time but the first
• Sum all partial results
• (carry) 1_37
• x 12
• 74 370444

22
Binary multiplication
• What are the rules?
• Successively multiply the multiplicand by each
digit of the multiplier starting at the right
shifting the result left by an extra left
position each time each time but the first
• Sum all partial results
• Binary multiplication is easy!
• (carry)111 _1101
• x 101
• 1101
• 00 1101001000001

23
Binary multiplication table
X 0 1
0 0 0
1 0 1
24
Algorithm
• Clear contents of 64-bit product register
• For (i 0 i lt32 i)
• If (LSB of multiplier_register 1)
• Add contents of multiplicand register to product
register
• Shift right one position multiplier register
• Shift left one position multiplicand register
• / / for loop

25
Multiplier First version
Multiplier
Multiplicand (64 bits)
64-bitALU
Control
Product (64 bits)
26
Multiplier First version
As we learnedin grade school
Multiplier
Multiplicand (64 bits)
To get next bit ( LSB to MSB)
64-bitALU
Control
Product (64 bits)
27
Explanations
• Multiplicand register must be 64-bit wide because
32-bit multiplicand will be shifted 32 times to
the left
• Requires a 64-bit ALU
• Product register must be 64-bit wide to
accommodate the result
• Contents of multiplier register is shifted 32
times to the right so that each bit successively
becomes its least significant bit (LSB)

28
Example (I)
• Multiply 0011 by 0011
• StartMultiplicand Multiplier Product0011 0011
0000
• First additionMultiplicand Multiplier
Product0011 0011 0011

29
Example (II)
• Shift right and leftMultiplicand Multiplier
Product0110 0001 0011
• Second additionMultiplicand Multiplier
Product0110 0001 1001
• 0110 011 1001

30
Example (III)
• Shift right and leftMultiplicand Multiplier
Product1100 0000 1001
• Multiplier is all zeroes we are done

31
First Optimization
• Must have a 64-bit ALU
• More complex than a 32-bit ALU
• Solution is not to shift the multiplicand
• After each cycle, the LSB being added remains
unchanged
• Will save that bit elsewhere and shift the
product register one position to the left after
each iteration

32
Binary multiplication
• 1101
• x 101
• 1101
• 00 1101001000101
• Observe that the least significant bit added
during each cycle remains unchanged

33
Algorithm
• Clear contents of 64-bit product register
• For (i 0 i lt32 i)
• If (LSB of multiplier_register 1)
• Add contents of multiplicand register to product
register
• Save LSB of product register
• Shift right one position both multiplier register
and product register
• / / for loop

34
Multiplier Second version
Multiplier
Multiplicand
32-bitALU
Control Test
Product (64 bits)
Shift Right and Save
35
Decimal Example (I)
• Multiply 27 by 12
• StartMultiplicand Multiplier Product Result27
12 -- --
• First digitMultiplicand Multiplier
Product Result27 12 54 --

36
Decimal Example (II)
• Shift right multiplier and productMultiplicand M
ultiplier Product Result27 1 5 4
• Second digitMultiplicand Multiplier
Product Result27 1 32 4

37
Decimal Example (III)
• Shift right multiplier and productMultiplicand M
ultiplier Product Result27 0 3 24
• Multiplier equals zeroResult is obtained by
concatenating contents of product and result
registers
• 324

38
How did it work?
• We learned
• 27?12 27?10 27?2 27?10 54 270
54
• Algorithm uses another decomposition
• 27?12 27?10 27?2 27?10 50 4
(27?10 50) 4 320 4

39
Example (I)
• Multiply 0011 by 0011
• StartMultiplicand Multiplier Product Result 001
1 0011 -- --
• First bitMultiplicand Multiplier
Product Result0011 0011 0011 --

40
Example (II)
• Shift right multiplier and productMultiplicand M
ultiplier Product Result0011 0001 0001 1-
• Second bitMultiplicand Multiplier
Product Result0011 0001 0100 1-
• Product register contains 0011 001 0100

41
Example (III)
• Shift right multiplier and productMultiplicand M
ultiplier Product Result0011 0000 010 01-
• Multiplier equals zeroResult is obtained by
concatenating contents of product and result
registers
• 1001 9

42
Second Optimization
• Both multiplier and product must be shifted to
one position to the right after each iteration
• Both are now 32-bit quantities
• Can store both quantities in the product register

43
Multiplier Third version
Multiplicand
Control Test
32-bitALU
Multiplier Product
Shift Right and Save
44
Third Optimization
• Multiplication requires 32 additions and 32 shift
operations
• Can have two or more partial multiplications
• One using bits 0-15 of multiplier
• A second using bits 16-31
• then add together the partial results

45
Multiplying negative numbers
• Can use the same algorithm as before but we must
extend the sign bit of the product

46
Related MIPS instructions (I)
• Integer multiplication uses a separate pair of
registers (hi and lo)
• mult s0, s1
• multiply contents of register s0 by contents of
register s1 and store results in register pair
hi-lo
• multu s0, s1
• same but unsigned

47
Related MIPS instructions (II)
• mflo s9
• Move contents of register lo to register s0
• mfhi s9
• Move contents of register hi to register s0

48
DIVISION
49
Division
• Implemented by successive subtractions
• Result must verify the equality
• Dividend Multiplier Quotient Remainder

50
Decimal division (long division
• What are the rules?
• Repeatedly try to subtract smaller multiple of
divisor from dividend
• Record multiple (or zero)
• At each step, repeat with a lower power of ten
• Stop when remainder is smaller than divisor

303
7 2126
-210
26 -21 5

51
Binary division
011
11 1011
-11
1011 gt-11 101
gtgt-11
10
• What are the rules?
• Repeatedly try to subtract powers of two of
divisor from dividend
• Mark 1 for success, 0 for failure
• At each step, shift divisor one position to the
right
• Stop when remainder is smaller than divisor

X
X
52
Same division in decimal
213
3 11
-12
11 gt-6 5
gt-3
2
• What are the rules?
• Repeatedly try to subtract powers of two of
divisor from dividend
• Mark 1 for success, 0 for failure
• At each step, shift divisor one position to the
right
• Stop when remainder is smaller than divisor

X
X
53
Observations
• Binary division is actually simpler
• We start with a left-shifted version of divisor
• We try to subtract it from dividend
• No need to find out which multiple to subtract
• We mark 1 for success, 0 for failure
• We shift divisor one position left after every
attempt

54
How to start the division
• One 64-bit register for successive remainders
• One 64-bit register for divisor
• Start with quotient in upper half
• One 32-bit register for the quotient

Initialized with dividend
All zeroes
55
How we proceed (I)-
• After each step we shift the quotient to the
right one position at a time

56
How we proceed (II)
• After each step we shift the contents of the
quotient register one position to the left
• To make space for the new 0 or 1 being inserted

57
Division Algorithm
• For i in range(0,33) from 0 to 32
• Subtract contents of divisor register
fromremainder register
• If remainder ? 0
• Shift quotient register to the left
• Set new rightmost bit to 1
• Else
• Undo subtraction
• Shift quotient register to the left
• Set new rightmost bit to 0
• Shift right one position contents of divisor
register

58
A simple divider
Quotient
Divisor (64 bits)
64-bitALU
Control Test
Remainder (64 bits)
59
Signed division
• Easiest solution is to remember the sign of the
operands and adjust the sign of the quotient and
remainder accordingly
• A little problem
• 5 ? 2 2 and the remainder is 1
• -5 ? 2 -2 and the remainder is -1
• The sign of the remainder must match the sign of
the quotient

60
Related MIPS instructions
• Integer division uses the same pair of registers
(hi and lo) as integer multiplication
• div s0, s1
• divide contents of register s0 by contents of
register s, leave the quotient in register lo
and the remainder in register hi
• divu s0, s1
• same but unsigned

61
TRANSITION SLIDE
• Here end the materials that were on the first
fall 2012 midterm
• Here start the materials that will be on the
fall 2012 midterm

To be moved to the right place
62
FLOATING POINT OPERATIONS
63
Floating point numbers
• Used to represent real numbers
• Very similar to scientific notation
• 3.5106, 0.82105, 75106,
• Both decimal numbers in scientific notation and
floating point numbers can be normalized
• 3.5106, 8.2106, 7.5107,

64
Fractional binary numbers
• 0.1 is ½ or 0.5ten
• 0.01 is 0.1 is 1/4 or 0.25ten
• 0.11 is ½ ¼ ¾ or 0.75ten
• 1.1 is 1½ or 1.5ten
• 10.01 is 2 ¼ or 2.5ten
• 11.11 is ______ or _____

65
Normalizing binary numbers
• 0.1 becomes 1.02-1
• 0.01 becomes 1.02-2
• 0.11 becomes 1.12-1
• 1.1 is already normalized and equal to1.020
• 10.01 becomes 1.00121
• 11.11 becomes 1______2_____

66
Representation
• Sign exponent coefficient
• IEEE Standard 754
• 1 8 23 32 bits
• 1 11 52 64 bits (double precision)

67
The sign bit
• 0 indicates a positive number
• 1 a negative number

68
The exponent (I)
• 8 bits for single precision
• 11 bits for double precision
• With 8 bits, we can represent exponents between
-126 and 127
• All-zeroes value is reserved for the zeroes and
denormalized numbers
• All-ones value are reserved for the infinities
and NaNs (Not a Number)

69
The exponent (II)
• Exponents are represented using a biased notation
• Stored value actual exponent bias
• For 8 bit exponents, bias is 127
• Stored value of 1 corresponds to 126
• Stored value of 254 corresponds to 127

0 and 255 are reserved for special values
70
The exponent (III)
• Biased notation simplifies comparisons
• If two normalized floating point numbers have
different exponents, the one with the bigger
exponent is the bigger of the two

71
Special values (I)
• Signed zeroes
• IEEE 754 distinguishes between 0 and 0
• Represented by
• Sign bit 0 or 1
• Biased exponent all zeroes
• Coefficient all zeroes

72
Special values (II)
• Denormalized numbers
• Numbers whose coefficient cannot be normalized
• Smaller than 2126
• Will have a coefficient with leading zeroes and
exponent field equal to zero
• Reduces the number of significant digits
• Lowers accuracy

73
Special values (III)
• Infinities
• ? and ?
• Represented by
• Sign bit 0 or 1
• Biased exponent all ones
• Coefficient all zeroes

74
Special values (IV)
• NaN
• For Not a Number
• Often result from illegal divisions0/0, 8/8,
8/8, 8/8, and 8/8
• Represented by
• Sign bit 0 or 1
• Biased exponent all ones
• Coefficient non zero

75
The coefficient
• Also known as fraction or significand
• Most significant bit is always one
• Implicit and not represented
• Biased exponent is 127ten
• True coefficient is implicit one followed by all
zeroes

76
Decoding a floating point number
• Subtract 127 from biased exponent to obtain power
of two ltbegt 127
• Use coefficient to construct a normalized binary
value with a binary point 1.ltcoefficientgt
• Number being represented is 1.ltcoefficientgt
2ltbegt 127

77
First example
• Sign bit is zero Number is positive
• Biased exponent is 127 Power of two is zero
• Normalized binary value is 1.0000000
• Number is 120 1

78
Second example
• Sign bit is zero Number is positive
• Biased exponent is 128 Power of two is 1
• Normalized binary value is 1.1000000
• Number is 1.121 11 3ten

79
Third example
• Sign bit is 1 Number is negative
• Biased exponent is 126 Power of two is 1
• Normalized binary value is 1.1100000
• Number is 1.1121 0.111 7/8ten

80
Can we do it now?
• Sign bit is 0 Number is ___________
• Biased exponent is 129 Power of two is _______
• Normalized binary value is 1.__________
• Number is _________________________

81
Encoding a floating point number
• Use sign to pick sign bit
• Normalize the numberConvert it to form 1.ltmore
bitsgt 2ltexpgt
• Add 127 to exponent ltexpgt to obtainbiased
exponent ltbegt
• Coefficient ltcoeffgt is equal to fractional part
ltmore bitsgt of number

82
First example
• Represent 7
• Convert to binary 111
• Normalize 1.1122
• Sign bit is 0
• Biased exponent is 127 2 10000001two
• Coefficient is 11000

83
Second example
• Represent 1/2
• Convert to binary 0.1
• Normalize 1.02-1
• Sign bit is 0
• Biased exponent is 127 1 01111110two
• Coefficient is 000

84
Third example
• Represent 2
• Convert to binary 10
• Normalize 1.021
• Sign bit is 1
• Biased exponent is 127 1 10000000two
• Coefficient is 000

85
Fourth example
• Represent 9/4
• Convert to binary 100122
• Normalize 1.00121
• Sign bit is 0
• Biased exponent is 127 1 10000000two
• Coefficient is 00100

86
Can we do it now?
• Represent 6.25
• Convert to binary ________
• Normalize 1.______2_______
• Sign bit is _____
• Biased exponent is 127 ___ ______ten
• Coefficient is_________

87
Range
• Can represent numbers between1.0002126 and
1.1112127
• Say between 2126 and 2128
• Observing that 210?? 103we divide the exponents
by 10 and multiply them by 3 to obtain the
interval expressed in powers of 10
• Approximate range is 1038 to 1038

88
Accuracy
• We have 24 significant bits
• Theoretical precision of 1/224, that is, roughly
1/107
• Cannot add correctly billions or trillions
• Actual situation is worse if we do too many
computations
• 1,000,000 999,999.4875 ???

89
Guard bits
• Do all arithmetic operations with two additional
bits to reduce rounding errors

90
Double precision arithmetic (I)
• Use 64-bit double words
• Allows us to have
• One bit for sign
• Eleven bits for exponent
• 2,048 possible values
• Fifty-two bits for coefficient
• Plus the implicit leading bit

91
Double precision arithmetic (II)
• Exponents are still represented using a biased
notation
• Stored value actual exponent bias
• For 11-bit exponents, bias is 1023
• Stored value of 1 corresponds to 1,022
• Stored value of 2,046 corresponds to 1,023
• Stored values of 0 and 2,047 are reserved for
special cases

92
Double precision arithmetic (III)
• Can now represent numbers between1.00021,022
and 1.11121,203
• Say between 21,022 and 21,204
• Approximate range is 10307 to 10307
• In reality, more like 10308 to 10308

93
Double precision arithmetic (IV)
• We now have 53 significant bits
• Theoretical precision of 1/253. that is, roughly
1/1016
• Can now add correctly billions or trillions

94
If that is now enough,
• Can use 128-bit quad words
• Allows us to have
• One bit for sign
• Fifteen bits for exponent
• From 16382 to 16383
• One hundred twelve bits for coefficient
• Plus the implicit leading bit

95
Decimal floating point addition (I)
• 5.25103 1.22102 ?
• Denormalize number with smaller
exponent5.25103 0.122103
• Add the numbers5.25103 0.122103 5.372103
• Result is normalized

96
Decimal floating point addition (II)
• 9.25103 8.22102 ?
• Denormalize number with smaller
exponent9.25103 0.822103
• Add the numbers9.25103 0.822103
10.072103
• Normalize the result10.072103 1.0072104

97
Binary floating point addition (I)
• Say 1001 10 or 1.00123 1.021
• Denormalize number with smaller
exponent1.00123 0.0123
• Add the numbers1.00123 0.0123 1.01123
• Result is normalized

98
Binary floating point addition (II)
• Say 101 11 or 1.0122 1.121
• Denormalize number with smaller exponent
1.0122 0.1122
• Add the numbers1.0122 0.1122 10.0022
• Normalize the results10.0022 1.00023

99
Binary floating point subtraction
• Say 101 11 or 1.0122 1.121
• Denormalize number with smaller exponent
1.0122 0.1122
• Perform the subtraction1.0122 0.1122
0.1022
• Normalize the results0.1022 1.021

100
Decimal floating point multiplication
• Exponent of product is the sum of the exponents
of multiplicand and multiplier
• Coefficient of product is the product of the
coefficients of multiplicand and multiplier
• Compute sign using usual rules of arithmetic
• May have to renormalize the product

101
Decimal floating point multiplication
• 6103 2.5102 ?
• Exponent of product is 3 2 5
• Multiply the coefficients 6 2.5 15
• Result will be positive
• Normalize the result 15105 1.5106

102
Binary floating point multiplication
• Exponent of product is the sum of the exponents
of multiplicand and multiplier
• Coefficient of product is the product of the
coefficients of multiplicand and multiplier
• Compute sign using usual rules of arithmetic
• May have to renormalize the product

103
Binary floating point multiplication
• Say 110 11 or 1.122 1.121
• Exponent of product is 2 1 3
• Multiply the coefficients 1.1 1.1 10.01
• Result will be positive
• Normalize the result 10.0123 1.00124

104
FP division
• Very tricky
• One good solution is to multiply the dividend by
the inverse of the divisor

105
A trap
• Addition does not necessarily commute
• 91037 91037 410-37
• Observe that
• (91037 91037) 410-37 410-37
• while
• 91037 (91037 410-37) 0
• due to the limited accuracy of FP numbers

106
IMPLEMENTATIONS
107
The floating-point unit (I)
• Floating-point instructions were an optional
feature
• User had to buy a separate floating-point unit
aka floating point coprocessor
• Before Intel 80486, all Intel x86 architectures
the option to install a separate floating-point
chip(8087, 80287, 80387)

108
The floating-point unit (II)
• Default solution was to simulate the missing
floating-point instructions through assembly
routines
• As a result, many processor architectures use
separate banks of registers for integer
arithmetic and floating point arithmetic

109
The floating-point unit (III)
• Some older architectures implemented
• Single-precision operations in hardware through
the FPU
• Double-precision operations by software
• Made double-precession operations much costlier
than single-precision operations.

110
IBM 360 FP INSTRUCTIONS
111
Overview
• FPU offers a very familiar user interface
• Eight general purpose FP registers
• Distinct from the integer registers
• Two-operand instructions in both RR and RX
formats
• Includes single-precision and double-precision
versions or addition, subtraction, multiplication
and division

112
Examples of RR instructions
• AFR f1, f2 add contents of floating-point
register f2 into f1
• ADR f1,f2 add contents of double-precision regi
ster f2 into f1
• LFR f1, f2 load contents of floating-point
register f2 into f1
complement instructions for floating-point and
double-precision operands

113
Examples of RX instructions
• AF r1, d(r2) add contents of word at
address d contents(r2) into register r1

114
MIPS FP INSTRUCTIONS
115
Overview
• Thirty-two specialized single-precision
registersf0, f1, f31
• Each pair of single-precision registers forms a
double-precision register
• .s instructions apply to single precision format
• .d instructions apply to double precision
format
• Most instructions are in the R format

116
R-format instructions (I)
• add.s f1, f2, f3 f1 r2 f3 (single precision)
• add.d f2, f4, f6 (f2, f21) (f4, f41) (f6,
f6 1) (double precision applies to
register pairs)
• sub.s f1, f2, f3 f1 f2 f3 (single precision)
• sub.d f2, f4, f6 (double precision)
• mul.s f1, f2, f3 f1 f2f3 (single precision)
• mul.d f2, f4, f6 (double precision)

117
R-format instructions (II)
• div.s f1, f2, f3 f1 f2 /f3 (single precision)
• div.d f2, f4, f6 (double precision)
• c.x.s f1, f2 FP condition f1 x f2 ? 1 !
0 where x can be equal, not equal, less
than, less than or equal, greater than,
greater than or equal
• c.x.d f2, f4 (double precision)

118
I-format instructions (I)
4a to the current value of the PC if the FP
condition is true
4a to the current value of the PC if the FP
condition is false

119
I-format instructions (I)
• lwcl f1, a(r1) load floating-point word at
address a contents(r1) into f1
• ldcl f2, a(r1) (double precision)
• swcl f1, a(r1) store floating-point value in
f1 into word at address a contents(r1)
• sdcl f2, a(r1) (double precision)

The "c" in the opcodes stands for coprocessor!
120
x86 FP INSTRUCTIONS
121
Overview
• Original x86 FP coprocessor had a stack
architecture
• Stack registers were 80-bit wide as well as all
internal registers
• Better accuracy
• Provided single and double precision operations

122
Stack operations (I)
• Three types of operations
• Loads store an operand on the top of the stack
• Arithmetic and comparison operations find two
operands of the top of the stack and replace them
by the result of the operation
• Stores move the top of stack register into memory

123
Example
• a b c
• Load b on top of stack
• Load c on top of stack
• Add c to b
• Store result into a

b
b
---
b
---
---
124
Stack operations (II)
• Instruction set also allowed
• Operations on top of stack register and the ith
register below
• Immediate operands
• Operations on top of stack register and a memory
location
• Poor performance of FP unit architecture
motivated an extension to the x86 instruction set

125
Intel SSE2 FP Architecture (I)
• SSE2 Extension (2001) provided 8 floating point
registers
• Could hold either single precision or double
precision values
• Number extended to 16 by AMD, followed by Intel

126
Intel SSE2 FP Architecture (II)
• Registers are now 128-bit wide
• Can hold
• One quad precision value
• Two double precision values
• Four single precision values
• Can perform same operation in parallel on all
single/double precision values stored in the
same register

Wow!
127
REVIEW QUESTIONS
128
Review questions
• How would you represent 0.5 in double precision?
• How would you convert this double-precision value
into a single precision format?
• When doing accounting, we could do all the
computations in cents using integer arithmetic.
What would we win? What would we lose?

129
Solutions
• How would you represent 0.5 in double precision?
• Normalized representation 1.0 2-1
• Sign 0
• Biased exponent 1023 1 1022
• Coefficient All zeroes
• Because the 1 is implicit

130
Solutions
• How would you convert this double-precision value
into a single precision format?
• Same normalized representation 1.0 2-1
• Same sign 0
• New biased exponent 127 1 126
• Same coefficient All zeroes
• Because the 1 is implicit

131
Solutions
• When doing accounting, we could do all the
computations in cents using integer arithmetic.
What would we win? What would we lose?
• Big plus
• The results would be exact
• Big minus
• Could not handle numbers bigger than 20,000,000
in 32-bit signed arithmetic

132
Why 20,000,000?
• 32-bit unsigned arithmetic can represent numbers
from 0 to 232 1
• 32-bit unsigned arithmetic can represent numbers
from -231 to 231 1
• Roughly from -2000,000,000 to 2,000,000,000
• Must divide by 100 as we were using cents!

133
TRANSITION SLIDE
• Here end the materials that were on the first
fall 2012 midterm