Title: COMP201 Computer Systems Floating Point Numbers
1COMP201 Computer SystemsFloating Point Numbers
2Floating Point Numbers
- Representations considered so far have a limited
range dependent on the number of bits - 16 bits 0 to 65535 for unsigned -32768 to 32767
for 2's complement. - 32 bits 0 to 4294967295 for unsigned -2147483648
to 2147483647 2's Complement - How do we represent the following numbers
- mass of an electron 0.00000000000000000000000000
000910956Kg - Mass of the earth 5975000000000000000000000Kg
- Both these numbers have very few significant
digits but are well beyond the range above.
3Floating Point Numbers
- We normally use scientific notation
- Mass of electron 9.10956 x 10-31 Kg
- Mass of earth 5.975 x 1024 Kg
- These numbers can be split into two components
- Mantissa
- Exponent
- e.g. 9.10956 x 10-31 Kg mantissa
9.10956 exponent -31
4Floating Point Numbers
- Also need to be able to represent negative
numbers - Same approach can be taken with binary numbers
i.e. a x re - where a is the mantissa, r is the base( Radix)
and e is the exponent - Notice, we are trading off precision and possibly
making computation times longer, in exchange for
being able to represent a larger range of
numbers.
5Defining a floating point number
- Several things must be defined, in order to
define a floating point number - Size of mantissa
- Sign of mantissa
- Size of exponent
- Sign of exponent
- Number base in use
- 9.10956
- Positive
- 31
- Negative
- 10
6Floating Point Numbers
- Consider the following using eight digits and
radix 10
MSD
LSD
Mantissa (5 digits)
Exponent (2 digits)
The high order bit is the sign bit (0 for and
1 for - )
7Floating Point Numbers
- But this has serious limitations!
- Range of numbers which can be represented is 100
1099 since only two digits are devoted to
exponent. - Precision is only 5 digits
- No provision for negative exponents
8Floating Point Numbers (Continued)
- One solution for the limitations of range and
negative exponents, is to decrease the positive
range to 49, by applying an offset of 50
(excess-50 notation). Then, the range of numbers
which can be expressed becomes - 0.00001 x 10-50 to 0.99999 x 1049
- Often implementations require that the most
significant digit not be zero so restrict the
most negative value to 0.10000 x 10-50 and the
all-0s represents 0.
9From the text
Excess-50 notation
Range of represented numbers
10Examples (from textbook)
- Problem Convert 246.8035 to the standard format
- Write as 0.2468035 x 103
- Truncate at 5 digits 0.24680 x 103
- Final number 05324680
- Problem Convert -0.00000075 to FP
format. - Write as -0.75 x 10-6
- Zero fill to 5 digits
- Final number
- 54475000
NOTE Plus is represented by 0 minus by 5 in
this format.
11Floating point in the computer
- Within the computer, we work in base 2, and use a
format such as - NOTE Bit order depends upon the particular
computer
12Numbers represented
- Using 32 bits to represent a number, with 1 sign
bit, 8 bits for exponent, and the remaining (23
bits) for mantissa. - In order to represent negative exponents, use
excess-128 notation. - Range of numbers represented is approximately
10-38 to 1038 (in decimal terms)
13But this is not the end!
- The precision of the mantissa can be improved
from 23 bits to 24 bits by noticing that the MSB
of the mantissa in a normalized binary number is
always 1 - And so, it can be implied, rather than expressed
directly. The added complications (see below)
are felt to be a good tradeoff for the added
precision. - This complication is the fact that certain
numbers are too small to be normalized, and the
number 0.0 cannot be represented at all! - The solution is to utilize certain codes for
these special cases.
14Floating point numbers the IEEE standard
- IEEE Standard 754
- Most (but not all) computer manufactures use
IEEE-754 format - Number represented (-1)S (1.M)2(E -
Bias)2 main formats single and double
15And there are special cases to be considered
- Exponent Mantissa
- 0 /- 0
- 0 not 0
- 1-254 any
- 255 /- 0
- 255 not 0
- Zero, represented by
- /- 2-126 x 0.M
- 2E-127 x 1.M
- /- infinity
- Special condition
16Special cases (continued)
- The thing to remember about the special cases, is
that they are SPECIAL, that is, not expected to
occur. - They cover numbers outside the range expected, by
using some unlikely-to-be used codes - Special code for 0.0
- Special code for numbers too small to be
normalized - Special code for infinity
- What you are expected to remember is that these
special codes exist, and they limit the range of
numbers that can be represented by the standard.
17Final ranges represented
- 2-126 to 2127
- Approx. 10-38 to 1038
- Similarly, the double-precision used sixty-four
bits, and represents a range of approximately
10-300 to 10300
18Floating Point Numbers
- IEEE Standard 754
- Example What Decimal number does the following
IEEE floating point number represent? 0
10000001 10110000000000000000000Sign 0
(positive)Exponent 2 (0x81 or 129 127
2)Mantissa 1 .10110000000000000000000
Final answer 110.112 or 6.7510
19More
- What is the IEEE f.p. number
- 01000001111111100000000000000000
- Converted to decimal?
- Begin by dividing up the number into the fields
of sign, exponent and mantissa - 0 10000011 11111100000000000000000
- Then, convert the exponent ( 10000011 131) and
subtract the bias (127) to get the shift (4) - Add in the implied one to the mantissa
- 1.11111100000000000000000
- And shift 4 places, to get the final number
11111.11, or 31.75 in decimal.
Mantissa (23 bits)
20And more
- What is the decimal number 32.125 converted to
IEEE fp format? - First, convert the number to binary
- 100000.001
- Then, normalize for 1.xxxx format
- 1.00000001 shift 5
- Add in the bias (127) and convert the total (132)
to binary (10000100 or 0x84) - Assemble final number
- 0 1000010 00000001000000000000000
21Floating Point Application
- Floating point arithmetic is very important in
certain types of application - Primarily scientific computation
- Many applications use no floating point
arithmetic at all - Communications
- Operating systems
- Most graphics
22Floating point implementation
- Floating point arithmetic may be implemented in
software or hardware. - It is much more complex than integer arithmetic.
- Hardware instruction implementations require more
gates than equivalent integer instructions and
are often slower - Software implementations are generally very slow
- CPU Floating performance can be very different
from Integer performance