Title: Number Representation Fixed and Floating Point
1Number RepresentationFixed and Floating Point
- No Method Capable of Representing ALL Real
Numbers Using Finite Register Lengths - Must Use Approximations to Represent Values
- Concentrate on Two Forms
- Fixed Point
- Floating Point
- Others are
- Rational Number Systems uses ratios of integers
- Logarithmic Number Systems uses signs and
logarithms of values
2Fixed Versus Floating Point
- Fixed Point Values Represent Values where Any Two
Differ by 1 unit in the last place (ulp) - Equal Spacing Between Numbers
- Floating Point Values Use Two Multi-Bit Words
- Mantissa
- Exponent
- Both Forms Must be Capable of Representing Signed
Quantities - Fixed Point Values CAN be Used to Represent
Fractional Quantities
3Floating Point Characteristics
- Total Number of Representations Total Bit
Strings - For n-bit Register we have 2n
- Range of Value is Larger than Fixed Point
- Precision of Value is Smaller
- Distance Between Two Consecutive Values Increases
4Floating Point
s
e
m
s Sign Bit (signed magnitude) e Exponent (in
2s Complement Form) m Mantissa (significand)
mMAX1 - ulp 0,1)
hidden bit
float BIAS 127 (32 bits-23 for m and 8 for
e) double BIAS1023 (64 bits-52 for m and 11
for e) Sign of Exponent is Complement of its
MSb Thus, adding/subtracting bias is just
complementation of MSb
5Floating Point Example
double 00000000 bfe80000 Big Endian MSW has
Higher Address
s
m
e
1 011 1111 1110 1000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
s 1 e 1022 m 0.5 Value (-1)1?1.5
?2(1022-1023) Value -(1.5)(0.5) -0.75
6Floating Point Normalization
- Redundant /representations are Possible!
- Hidden Bit Helps
- Out of All Possible Representations, Choose One
With Fewest Leading Zeros in Significand - This is Normalization
- After Performing Arithmetic, Normalization Must
be Accomplished
7Floating Point Special Numbers
8Denormalized Numbers
- Allows for Gradual Degradation for Underflow