Number RepresentationFixed and Floating Point

- No Method Capable of Representing ALL Real

Numbers Using Finite Register Lengths - Must Use Approximations to Represent Values
- Concentrate on Two Forms
- Fixed Point
- Floating Point
- Others are
- Rational Number Systems uses ratios of integers
- Logarithmic Number Systems uses signs and

logarithms of values

Fixed Versus Floating Point

- Fixed Point Values Represent Values where Any Two

Differ by 1 unit in the last place (ulp) - Equal Spacing Between Numbers
- Floating Point Values Use Two Multi-Bit Words
- Mantissa
- Exponent
- Both Forms Must be Capable of Representing Signed

Quantities - Fixed Point Values CAN be Used to Represent

Fractional Quantities

Floating Point Characteristics

- Total Number of Representations Total Bit

Strings - For n-bit Register we have 2n
- Range of Value is Larger than Fixed Point
- Precision of Value is Smaller
- Distance Between Two Consecutive Values Increases

Floating Point

s

e

m

s Sign Bit (signed magnitude) e Exponent (in

2s Complement Form) m Mantissa (significand)

mMAX1 - ulp 0,1)

hidden bit

float BIAS 127 (32 bits-23 for m and 8 for

e) double BIAS1023 (64 bits-52 for m and 11

for e) Sign of Exponent is Complement of its

MSb Thus, adding/subtracting bias is just

complementation of MSb

Floating Point Example

double 00000000 bfe80000 Big Endian MSW has

Higher Address

s

m

e

1 011 1111 1110 1000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

s 1 e 1022 m 0.5 Value (-1)1?1.5

?2(1022-1023) Value -(1.5)(0.5) -0.75

Floating Point Normalization

- Redundant /representations are Possible!

- Hidden Bit Helps
- Out of All Possible Representations, Choose One

With Fewest Leading Zeros in Significand - This is Normalization
- After Performing Arithmetic, Normalization Must

be Accomplished

Floating Point Special Numbers

Denormalized Numbers

- Allows for Gradual Degradation for Underflow

