Combinational Logic IV FloatingPoint Arithmetic - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Combinational Logic IV FloatingPoint Arithmetic

Description:

Save the leading bit '1' for mantissa with special format for 0.0. IEEE754 double precision: ... Shift smaller mantissa right to match exponents. Add the Two ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 24
Provided by: KC96
Category:

less

Transcript and Presenter's Notes

Title: Combinational Logic IV FloatingPoint Arithmetic


1
Combinational Logic IVFloating-Point Arithmetic
  • Instructor Koling Chang
  • email kchang_at_cs.ucdavis.edu

2
Outline
  • Fractions
  • Floating-Point Representation
  • Addition
  • Multiplication

3
Fractions
  • Also called Real Number
  • Representing fractions in the decimal system
  • 0.7913
  • in decimal
  • 7x10-1 9x10-2 1x10-3 3x10-4
  • Decimal point is used to identify the fraction
    part.
  • This can be generalized.

4
Binary Fraction
  • 0.11001B
  • 1.2-1 1.2-2 0.2-3 0.2-4 1.2-5 0.78125 D
  • How do we convert between Binary Fractions to
    Decimal Fractions ?

5
Converting Fraction from Binary to Decimal
  • decimal_value 0.0
  • for ( i n downto 1 )
  • decimal_value (decimal_valuedi)/2

6
Converting Fraction from Decimal to Binary
  • For ( i 0 to n )
  • decimal_value 2
  • if ( decimal_value gt1 )
  • bi 1
  • decimal_value - 1
  • else
  • bi 0

7
Example
8
Representing Floating-Point Numbers
  • Naive way- I bits for integer, F bits for
    fraction
  • ?????.?????
  • I F
  • This is the Fixed-Point we talked about.
  • Limited binary combinations result in a fixed
    range of real numbers that can be represented.
  • Solution Make the point position floating

9
Tradeoff Between I and F
  • Textbook says Range and Precision Tradeoff
  • Not quite correct What if there is no integer
    part.
  • Precision is determined by the number of bits.
  • Range is determined by the position of the point.

10
Using Scientific Notation
  • Writing Decimal number using scientific notation
    with significand and exponent
  • 1.2345 x 1045
  • 9.876543 x 10-37
  • Same form can be applied to binary point numbers

11
Binary Notation with Significand and Exponent
  • 1101.101 x 211001 13.625 x 225
  • 4.57179 x 108
  • The Normal Form
  • 1.X1X2X3 XM X 2
  • 1.101101E11100

Yn-1Yn-2Yn-3 Y0
12
Floating Point Representation
  • Using two signed numbers to represent exponent
    and mantissa

1 bit
1 bit
N bits
M bits
Se
Exponent
Sm
Mantissa
13
IEEE Standard
  • Using excess-M for Exponent
  • Save the leading bit 1 for mantissa with
    special format for 0.0
  • IEEE754 double precision

1 bit
11 bit
52 bits
Sm
Exponent
Mantissa
excess-M (1023)
14
IEEE Standard
  • Single Presicion

1 bit
8 bit
23 bits
Sm
Exponent
Mantissa
excess-M (127)
15
Special Values
16
Limits
  • Denormals
  • Smallest normalized 1x2-126
  • Largest denormal 0.1111111x2-127
  • Smallest denormal 0.00000001x2-127
  • 10-45
  • NaNs -1 , 0/0

17
Floating-Point Addition
  • Match Exponents
  • Shift smaller mantissa right to match exponents
  • Add the Two Mantissas
  • Normalize the Result
  • Test for Overflow/Underflow

18
Example
  • 13.25 4.75
  • 13.25 1.10101 x 23
  • 4.75 1.0011 x 22
  • (1.10101 0.10011) x 23
  • 10.01 x 23 1.001 x 24

19
Floating-Point Multiplication
  • Add the two exponents
  • Multiply the two mantissas
  • Compute the result sign bit using XOR
  • Normalize the final product
  • Check for overflow/underflow

20
Example
  • 1.101x23 x 1.01x22
  • 1.101 x 1.01 10.00001
  • 23 x 22 25
  • Answer 10.00001x25 1.000001x26

21
Example
  • Floating Point Adder (IBM 360)

Input bus
E1
E2
M1
M2
Adder 1
Shifter 1
E1-E2
Adder 2
R
zero digit checker
Adder 3
Shifter 2
leading zeros
M3
E3
output bus
22
Co-Processor
  • In old days, floating point were often handled by
    a co-processor.

23
Combinational Logic Summary
  • Common used combinational circuits
  • mux, demux, encoder, decoder, comparator
  • Arithmetic circuits
  • Fixed-point representation
  • Adder,
  • Multiplier, Divider
  • Floating-point representation, adder, multiplier
Write a Comment
User Comments (0)
About PowerShow.com