Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T

Description:

3. performing round digits selection (RDS) Reduces the complexity of the rounding logic ... Performing Rounding Digits Selection. Row selected by c, m bits ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 29
Provided by: Electrical55
Category:

less

Transcript and Presenter's Notes

Title: Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T


1
Systematic IEEE Rounding Method for High-Speed
Floating-Point Multiplierswritten by N. T.
Quach, N. Takagi and M. J. Flynn
  • Hyuk Park
  • Anusha Ravi
  • Gahngsoo Moon

2
OverviewProblem
  • Rounding for floating-point multiplier
  • Makes multiplication slow
  • why?
  • needs many addition due to many inputs at the LSB
  • waits until overflow bit comes out

3
OverviewPrevious Work
  • Integrated Rounding Method (IRM) Santoro et al.
    89 for high-speed FP multipliers
  • How?
  • Pre-compute multiple significand values (SVs).
  • Select the appropriate SV for final normalization
    and rounding
  • Advantage
  • Faster than conventional rounding method
  • Disadvantage
  • complicated rounding logic
  • why?
  • Must accounts for both the pre-round and
    post-round normalization since the IRM occurs
    before the significand is normalized

4
OverviewProposed Approach
  • Systematic Integrated Rounding Method
  • 1. constructing a rounding table
  • Lists all possible significand values (SVs)
  • 2. developing prediction scheme
  • Reduces the number of these SVs to a smaller set
  • 3. performing round digits selection (RDS)
  • Reduces the complexity of the rounding logic

5
OverviewConventional IEEE Rounding vs.
Systematic IRM
  • IEEE Rounding using conventional algorithm
  • 1. Computation of C and S
  • 2. Addition of C and S
  • 3. Pre-rounding
  • e.g.) 10.1010 x 25 ? 1.01010 x 26
  • 4. Computation of rounding bits
  • e.g.) guard bit and sticky bit
  • 5. Rounding
  • 6. Post-rounding
  • e.g.) 1.11111 x 25 adding round bit 1 to 1
    ? 10.00000 x 25 ? 1.00000 x 26
  • Systematic IRM
  • 1. Computation of C and S
  • 2. Computation of SVs and rounding bits
  • 3. Final select and align

6
OverviewContribution of This Paper
  • Propose a systematic IRM for all the IEEE
    rounding modes
  • Demonstrate that the prediction scheme is unique
    for the Round to Nearest Even mode in the simple
    hardware implementation
  • Propose an efficient improved hardware
    implementation that provides solutions to all the
    IEEE rounding modes.

7
OverviewNotation
  • R S C
  • RI SI CI
  • RF SF CF
  • g pre-normalized guard bit
  • s pre-normalized sticky bit

8
RN Even ModeConstructing a Rounding Table
  • Need to construct a RI candidate table to be used
    in Final Select and Align stage.
  • Equations
  • NS
  • lr l ? c, gr m, sr s
  • RS
  • lr n ? l ? c, gr l ? c, sr m ? s
  • In NS case, no bit is discarded, therefore only
    single candidate for each case depending on RI,
    RF, l, s, for example,
  • c 1, m 1, (l, s) 11
  • After add carry-out from RF to l bit, (l, m, s)
    011
  • Increment is required and the result is
    equivalent with RI 2

9
RN Even ModeConstructing a Rounding Table Contd
  • But taking advantage of discarded bit allows more
    than a candidates, for example,
  • c 1, m 0, (n, l, s) 001
  • After add carry-out from RF right shift,(l, m,
    s) 011
  • Increment required and the result is equal to RI
    2
  • But since l 0 initially and l bit will be
    discarded after shift-right, RI 2 is equivalent
    with RI 3 in this case.

10
RN Even ModeConstructing a Rounding Table Contd
  • Optimization
  • Roughly three SVs, RI (0, 1, 2) are to be
    calculated. And naive approach is to have
    multiple adders in parallel, which is not
    attractive.
  • Duplicate carry-lookahead networks only instead
    of having multiple adders.
  • Have to compute the whole rounding table?
  • Prediction allows precomputation only the SVs
    in a group.
  • This prediction signal (p) is just a decoded
    control signal (SF?m, CF?m were used in this
    paper)
  • Now, compound adder (CA) computes RI p (0, 1)

11
RN Even ModeConstructing a Rounding Table Contd
12
RN Even ModeConstructing a Rounding Table Contd
13
RN Even ModeDeveloping a Valid Prediction Scheme
  • Since prediction signal is based on SF CF,
    there are 16 possible schemes, but,
  • SVs in all groups must be computable by the CA.
  • Minimize the number of SVs that need to be
    precomputed.
  • When prediction scheme is used, CA computes CI
    SI p (0, 1), so rouding table is to be
    modified.
  • NS, c 0, m 1, (lp, s, p) 0-1
  • After add p bit to lp, (l, m, s) 11-
  • Increment required, the result is equal to RI p

14
RN Even ModeDeveloping a Valid Prediction Scheme
Contd
15
RN Even ModePerforming Rounding Digits Selection
  • Row selected by c, m bits
  • Optimizable by using prediction scheme
  • Column selected by c, m0, m1
  • Not optimizable since RS is always needed in
    multiplier.
  • To select between entries by n, lp, s, p
  • Optimizable by using prediction scheme and
    rounding digit selection
  • Since CA can compute only RI p (0, 1), RI p
    2 term should be eliminated, so 26 ways for
    performing RDS.

16
RN Even ModeAn Improved Implementation for FCR
  • Relaxing the constraint on CA,
  • N-bit CA calculates RIN1 and LSB of RI is set
    up depending on the value of l bit.
  • Instead of RI 1,l 1 RI 2, set l 0l
    0 RI 0, set l 1
  • Major advantage is ability to compute more SVs in
    a group, so greater degree of freedom in
    selecting the rounding digits.
  • Now more prediction schemes allowed.
  • Eight schemes, four of them do not require fix-up.

17
RN Even ModeAn Improved Implementation for FCR
Contd
18
Round to Infinity ModeOverview
  • RPI can be implemented as RI for ve numbers and
    RZ for ve numbers
  • RMI can be implemented as RZ for ve numbers and
    RI for ve numbers
  • Hence we have
  • RI(x) ?x?, RZ(x) ?x?

19
Round to Infinity ModeRounding Table
  • l 0 Add Carry bit (c) Rounding bit (r).
  • r added to n bit position of RI. Hence RI3
  • Discarded LSB leads to RI2
  • l 1 Add c and r
  • r added to n bit position of RI. Hence RI3
  • Discarded LSB leads to RI4

20
Round to Infinity ModeRI Prediction Table
  • SF.m CF.m
  • Both False ? RF ? 1Rows 1 or 2
  • 1 bit True ? RF (0,2)Rows 2 or 3 or 4
  • 2 bits True ? RF 1,2)Rows 3 or 4

21
Round to Infinity ModeRounding Table with
Prediction
22
Round to Zero ModeOverview
  • Simplest among IEEE modes
  • Truncation implies only 2 rows
  • Both implementations possible
  • No valid Prediction scheme

23
The ES Rounding Algorithm
  • Based on Injection Rounding
  • Reduces RN and RI modes to RZ mode
  • Adds an injection that depends on the Rounding
    mode
  • Assumes that Sum and Carry include the Injection
    beforehand

G. Even and P.-M. Seidel, A comparison of three
rounding algorithms for IEEE floating-point
multiplication, IEEE Trans. Computers , vol. 49,
pp. 638650, July, 2000
24
ES ImprovedImplementation
Implementation
25
ES Algorithm Implementation vs. Improved
Implementation
  • ES Algorithm
  • Implementation
  • Costly in terms of Hardware
  • Extra row of Half Adders
  • Extra full-length (N bits) shifter
  • Gate Delay ?
  • Datapath
  • 2 Half Adders
  • 1 Compound Adder
  • 2-1 rounding MUX
  • Post-round normalization right shift
  • Improved Implementation
  • Lesser Hardware
  • Gate Delay ?
  • Datapath
  • 1 Full Adder
  • 1 Compound Adder
  • 4-1 Rounding MUX
  • Hence Net Delay saving of 1 Full Length Shifter

26
Simple Implementation vs. Improved Implementation
  • Equal in terms of Hardware and Critical Timing
    paths
  • Differ in the LSB logic
  • Improved calculates more SVs in a group ?
    greater degree of freedom in selecting prediction
    rounding digits
  • Improved provides solution for RI mode while
    Simple does not

27
Discussion Summary
  • IRM has Speed advantage. Combines Rounding with
    CPA stage, 3 stages lesser than Conventional
    Rounding Algorithm
  • Tradeoff ? More Hardware
  • Rounding before significand normalization ?
    Complicates Rounding Logic
  • Remedy ? Reduce number of SVs logic complexity
  • Judicious selection of Prediction scheme
    rounding digits

28
Discussion Summary
  • Hardware consists of Adders to compute SVs and
    Rounding Logic to select from these
  • Hi-speed adders are Area and Power hungry
  • To reduce adders, duplicate only CLA network
    instead of entire adder for parallel computation
    of SVs. Compound Adders thus reduce the logic
    required for the Rounding table
  • Improved Implementation better than Simple
    Implementation
Write a Comment
User Comments (0)
About PowerShow.com