Title: Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T
1Systematic IEEE Rounding Method for High-Speed
Floating-Point Multiplierswritten by N. T.
Quach, N. Takagi and M. J. Flynn
- Hyuk Park
- Anusha Ravi
- Gahngsoo Moon
2OverviewProblem
- Rounding for floating-point multiplier
- Makes multiplication slow
- why?
- needs many addition due to many inputs at the LSB
- waits until overflow bit comes out
3OverviewPrevious Work
- Integrated Rounding Method (IRM) Santoro et al.
89 for high-speed FP multipliers - How?
- Pre-compute multiple significand values (SVs).
- Select the appropriate SV for final normalization
and rounding - Advantage
- Faster than conventional rounding method
- Disadvantage
- complicated rounding logic
- why?
- Must accounts for both the pre-round and
post-round normalization since the IRM occurs
before the significand is normalized
4OverviewProposed Approach
- Systematic Integrated Rounding Method
- 1. constructing a rounding table
- Lists all possible significand values (SVs)
- 2. developing prediction scheme
- Reduces the number of these SVs to a smaller set
- 3. performing round digits selection (RDS)
- Reduces the complexity of the rounding logic
5OverviewConventional IEEE Rounding vs.
Systematic IRM
- IEEE Rounding using conventional algorithm
- 1. Computation of C and S
- 2. Addition of C and S
- 3. Pre-rounding
- e.g.) 10.1010 x 25 ? 1.01010 x 26
- 4. Computation of rounding bits
- e.g.) guard bit and sticky bit
- 5. Rounding
- 6. Post-rounding
- e.g.) 1.11111 x 25 adding round bit 1 to 1
? 10.00000 x 25 ? 1.00000 x 26
- Systematic IRM
- 1. Computation of C and S
- 2. Computation of SVs and rounding bits
- 3. Final select and align
6OverviewContribution of This Paper
- Propose a systematic IRM for all the IEEE
rounding modes - Demonstrate that the prediction scheme is unique
for the Round to Nearest Even mode in the simple
hardware implementation - Propose an efficient improved hardware
implementation that provides solutions to all the
IEEE rounding modes.
7OverviewNotation
- R S C
- RI SI CI
- RF SF CF
- g pre-normalized guard bit
- s pre-normalized sticky bit
8RN Even ModeConstructing a Rounding Table
- Need to construct a RI candidate table to be used
in Final Select and Align stage. - Equations
- NS
- lr l ? c, gr m, sr s
- RS
- lr n ? l ? c, gr l ? c, sr m ? s
- In NS case, no bit is discarded, therefore only
single candidate for each case depending on RI,
RF, l, s, for example, - c 1, m 1, (l, s) 11
- After add carry-out from RF to l bit, (l, m, s)
011 - Increment is required and the result is
equivalent with RI 2
9RN Even ModeConstructing a Rounding Table Contd
- But taking advantage of discarded bit allows more
than a candidates, for example, - c 1, m 0, (n, l, s) 001
- After add carry-out from RF right shift,(l, m,
s) 011 - Increment required and the result is equal to RI
2 - But since l 0 initially and l bit will be
discarded after shift-right, RI 2 is equivalent
with RI 3 in this case.
10RN Even ModeConstructing a Rounding Table Contd
- Optimization
- Roughly three SVs, RI (0, 1, 2) are to be
calculated. And naive approach is to have
multiple adders in parallel, which is not
attractive. - Duplicate carry-lookahead networks only instead
of having multiple adders. - Have to compute the whole rounding table?
- Prediction allows precomputation only the SVs
in a group. - This prediction signal (p) is just a decoded
control signal (SF?m, CF?m were used in this
paper) - Now, compound adder (CA) computes RI p (0, 1)
11RN Even ModeConstructing a Rounding Table Contd
12RN Even ModeConstructing a Rounding Table Contd
13RN Even ModeDeveloping a Valid Prediction Scheme
- Since prediction signal is based on SF CF,
there are 16 possible schemes, but, - SVs in all groups must be computable by the CA.
- Minimize the number of SVs that need to be
precomputed. - When prediction scheme is used, CA computes CI
SI p (0, 1), so rouding table is to be
modified. - NS, c 0, m 1, (lp, s, p) 0-1
- After add p bit to lp, (l, m, s) 11-
- Increment required, the result is equal to RI p
14RN Even ModeDeveloping a Valid Prediction Scheme
Contd
15RN Even ModePerforming Rounding Digits Selection
- Row selected by c, m bits
- Optimizable by using prediction scheme
- Column selected by c, m0, m1
- Not optimizable since RS is always needed in
multiplier. - To select between entries by n, lp, s, p
- Optimizable by using prediction scheme and
rounding digit selection - Since CA can compute only RI p (0, 1), RI p
2 term should be eliminated, so 26 ways for
performing RDS.
16RN Even ModeAn Improved Implementation for FCR
- Relaxing the constraint on CA,
- N-bit CA calculates RIN1 and LSB of RI is set
up depending on the value of l bit. - Instead of RI 1,l 1 RI 2, set l 0l
0 RI 0, set l 1 - Major advantage is ability to compute more SVs in
a group, so greater degree of freedom in
selecting the rounding digits. - Now more prediction schemes allowed.
- Eight schemes, four of them do not require fix-up.
17RN Even ModeAn Improved Implementation for FCR
Contd
18Round to Infinity ModeOverview
- RPI can be implemented as RI for ve numbers and
RZ for ve numbers - RMI can be implemented as RZ for ve numbers and
RI for ve numbers - Hence we have
- RI(x) ?x?, RZ(x) ?x?
19Round to Infinity ModeRounding Table
- l 0 Add Carry bit (c) Rounding bit (r).
- r added to n bit position of RI. Hence RI3
- Discarded LSB leads to RI2
- l 1 Add c and r
- r added to n bit position of RI. Hence RI3
- Discarded LSB leads to RI4
20Round to Infinity ModeRI Prediction Table
- SF.m CF.m
- Both False ? RF ? 1Rows 1 or 2
- 1 bit True ? RF (0,2)Rows 2 or 3 or 4
- 2 bits True ? RF 1,2)Rows 3 or 4
21Round to Infinity ModeRounding Table with
Prediction
22Round to Zero ModeOverview
- Simplest among IEEE modes
- Truncation implies only 2 rows
- Both implementations possible
- No valid Prediction scheme
23The ES Rounding Algorithm
- Based on Injection Rounding
- Reduces RN and RI modes to RZ mode
- Adds an injection that depends on the Rounding
mode - Assumes that Sum and Carry include the Injection
beforehand
G. Even and P.-M. Seidel, A comparison of three
rounding algorithms for IEEE floating-point
multiplication, IEEE Trans. Computers , vol. 49,
pp. 638650, July, 2000
24 ES ImprovedImplementation
Implementation
25ES Algorithm Implementation vs. Improved
Implementation
- ES Algorithm
- Implementation
- Costly in terms of Hardware
- Extra row of Half Adders
- Extra full-length (N bits) shifter
- Gate Delay ?
- Datapath
- 2 Half Adders
- 1 Compound Adder
- 2-1 rounding MUX
- Post-round normalization right shift
- Improved Implementation
- Lesser Hardware
- Gate Delay ?
- Datapath
- 1 Full Adder
- 1 Compound Adder
- 4-1 Rounding MUX
- Hence Net Delay saving of 1 Full Length Shifter
26Simple Implementation vs. Improved Implementation
- Equal in terms of Hardware and Critical Timing
paths - Differ in the LSB logic
- Improved calculates more SVs in a group ?
greater degree of freedom in selecting prediction
rounding digits - Improved provides solution for RI mode while
Simple does not
27Discussion Summary
- IRM has Speed advantage. Combines Rounding with
CPA stage, 3 stages lesser than Conventional
Rounding Algorithm - Tradeoff ? More Hardware
- Rounding before significand normalization ?
Complicates Rounding Logic - Remedy ? Reduce number of SVs logic complexity
- Judicious selection of Prediction scheme
rounding digits
28Discussion Summary
- Hardware consists of Adders to compute SVs and
Rounding Logic to select from these - Hi-speed adders are Area and Power hungry
- To reduce adders, duplicate only CLA network
instead of entire adder for parallel computation
of SVs. Compound Adders thus reduce the logic
required for the Rounding table - Improved Implementation better than Simple
Implementation