Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T

Description:

3. performing round digits selection (RDS) Reduces the complexity of the rounding logic ... Performing Rounding Digits Selection. Row selected by c, m bits ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 29

Provided by: Electrical55

Category:

more less

Transcript and Presenter's Notes

Title: Systematic IEEE Rounding Method for HighSpeed FloatingPoint Multipliers written by N' T' Quach, N' T

1
Systematic IEEE Rounding Method for High-Speed
Floating-Point Multiplierswritten by N. T.
Quach, N. Takagi and M. J. Flynn

Hyuk Park
Anusha Ravi
Gahngsoo Moon

2
OverviewProblem

Rounding for floating-point multiplier
Makes multiplication slow
why?
needs many addition due to many inputs at the LSB
waits until overflow bit comes out

3
OverviewPrevious Work

Integrated Rounding Method (IRM) Santoro et al.
89 for high-speed FP multipliers
How?
Pre-compute multiple significand values (SVs).
Select the appropriate SV for final normalization
and rounding
Advantage
Faster than conventional rounding method
Disadvantage
complicated rounding logic
why?
Must accounts for both the pre-round and
post-round normalization since the IRM occurs
before the significand is normalized

4
OverviewProposed Approach

Systematic Integrated Rounding Method
1. constructing a rounding table
Lists all possible significand values (SVs)
2. developing prediction scheme
Reduces the number of these SVs to a smaller set
3. performing round digits selection (RDS)
Reduces the complexity of the rounding logic

5
OverviewConventional IEEE Rounding vs.
Systematic IRM

IEEE Rounding using conventional algorithm
1. Computation of C and S
2. Addition of C and S
3. Pre-rounding
e.g.) 10.1010 x 25 ? 1.01010 x 26
4. Computation of rounding bits
e.g.) guard bit and sticky bit
5. Rounding
6. Post-rounding
e.g.) 1.11111 x 25 adding round bit 1 to 1
? 10.00000 x 25 ? 1.00000 x 26

Systematic IRM
1. Computation of C and S
2. Computation of SVs and rounding bits
3. Final select and align

6
OverviewContribution of This Paper

Propose a systematic IRM for all the IEEE
rounding modes
Demonstrate that the prediction scheme is unique
for the Round to Nearest Even mode in the simple
hardware implementation
Propose an efficient improved hardware
implementation that provides solutions to all the
IEEE rounding modes.

7
OverviewNotation

R S C
RI SI CI
RF SF CF
g pre-normalized guard bit
s pre-normalized sticky bit

8
RN Even ModeConstructing a Rounding Table

Need to construct a RI candidate table to be used
in Final Select and Align stage.
Equations
NS
lr l ? c, gr m, sr s
RS
lr n ? l ? c, gr l ? c, sr m ? s
In NS case, no bit is discarded, therefore only
single candidate for each case depending on RI,
RF, l, s, for example,
c 1, m 1, (l, s) 11
After add carry-out from RF to l bit, (l, m, s)
011
Increment is required and the result is
equivalent with RI 2

9
RN Even ModeConstructing a Rounding Table Contd

But taking advantage of discarded bit allows more
than a candidates, for example,
c 1, m 0, (n, l, s) 001
After add carry-out from RF right shift,(l, m,
s) 011
Increment required and the result is equal to RI
2
But since l 0 initially and l bit will be
discarded after shift-right, RI 2 is equivalent
with RI 3 in this case.

10
RN Even ModeConstructing a Rounding Table Contd

Optimization
Roughly three SVs, RI (0, 1, 2) are to be
calculated. And naive approach is to have
multiple adders in parallel, which is not
attractive.
Duplicate carry-lookahead networks only instead
of having multiple adders.
Have to compute the whole rounding table?
Prediction allows precomputation only the SVs
in a group.
This prediction signal (p) is just a decoded
control signal (SF?m, CF?m were used in this
paper)
Now, compound adder (CA) computes RI p (0, 1)

11
RN Even ModeConstructing a Rounding Table Contd
12
RN Even ModeConstructing a Rounding Table Contd
13
RN Even ModeDeveloping a Valid Prediction Scheme

Since prediction signal is based on SF CF,
there are 16 possible schemes, but,
SVs in all groups must be computable by the CA.
Minimize the number of SVs that need to be
precomputed.
When prediction scheme is used, CA computes CI
SI p (0, 1), so rouding table is to be
modified.
NS, c 0, m 1, (lp, s, p) 0-1
After add p bit to lp, (l, m, s) 11-
Increment required, the result is equal to RI p

14
RN Even ModeDeveloping a Valid Prediction Scheme
Contd
15
RN Even ModePerforming Rounding Digits Selection

Row selected by c, m bits
Optimizable by using prediction scheme
Column selected by c, m0, m1
Not optimizable since RS is always needed in
multiplier.
To select between entries by n, lp, s, p
Optimizable by using prediction scheme and
rounding digit selection
Since CA can compute only RI p (0, 1), RI p
2 term should be eliminated, so 26 ways for
performing RDS.

16
RN Even ModeAn Improved Implementation for FCR

Relaxing the constraint on CA,
N-bit CA calculates RIN1 and LSB of RI is set
up depending on the value of l bit.
Instead of RI 1,l 1 RI 2, set l 0l
0 RI 0, set l 1
Major advantage is ability to compute more SVs in
a group, so greater degree of freedom in
selecting the rounding digits.
Now more prediction schemes allowed.
Eight schemes, four of them do not require fix-up.

17
RN Even ModeAn Improved Implementation for FCR
Contd
18
Round to Infinity ModeOverview

RPI can be implemented as RI for ve numbers and
RZ for ve numbers
RMI can be implemented as RZ for ve numbers and
RI for ve numbers
Hence we have
RI(x) ?x?, RZ(x) ?x?

19
Round to Infinity ModeRounding Table

l 0 Add Carry bit (c) Rounding bit (r).
r added to n bit position of RI. Hence RI3
Discarded LSB leads to RI2
l 1 Add c and r
r added to n bit position of RI. Hence RI3
Discarded LSB leads to RI4

20
Round to Infinity ModeRI Prediction Table

SF.m CF.m
Both False ? RF ? 1Rows 1 or 2
1 bit True ? RF (0,2)Rows 2 or 3 or 4
2 bits True ? RF 1,2)Rows 3 or 4

21
Round to Infinity ModeRounding Table with
Prediction
22
Round to Zero ModeOverview

Simplest among IEEE modes
Truncation implies only 2 rows
Both implementations possible
No valid Prediction scheme

23
The ES Rounding Algorithm

Based on Injection Rounding
Reduces RN and RI modes to RZ mode
Adds an injection that depends on the Rounding
mode
Assumes that Sum and Carry include the Injection
beforehand

G. Even and P.-M. Seidel, A comparison of three
rounding algorithms for IEEE floating-point
multiplication, IEEE Trans. Computers , vol. 49,
pp. 638650, July, 2000
24
ES ImprovedImplementation
Implementation
25
ES Algorithm Implementation vs. Improved
Implementation

ES Algorithm
Implementation
Costly in terms of Hardware
Extra row of Half Adders
Extra full-length (N bits) shifter
Gate Delay ?
Datapath
2 Half Adders
1 Compound Adder
2-1 rounding MUX
Post-round normalization right shift

Improved Implementation
Lesser Hardware
Gate Delay ?
Datapath
1 Full Adder
1 Compound Adder
4-1 Rounding MUX
Hence Net Delay saving of 1 Full Length Shifter

26
Simple Implementation vs. Improved Implementation

Equal in terms of Hardware and Critical Timing
paths
Differ in the LSB logic
Improved calculates more SVs in a group ?
greater degree of freedom in selecting prediction
rounding digits
Improved provides solution for RI mode while
Simple does not

27
Discussion Summary

IRM has Speed advantage. Combines Rounding with
CPA stage, 3 stages lesser than Conventional
Rounding Algorithm
Tradeoff ? More Hardware
Rounding before significand normalization ?
Complicates Rounding Logic
Remedy ? Reduce number of SVs logic complexity
Judicious selection of Prediction scheme
rounding digits

28
Discussion Summary

Hardware consists of Adders to compute SVs and
Rounding Logic to select from these
Hi-speed adders are Area and Power hungry
To reduce adders, duplicate only CLA network
instead of entire adder for parallel computation
of SVs. Compound Adders thus reduce the logic
required for the Rounding table
Improved Implementation better than Simple
Implementation

Write a Comment

User Comments (0)