Ajay K' Verma and Paolo Ienne - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Ajay K' Verma and Paolo Ienne

Description:

Ajay K. Verma and Paolo Ienne. Processor Architecture Laboratory (LAP) ... Issue: AND operator is idempotent. Reduce the final expression with respect to (x2 - x) for ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 27
Provided by: josed83
Category:

less

Transcript and Presenter's Notes

Title: Ajay K' Verma and Paolo Ienne


1
Towards the Automatic Exploration of
Arithmetic-Circuit Architectures
  • Ajay K. Verma and Paolo Ienne
  • Processor Architecture Laboratory (LAP)
  • Centre for Advanced Digital Systems (CSDA)
  • Ecole Polytechnique Fédérale de Lausanne (EPFL)

2
Example Plenty of Different Adders
0.49 ns 691 µm2
0.41 ns 385 µm2
Problem How can we obtain automatically the best
fitting implementation in this space?
0.34 ns 534 µm2
3
Typical Synthesis Methods
  • Write all the expressions in sum of product form.
  • Find all the Kernels and Cokernels of the
    expressions.
  • Formulate the problem as a Rectangle Cover
    Problem.
  • Use heuristics to solve the Rectangle Cover
    Problem.

a
de
bc
f
a
0
0
2
1
x af abc def bcde
0
0
4
3
de
(a de) (bc f)
2
4
0
0
bc
0
0
f
1
3
4
Limitations of Typical Methods
  • All expressions should be in sum of product form.
  • Arithmetic expressions are XOR-intensive.
  • Kernel extraction is based on algebraic
    factoring.
  • Expressions and are considered
    independent (? unable to explore all common
    subexpressions).
  • Rectangle Cover Problem solved with heuristics
    which therefore cannot guarantee optimal results.

5
Related Work
  • Multi-level optimization and Boolean division
  • Classic problem Brayton82, Brayton90,
    DeMicheli94,
  • Boolean division improvements Chang99,
  • Optimization of specific arithmetic circuits
  • Final adders for multipliers Lee91
  • Column compressors Stelling98
  • Carry-save addition Verma04
  • Symbolic algebra
  • Various applications to EDA Peymandoust01

6
Outline
  • Problem formulation.
  • Introduction of a different division.
  • Core problem finding CSEs.
  • Enumeration of all possible CSEs.
  • Pruning the search space.
  • Results and analysis.

7
Problem Statement
  • Pareto-optimal implementation An implementation
    which is better than any other in terms of area
    or critical-path delay.

Given a set of Boolean expressions, generate all
their Pareto-optimal implementations.
8
Gröbner Bases and Division
  • A well known method for multinomial division
    using the remainder theorem.

reduce (f, g)
Algebraic factoring
9
Gröbner Bases for Boolean Algebra
  • Boolean algebra does not form a ring under the
    operations AND and OR.
  • Neither of the two operations is invertible.
  • But Boolean algebra forms a ring over the field
    GF(2) underthe operations AND and XOR.
  • Operation XOR is self-invertible.
  • Reed-Muller form has no NOT operation.
  • Reed-Muller form of an expression is unique.
  • Expected size of an expression in Reed-Muller
    form is smaller than the expected size in sum of
    product form.
  • Issue AND operator is idempotent.
  • Reduce the final expression with respect to (x2 -
    x) forunderlying variables x.

10
Two Theorems and Their Consequences
  • Theorem 1 In any Pareto-optimal implementation
    of E1 and E2 , where they use S as a Common
    Sub-Expression, the implementation of S must be
    Pareto-optimal.

The problem has a dynamic programming structure.
11
Two Theorems and Their Consequences
  • Theorem 2 If there are m Pareto-optimal
    implementations of E1 and n Pareto-optimal
    implementations of E2 which use Sk as the
    implementation of their CSE, then by considering
    only (m n) combinations of these
    implementations we can find all Pareto-optimal
    implementations using Sk .

E2
E1
1 (20)
1 (30)
Area
2 (16)
8 (38)
3 (25)
4 (15)
8 (22)
5 (14)
10 (20)
7 (13)
8 (35)
Delay
9 (12)
12
Hence, Two Independent Problems
  • Problem 1 Given two Boolean expressions E1 and
    E2 , find all possible Common Sub-Expressions
    between them.
  • Problem 2 Find all Pareto-optimal
    implementations of a single Boolean expression E.

13
Problem 1 Enumerating CSEs
The nodes of the DAG correspond to all partial
implementations of the two expressions with some
sharing between them.
14
Replacing Partial Occurrences Can Be Useful
Partial occurrences can also be replaced by a new
variable (e.g., s x ? y ? x (x ? y) ? y
s ? y). Kernel extraction algorithms cannot be
used.
Replacing partial occurrences too
Without replacing partial occurrences
3 XOR gates, 2 AND gates
3 XOR gates, 3 AND gates
15
t-Reductions Are Necessary
t bd
t -reductions preserve the min delay at least in
one path.
16
Pruning the Enumeration DAG
  • The size of DAG can be as large as O ((n m)
    2m), where n is the number of variables and m is
    the sizes of Boolean expressions.
  • Enumerating the whole DAG is computationally
    infeasible.
  • Pruning Criteria.
  • Recognizing node equivalence (width reduction).
  • Merging some reductions into a single one(height
    reduction).
  • Delaying certain reductions (branch reduction).

17
Pruning Based on Node Equivalence (Width
Reduction)

s5 ? abcd, s6 ? abcd, s7 ? s1cd s5 s6 ?
s7
18
Neutral t-Reductions Should Be Applied
Immediately (Height Reduction)
s-reduction
  • Neutral t-reductions
  • A t-reduction which does not kill any s
    -reduction.

t-reduction
  • Recognition of neutral t -reductions
  • Find all reductions which are killed by this
    reduction.
  • Check if any of them is an s -reduction.

Normalization
Normalization
19
Nonneutral t-Reductions Should Be Delayed (Branch
Reduction)
  • The only purpose of t -reductions is to preserve
    the minimum delay in at least one path.
  • If there exist at least one s -reduction which
    preserves the minimum delay, any t -reduction at
    the current node can be avoided.
  • Computing the minimum delay corresponding to a
    Boolean expression is NP-hard.
  • Not all instances of Boolean expressions are hard
    to compute the minimum delay.
  • E.g., the minimum delay of a Boolean expression
    A1 A2 An can be computed using a
    two-greedy approach, where Ais are product terms
    with disjoint set of variables.

20
Two Independent Problems
  • Problem 1 Given two Boolean expressions E1 and
    E2 , find all possible Common Sub-Expressions
    between them.
  • Problem 2 Find all Pareto-optimal
    implementations of a single Boolean expression E.

21
Problem 2Special Case of the General Problem
  • All the Pareto-optimal implementations of a
    single expression can be evaluated using DAG
    enumeration.
  • s - and t -reductions can be defined in a similar
    way.
  • If the corresponding expression occurs more than
    once then its an s -reduction, otherwise t
    -reduction.
  • If there no s -reductions in the DAG, then all
    implementations will have the same area.
  • The Pareto-optimal implementation will correspond
    to the one with minimum delay and can be computed
    using a two-greedy strategy.

22
Experimental Setup
E1 f (x1, x2, ) E2 g (x1, x2, ) E3 h
(x1, x2, )
Conversion into Reed-Muller form
Logic synthesis
E1 f1 (x1, x2, ) E2 g1 (x1, x2, ) E3 h1
(x1, x2, )
CSE enumeration
(E11, E21, E31), (E12, E22, E32),
Logic synthesis
Artisan Standard Cells UMC CMOS Technology 0.13µm
23
Results
6-bit Adder
5-bit Adder
Multi-input Addition
4 X 3-bit Multiplier
24
There Is Scope for Further Pruning
Area and Delay for all 6-bit adders generated by
our algorithm
Without any pruning it is impossible to handle
expressions with more than five variables.
25
but the Enumeration Algorithm Finds Interesting
Non-trivial Relations!
4x4-bit multiplier better than our best
manually-designed multiplier?!
Idea Exploit complex dependencies among the
partial product buts of a multiplier
26
Conclusions
  • We have exploited a new form of division which is
    better than algebraic division and still less
    complex than Boolean division.
  • Key to a better exploitation of Common
    Sub-Expressions (CSEs).
  • We have introduced a CSE enumeration algorithm
    which discovers all architectures. Unfortunately,
    it is still very slow.
  • More effective pruning strategies are required,
    especially based on the inferiority of some
    implementations still explored.
  • Despite the runtime limitations, this exploration
    algorithm has already made it possible to study
    innovative architectures.
  • Exploit dependency among input bits in the
    compressors of multipliers.

Many opportunities lay still untapped in the
synthesis of arithmetic components.
Write a Comment
User Comments (0)
About PowerShow.com