Efficient DAG mapping using decomposition selection and area-delay curves using a mapping graph - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient DAG mapping using decomposition selection and area-delay curves using a mapping graph

Description:

Maintain an area-delay curve at each node composed of non inferior results of matching. ... tree matching - to much crossover if non critical. Is result from ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 45
Provided by: manpree9
Category:

less

Transcript and Presenter's Notes

Title: Efficient DAG mapping using decomposition selection and area-delay curves using a mapping graph


1
Efficient DAG mapping using decomposition
selection and area-delay curves using a mapping
graph
  • Dirk-Jan Jongeneel
  • dirkjjn_at_cas.et.tudelft.nl

2
Acknowledgements
  • R.H.J.M. Otten
  • R. Brayton
  • Y. Watanabe
  • Y. Kukimoto
  • P. Sawkar
  • S. Burns

3
Agenda
  • Why a new mapping approach
  • Algorithm Lehman-Watanabe.
  • Potential problems for practical use.
  • Area-delay optimization
  • Massouds extension.
  • Implementation using fanout load and area guesses
  • Repowering possibilities.
  • Heuristic for backward traversal for optimal
    cover selection to get multiple design points.
  • Results
  • Encountered problems/Potential solutions

4
Technology mapping
  • Input
  • Technology independent optimized logic network.
  • Description of gates in a library with there
    costs.
  • Output
  • Net list of gates (from the library) which
    minimize total cost .
  • General approach
  • Construct a subject DAG for the network.
  • Represent each gate in target library by pattern
    DAGs.
  • Find an optimal-cost covering of the subject DAG
    using the collection of pattern DAGs.

5
Current Mapping strategies
  • Complexity of DAG covering
  • NP-Hard
  • Remains NP-hard even when the nodes have degree ?
    2.
  • Tree-mapping proposed for optimal min area cover
    and later also used for min delay Keutzer.
  • If subject DAG and pattern DAGs are trees, an
    efficient algorithm to find the best cover
    exists.
  • based on dynamic programming algorithm.
  • DAG-mapping is possible for optimal min delay
    Kukimoto.
  • The subject DAG is not broken into trees and the
    matching part of the algorithm is slightly
    modified.

6
  • Normal approach
  • Phase 1 Technology independent optimization
  • commit to a particular Boolean network.
  • algebraic decomposition is used.
  • Phase 2 AND2/INV decomposition
  • commit to a particular decomposition of a general
    Boolean network using 2-input ANDs and Inverters.
  • Phase 3 Technology mapping
  • a two step dynamic programming algorithm is
    used
  • From PI to PO for all nodes find all the matches
    at a node with their costs using tree-matching
    and select the one with lowest cost.
  • From PO to PI select the best match at a node to
    cover a part of the subject DAG and continue
    recursively at the inputs of the current selected
    match.

7
Current drawback and solution using
Lehman-Watanabe Method
  • Drawbacks Procedures in each phase are
    disconnected resulting in optimal sub-results but
    possible sub-optimal overall result.
  • Phase 1 and 2 make critical decisions about
    algebraic and AND2/INV decompositions without
    knowing much about constraints and library.
  • Phase 3 knows about the constraints and library
    but the solution space has already been limited
    by the decisions made earlier.
  • Lehman-Watanabe Method.
  • Efficiently encode a set of AND2/INV
    decompositions into a single structure called a
    mapping graph.
  • Apply a modified tree-based or a new partial
    technology mapper while dynamically performing
    algebraic logic decomposition on the mapping
    graph.
  • DAG-mapping is naturally introduced

8
Mapping graph AND2/INV decompositions
  • f abc can be represented in various ways.
  • We can combine them with a choice node.

9
Mapping graph AND2/INV decompositions
  • This can compactly be represented by this.
  • Which also encode the new following
    decomposition.

10
Mapping graph AND2/INV decompositions
  • The complete decomposition in Ugates is

11
Mapping graph AND2/INV decompositions
  • The mapping graph is a modified Boolean network
  • Choice node Makes choices possible between
    different decompositions.
  • Cyclic Functions written in terms of each
    other, e.g. inverter chain with arbitrary length.
  • Reduced No two choice nodes with the same
    functions. No two AND2s with same fanins.
  • Ugates Efficient implementation because of
    regularity.
  • For cht benchmark (MCNC91), there are 2.2 x 1093
    AND2/INV decompositions. All are encoded only
    with 400 ugates containing 599 AND2s in total.

12
Tree-mapping on a Mapping Graph
  • Every time a choice-node is reached an input is
    selected and tree matching continues as usual.
  • For every choice-node all inputs have to be
    tried.
  • Cycles may occur iterate until costs are stable.
  • There are the inverter cycles and cycles
    introduced by reduction and multiple encoding
    fgh1 and gfh2.
  • DAG-mapping is automatically introduced
  • Because the number of fanouts is unknown during
    mapping splitting in trees is not possible so
    that matches passes multi fanout points resulting
    in DAG-mapping.
  • Select the cover as usual.

13
Example Tree-mapping
  • Best choice if c is later than a and b.
  • subject Graph library pattern graph
  • i3 is faster than i1 and i2.

14
Graph-Mapping Theory
  • Graph-mapping(?) min ( tree-mapping(?) )
  • ???
  • ? mapping graph
  • ? AND2/INV decomposition encode in ?
  • Graph-mapping finds an optimal tree
    implementation for each primary output over all
    the AND2/INV decompositions encoded in ?.
  • Graph-mapping is as powerful as applying
    tree-matching exhaustively, but is typically
    exponentially faster.

15
Lambda and Delta Mapping
  • Lambda mapping
  • 1 encode all the AND2 decompositions of the
    product terms and then all the sum terms for all
    the nodes.
  • 2 Apply graph-mapping.
  • Takes together phase 2 and 3 (AND2/INV
    decomposition and mapping)
  • Delta mapping
  • 1 encode all the AND2 decompositions of the
    product terms and then all the sum terms for all
    the nodes.
  • 2 Iteratively apply graph-mapping and logic
    decomposition until nothing changes any more.
  • Takes together phase 1, 2 and 3(algebraric
    optimization, AND2/INV decomposition and
    mapping)

16
Dynamic Logic Decomposition
  • During mapping find D-patterns and add
    corresponding F-pattern dynamically.
  • D-pattern ab ab F-pattern a(bc)
  • If a is critical F-pattern is usually better.

17
Dynamic Logic Decomposition
  • D-pattern search and F-pattern adding in a Graph.
  • note adding a F-pattern may introduce a new
    D-pattern

18
Example choosing the right decomposition
  • Tree-matching on graph and AND2/INV
    decompositions.
  • AND8 node with arrival time a3delay(AND2).

19
Possible problems for practical use
  • The size of the graph becomes larger depending on
    initial node size N needing more memory.
  • The size of the choice nodes become larger
    depending on initial size N, slowing down
    tree-matching.

20
  • These two problems can partially be solved by
    choosing a value for N.
  • Large value More memory and longer run time but
    much more possibilities to find matches
    resulting in a better cover.
  • Small value Less memory and smaller run time but
    not such a good cover.
  • Another possibility is the use of Partial
    Matching.
  • Depending on library model and delay model the
    same matches can be found much faster because of
    pruning except for leaf DAGs.
  • Worse case AND10 decomposition and AND4 cell
  • tree-matching 61 sec
  • partial-matching 0.74 sec
  • Disadvantage as soon as we do some more complex
    modeling it becomes an approximation and we might
    prune away potentially better matches.
  • We save at each node all different partial
    matches causing an increase in memory use.

21
Partial matching
  • The library
  • A library cell is composed out of its
    root-cell of its AND2/INV decomposition and the
    partials that represent its inputs.

22
Example Partial matching
  • At each node try to find all the partial matches
    by combining all the partials at the inputs of
    the root-cell. Then evaluate the partials that
    are also complete matches and save the best as
    -.

23
Condition for equality
  • Partial match and tree match will give equal
    results if we assume that
  • Delays of all inputs to output of the library
    cell are equal under all conditions.
  • partial1 i11, i24 (better ? worse)
    partial2 i12, i23
  • Different input output delaysand2 a2, b2
    -gt2 and4 a1, b4, c4, d4 -gt1
  • Change due to load dependencyand2 a2, b2
    -gt2 loaded and2 a3, b6 -gt1
  • The area of the match is the sum of the area of
    the inputs and the cell itself.
  • Leaf DAGs are not possible because there is no
    relation known between the partials of two inputs.

24
Advantages of Graph-mapping
  • Optimal decomposition is chosen with respect to
    constraints.
  • Dynamic decomposition can do a better job than
    technology independent optimization.
  • Encoding more initial circuits possible.
  • Sharing is maximized resulting in lower area.
  • Has potential for interesting repowering
    decisions during matching.
  • Tradeoffs possible between runtime, quality and
    memory

25
Area-Delay estimation
  • Massouds proposal for area-delay tradeoff
  • Maintain an area-delay curve at each node
    composed of non inferior results of matching.
  • Solution(t1,a1) is non-inferior if there is no
    solution (t2,a2) such that t2ltt1 and a2lta1
    OR t2ltt1 and a2lta1.
  • Use a delta-t to cut down the number of points in
    the curve because combining curves could give an
    exponentially increase each stage.
  • During selection of matches to cover the graph
    select the match that meets required timing, and
    recurs as usual.
  • To improve results loads can be used as soon as
    they are known.
  • Outside the critical path we can now select
    smaller covers.
  • Optimal only for trees under no load condition.
  • DAG approximation possible using an area guess of
    area 1/n for an n fanout point to encourage
    sharing.
  • Load can be approximated by a ndefault value,
    and could be corrected for real load during
    matching and covering.

26
Example using area-delay curve
  • Combine the points of the two inputs to create a
    new curve with non inferior points.
  • At cover selection we note a non critical input
    (A) and select a mach with lower area.
  • Result area7.5 -gt normally would be 9.

27
Implementation using area and fanout load guesses
  • We use as area guess area1/n to encourage
    sharing, but n is special.
  • n is the number of fanouts of the node in the
    original network. The number of real fanouts in
    the graph cant be used because it is unknown how
    many of them will actually be used.
  • Nodes inside a decomposition will most likely
    only be used to get the best decomposition thus
    n1
  • When a part of a match is crossing a multi fanout
    point the area of these inputs of the match are
    also multiplied by 1/n where n is the value at
    the multi fanout point.
  • The assumption is that after reconversion the
    divided areas add up to the original value again

28
Implementation using area and fanout load guesses
  • To account for load to be able to keep the
    non-inferior points of the curve and put them in
    the right place we use the a load guess nlavr
  • n is the same as in the area guess.
  • lavr should be about equal for all library gates.
  • PO loads are directly used to get an exact as
    possible result. They can be considerably higher.
  • If a match crosses a multi fanout point the
    delays of the inputs that cross are increased.
    This accounts for the fact that now possibly the
    load at nodes inside the decomposition are not
    1lavr any more but could be used more than one
    time.

29
Reduction of curve/runtime
  • Area spacing based reduction to a certain
    limit,say 10, such that for a n-input cell at
    the most 10n number of matches have to be
    evaluated.
  • If area difference lt2 of the area -gt delete the
    slowest
  • increase percentage until number of points
    ltlimit.
  • More points gives exponential growing runtime.
  • Tradeoff possible between runtime and quality of
    the result.
  • If we do this based on time we get a curve with a
    few fast points and a lot slow points with almost
    equal area.

30
Repowering opportunities
  • Several different repowering possibilities will
    be in the curve and the best one under the
    current conditions will be chosen.
  • Capacitance-splitting Serial repowering

31
Repowering opportunities
  • Complex repowering

32
Heuristics for cover selection.
  • Select match which meet timing and calculate
    required times for its fanin.
  • Do parallel repowering in case of critical path.

33
Results
  • Mcnc test cases
  • use modified lib2.genlib (input-output delays are
    equal)
  • using partial matching (equal to tree because
    of library)
  • Nmax10
  • using load and area guesses

34
Results
  • Compare results of DAG mapping with DAG mapping
    with area recovering and Graph mapping with area
    recovery.

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
  • Difference between partial and tree matching when
    crossing multi fanout.
  • partial matching -gt hardly crossover
  • tree matching -gt to much crossover if non
    critical
  • Is result from model used at crossover.
  • Partial matching Crossover guesses only at
    complete matches -gt early prunes out matches that
    cross.This can be improved by using crossing
    information also for partials.
  • Tree matching Crossover matches are competing
    with matches at the fanout point -gt they are some
    faster but have about equal area guess.Crossover
    matches should only be used for critical paths
    and if we therefore make the area some larger
    they will not compete with the other matches at
    non critical paths.
  • Important keep comparisons fair not only for
    cone but also between cones.

40
Encountered problems/Potential solutions
  • Runtime is a problem using tree match and quality
    using partial match.
  • C1355 tree -gt gt12 hours. partial -gt 10 min.
  • Tree matching is very simple implementation.
  • Explore faster techniques using properties of the
    graph.
  • Is needed now for leaf-DAGs (but MUX4 is
    already disaster)
  • Partial matching
  • Approximate library cells by one delay for inputs
    - output.
  • Use tree matching if one input is really much
    slower.
  • Use partial ordering for area delay and 1,4 lt-gt
    2,3 problem.

41
  • BDDs are used for reduction of the Graph causing
    a problem in case of blow up.
  • This often occurs when there are a lot of
    XOR/selector type of gates.
  • Not always using BDDs or doing something about
    ordering could be a solution.
  • It is possible not to use BDDs, but then sharing
    will be less. Multiple encoding of networks is
    not possible.

42
  • Cycles or loops are difficult in matching and
    covering.
  • Large loops have to be matched by iteration until
    nothing changes, giving problem for area and load
    guesses.
  • The inverter cycle exists always and has a
    typical problem for the load guesses because of
    inverters following each other after connecting
    two matches.
  • Extended data structures are needed to store more
    information.
  • During iteration we have to keep track of what
    information has come from what assumption
    considering fanouts.
  • For the inverter cycle we need an extended data
    structure to take into account where a match
    comes from and where it connects to.

43
  • Inverter problems for guesses.
  • Multi fanout point f Nand match crossing f -gt
    increase delay
  • But if we add an inverter we end at the multi
    fanout f and add delay again
  • Do not count in case of Nand. But if connected to
    Nor, delay should increase to favor sharing.

44
Conclusion
  • Graph mapping has the potential of finding
    smaller and/or faster results.
  • Offers different design points to chose from.
  • Run time, Quality and memory use are adjustable
    and an trade off between each other.
  • Could be extended to make also decisions about
    other aspects, such as power.
Write a Comment
User Comments (0)
About PowerShow.com