Layout Driven Data Communication Optimization for High Level Synthesis - PowerPoint PPT Presentation

About This Presentation
Title:

Layout Driven Data Communication Optimization for High Level Synthesis

Description:

Semi-pruned insert -node at. IDF if variable live outside some basic block ... insert(R) into -options. foreach instruction i R. if( i is a destination of ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 27
Provided by: RyanKa
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Layout Driven Data Communication Optimization for High Level Synthesis


1
Layout Driven Data Communication Optimization for
High Level Synthesis
Adam Kaplan, Philip Brisk and Majid Sarrafzadeh
Computer Science Department University of
California, Los Angeles
  • Ryan Kastner, Wenrui Gong,
  • Xin Hao, Forrest Brewer
  • Dept. of Electrical and
  • Computer Engineering
  • University of California,
  • Santa Barbara

2
High Level Synthesis
  • Input Application description written in C (C,
    SystemC, HandelC, SpecC)

Internal filter of an image convolver
Maximize performance (area, latency, power, )
subject to input constraints
3
Target Architectures
  • Spatial architectures
  • Local control between data path, global data flow
    between control nodes
  • Lots of distributed computational units, memory
  • Coarse/fine grained reconfigurable architectures
  • Techniques could be used for other architectures
  • May not make sense
  • Our design flow has little resource sharing

Coarse grain programmable platform
Fine grain configurable platform
4
Obligatory Design Flow Slide
5
Design Example
int FAST(real b, int n) real fn int i,
in, nn, n2pow, n4pow, nthpo n2pow
fastlog2(n) if(n2pow lt 0) return 0 nthpo
n fn nthpo n4pow n2pow / 2 /
radix 2 iteration required do it now /
if(n2pow 2) nn 2 in n / nn
FR2TR(in, b, b in) else nn 1
  • FAST function from MediaBench
  • Some nodes missing - simple computation, merged
    into others
  • Lines below show data communication

Node 1
Node 2
Node 3
Node 4
Node 5
/ perform radix 4 iterations / for(i 1 i
lt n4pow i) nn 4 in n / nn
FR4TR(in, nn, b, b in, b 2 in, b 3
in) / perform inplace reordering /
FORD1(n2pow, b) FORD2(n2pow, b) / take
conjugates / for(i 3 i lt n i 2) bi
-bi return 1
Node 6
Node 7
Node 8
Node 9
Node 10
6
Characterizing Data Communication
  • Examples of data communication schemes

Memory (Register Bank, RAM)
Bus
Control Node 3
Control Node 2
Control Node 2
Control Node 3
Control Node 4
Control Node 4
Distributed
Centralized
Data communication wire
Data communication memory access
7
Identifying Data Communication
  • Determine relationship between place(s) where
    data is defined and where data is used

a ?
b ?
  • Naïve method all use-points of a variable
    depend on all definitions of that variable
  • Not all use points use a variable

a ?
b ?
a ?
c ?
? b
? c
? a
Need analysis to minimize the amount of data
communication
8
Use of SSA in Compilation
  • Must determine relationship between where data is
    generated and where data is used
  • Problem formulations
  • DAC03 Minimize the total number of bits
    communicated between all pairs of control nodes
  • Today Minimize overall wirelength
  • SSA (Static Single Assignment)
  • Changes each variable to have a unique definition
    point
  • Must add ?-nodes to merge definitions

9
SSA Fundamentals
  • SSA algorithms
  • Find location of ?-nodes
  • Rename variables
  • Three main SSA algorithms
  • Minimal, Pruned Cytron et al.
  • Semi-pruned Briggs et al.
  • Differ in number and location of ?-nodes
  • Minimal insert ?-nodes at
  • iterated dominance frontier (IDF)
  • Semi-pruned insert ?-node at
  • IDF if variable live outside some basic block
  • Pruned insert ?-node at
  • IDF if variable live at that time

10
Results SSA for Data Comm. Minimization
  • Edge Weight w(i,j) number of bits communicated
    from node i to j
  • Total Edge Weight (TEW) - corresponds to amount
    of data communication

MediaBenchmarks
11
Further Minimizing Data Communication
  • Current SSA algorithms place ?-nodes temporally
  • In software compilation, live ranges should be
    short
  • Appropriate in hardware?

Spatial ?-node distribution
Temporal ?-node distribution
a1 ?
b1 ?
a2 ?
b2 ?
a3 ?
c1 ?
? b1
? c1
TEW 3
a4 ? ?(a2,a3)
? a4
12
Spatial ?-nodes Distribution Algorithm
  • d number of uses of ?-node destination
  • s number of ?-node source values
  • Number of temporal links
  • Number of spatial links

s 3
a3??(a0,a1,a2)
? a3
? a3
d 2
Optimal assuming ideal n-dimensional floorplan
13
Physically Aware Compiler Transforms
  • Consider layout information during compilation
  • Modify transforms to consider physical info
  • Ideal full physical synthesis extremely
    accurate, but way too time consuming
  • Approximate using floorplanning
  • Much faster
  • Gives good enough high level physical picture

application
Hardware Compilation
  • Our previous data comm. work
  • No physical information
  • Can lead to negative results

Physical Synthesis
14
Physically Aware Data Communication
  • Modify placement of F-functions to consider
    wirelength

?-Placement Algorithm
  1. Given a CFG Gcfg(Vcfg, Ecfg)
  2. perform_ssa(Gcfg)
  3. calculate_def_use_chains(Gcfg)
  4. remove_back_edges(Gcfg)
  5. topological_sort(Gcfg)
  6. foreach vertex v ? Vcfg
  7. foreach ??-node?? ? v
  8. s ? ??.sources
  9. d ? def_use_chain(?.dest)
  10. IDF ? iterated_dominance_fronter(s)
  11. PossiblePlacements ?
    findPlacementOptions(IDF)
  12. place(?) ?
    selectBest(PossiblePlacements)
  13. distribute/duplicate ? to place(?)?

15
Algorithm in Action
a1 ?
  • Evaluate all options for ?-nodes
  • Replicate ? when necessary
  • Limit amount of replication - most often leads to
    more wirelength
  • Can play tricks to limit redundant placements

b1 ?
a2 ?
b2 ?
a3 ?
c1 ?
? b1
? c1
Traditional (temporal)
Traditional (temporal)
a4 ? ?(a2,a3)
Any of these options could yield the best
wirelength Highly dependent on the floorplan
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
Spatial DAC03
Spatial DAC03
? a4
16
Algorithm in Action
  • FAST function from MediaBench testsuite

17
Algorithm in Action
18
Full Floorplanning Results
  • Simple iterative approach

Spectacularly negative results
  1. Initial optimization minimizes data communication
  2. Full SA based floorplanning
  3. Reoptimization based to minimize floorplanning
  4. Full SA based floorplanning

19
Incremental Floorplanning
  • Incremental Placement Coudert et al
  • Given an optimized placement and a set of changes
    to the netlist (e.g., due to technology
    remapping) modify the placement to improve it.
  • Equally applicable to floorplanning

Initial Floorplan
Modified Floorplan
Perturbations
6
20
Our Incremental Floorplanner
Initial Floorplan
Modified Floorplan
Perturbations
6
Incremental Floorplan

32/36 -
Incremental Floorplanner
27/30.4 -
-
1
5/5.6 -
4
16/18 -
-
11/12.4 -
3
2
2/2.3 -
9/10.1 -
21
Our Incremental Floorplanner
  • Calculate area room of each node bottom up
    slicing tree traversal
  • Area redistribution
  • Top down traversal
  • Increase area if necessary
  • Not enough space at root
  • Aspect ratios become too distorted

Simple, yet effective Other more complicated
algorithms might work better
Modified Floorplan
Incremental Floorplan

32/36 -
27/30.4 -
-
1
5/5.6 -
4
16/18 -
-
11/12.4 -
3
2
2/2.3 -
9/10.1 -
22
MediaBench Functions
Benchmark Blocks ? Links Weight Initial WL
1 adpcm coder 33 31 54 2688 35568
2 adpcm decoder 26 23 44 1952 21588
3 internal filter 10 143 60 17088 411637
4 Internal expand 101 94 257 14336 317031
5 compress output 34 17 60 2368 29114
6 mpeg2dec block 62 13 66 2272 34510
7 mpeg2dec vector 16 4 26 1024 4366
8 FAST 14 4 15 704 3714
9 FR4TR 77 87 155 704 340697
10 det 12 5 13 7936 3772
23
Incremental Floorplanning Results
Optimal Approach 12 Overall Wirelength
Reduction 25 Phi-node Wirelength Reduction
Normalized Wirelength
Our Approach 6 Overall Wirelength Reduction 8
Phi-node Wirelength Reduction
avg
Benchmarks
24
Related Work
  • Hardware compilation projects using SSA
  • PDGSSA form UCSB
  • CASH CMU
  • SA-C UCR
  • Sea Cucumber BYU
  • Physically aware behavioral synthesis techniques
  • SA for scheduling, binding and floorplanning
    Prabhakaran97
  • SA for binding and floorplanning Yung-Ming94
  • Scheduling, allocation and binding Dougherty00
  • Fasolt bus topology Knapp92
  • High level synthesis Tarafdar00
  • Incremental CAD
  • Problem overview/challenges Coudert00
  • Floorplanning Crenshaw99

25
Conclusions
  • Its been a long strange trip
  • SSA a nice IR for hardware compilation
  • Explicitly shows data flow
  • Useful for exploiting parallelism
  • Compiler techniques applied to hardware design
    can reduce wirelength
  • They must be aware of physical information
  • They must use an incremental floorplanning

26
Questions?
  • (and cue for applause)
Write a Comment
User Comments (0)
About PowerShow.com