Layout Driven Data Communication Optimization for High Level Synthesis

About This Presentation

Title:

Layout Driven Data Communication Optimization for High Level Synthesis

Description:

Semi-pruned insert -node at. IDF if variable live outside some basic block ... insert(R) into -options. foreach instruction i R. if( i is a destination of ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 27

Provided by: RyanKa

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Layout Driven Data Communication Optimization for High Level Synthesis

1
Layout Driven Data Communication Optimization for
High Level Synthesis
Adam Kaplan, Philip Brisk and Majid Sarrafzadeh
Computer Science Department University of
California, Los Angeles

Ryan Kastner, Wenrui Gong,
Xin Hao, Forrest Brewer
Dept. of Electrical and
Computer Engineering
University of California,
Santa Barbara

2
High Level Synthesis

Input Application description written in C (C,
SystemC, HandelC, SpecC)

Internal filter of an image convolver
Maximize performance (area, latency, power, )
subject to input constraints
3
Target Architectures

Spatial architectures
Local control between data path, global data flow
between control nodes
Lots of distributed computational units, memory
Coarse/fine grained reconfigurable architectures
Techniques could be used for other architectures
May not make sense
Our design flow has little resource sharing

Coarse grain programmable platform
Fine grain configurable platform
4
Obligatory Design Flow Slide
5
Design Example
int FAST(real b, int n) real fn int i,
in, nn, n2pow, n4pow, nthpo n2pow
fastlog2(n) if(n2pow lt 0) return 0 nthpo
n fn nthpo n4pow n2pow / 2 /
radix 2 iteration required do it now /
if(n2pow 2) nn 2 in n / nn
FR2TR(in, b, b in) else nn 1

FAST function from MediaBench
Some nodes missing - simple computation, merged
into others
Lines below show data communication

Node 1
Node 2
Node 3
Node 4
Node 5
/ perform radix 4 iterations / for(i 1 i
lt n4pow i) nn 4 in n / nn
FR4TR(in, nn, b, b in, b 2 in, b 3
in) / perform inplace reordering /
FORD1(n2pow, b) FORD2(n2pow, b) / take
conjugates / for(i 3 i lt n i 2) bi
-bi return 1
Node 6
Node 7
Node 8
Node 9
Node 10
6
Characterizing Data Communication

Examples of data communication schemes

Memory (Register Bank, RAM)
Bus
Control Node 3
Control Node 2
Control Node 2
Control Node 3
Control Node 4
Control Node 4
Distributed
Centralized
Data communication wire
Data communication memory access
7
Identifying Data Communication

Determine relationship between place(s) where
data is defined and where data is used

a ?
b ?

Naïve method all use-points of a variable
depend on all definitions of that variable
Not all use points use a variable

a ?
b ?
a ?
c ?
? b
? c
? a
Need analysis to minimize the amount of data
communication
8
Use of SSA in Compilation

Must determine relationship between where data is
generated and where data is used
Problem formulations
DAC03 Minimize the total number of bits
communicated between all pairs of control nodes
Today Minimize overall wirelength
SSA (Static Single Assignment)
Changes each variable to have a unique definition
point
Must add ?-nodes to merge definitions

9
SSA Fundamentals

SSA algorithms
Find location of ?-nodes
Rename variables
Three main SSA algorithms
Minimal, Pruned Cytron et al.
Semi-pruned Briggs et al.
Differ in number and location of ?-nodes
Minimal insert ?-nodes at
iterated dominance frontier (IDF)
Semi-pruned insert ?-node at
IDF if variable live outside some basic block
Pruned insert ?-node at
IDF if variable live at that time

10
Results SSA for Data Comm. Minimization

Edge Weight w(i,j) number of bits communicated
from node i to j
Total Edge Weight (TEW) - corresponds to amount
of data communication

MediaBenchmarks
11
Further Minimizing Data Communication

Current SSA algorithms place ?-nodes temporally
In software compilation, live ranges should be
short
Appropriate in hardware?

Spatial ?-node distribution
Temporal ?-node distribution
a1 ?
b1 ?
a2 ?
b2 ?
a3 ?
c1 ?
? b1
? c1
TEW 3
a4 ? ?(a2,a3)
? a4
12
Spatial ?-nodes Distribution Algorithm

d number of uses of ?-node destination
s number of ?-node source values
Number of temporal links
Number of spatial links

s 3
a3??(a0,a1,a2)
? a3
? a3
d 2
Optimal assuming ideal n-dimensional floorplan
13
Physically Aware Compiler Transforms

Consider layout information during compilation
Modify transforms to consider physical info
Ideal full physical synthesis extremely
accurate, but way too time consuming

Approximate using floorplanning
Much faster
Gives good enough high level physical picture

application
Hardware Compilation

Our previous data comm. work
No physical information
Can lead to negative results

Physical Synthesis
14
Physically Aware Data Communication

Modify placement of F-functions to consider
wirelength

?-Placement Algorithm

Given a CFG Gcfg(Vcfg, Ecfg)
perform_ssa(Gcfg)
calculate_def_use_chains(Gcfg)
remove_back_edges(Gcfg)
topological_sort(Gcfg)
foreach vertex v ? Vcfg
foreach ??-node?? ? v
s ? ??.sources
d ? def_use_chain(?.dest)
IDF ? iterated_dominance_fronter(s)
PossiblePlacements ?
findPlacementOptions(IDF)
place(?) ?
selectBest(PossiblePlacements)
distribute/duplicate ? to place(?)?

15
Algorithm in Action
a1 ?

Evaluate all options for ?-nodes
Replicate ? when necessary
Limit amount of replication - most often leads to
more wirelength
Can play tricks to limit redundant placements

b1 ?
a2 ?
b2 ?
a3 ?
c1 ?
? b1
? c1
Traditional (temporal)
Traditional (temporal)
a4 ? ?(a2,a3)
Any of these options could yield the best
wirelength Highly dependent on the floorplan
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
a4 ? ?(a2,a3)
Spatial DAC03
Spatial DAC03
? a4
16
Algorithm in Action

FAST function from MediaBench testsuite

17
Algorithm in Action
18
Full Floorplanning Results

Simple iterative approach

Spectacularly negative results

Initial optimization minimizes data communication
Full SA based floorplanning
Reoptimization based to minimize floorplanning
Full SA based floorplanning

19
Incremental Floorplanning

Incremental Placement Coudert et al
Given an optimized placement and a set of changes
to the netlist (e.g., due to technology
remapping) modify the placement to improve it.
Equally applicable to floorplanning

Initial Floorplan
Modified Floorplan
Perturbations
6
20
Our Incremental Floorplanner
Initial Floorplan
Modified Floorplan
Perturbations
6
Incremental Floorplan

32/36 -
Incremental Floorplanner
27/30.4 -
-
1
5/5.6 -
4
16/18 -
-
11/12.4 -
3
2
2/2.3 -
9/10.1 -
21
Our Incremental Floorplanner

Calculate area room of each node bottom up
slicing tree traversal
Area redistribution
Top down traversal
Increase area if necessary
Not enough space at root
Aspect ratios become too distorted

Simple, yet effective Other more complicated
algorithms might work better
Modified Floorplan
Incremental Floorplan

32/36 -
27/30.4 -
-
1
5/5.6 -
4
16/18 -
-
11/12.4 -
3
2
2/2.3 -
9/10.1 -
22
MediaBench Functions
Benchmark Blocks ? Links Weight Initial WL
1 adpcm coder 33 31 54 2688 35568
2 adpcm decoder 26 23 44 1952 21588
3 internal filter 10 143 60 17088 411637
4 Internal expand 101 94 257 14336 317031
5 compress output 34 17 60 2368 29114
6 mpeg2dec block 62 13 66 2272 34510
7 mpeg2dec vector 16 4 26 1024 4366
8 FAST 14 4 15 704 3714
9 FR4TR 77 87 155 704 340697
10 det 12 5 13 7936 3772
23
Incremental Floorplanning Results
Optimal Approach 12 Overall Wirelength
Reduction 25 Phi-node Wirelength Reduction
Normalized Wirelength
Our Approach 6 Overall Wirelength Reduction 8
Phi-node Wirelength Reduction
avg
Benchmarks
24
Related Work

Hardware compilation projects using SSA
PDGSSA form UCSB
CASH CMU
SA-C UCR
Sea Cucumber BYU
Physically aware behavioral synthesis techniques
SA for scheduling, binding and floorplanning
Prabhakaran97
SA for binding and floorplanning Yung-Ming94
Scheduling, allocation and binding Dougherty00
Fasolt bus topology Knapp92
High level synthesis Tarafdar00
Incremental CAD
Problem overview/challenges Coudert00
Floorplanning Crenshaw99

25
Conclusions

Its been a long strange trip
SSA a nice IR for hardware compilation
Explicitly shows data flow
Useful for exploiting parallelism
Compiler techniques applied to hardware design
can reduce wirelength
They must be aware of physical information
They must use an incremental floorplanning

Layout Driven Data Communication Optimization for High Level Synthesis - PowerPoint PPT Presentation

Layout Driven Data Communication Optimization for High Level Synthesis

Semi-pruned insert -node at. IDF if variable live outside some basic block ... insert(R) into -options. foreach instruction i R. if( i is a destination of ... – PowerPoint PPT presentation