Title: Compiling Finite Synchronous Kahn Networks to Efficient Reconfigurable Hardware
1Compiling Finite Synchronous Kahn Networks to
Efficient Reconfigurable Hardware
Jean Baptiste Note Jean Vuillemin Ecole Normale
Supérieure - Paris
2Plan
- Goals
- Compile FPGA design from software executable
source code - Hardware bitwise compatible with software by
construction - Performance no worse than twice that of
hand-crafted design
- Means
- Exact range analysis of the source code
- Automatic area/time trade-off to meet available
IO bandwidth - Tight FPGA technology mapping
- Results
- Achieved on a dozen video data-flow algorithms
- Exposed synthesis through source guidance
- Half-toning used here as a show-case study
3Half Toning
4Random Diffusion
5RD source co-routine
Input p
e pz(d) i egtgt8
m e gtgt1 a m127 c z629(a) d e1 a c
diffuseError
Output i
6RD Range Analysis
Input p0..255
e pz(d) 0,255 i egtgt8 0,0
e pz(d) 0,255 0,510 i egtgt8 0,0
0,1
m e gtgt1 0,127 a m127 0,127 c
z629(a) d e1ac
m e gtgt1 0,127 0,255 a m127 0,127
0,127 c z629(a) 0,127 0,127 d
e1ac 0,255 0,255
m e gtgt1 0,127 a m127 0,127 c
z629(a) 0,127 d e1ac 0,255
0,255
Output i0,1
7RD Bit Sizing
e pZ(d) 0,255 0,510 i egtgt8 0,0
0,1
e pZ(d) u9 i egtgt8 u1
m e gtgt1 0,127 0,255 a m127 0,127
0,127 c Z629(a) 0,127 0,127 d
e1ac 0,255 0,255
m e gtgt1 u8 a m127 u7 c Z629(a) u7 d
e1ac u8
u8
8RD Bit Level
e0..8 p0..7z(d0..7)
e pz(d) u9 i egtgt8 u1 m e gtgt1 u8 a
m127 u7 c z629(a) u7 d e1ac u8
e0..8 p0..7z(e1..7) d0..7
e0e1..7z629(e1..7 ) i0 e8
i0 e8
m0..7 e1..8
a0..6 m0..6e1..7
c0..6 z629(a0..6 )
d0..7 e0a0..6c0..6
9RD Synchronous Circuit
e pZ(rZ629(r)e1) // total error i
egtgt8 // drop ink r (egtgt1)127 // remaining
error
e0..8 p0..7z(e1..7) d0..7
e0e1..7z629(e1..7 ) i0 e8
10RD Trading Space/Time
- Bit-Serial
- Bit serial pixel input p, each 8 cycles
- 2 full-adders mux enable nand
- 630x81 unit-delay registers
- Bit-Parallel
- 8b pixel input p
- 14 full-adders, 8 registers
- 630 x 7b line-delay in RAM.
11Floyd Steinberg Error Diffusion
12Floyd Steinberg
Random Diffusion
White Noise
Blue Noise
13FS Source Code
Threshold Table
Drop Ink
Diffuse Error
14FS SSA flow graph
15FS Range Analysis
Naive interval analysis fails!
Manual Annotation e1 -7,15
16FS Bit Level
17Conclusions
- Compiling circuits from C level spec is doable
- Key components
- Exact Range Analysis
- Tight FPGA technology mapping
- Automatically trade area/bandwidth
- Source Guidance leads to efficient hardware
- Successful over gt12 leading edge video
algorithms - Over-sampling from PAL to HDTV
- Tracking pixel movements
- Video compression
- Digital Half Toning
- more
18Demo