Compiling Finite Synchronous Kahn Networks to Efficient Reconfigurable Hardware PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: Compiling Finite Synchronous Kahn Networks to Efficient Reconfigurable Hardware


1
Compiling Finite Synchronous Kahn Networks to
Efficient Reconfigurable Hardware
Jean Baptiste Note Jean Vuillemin Ecole Normale
Supérieure - Paris
2
Plan
  • Goals
  • Compile FPGA design from software executable
    source code
  • Hardware bitwise compatible with software by
    construction
  • Performance no worse than twice that of
    hand-crafted design
  • Means
  • Exact range analysis of the source code
  • Automatic area/time trade-off to meet available
    IO bandwidth
  • Tight FPGA technology mapping
  • Results
  • Achieved on a dozen video data-flow algorithms
  • Exposed synthesis through source guidance
  • Half-toning used here as a show-case study

3
Half Toning
4
Random Diffusion
5
RD source co-routine
Input p
e pz(d) i egtgt8

m e gtgt1 a m127 c z629(a) d e1 a c
diffuseError
Output i
6
RD Range Analysis
Input p0..255
e pz(d) 0,255 i egtgt8 0,0
e pz(d) 0,255 0,510 i egtgt8 0,0
0,1
m e gtgt1 0,127 a m127 0,127 c
z629(a) d e1ac
m e gtgt1 0,127 0,255 a m127 0,127
0,127 c z629(a) 0,127 0,127 d
e1ac 0,255 0,255
m e gtgt1 0,127 a m127 0,127 c
z629(a) 0,127 d e1ac 0,255
0,255
Output i0,1
7
RD Bit Sizing
e pZ(d) 0,255 0,510 i egtgt8 0,0
0,1
e pZ(d) u9 i egtgt8 u1
m e gtgt1 0,127 0,255 a m127 0,127
0,127 c Z629(a) 0,127 0,127 d
e1ac 0,255 0,255
m e gtgt1 u8 a m127 u7 c Z629(a) u7 d
e1ac u8
u8
8
RD Bit Level
e0..8 p0..7z(d0..7)
e pz(d) u9 i egtgt8 u1 m e gtgt1 u8 a
m127 u7 c z629(a) u7 d e1ac u8
e0..8 p0..7z(e1..7) d0..7
e0e1..7z629(e1..7 ) i0 e8
i0 e8
m0..7 e1..8
a0..6 m0..6e1..7
c0..6 z629(a0..6 )
d0..7 e0a0..6c0..6
9
RD Synchronous Circuit
e pZ(rZ629(r)e1) // total error i
egtgt8 // drop ink r (egtgt1)127 // remaining
error
e0..8 p0..7z(e1..7) d0..7
e0e1..7z629(e1..7 ) i0 e8
10
RD Trading Space/Time
  • Bit-Serial
  • Bit serial pixel input p, each 8 cycles
  • 2 full-adders mux enable nand
  • 630x81 unit-delay registers
  • Bit-Parallel
  • 8b pixel input p
  • 14 full-adders, 8 registers
  • 630 x 7b line-delay in RAM.

11
Floyd Steinberg Error Diffusion
12
Floyd Steinberg
Random Diffusion
White Noise
Blue Noise
13
FS Source Code
Threshold Table

Drop Ink
Diffuse Error
14
FS SSA flow graph
15
FS Range Analysis
Naive interval analysis fails!
Manual Annotation e1 -7,15
16
FS Bit Level
17
Conclusions
  • Compiling circuits from C level spec is doable
  • Key components
  • Exact Range Analysis
  • Tight FPGA technology mapping
  • Automatically trade area/bandwidth
  • Source Guidance leads to efficient hardware
  • Successful over gt12 leading edge video
    algorithms
  • Over-sampling from PAL to HDTV
  • Tracking pixel movements
  • Video compression
  • Digital Half Toning
  • more

18
Demo
Write a Comment
User Comments (0)
About PowerShow.com