High-Level Synthesis: Creating Custom Circuits from High-Level Code - PowerPoint PPT Presentation

About This Presentation
Title:

High-Level Synthesis: Creating Custom Circuits from High-Level Code

Description:

High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida Existing FPGA Tool Flow Register-transfer (RT ... – PowerPoint PPT presentation

Number of Views:210
Avg rating:3.0/5.0
Slides: 53
Provided by: Ram173
Category:

less

Transcript and Presenter's Notes

Title: High-Level Synthesis: Creating Custom Circuits from High-Level Code


1
High-Level Synthesis Creating Custom
Circuits from High-Level Code
  • Greg Stitt
  • ECE Department
  • University of Florida

2
Existing FPGA Tool Flow
  • Register-transfer (RT) synthesis
  • Specify RT structure (muxes, registers, etc)
  • Allows precise specification
  • - But, time consuming, difficult, error prone

HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
3
Future FPGA Tool Flow?
C/C, Java, etc.
High-level Synthesis
HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
4
High-level Synthesis
  • Wouldnt it be nice to write high-level code?
  • Ratio of C to VHDL developers (100001 ?)
  • Easier to specify
  • Separates function from architecture
  • More portable
  • - Hardware potentially slower
  • Similar to assembly code era
  • Programmers could always beat compiler
  • But, no longer the case
  • Hopefully, high-level synthesis will catch up to
    manual effort

5
High-level Synthesis
  • More challenging than compilation
  • Compilation maps behavior into assembly
    instructions
  • Architecture is known to compiler
  • High-level synthesis creates a custom
    architecture to execute behavior
  • Huge hardware exploration space
  • Best solution may include microprocessors
  • Should handle any high-level code
  • Not all code appropriate for hardware

6
High-level Synthesis
  • First, consider how to manually convert
    high-level code into circuit
  • Steps
  • 1) Build FSM for controller
  • 2) Build datapath based on FSM

acc 0 for (i0 i lt 128 i) acc ai
7
Manual Example
  • Build a FSM (controller)
  • Decompose code into states

acc 0 for (i0 i lt 128 i) acc ai
acc0, i 0
if (i lt 128)
Done
load ai
acc ai
i
8
Manual Example
  • Build a datapath
  • Allocate resources for each state

acc0, i 0
if (i lt 128)
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
9
Manual Example
  • Build a datapath
  • Determine register inputs

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
10
Manual Example
  • Build a datapath
  • Add outputs

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
11
Manual Example
  • Build a datapath
  • Add control signals

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
12
Manual Example
  • Combine controllerdatapath

In from memory
Controller
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
1


lt

acc 0 for (i0 i lt 128 i) acc ai
Done
Memory Read
acc
Memory address
13
Manual Example
  • Alternatives
  • Use one adder (plus muxes)

In from memory
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
lt
MUX
MUX

acc
Memory address
14
Manual Example
  • Comparison with high-level synthesis
  • Determining when to perform each operation
  • gt Scheduling
  • Allocating resource for each operation
  • gt Resource allocation
  • Mapping operations onto resources
  • gt Binding

15
Another Example
  • Your turn

x0 for (i0 i lt 100 i) if (ai gt 0)
x else x -- ai
x //output x
  • Steps
  • 1) Build FSM (do not perform if conversion)
  • 2) Build datapath based on FSM

16
High-Level Synthesis
Could be C, C, Java, Perl, Python, SystemC,
ImpulseC, etc.
High-level Code
High-Level Synthesis
Custom Circuit
Usually a RT VHDL description, but could as low
level as a bit file
17
High-Level Synthesis
acc 0 for (i0 i lt 128 i) acc ai
High-Level Synthesis
18
Main Steps
High-level Code
Converts code to intermediate representation -
allows all following steps to use common format.
Front-end
Syntactic Analysis
Intermediate Representation
Optimization
Determines when each operation will execute
Scheduling
Back-end
Determines physical resources to be used
Resource Allocation
Maps operations onto physical resources
Binding
Controller Datapath
19
Syntactic Analysis
  • Definition Analysis of code to verify syntactic
    correctness
  • Converts code into intermediate representation
  • 2 steps
  • 1) Lexical analysis (Lexing)
  • 2) Parsing

High-level Code
Lexical Analysis
Syntactic Analysis
Parsing
Intermediate Representation
20
Lexical Analysis
  • Lexical analysis (lexing) breaks code into a
    series of defined tokens
  • Token defined language constructs

x 0 if (y lt z) x 1
Lexical Analysis
ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN,
ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1),
SEMICOLON
21
Lexing Tools
  • Define tokens using regular expressions - outputs
    C code that lexes input
  • Common tool is lex

/ braces and parentheses / "" YYPRINT
return LBRACE "" YYPRINT return RBRACE
"," YYPRINT return COMMA "" YYPRINT
return SEMICOLON "!" YYPRINT return
EXCLAMATION "" YYPRINT return LBRACKET
"" YYPRINT return RBRACKET "-"
YYPRINT return MINUS / integers 0-9
yylval.intVal atoi( yytext ) return INT
22
Parsing
  • Analysis of token sequence to determine correct
    grammatical structure
  • Languages defined by context-free grammar

Correct Programs
Grammar
x 0 y 1
x 0
Program Exp
if (a lt b) x 10
Exp Stmt SEMICOLON IF LPAREN Cond
RPAREN Exp Exp Exp
if (var1 ! var2) x 10
Cond ID Comp ID
x 0 if (y lt z) x 1
x 0 if (y lt z) x 1 y 5 t 1
Stmt ID ASSIGN INT
Comp LT NE
23
Parsing
Incorrect Programs
Grammar
x 3 5

Program Exp
Exp S SEMICOLON IF LPAREN Cond RPAREN
Exp Exp Exp
x 5
x 5
if (x5 gt y) x 2
Cond ID Comp ID
x y
S ID ASSIGN INT
Comp LT NE
24
Parsing Tools
  • Define grammar in special language
  • Automatically creates parser based on grammar
  • Popular tool is yacc - yet-another-compiler-comp
    iler

program functions 1 functions
function 1 functions function
1 function HEXNUMBER LABEL COLON
code 2
25
Intermediate Representation
  • Parser converts tokens to intermediate
    representation
  • Usually, an abstract syntax tree

Assign
x 0 if (y lt z) x 1 d 6
x
if
0
assign
cond
assign
y
z
x
lt
1
d
6
26
Intermediate Representation
  • Why use intermediate representation?
  • Easier to analyze/optimize than source code
  • Theoretically can be used for all languages
  • Makes synthesis back end language independent

C Code
Java
Perl
Syntactic Analysis
Syntactic Analysis
Syntactic Analysis
Intermediate Representation
Scheduling, resource allocation, binding,
independent of source language - sometimes
optimizations too
Back End
27
Intermediate Representation
  • Different Types
  • Abstract Syntax Tree
  • Control/Data Flow Graph (CDFG)
  • Sequencing Graph
  • Etc.
  • We will focus on CDFG
  • Combines control flow graph (CFG) and data flow
    graph (DFG)

28
Control flow graphs
  • CFG
  • Represents control flow dependencies of basic
    blocks
  • Basic block is section of code that always
    executes from beginning to end
  • I.e. no jumps into or out of block

acc0, i 0
acc 0 for (i0 i lt 128 i) acc ai
if (i lt 128)
Done
acc ai i
29
Control flow graphs
  • Your turn
  • Create a CFG for this code

i 0 while (j lt 10) if (x lt 5) y
2 else if (z lt 10) y 6
30
Data Flow Graphs
  • DFG
  • Represents data dependencies between operations

c
b
a
d


x ab y cd z x - y
-
y
z
x
31
Control/Data Flow Graph
  • Combines CFG and DFG
  • Maintains DFG for each node of CFG

acc 0 for (i0 i lt 128 i) acc ai
0
0
acc
i
acc0 i0
if (i lt 128)
acc
ai
i
1
Done
acc ai i


i
acc
32
High-Level Synthesis Optimization
33
Synthesis Optimizations
  • After creating CDFG, high-level synthesis
    optimizes graph
  • Goals
  • Reduce area
  • Improve latency
  • Increase parallelism
  • Reduce power/energy
  • 2 types
  • Data flow optimizations
  • Control flow optimizations

34
Data Flow Optimizations
  • Tree-height reduction
  • Generally made possible from commutativity,
    associativity, and distributivity

a
b
c
d
c
d
a
b






c
d
b
a
b
a
c
d






35
Data Flow Optimizations
  • Operator Strength Reduction
  • Replacing an expensive (strong) operation with
    a faster one
  • Common example replacing multiply/divide with
    shift

0 multiplications
1 multiplication
bi ai ltlt 3
bi ai 8
c b ltlt 2 a b c
a b 5
a b 13
c b ltlt 2 d b ltlt 3 a c d b
36
Data Flow Optimizations
  • Constant propagation
  • Statically evaluate expressions with constants

x 0 y x 15 z y 10
x 0 y 0 z 10
37
Data Flow Optimizations
  • Function Specialization
  • Create specialized code for common inputs
  • Treat common inputs as constants
  • If inputs not known statically, must include if
    statement for each call to specialized function

int f (int x) y x 15 return y
10
int f (int x) y x 15 return y
10
int f_opt () return 10
Treat frequent input as a constant
for (I0 I lt 1000 I) f(0)
for (I0 I lt 1000 I) f_opt(0)
38
Data Flow Optimizations
  • Common sub-expression elimination
  • If expression appears more than once, repetitions
    can be replaced

a x y . . . . . . . . . . . . b
c 25 x y
a x y . . . . . . . . . . . . b
c 25 a
x y already determined
39
Data Flow Optimizations
  • Dead code elimination
  • Remove code that is never executed
  • May seem like stupid code, but often comes from
    constant propagation or function specialization

int f (int x) if (x gt 0 ) a b 15
else a b / 4 return a
int f_opt () a b 15 return a
Specialized version for x gt 0 does not need else
branch - dead code
40
Data Flow Optimizations
  • Code motion (hoisting/sinking)
  • Avoid repeated computation

for (I0 I lt 100 I) z x y bi
ai z
z x y for (I0 I lt 100 I) bi
ai z
41
Control Flow Optimizations
  • Loop Unrolling
  • Replicate body of loop
  • May increase parallelism

for (i0 i lt 128 i) aI bI cI
for (i0 i lt 128 i2) aI bI
cI aI1 bI1 cI1
42
Control Flow Optimizations
  • Function Inlining
  • Replace function call with body of function
  • Common for both SW and HW
  • SW - Eliminates instructions/branches
  • HW - Eliminates unnecessary control states

for (i0 i lt 128 i) aI f( bI, cI
) . . . . int f (int a, int b) return a b
15
for (i0 i lt 128 i) aI bI cI
15
43
Control Flow Optimizations
  • Conditional Expansion
  • Replace if with logic expression
  • Execute if/else bodies in parallel

y ab If (a) x bd Else x bd
y ab x a(bd) abd
DeMicheli
Can be further optimized to
y ab x y d(ab)
44
Example
  • Optimize this

x 0 y a b if (x lt 15) z a b -
c else z x 12 output z 12
45
High-Level SynthesisScheduling
46
Scheduling
  • Scheduling assigns a start time to each operation
  • Start times must not violate dependencies in DFG
  • Start times must meet performance constraints
  • Alternatively, resource constraints

47
Examples
a
b
c
d
c
d
a
b

Cycle1
Cycle1
Cycle2



Cycle2

Cycle3

Cycle3
c
d
a
b
Cycle1



Cycle2
48
Scheduling Problems
  • Several types of scheduling problems
  • Usually some combination of performance and
    resource constraints
  • Problems
  • Unconstrained
  • Not very useful, every schedule is valid
  • Minimum latency
  • Latency constrained
  • Mininum-latency, resource constrained
  • i.e. Find the scheduling with the shortest
    latency, that uses less than a specified of
    resources
  • NP-Complete
  • Mininum-resource, latency constrained
  • i.e. find the schedule that meets the latency
    constraint (which may be anything), and uses the
    minimum of resources
  • NP-Complete

49
Minimum Latency Scheduling
  • ASAP (as soon as possible) algorithm
  • Find a node whose predecessors are scheduled
  • Schedule node one cycle later than max cycle of
    predecessor
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
-
lt
Cycle1


Cycle2

Cycle3
Minimum possible latency - 4 cycles

Cycle4

50
Minimum Latency Scheduling
  • ALAP (as late as possible) algorithm
  • Run ASAP, get minimum latency L
  • Find a node whose successors are scheduled
  • Schedule node one cycle before than min cycle of
    predecessor
  • Nodes with no successors scheduled to cycle L
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
-
lt
Cycle1
Cycle4


Cycle3
Cycle2

Cycle3



Cycle4
51
Minimum Latency Scheduling
  • ALAP
  • Has to run ASAP first, seems pointless
  • But, many heuristics need the mobility/slack of
    each operation
  • ASAP gives the earliest possible time for an
    operation
  • ALAP gives the latest possible time for an
    operation
  • Slack difference between earliest and latest
    possible schedule
  • Slack 0 implies operation has to be done in the
    current scheduled cycle
  • The larger the slack, the more options a
    heuristic has to schedule the operation

52
Latency-Constrained Scheduling
  • Instead of finding the minimum latency, find
    latency less than L
  • Solutions
  • Use ASAP, verify that minimum latency less than L
  • Use ALAP starting with cycle L instead of minimum
    latency (dont need ASAP)
Write a Comment
User Comments (0)
About PowerShow.com