HighLevel Synthesis: Creating Custom Circuits from HighLevel Code - PowerPoint PPT Presentation

1 / 121
About This Presentation
Title:

HighLevel Synthesis: Creating Custom Circuits from HighLevel Code

Description:

High-level synthesis creates a custom architecture to execute behavior ... Custom Circuit. High-Level Synthesis. Could be C, C , Java, Perl, Python, SystemC, ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 122
Provided by: Ram173
Category:

less

Transcript and Presenter's Notes

Title: HighLevel Synthesis: Creating Custom Circuits from HighLevel Code


1
High-Level Synthesis Creating Custom
Circuits from High-Level Code
  • Greg Stitt
  • ECE Department
  • University of Florida

2
Existing FPGA Tool Flow
  • Register-transfer (RT) synthesis
  • Specify RT structure (muxes, registers, etc)
  • Allows precise specification
  • - But, time consuming, difficult, error prone

HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
3
Future FPGA Tool Flow?
C/C, Java, etc.
High-level Synthesis
HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
4
High-level Synthesis
  • Wouldnt it be nice to write high-level code?
  • Ratio of C to VHDL developers (100001 ?)
  • Easier to specify
  • Separates function from architecture
  • More portable
  • - Hardware potentially slower
  • Similar to assembly code era
  • Programmers could always beat compiler
  • But, no longer the case
  • Hopefully, high-level synthesis will catch up to
    manual effort

5
High-level Synthesis
  • More challenging than compilation
  • Compilation maps behavior into assembly
    instructions
  • Architecture is known to compiler
  • High-level synthesis creates a custom
    architecture to execute behavior
  • Huge hardware exploration space
  • Best solution may include microprocessors
  • Should handle any high-level code
  • Not all code appropriate for hardware

6
High-level Synthesis
  • First, consider how to manually convert
    high-level code into circuit
  • Steps
  • 1) Build FSM for controller
  • 2) Build datapath based on FSM

acc 0 for (i0 i lt 128 i) acc ai
7
Manual Example
  • Build a FSM (controller)
  • Decompose code into states

acc 0 for (i0 i lt 128 i) acc ai
acc0, i 0
if (i lt 128)
Done
load ai
acc ai
i
8
Manual Example
  • Build a datapath
  • Allocate resources for each state

acc0, i 0
if (i lt 128)
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
9
Manual Example
  • Build a datapath
  • Determine register inputs

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
10
Manual Example
  • Build a datapath
  • Add outputs

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
11
Manual Example
  • Build a datapath
  • Add control signals

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
12
Manual Example
  • Combine controllerdatapath

In from memory
Controller
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
1


lt

acc 0 for (i0 i lt 128 i) acc ai
Done
Memory Read
acc
Memory address
13
Manual Example
  • Alternatives
  • Use one adder (plus muxes)

In from memory
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
lt
MUX
MUX

acc
Memory address
14
Manual Example
  • Comparison with high-level synthesis
  • Determining when to perform each operation
  • gt Scheduling
  • Allocating resource for each operation
  • gt Resource allocation
  • Mapping operations onto resources
  • gt Binding

15
Another Example
  • Your turn

x0 for (i0 i lt 100 i) if (ai gt 0)
x else x -- ai
x //output x
  • Steps
  • 1) Build FSM (do not perform if conversion)
  • 2) Build datapath based on FSM

16
High-Level Synthesis
Could be C, C, Java, Perl, Python, SystemC,
ImpulseC, etc.
High-level Code
High-Level Synthesis
Custom Circuit
Usually a RT VHDL description, but could as low
level as a bit file
17
High-Level Synthesis
acc 0 for (i0 i lt 128 i) acc ai
High-Level Synthesis
18
Main Steps
High-level Code
Converts code to intermediate representation -
allows all following steps to use language
independent format.
Front-end
Syntactic Analysis
Intermediate Representation
Optimization
Determines when each operation will execute, and
resources used
Scheduling/Resource Allocation
Back-end
Maps operations onto physical resources
Binding/Resource Sharing
Controller Datapath
19
Syntactic Analysis
  • Definition Analysis of code to verify syntactic
    correctness
  • Converts code into intermediate representation
  • 2 steps
  • 1) Lexical analysis (Lexing)
  • 2) Parsing

High-level Code
Lexical Analysis
Syntactic Analysis
Parsing
Intermediate Representation
20
Lexical Analysis
  • Lexical analysis (lexing) breaks code into a
    series of defined tokens
  • Token defined language constructs

x 0 if (y lt z) x 1
Lexical Analysis
ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN,
ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1),
SEMICOLON
21
Lexing Tools
  • Define tokens using regular expressions - outputs
    C code that lexes input
  • Common tool is lex

/ braces and parentheses / "" YYPRINT
return LBRACE "" YYPRINT return RBRACE
"," YYPRINT return COMMA "" YYPRINT
return SEMICOLON "!" YYPRINT return
EXCLAMATION "" YYPRINT return LBRACKET
"" YYPRINT return RBRACKET "-"
YYPRINT return MINUS / integers 0-9
yylval.intVal atoi( yytext ) return INT
22
Parsing
  • Analysis of token sequence to determine correct
    grammatical structure
  • Languages defined by context-free grammar

Correct Programs
Grammar
x 0 y 1
x 0
Program Exp
if (a lt b) x 10
Exp Stmt SEMICOLON IF LPAREN Cond
RPAREN Exp Exp Exp
if (var1 ! var2) x 10
Cond ID Comp ID
x 0 if (y lt z) x 1
x 0 if (y lt z) x 1 y 5 t 1
Stmt ID ASSIGN INT
Comp LT NE
23
Parsing
Incorrect Programs
Grammar
x 3 5

Program Exp
Exp S SEMICOLON IF LPAREN Cond RPAREN
Exp Exp Exp
x 5
x 5
if (x5 gt y) x 2
Cond ID Comp ID
x y
S ID ASSIGN INT
Comp LT NE
24
Parsing Tools
  • Define grammar in special language
  • Automatically creates parser based on grammar
  • Popular tool is yacc - yet-another-compiler-comp
    iler

program functions 1 functions
function 1 functions function
1 function HEXNUMBER LABEL COLON
code 2
25
Intermediate Representation
  • Parser converts tokens to intermediate
    representation
  • Usually, an abstract syntax tree

Assign
x 0 if (y lt z) x 1 d 6
x
if
0
assign
cond
assign
y
z
x
lt
1
d
6
26
Intermediate Representation
  • Why use intermediate representation?
  • Easier to analyze/optimize than source code
  • Theoretically can be used for all languages
  • Makes synthesis back end language independent

C Code
Java
Perl
Syntactic Analysis
Syntactic Analysis
Syntactic Analysis
Intermediate Representation
Scheduling, resource allocation, binding,
independent of source language - sometimes
optimizations too
Back End
27
Intermediate Representation
  • Different Types
  • Abstract Syntax Tree
  • Control/Data Flow Graph (CDFG)
  • Sequencing Graph
  • Etc.
  • We will focus on CDFG
  • Combines control flow graph (CFG) and data flow
    graph (DFG)

28
Control flow graphs
  • CFG
  • Represents control flow dependencies of basic
    blocks
  • Basic block is section of code that always
    executes from beginning to end
  • I.e. no jumps into or out of block

acc0, i 0
acc 0 for (i0 i lt 128 i) acc ai
if (i lt 128)
Done
acc ai i
29
Control flow graphs
  • Your turn
  • Create a CFG for this code

i 0 while (j lt 10) if (x lt 5) y
2 else if (z lt 10) y 6
30
Data Flow Graphs
  • DFG
  • Represents data dependencies between operations

c
b
a
d


x ab y cd z x - y
-
y
z
x
31
Control/Data Flow Graph
  • Combines CFG and DFG
  • Maintains DFG for each node of CFG

acc 0 for (i0 i lt 128 i) acc ai
0
0
acc
i
acc0 i0
if (i lt 128)
acc
ai
i
1
Done
acc ai i


i
acc
32
High-Level Synthesis Optimization
33
Synthesis Optimizations
  • After creating CDFG, high-level synthesis
    optimizes graph
  • Goals
  • Reduce area
  • Improve latency
  • Increase parallelism
  • Reduce power/energy
  • 2 types
  • Data flow optimizations
  • Control flow optimizations

34
Data Flow Optimizations
  • Tree-height reduction
  • Generally made possible from commutativity,
    associativity, and distributivity

a
b
c
d
c
d
a
b






c
d
b
a
b
a
c
d






35
Data Flow Optimizations
  • Operator Strength Reduction
  • Replacing an expensive (strong) operation with
    a faster one
  • Common example replacing multiply/divide with
    shift

0 multiplications
1 multiplication
bi ai ltlt 3
bi ai 8
c b ltlt 2 a b c
a b 5
a b 13
c b ltlt 2 d b ltlt 3 a c d b
36
Data Flow Optimizations
  • Constant propagation
  • Statically evaluate expressions with constants

x 0 y x 15 z y 10
x 0 y 0 z 10
37
Data Flow Optimizations
  • Function Specialization
  • Create specialized code for common inputs
  • Treat common inputs as constants
  • If inputs not known statically, must include if
    statement for each call to specialized function

int f (int x) y x 15 return y
10
int f (int x) y x 15 return y
10
int f_opt () return 10
Treat frequent input as a constant
for (I0 I lt 1000 I) f(0)
for (I0 I lt 1000 I) f_opt(0)
38
Data Flow Optimizations
  • Common sub-expression elimination
  • If expression appears more than once, repetitions
    can be replaced

a x y . . . . . . . . . . . . b
c 25 x y
a x y . . . . . . . . . . . . b
c 25 a
x y already determined
39
Data Flow Optimizations
  • Dead code elimination
  • Remove code that is never executed
  • May seem like stupid code, but often comes from
    constant propagation or function specialization

int f (int x) if (x gt 0 ) a b 15
else a b / 4 return a
int f_opt () a b 15 return a
Specialized version for x gt 0 does not need else
branch - dead code
40
Data Flow Optimizations
  • Code motion (hoisting/sinking)
  • Avoid repeated computation

for (I0 I lt 100 I) z x y bi
ai z
z x y for (I0 I lt 100 I) bi
ai z
41
Control Flow Optimizations
  • Loop Unrolling
  • Replicate body of loop
  • May increase parallelism

for (i0 i lt 128 i) ai bi ci
for (i0 i lt 128 i2) ai bi
ci ai1 bi1 ci1
42
Control Flow Optimizations
  • Function Inlining
  • Replace function call with body of function
  • Common for both SW and HW
  • SW - Eliminates function call instructions
  • HW - Eliminates unnecessary control states

for (i0 i lt 128 i) ai f( bi, ci
) . . . . int f (int a, int b) return a b
15
for (i0 i lt 128 i) ai bi ci
15
43
Control Flow Optimizations
  • Conditional Expansion
  • Replace if with logic expression
  • Execute if/else bodies in parallel

y ab if (a) x bd else x bd
y ab x a(bd) abd
DeMicheli
Can be further optimized to
y ab x y d(ab)
44
Example
  • Optimize this

x 0 y a b if (x lt 15) z a b -
c else z x 12 output z 12
45
High-Level SynthesisScheduling/Resource
Allocation
46
Scheduling
  • Scheduling assigns a start time to each operation
    in DFG
  • Start times must not violate dependencies in DFG
  • Start times must meet performance constraints
  • Alternatively, resource constraints
  • Performed on the DFG of each CFG node
  • gt Cant execute multiple CFG nodes in parallel

47
Examples
a
b
c
d
c
d
a
b

Cycle1
Cycle1
Cycle2



Cycle2

Cycle3

Cycle3
c
d
a
b
Cycle1



Cycle2
48
Scheduling Problems
  • Several types of scheduling problems
  • Usually some combination of performance and
    resource constraints
  • Problems
  • Unconstrained
  • Not very useful, every schedule is valid
  • Minimum latency
  • Latency constrained
  • Mininum-latency, resource constrained
  • i.e. find the schedule with the shortest latency,
    that uses less than a specified of resources
  • NP-Complete
  • Mininum-resource, latency constrained
  • i.e. find the schedule that meets the latency
    constraint (which may be anything), and uses the
    minimum of resources
  • NP-Complete

49
Minimum Latency Scheduling
  • ASAP (as soon as possible) algorithm
  • Find a candidate node
  • Candidate is a node whose predecessors have been
    scheduled and completed (or has no predecessors)
  • Schedule node one cycle later than max cycle of
    predecessor
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
-
lt
Cycle1


Cycle2

Cycle3


Cycle4
Minimum possible latency - 4 cycles
50
Minimum Latency Scheduling
  • ALAP (as late as possible) algorithm
  • Run ASAP, get minimum latency L
  • Find a candidate
  • Candidate is node whose successors are scheduled
    (or has none)
  • Schedule node one cycle before min cycle of
    predecessor
  • Nodes with no successors scheduled to cycle L
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
-
lt
Cycle1
Cycle4


Cycle3
Cycle2

Cycle3


Cycle4
L 4 cycles
51
Minimum Latency Scheduling
  • ALAP (as late as possible) algorithm
  • Run ASAP, get minimum latency L
  • Find a candidate
  • Candidate is node whose successors are scheduled
    (or has none)
  • Schedule node one cycle before min cycle of
    predecessor
  • Nodes with no successors scheduled to cycle L
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
Cycle1


Cycle2

Cycle3
-

lt

Cycle4
L 4 cycles
52
Minimum Latency Scheduling
  • ALAP
  • Has to run ASAP first, seems pointless
  • But, many heuristics need the mobility/slack of
    each operation
  • ASAP gives the earliest possible time for an
    operation
  • ALAP gives the latest possible time for an
    operation
  • Slack difference between earliest and latest
    possible schedule
  • Slack 0 implies operation has to be done in the
    current scheduled cycle
  • The larger the slack, the more options a
    heuristic has to schedule the operation

53
Latency-Constrained Scheduling
  • Instead of finding the minimum latency, find
    latency less than L
  • Solutions
  • Use ASAP, verify that minimum latency less than L
  • Use ALAP starting with cycle L instead of minimum
    latency (dont need ASAP)

54
Scheduling with Resource Constraints
  • Schedule must use less than specified number of
    resources

Constraints 1 ALU (/-), 1 Multiplier
c
d
a
e
f
b
g
Cycle1
-


Cycle2

Cycle3


Cycle4

Cycle5
55
Scheduling with Resource Constraints
  • Schedule must use less than specified number of
    resources

Constraints 2 ALU (/-), 1 Multiplier
c
d
a
e
f
b
g
Cycle1
-



Cycle2


Cycle3

Cycle4
56
Mininum-Latency, Resource-Constrained Scheduling
  • Definition Given resource constraints, find
    schedule that has the minimum latency
  • Example

Constraints 1 ALU (/-), 1 Multiplier
c
d
e
a
b
f
g
Cycle1
-


Cycle2
Cycle4
Cycle3


Cycle5

Cycle6
57
Mininum-Latency, Resource-Constrained Scheduling
  • Definition Given resource constraints, find
    schedule that has the minimum latency
  • Example

Constraints 1 ALU (/-), 1 Multiplier
c
d
e
a
b
f
g
Cycle1
-


Cycle2
Cycle3
Cycle4



Cycle5
Different schedules may use same resources, but
have different latencies
58
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Assumes one type of resource
  • Basic Idea
  • Input graph, of resources r
  • 1) Label each node by max distance from output
  • i.e. Use path length as priority
  • 2) Determine C, the set of scheduling candidates
  • Candidate if either no predecessors, or
    predecessors scheduled
  • 3) From C, schedule up to r nodes to current
    cycle, using label as priority
  • 4) Increment current cycle, repeat from 2) until
    all nodes scheduled

59
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Example

c
d
a
e
f
j
b
g
k
-
-






r 3
60
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 1 - Label each node by max distance from
    output
  • i.e. use path length as priority

c
d
a
e
f
j
b
g
k
3
1
4
4
2
3
2
1
r 3
61
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 2 - Determine C, the set of scheduling
    candidates

c
d
a
e
f
j
b
g
k
C
3
1
4
4
2
3
2
Cycle 1
1
r 3
62
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 3 - From C, schedule up to r nodes to
    current cycle, using label as priority

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
Not scheduled due to lower priority
2
Cycle 1
1
r 3
63
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 2

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
C
2
Cycle 2
1
r 3
64
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 3

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
Cycle2
2
Cycle 2
1
r 3
65
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Skipping to finish

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
Cycle2
2
Cycle3
Cycle4
1
r 3
66
Mininum-Latency, Resource-Constrained Scheduling
  • Hus is simplified problem
  • Common Extensions
  • Multiple resource types
  • Multi-cycle operation

c
d
a
b
Cycle1


-
Cycle2
/
67
Mininum-Latency, Resource-Constrained Scheduling
  • List Scheduling - (minimum latency,
    resource-constrained version)
  • Extension for multiple resource types
  • Basic Idea - Hus algorithm for each resource
    type
  • Input graph, set of constraints R for each
    resource type
  • 1) Label nodes based on max distance to output
  • 2) For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors )
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t
  • 3) Increment cycle, repeat from 2) until all
    nodes scheduled

68
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • Step 1 - Label nodes based on max distance to
    output (not shown, so you can see operations)
  • nodes given IDs for illustration purposes

2 ALUs (/-), 2 Multipliers
c
d
a
e
f
j
b
g
k

4
-
3
2


1

6

5

7
-
8
69
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
1 2,3,4 1

4
-
3
2


1

6

5

7
-
8
70
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
1 2,3,4 1

4
-
3
2


1
Cycle1

6
Candidate, but not scheduled due to low priority

5

7
-
8
71
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
  • 2,3,4 1
  • 5,6 4 2


4
-
3
2


1
Cycle1

6

5

7
-
8
72
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
  • 2,3,4 1
  • 5,6 4 2


4
-
3
2


1
Cycle1

6

5
Cycle2

7
-
8
73
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
  • 2,3,4 1
  • 5,6 4 2
  • 7 3


4
-
3
2


1
Cycle1

6

5
Cycle2

7
-
8
74
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - (minimum latency)
  • Final schedule
  • Note - ASAP would require more resources
  • ALAP wouldnt but in general, it would

2 ALUs (/-), 2 Multipliers
c
d
a
e
f
j
b
g
k

Cycle1
3
2


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
75
Mininum-Latency, Resource-Constrained Scheduling
  • Extension for multicycle operations
  • Same idea (differences shown in red)
  • Input graph, set of constraints R for each
    resource type
  • 1) Label nodes based on max cycle latency to
    output
  • 2) For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled and completed
    predecessors)
  • 4) Schedule up to (Rt - nt) operations from C
    based on priority, one cycle after predecessor
  • Rt is the constraint on resource type t
  • nt is the number of resource t in use from
    previous cycles
  • Repeat from 2) until all nodes scheduled

76
Mininum-Latency, Resource-Constrained Scheduling
  • Example

2 ALUs (/-), 2 Multipliers
c
d
e
a
b
f
j
g
k

Cycle1
3
2


1
Cycle2

4
6

Cycle3
5
Cycle4

Cycle5
Cycle6

7
-
Cycle7
8
77
List Scheduling (Min Latency)
  • Your turn (2 ALUs, 1 Mult)
  • Steps (will be on test)
  • 1) Label nodes with priority
  • 2) Update candidate list for each cycle
  • 3) Redraw graph to show schedule

6

-

5

3
2


4
1

8

7
-

10
9
-
11
78
List Scheduling (Min Latency)
  • Your turn (2 ALUs, 1 Mult, Mults take 2 cycles)

c
d
a
e
f
b
g


3
2


4
1

5

7
79
Minimum-Resource, Latency-Constrained
  • Note that if no resource constraints given,
    schedule determines number of required resources
  • Max of each resource type used in a single cycle

c
d
a
e
f
b
g
3 ALUs
Cycle1
-


2 Mults

Cycle2


Cycle3

Cycle4
80
Minimum-Resource, Latency-Constrained
  • Minimum-Resource Latency-Constrained Scheduling
  • For all schedules that have latency less than the
    constraint, find the one that uses the fewest
    resources

Latency Constraint lt 4
Latency Constraint lt 4
c
d
e
a
b
f
g
c
d
e
a
b
f
g
-
Cycle1


-
Cycle1



Cycle2


Cycle2


Cycle3
Cycle3


Cycle4

Cycle4
2 ALUs, 1 Mult
3 ALUs, 2 Mult
81
Minimum-Resource, Latency-Constrained
  • List scheduling (Minimum resource version)
  • Basic Idea
  • 1) Compute latest start times for each op using
    ALAP with specified latency constraint
  • Latest start times must include multicycle
    operations
  • 2) For each resource type
  • 3) Determine candidate nodes
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle
  • 5) Schedule ops with 0 slack
  • Update required number of resources (assume 1 of
    each to start with)
  • 6) Schedule ops that require no extra resources
  • 7) Repeat from 2) until all nodes scheduled

82
Minimum-Resource, Latency-Constrained
  • 1) Find ALAP schedule

c
d
e
a
b
f
g
j
k

4
-
2
3
Last Possible Cycle


1

6

5
LPC
Node
-
7
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

Latency Constraint 3 cycles
c
d
e
a
b
f
g
j
k
Cycle1
2

3


1
Cycle2

6

5
4
Cycle3
-
-
7
Defines last possible cycle for each operation
83
Minimum-Resource, Latency-Constrained
  • 2) For each resource type
  • 3) Determine candidate nodes C
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle

Cycle
LPC
Slack
Node
Candidates 1,2,3,4
0 0 0 2
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

Initial Resources 1 Mult, 1 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Cycle 1
84
Minimum-Resource, Latency-Constrained
  • 5)Schedule ops with 0 slack
  • Update required number of resources
  • 6) Schedule ops that require no extra resources

Slack
Cycle
LPC
Node
Candidates 1,2,3,4
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

0 1 0 1 0 1 2
X
Resources 1 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
4 requires 1 more ALU - not scheduled
-
7
Cycle 1
85
Minimum-Resource, Latency-Constrained
  • 2)For each resource type
  • 3) Determine candidate nodes C
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle

Slack
Cycle
LPC
Node
Candidates 4,5,6
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1 1 0 0

Resources 1 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Cycle 2
86
Minimum-Resource, Latency-Constrained
  • 5)Schedule ops with 0 slack
  • Update required number of resources
  • 6) Schedule ops that require no extra resources

Slack
Cycle
LPC
Node
Candidates 4,5,6
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1
1 2 0 2 0 2
Resources 2 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Already 1 ALU - 4 can be scheduled
Cycle 2
87
Minimum-Resource, Latency-Constrained
  • 2)For each resource type
  • 3) Determine candidate nodes C
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle

Slack
Cycle
LPC
Node
Candidates 7
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1
2 2 2 0
Resources 2 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Cycle 3
88
Minimum-Resource, Latency-Constrained
  • Final Schedule

Required Resources 2 Mult, 2 ALU
Slack
Cycle
LPC
Node
c
d
e
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1
2 2 2 3
a
b
f
g
j
k
Cycle1

2
3


1
4
Cycle2

-
6

5
Cycle3
-
7
89
Other extensions
  • Chaining
  • Multiple operations in a single cycle
  • Pipelining
  • Input DFG, data delivery rate
  • For fully pipelined circuit, must have one
    resource per operation (remember systolic arrays)

c
d
a
b
e
f
Multiple adds may be faster than 1 divide -
perform adds in one cycle


/

-
90
Summary
  • Scheduling assigns each operation in a DFG a
    start time
  • Done for each DFG in the CDFG
  • Different Types
  • Minimum Latency
  • ASAP, ALAP
  • Latency-constrained
  • ASAP, ALAP
  • Minimum-latency, resource-constrained
  • Hus Algorithm
  • List Scheduling
  • Minimum-resource, latency-constrained
  • List Scheduling

91
High-level Synthesis Binding/Resource Sharing
92
Binding
  • During scheduling, we determined
  • When ops will execute
  • How many resources are needed
  • We still need to decide which ops execute on
    which resources
  • gt Binding
  • If multiple ops use the same resource
  • gtResource Sharing

93
Binding
  • Basic Idea - Map operations onto resources such
    that operations in same cycle dont use same
    resource

2 ALUs (/-), 2 Multipliers

Cycle1
2
3


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALU2
Mult2
Mult1
ALU1
94
Binding
  • Many possibilities
  • Bad binding may increase resources, require huge
    steering logic, reduce clock, etc.

2 ALUs (/-), 2 Multipliers

Cycle1
2
3


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALU2
Mult2
Mult1
ALU1
95
Binding
  • Cant do this
  • 1 resource cant perform multiple ops
    simultaneously!

2 ALUs (/-), 2 Multipliers

Cycle1
2
3


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
96
Binding
  • How to automate?
  • More graph theory
  • Compatibility Graph
  • Each node is an operation
  • Edges represent compatible operations
  • Compatible - if two ops can share a resource
  • I.e. Ops that use same type of resource (ALU,
    etc.) and are scheduled to different cycles

97
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5 and 6 not compatible (same cycle)
5
7
4
2 and 3 not compatible (same cycle)
3
98
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
Note - Fully connected subgraphs can share a
resource (all involved nodes are compatible)
7
4
3
99
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
Note - Fully connected subgraphs can share a
resource (all involved nodes are compatible)
7
4
3
100
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
Note - Fully connected subgraphs can share a
resource (all involved nodes are compatible)
7
4
3
101
Compatibility Graph
  • Binding Find minimum number of fully connected
    subgraphs that cover entire graph
  • Well-known problem Clique partitioning
    (NP-complete)
  • Cliques 2,8,7,4,3,1,5,6
  • ALU1 executes 2,8,7,4
  • ALU2 executes 3
  • MULT1 executes 1,5
  • MULT2 executes 6

1
6
2
8
5
7
4
3
102
Compatibility Graph
  • Final Binding


Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
7
4
3
103
Compatibility Graph
  • Alternative Final Binding


Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
7
4
3
104
Translation to Datapath
a
b
c
d
e
f
g
h
i

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
  • Add resources and registers
  • Add mux for each input
  • Add input to left mux for each left input in DFG
  • Do same for right mux
  • If only 1 input, remove mux

a
f
e
g
b
d
e
i
h
c
Mux
Mux
Mux
Mux
Mult(1,5)
Mult(6)
ALU(2,7,8,4)
ALU(3)
Reg
Reg
Reg
Reg
105
Left Edge Algorithm
  • Alternative to clique partitioning
  • Take scheduled DFG, rotate it 90 degrees

2 ALUs (/-), 2 Multipliers
106
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
107
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
108
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
109
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
110
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
111
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
112
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
113
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
114
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  • Initialize right_edge to 0
  • Find a node N whose left edge is gt right_edge
  • Bind N to a particular resource
  • Update right_edge to the right edge of N
  • Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  • Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2
Write a Comment
User Comments (0)
About PowerShow.com