Title: The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches
1The Design of Application Specific Integrated
Circuits with High Level Synthesis Approaches
Shiann-Rong Kuang (???) Assistant
ProfessorDept. of Computer Science and
EngineeringNational Sun Yat-Sen University
2Outlines
- Introduction
- Novel High Level Synthesis Approaches
- Integrated Data Path Synthesis Approach
- Pipelined Control Path Synthesis Approach
- Dynamic Pipelining Approach
- ASICs design
- Binary Arithmetic Coder
- Low-Error Fixed-Width Multipliers
- Fuzzy Color Corrector
- Future Work
3Introduction
- High level synthesis
- Behavioral description ? register transfer level
description - Data path synthesis and control path synthesis
FSM
t1a-b t2ct1 t3e-f xd-t2 yt1t3
e t2 y
a t1 x
e t3
b
d
f
_
4Integrated Data Path Synthesis Approach
- Data Path Synthesis
- module selection, scheduling, and allocation
highly interdependent - separately solve them ? the best designs may not
be explored - Proposed Data Path Synthesis Approach
- combine module selection, scheduling, and
allocation - general module selection model
- module types with different attributes (delay,
area, ) - a mixed-vertex compatibility graph model
- solve it globally using partial clique
partitioning
5a
e
c
b
d
f
Clock cycle100ns, Latency5, and performance
constraint500ns
-1
-4
t1
t3
2
t2
5
-3
y
x
circuit 1 2
module cost 340 380
MUX cost 200 80
wire cost 1200 1100
Register cost 900 900
Total cost 2640 2460
6- Find all feasible Assignments
Initial MCG
A131
A130
A132
A141
A140
A442
A430
A433
A432
A431
A441
V130, V20
A440
A332
A334
A343
A342
A333
A211
A212
A213
A221
A222
A450
A511
A521
A512
A513
A514
A522
A523
7MCG after iteration 1
MCG after iteration 2
8Final MCG
MCG after iteration 3
9Integrated Data Path Synthesis Approach
10Integrated Data Path Synthesis Approach
11Integrated Data Path Synthesis Approach
12Pipelined Control Path Synthesis Approach
- Main Idea of Pipelining Control Path
13Pipelined Control Path Synthesis Approach
- Proposed Control Path Synthesis Approach
- A problem may violate the control dependency
- Modify the original BSTG by inserting no
operation states - Theorem
- A BSTG satisfies all control dependencies if
the distance Dij of states in each
produce-consume state pair ltSi, Sjgtc satisfies
one of the following conditions - Condition 1 if Sj is not a branch state, then
Dij ? k. - Condition 2 if Sj is a branch state, then Dij
? 2k-1. - Nij the minimal number of NOOPs needed to
insert between ltSi, Sjgtc - Nij 2k-Dij-1, if Sj is a branch state
- Nij k-Dij, otherwise.
- Minimize the number of NOOPs using ILP formulation
14SCDFG
15(No Transcript)
16(No Transcript)
17Dynamic Pipelining Approach
- Pipelining
- In most of existing pipelining techniques,
latency is fixed or has some fixed values - In some loops of ASICs, variant loop execution
length and time-relative data dependencies
between the different iterations make them to be
pipelined inefficiently or impossibly - Dynamic pipelining
- A new loop scheduling approach to
- pipeline the loop using variant latencies
- Controller consists of two interactive
- finite state machines
while(c1) ? ? ? while(c2) ? ? ?
? ? ?
18Dynamic Pipelining Approach
19An Example of Dynamic Pipelining
j1 while (Ngtj) / N is the number of data
which needs to be sorted / ij-1 tempaj w
hile (templtai i ? 0) ai1ai ii-1
ai1temp j
20 S1 j1 .............................
..... o1 S2 O_loop if(N j) goto End_O
........... o2 ij-1 ...........
..................... o3 r_addj
............ o4 S3 j ...
...................... o5 tempar_add
......... o6 S4 I_loop r_addi
.................. o7 S5 dataa
r_add ................... o8 S6 w
_addi1 .............. o9
if (!(templtdata i ? 0)) goto End_I
. o10 S7 aw_adddata ..............
.. o11 ii-1 gotoI_loop
....... o12 S8 End_I aw_addtemp
gotoO_loop ........ o13 End_O
21BSTGo
new PBSTGi
original PBSTGi
22new BSTGo
L3
unwind the loop body four times
23final PBSTGo
final PBSTGi
24inner controller
done
ci
Eq. (3.4)
combinational logic
Control signals
from datapath
to datapath
state registers
start
co
outer controller
Mux
combinational logic
Eq. (3.3)
state registers
run
Eq. (3.5)
25An execution example
iteration
i
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
6
5
2
3
4
5
6
7
4
5
6
7
4
5
6
7
4
8
i
1
i
2
latency7
latency3
latency5
latency3
PS2
PS1
PS1
PS2
PS1
PS1
PS2
PS1
PS1
Nop
Nop
PS1
PS2
PS1
PS2
PS1
PS2
PS1
PS2
PS1
inner
Nop
PS1
PS1
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
Nop
Nop
Nop
Nop
Nop
Nop
outer
PS1
PS2
PS2
S7 S5
S6 S4
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S6 S4
S6 S4
S7 S5
S6 S4
S6 S4
state(i)
S6 S4
N3 S5 S2
S8 N1 S3
N3 S5 S2
S8 N1 S3
N3 S5 S2
N3 S5 S2
S8 N1 S3
N3 S5 S2
N3 S5 S2
S8 N1 S3
S8 N1 S3
S8 N1 S3
state(o)
N2 S4
N2 S4
N2 S4
N2 S4
N2 S4
done
1
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
1
0
0
1
1
0
0
start
1
0
0
1
1
1
0
0
0
1
1
1
1
1
0
0
1
0
0
1
1
0
0
run( )
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
1
1
0
1
1
0
1
1
26Experimental Results
- Comparing results of insertion sorter
- Other examples
27Binary Arithmetic Coder
- Adaptive Binary Arithmetic Coder
- Q-coder compress mainly bilevel image data
- a compression chip universal enough quickly
compress any type of data that could still
achieve a good compression ratio - proposed modified hardwared algorithm
- a new probability estimation modeler using a
table-look-up approach - a technique solves carry-over and source
termination - fixed-width parallel multiplier
- VLSI chip
28Encoding Algorithm
Encoding() C0x00 A0xff R0x0000
S0000000000 for (each input binary symbol)
phase1 Generate P('0'S) by Eq.
(4.5) phase2 APA P('0'S) if
(input symbol'0') AAP else
AA-AP CCAP if (carry
occurs) R Update the adaptive
modeler by Eq. (4.6) Shift the input
symbol into S phase3 while (MSB of A0)
normalization_of_encoding() Encode LPS
and then output 17 consecutive '1's
29System Architecture
30(No Transcript)
31(No Transcript)
32Dynamic Pipelining Design
33Low-Error Fixed-Width Multipliers
- Fixed-Width Multiplier
- multiplication operations used in many ASICs have
the special fixed-width property - directly omit about half the adder cells of the
conventional parallel multiplier - ? a significant error would be introduced in
the product - Low-Error Fixed-Width Multiplier
- low-error fixed-width sign-magnitude multipliers
- low-error fixed-width twos complement
multipliers - reduced width multiplier (n lt m lt 2n)
34Low-Error Fixed-Width Multipliers
- Fixed-width sign-magnitude multipliers
-
-
-
-
?
where
Theorem Given a ?, we have that
and
35X x5 x4 x3 x2 x1 x0 Y y5 y4
y3 y2 y1 y0
Sign-magnitude multiplier
36Twos complement multiplier
37Reduced width multiplier
38Low-Error Fixed-Width Multipliers
39Application
(a) original
(b) M1
(c) MF
(d) MR1
(f) MS
(e) MR2
40(b) M1
(a) original
41(c) MF
(d) MR1
42(e) MR2
(f) MS
43Fuzzy Color Corrector
- Fuzzy Color Correction
- in previous literature, the color correction
process was modeled as a three-level fuzzy tree
inference process - the algorithm in it is inefficient and its
hardware implementation is then costly and slow - a new efficient fuzzy tree inference algorithm
suitable for the center of gravity
defuzzification method is proposed
44- modified fuzzy color correction algorithm
Init L1 S1 while (input pattern Xi ?
NULL) S1 Calculate the address of rule
memory (ROM) S2, S3 s1ROMaddress
Ds1 S4 k0 PathL0
dROMaddress S5 while (klt8 Dgt0)
S6 Dd PathLk k S5 if
(1? k ? 7 D ? d/2) PathLk
S7S13 Calculate Xo using Eq.
(6.6) S7 if (L4) L1
452.5. Fuzzy Color Corrector
- Proposed Sequential Architecture
46Dynamic pipelined Design
47Future Work
System-on-a-Chip (SoC) Platform
NNI NoC Network Interface (ISO-OSI 7-Layer RM)
48References
- 1 Jer-Min Jou, Shiann-Rong Kuang, Yeu-Horng
Shiau, and Ren-Der Chen, Design of A Dynamic
Pipelined Architecture for Fuzzy Color
Correction, to be published in IEEE Transactions
on VLSI Systems, 2002. - 2 Jer-Min Jou, Yeu-Horng Shiau, Pei-Yin Chen,
and Shiann-Rong Kuang, A Low Cost Gray
Prediction Search Chip for Motion Estimation,
Vol. 49, No. 7, pp. 928-938, July 2002. - 3 Shiann-Rong Kuang, Jer-Min Jou, Ren-Der Chen,
and Yeu-Horng Shiau, Dynamic Pipeline Design of
an Adaptive Binary Arithmetic Coder, IEEE
Transactions on Circuits Systems Part II, Vol.
48, No. 9, pp. 813-825, September 2001. - 4 Jer Min Jou, Shiann Rong Kuang, and Ren-Der
Chen, Design of Low-Error Fixed-Width
Multipliers for DSP Applications, IEEE
Transactions on Circuits Systems Part II, Vol.
46, No. 6, pp. 836-842, June 1999.
49References
- 5 Jer-Min Jou, Shiann-Rong Kuang, and Ren-Der
Chen, A New Efficient Fuzzy Algorithm for Color
Correction, IEEE Transactions on Circuits
Systems Part I, Vol. 46, No. 6, pp. 773-775, June
1999. - 6 Shiann-Rong Kuang, Jer-Min Jou, and Yuh-Lin
Chen, The Design of an Adaptive On-Line Binary
Arithmetic Coding Chip, IEEE Transactions on
Circuits Systems Part I, Vol. 45, No. 7, pp.
693-706, July 1998. - 7 Jer-Min Jou and Shiann-Rong Kuang, Design of
a low-error fixed-width multiplier for DSP
applications, Electronics Letters, Vol. 33, No.
19, pp. 1597-1598, 1997. - 8 Jer-Min Jou and Shiann-Rong Kuang, A
Library-Adaptively Integrated High Level
Synthesis System, Proceedings of NSC Part A
Physical Science and Engineering, Vol. 19, No. 3,
pp. 220-234, May 1995.