The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches

Description:

The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches Shiann-Rong Kuang ( ) Assistant Professor Dept. of Computer ... – PowerPoint PPT presentation

Number of Views:482

Avg rating:3.0/5.0

Slides: 50

Provided by: p203

Category:

more less

Transcript and Presenter's Notes

Title: The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches

1
The Design of Application Specific Integrated
Circuits with High Level Synthesis Approaches
Shiann-Rong Kuang (???) Assistant
ProfessorDept. of Computer Science and
EngineeringNational Sun Yat-Sen University
2
Outlines

Introduction
Novel High Level Synthesis Approaches
Integrated Data Path Synthesis Approach
Pipelined Control Path Synthesis Approach
Dynamic Pipelining Approach
ASICs design
Binary Arithmetic Coder
Low-Error Fixed-Width Multipliers
Fuzzy Color Corrector
Future Work

3
Introduction

High level synthesis
Behavioral description ? register transfer level
description
Data path synthesis and control path synthesis

FSM
t1a-b t2ct1 t3e-f xd-t2 yt1t3
e t2 y
a t1 x
e t3
b
d
f
_

4
Integrated Data Path Synthesis Approach

Data Path Synthesis
module selection, scheduling, and allocation
highly interdependent
separately solve them ? the best designs may not
be explored
Proposed Data Path Synthesis Approach
combine module selection, scheduling, and
allocation
general module selection model
module types with different attributes (delay,
area, )
a mixed-vertex compatibility graph model
solve it globally using partial clique
partitioning

5
a
e
c
b
d
f
Clock cycle100ns, Latency5, and performance
constraint500ns
-1
-4
t1
t3
2
t2
5
-3
y
x
circuit 1 2
module cost 340 380
MUX cost 200 80
wire cost 1200 1100
Register cost 900 900
Total cost 2640 2460
6

Find all feasible Assignments

MCG transformations

Initial MCG
A131
A130
A132
A141
A140
A442
A430
A433
A432
A431
A441
V130, V20
A440
A332
A334
A343
A342
A333
A211
A212
A213
A221
A222
A450
A511
A521
A512
A513
A514
A522
A523
7
MCG after iteration 1
MCG after iteration 2
8
Final MCG
MCG after iteration 3
9
Integrated Data Path Synthesis Approach

Experiments and Results

10
Integrated Data Path Synthesis Approach
11
Integrated Data Path Synthesis Approach
12
Pipelined Control Path Synthesis Approach

Main Idea of Pipelining Control Path

13
Pipelined Control Path Synthesis Approach

Proposed Control Path Synthesis Approach
A problem may violate the control dependency
Modify the original BSTG by inserting no
operation states
Theorem
A BSTG satisfies all control dependencies if
the distance Dij of states in each
produce-consume state pair ltSi, Sjgtc satisfies
one of the following conditions
Condition 1 if Sj is not a branch state, then
Dij ? k.
Condition 2 if Sj is a branch state, then Dij
? 2k-1.
Nij the minimal number of NOOPs needed to
insert between ltSi, Sjgtc
Nij 2k-Dij-1, if Sj is a branch state
Nij k-Dij, otherwise.
Minimize the number of NOOPs using ILP formulation

14
SCDFG
15
(No Transcript)
16
(No Transcript)
17
Dynamic Pipelining Approach

Pipelining
In most of existing pipelining techniques,
latency is fixed or has some fixed values
In some loops of ASICs, variant loop execution
length and time-relative data dependencies
between the different iterations make them to be
pipelined inefficiently or impossibly
Dynamic pipelining
A new loop scheduling approach to
pipeline the loop using variant latencies
Controller consists of two interactive
finite state machines

while(c1) ? ? ? while(c2) ? ? ?
? ? ?
18
Dynamic Pipelining Approach
19
An Example of Dynamic Pipelining
j1 while (Ngtj) / N is the number of data
which needs to be sorted / ij-1 tempaj w
hile (templtai i ? 0) ai1ai ii-1
ai1temp j
20
S1 j1 .............................
..... o1 S2 O_loop if(N j) goto End_O
........... o2 ij-1 ...........
..................... o3 r_addj
............ o4 S3 j ...
...................... o5 tempar_add
......... o6 S4 I_loop r_addi
.................. o7 S5 dataa
r_add ................... o8 S6 w
_addi1 .............. o9
if (!(templtdata i ? 0)) goto End_I
. o10 S7 aw_adddata ..............
.. o11 ii-1 gotoI_loop
....... o12 S8 End_I aw_addtemp
gotoO_loop ........ o13 End_O
21

BSTG Partitioning

BSTGo

Inner Loop Pipelining

new PBSTGi
original PBSTGi
22

Outer Loop Pipelining

new BSTGo
L3
unwind the loop body four times
23
final PBSTGo
final PBSTGi
24

Datapath Allocation

Controller Architecture

inner controller
done
ci
Eq. (3.4)
combinational logic
Control signals
from datapath
to datapath
state registers
start
co
outer controller
Mux
combinational logic
Eq. (3.3)
state registers
run
Eq. (3.5)
25
An execution example
iteration
i
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
6
5
2
3
4
5
6
7
4
5
6
7
4
5
6
7
4
8
i
1
i
2
latency7
latency3
latency5
latency3
PS2
PS1
PS1
PS2
PS1
PS1
PS2
PS1
PS1
Nop
Nop
PS1
PS2
PS1
PS2
PS1
PS2
PS1
PS2
PS1
inner
Nop
PS1
PS1
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
Nop
Nop
Nop
Nop
Nop
Nop
outer
PS1
PS2
PS2
S7 S5
S6 S4
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S6 S4
S6 S4
S7 S5
S6 S4
S6 S4
state(i)
S6 S4
N3 S5 S2
S8 N1 S3
N3 S5 S2
S8 N1 S3
N3 S5 S2
N3 S5 S2
S8 N1 S3
N3 S5 S2
N3 S5 S2
S8 N1 S3
S8 N1 S3
S8 N1 S3
state(o)
N2 S4
N2 S4
N2 S4
N2 S4
N2 S4
done
1
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
1
0
0
1
1
0
0
start
1
0
0
1
1
1
0
0
0
1
1
1
1
1
0
0
1
0
0
1
1
0
0
run( )
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
1
1
0
1
1
0
1
1
26
Experimental Results

Comparing results of insertion sorter
Other examples

27
Binary Arithmetic Coder

Adaptive Binary Arithmetic Coder
Q-coder compress mainly bilevel image data
a compression chip universal enough quickly
compress any type of data that could still
achieve a good compression ratio
proposed modified hardwared algorithm
a new probability estimation modeler using a
table-look-up approach
a technique solves carry-over and source
termination
fixed-width parallel multiplier
VLSI chip

28
Encoding Algorithm
Encoding() C0x00 A0xff R0x0000
S0000000000 for (each input binary symbol)
phase1 Generate P('0'S) by Eq.
(4.5) phase2 APA P('0'S) if
(input symbol'0') AAP else
AA-AP CCAP if (carry
occurs) R Update the adaptive
modeler by Eq. (4.6) Shift the input
symbol into S phase3 while (MSB of A0)
normalization_of_encoding() Encode LPS
and then output 17 consecutive '1's
29
System Architecture
30
(No Transcript)
31
(No Transcript)
32
Dynamic Pipelining Design
33
Low-Error Fixed-Width Multipliers

Fixed-Width Multiplier
multiplication operations used in many ASICs have
the special fixed-width property
directly omit about half the adder cells of the
conventional parallel multiplier
? a significant error would be introduced in
the product
Low-Error Fixed-Width Multiplier
low-error fixed-width sign-magnitude multipliers
low-error fixed-width twos complement
multipliers
reduced width multiplier (n lt m lt 2n)

34
Low-Error Fixed-Width Multipliers

Fixed-width sign-magnitude multipliers

?
where
Theorem Given a ?, we have that
and
35
X x5 x4 x3 x2 x1 x0 Y y5 y4
y3 y2 y1 y0
Sign-magnitude multiplier
36
Twos complement multiplier
37
Reduced width multiplier
38
Low-Error Fixed-Width Multipliers

Error comparison

39
Application
(a) original
(b) M1
(c) MF
(d) MR1
(f) MS
(e) MR2
40
(b) M1
(a) original
41
(c) MF
(d) MR1
42
(e) MR2
(f) MS
43
Fuzzy Color Corrector

Fuzzy Color Correction
in previous literature, the color correction
process was modeled as a three-level fuzzy tree
inference process
the algorithm in it is inefficient and its
hardware implementation is then costly and slow
a new efficient fuzzy tree inference algorithm
suitable for the center of gravity
defuzzification method is proposed

modified fuzzy color correction algorithm

Init L1 S1 while (input pattern Xi ?
NULL) S1 Calculate the address of rule
memory (ROM) S2, S3 s1ROMaddress
Ds1 S4 k0 PathL0
dROMaddress S5 while (klt8 Dgt0)
S6 Dd PathLk k S5 if
(1? k ? 7 D ? d/2) PathLk
S7S13 Calculate Xo using Eq.
(6.6) S7 if (L4) L1
45
2.5. Fuzzy Color Corrector

Proposed Sequential Architecture

46
Dynamic pipelined Design
47
Future Work
System-on-a-Chip (SoC) Platform
NNI NoC Network Interface (ISO-OSI 7-Layer RM)
48
References

1 Jer-Min Jou, Shiann-Rong Kuang, Yeu-Horng
Shiau, and Ren-Der Chen, Design of A Dynamic
Pipelined Architecture for Fuzzy Color
Correction, to be published in IEEE Transactions
on VLSI Systems, 2002.
2 Jer-Min Jou, Yeu-Horng Shiau, Pei-Yin Chen,
and Shiann-Rong Kuang, A Low Cost Gray
Prediction Search Chip for Motion Estimation,
Vol. 49, No. 7, pp. 928-938, July 2002.
3 Shiann-Rong Kuang, Jer-Min Jou, Ren-Der Chen,
and Yeu-Horng Shiau, Dynamic Pipeline Design of
an Adaptive Binary Arithmetic Coder, IEEE
Transactions on Circuits Systems Part II, Vol.
48, No. 9, pp. 813-825, September 2001.
4 Jer Min Jou, Shiann Rong Kuang, and Ren-Der
Chen, Design of Low-Error Fixed-Width
Multipliers for DSP Applications, IEEE
Transactions on Circuits Systems Part II, Vol.
46, No. 6, pp. 836-842, June 1999.

49
References

5 Jer-Min Jou, Shiann-Rong Kuang, and Ren-Der
Chen, A New Efficient Fuzzy Algorithm for Color
Correction, IEEE Transactions on Circuits
Systems Part I, Vol. 46, No. 6, pp. 773-775, June
1999.
6 Shiann-Rong Kuang, Jer-Min Jou, and Yuh-Lin
Chen, The Design of an Adaptive On-Line Binary
Arithmetic Coding Chip, IEEE Transactions on
Circuits Systems Part I, Vol. 45, No. 7, pp.
693-706, July 1998.
7 Jer-Min Jou and Shiann-Rong Kuang, Design of
a low-error fixed-width multiplier for DSP
applications, Electronics Letters, Vol. 33, No.
19, pp. 1597-1598, 1997.
8 Jer-Min Jou and Shiann-Rong Kuang, A
Library-Adaptively Integrated High Level
Synthesis System, Proceedings of NSC Part A
Physical Science and Engineering, Vol. 19, No. 3,
pp. 220-234, May 1995.