The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches

Description:

The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches Shiann-Rong Kuang ( ) Assistant Professor Dept. of Computer ... – PowerPoint PPT presentation

Number of Views:482
Avg rating:3.0/5.0
Slides: 50
Provided by: p203
Category:

less

Transcript and Presenter's Notes

Title: The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches


1
The Design of Application Specific Integrated
Circuits with High Level Synthesis Approaches
Shiann-Rong Kuang (???) Assistant
ProfessorDept. of Computer Science and
EngineeringNational Sun Yat-Sen University
2
Outlines
  • Introduction
  • Novel High Level Synthesis Approaches
  • Integrated Data Path Synthesis Approach
  • Pipelined Control Path Synthesis Approach
  • Dynamic Pipelining Approach
  • ASICs design
  • Binary Arithmetic Coder
  • Low-Error Fixed-Width Multipliers
  • Fuzzy Color Corrector
  • Future Work

3
Introduction
  • High level synthesis
  • Behavioral description ? register transfer level
    description
  • Data path synthesis and control path synthesis

FSM
t1a-b t2ct1 t3e-f xd-t2 yt1t3
e t2 y
a t1 x
e t3
b
d
f
_

4
Integrated Data Path Synthesis Approach
  • Data Path Synthesis
  • module selection, scheduling, and allocation
    highly interdependent
  • separately solve them ? the best designs may not
    be explored
  • Proposed Data Path Synthesis Approach
  • combine module selection, scheduling, and
    allocation
  • general module selection model
  • module types with different attributes (delay,
    area, )
  • a mixed-vertex compatibility graph model
  • solve it globally using partial clique
    partitioning

5
a
e
c
b
d
f
Clock cycle100ns, Latency5, and performance
constraint500ns
-1
-4
t1
t3
2
t2
5
-3
y
x
circuit 1 2
module cost 340 380
MUX cost 200 80
wire cost 1200 1100
Register cost 900 900
Total cost 2640 2460
6
  • Find all feasible Assignments
  • MCG transformations

Initial MCG
A131
A130
A132
A141
A140
A442
A430
A433
A432
A431
A441
V130, V20
A440
A332
A334
A343
A342
A333
A211
A212
A213
A221
A222
A450
A511
A521
A512
A513
A514
A522
A523
7
MCG after iteration 1
MCG after iteration 2
8
Final MCG
MCG after iteration 3
9
Integrated Data Path Synthesis Approach
  • Experiments and Results

10
Integrated Data Path Synthesis Approach
11
Integrated Data Path Synthesis Approach
12
Pipelined Control Path Synthesis Approach
  • Main Idea of Pipelining Control Path

13
Pipelined Control Path Synthesis Approach
  • Proposed Control Path Synthesis Approach
  • A problem may violate the control dependency
  • Modify the original BSTG by inserting no
    operation states
  • Theorem
  • A BSTG satisfies all control dependencies if
    the distance Dij of states in each
    produce-consume state pair ltSi, Sjgtc satisfies
    one of the following conditions
  • Condition 1 if Sj is not a branch state, then
    Dij ? k.
  • Condition 2 if Sj is a branch state, then Dij
    ? 2k-1.
  • Nij the minimal number of NOOPs needed to
    insert between ltSi, Sjgtc
  • Nij 2k-Dij-1, if Sj is a branch state
  • Nij k-Dij, otherwise.
  • Minimize the number of NOOPs using ILP formulation

14
SCDFG
15
(No Transcript)
16
(No Transcript)
17
Dynamic Pipelining Approach
  • Pipelining
  • In most of existing pipelining techniques,
    latency is fixed or has some fixed values
  • In some loops of ASICs, variant loop execution
    length and time-relative data dependencies
    between the different iterations make them to be
    pipelined inefficiently or impossibly
  • Dynamic pipelining
  • A new loop scheduling approach to
  • pipeline the loop using variant latencies
  • Controller consists of two interactive
  • finite state machines

while(c1) ? ? ? while(c2) ? ? ?
? ? ?
18
Dynamic Pipelining Approach
19
An Example of Dynamic Pipelining
j1 while (Ngtj) / N is the number of data
which needs to be sorted / ij-1 tempaj w
hile (templtai i ? 0) ai1ai ii-1
ai1temp j
20
S1 j1 .............................
..... o1 S2 O_loop if(N j) goto End_O
........... o2 ij-1 ...........
..................... o3 r_addj
............ o4 S3 j ...
...................... o5 tempar_add
......... o6 S4 I_loop r_addi
.................. o7 S5 dataa
r_add ................... o8 S6 w
_addi1 .............. o9
if (!(templtdata i ? 0)) goto End_I
. o10 S7 aw_adddata ..............
.. o11 ii-1 gotoI_loop
....... o12 S8 End_I aw_addtemp
gotoO_loop ........ o13 End_O
21
  • BSTG Partitioning

BSTGo
  • Inner Loop Pipelining

new PBSTGi
original PBSTGi
22
  • Outer Loop Pipelining

new BSTGo
L3
unwind the loop body four times
23
final PBSTGo
final PBSTGi
24
  • Datapath Allocation
  • Controller Architecture

inner controller
done
ci
Eq. (3.4)
combinational logic
Control signals
from datapath
to datapath
state registers
start
co
outer controller
Mux
combinational logic
Eq. (3.3)
state registers
run
Eq. (3.5)
25
An execution example
iteration
i
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
6
5
2
3
4
5
6
7
4
5
6
7
4
5
6
7
4
8
i
1
i
2
latency7
latency3
latency5
latency3
PS2
PS1
PS1
PS2
PS1
PS1
PS2
PS1
PS1
Nop
Nop
PS1
PS2
PS1
PS2
PS1
PS2
PS1
PS2
PS1
inner
Nop
PS1
PS1
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
PS2
PS3
PS1
Nop
Nop
Nop
Nop
Nop
Nop
outer
PS1
PS2
PS2
S7 S5
S6 S4
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S7 S5
S6 S4
S6 S4
S6 S4
S7 S5
S6 S4
S6 S4
state(i)
S6 S4
N3 S5 S2
S8 N1 S3
N3 S5 S2
S8 N1 S3
N3 S5 S2
N3 S5 S2
S8 N1 S3
N3 S5 S2
N3 S5 S2
S8 N1 S3
S8 N1 S3
S8 N1 S3
state(o)
N2 S4
N2 S4
N2 S4
N2 S4
N2 S4
done
1
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
1
0
0
1
1
0
0
start
1
0
0
1
1
1
0
0
0
1
1
1
1
1
0
0
1
0
0
1
1
0
0
run( )
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
1
1
0
1
1
0
1
1
26
Experimental Results
  • Comparing results of insertion sorter
  • Other examples

27
Binary Arithmetic Coder
  • Adaptive Binary Arithmetic Coder
  • Q-coder compress mainly bilevel image data
  • a compression chip universal enough quickly
    compress any type of data that could still
    achieve a good compression ratio
  • proposed modified hardwared algorithm
  • a new probability estimation modeler using a
    table-look-up approach
  • a technique solves carry-over and source
    termination
  • fixed-width parallel multiplier
  • VLSI chip

28
Encoding Algorithm
Encoding() C0x00 A0xff R0x0000
S0000000000 for (each input binary symbol)
phase1 Generate P('0'S) by Eq.
(4.5) phase2 APA P('0'S) if
(input symbol'0') AAP else
AA-AP CCAP if (carry
occurs) R Update the adaptive
modeler by Eq. (4.6) Shift the input
symbol into S phase3 while (MSB of A0)
normalization_of_encoding() Encode LPS
and then output 17 consecutive '1's
29
System Architecture
30
(No Transcript)
31
(No Transcript)
32
Dynamic Pipelining Design
33
Low-Error Fixed-Width Multipliers
  • Fixed-Width Multiplier
  • multiplication operations used in many ASICs have
    the special fixed-width property
  • directly omit about half the adder cells of the
    conventional parallel multiplier
  • ? a significant error would be introduced in
    the product
  • Low-Error Fixed-Width Multiplier
  • low-error fixed-width sign-magnitude multipliers
  • low-error fixed-width twos complement
    multipliers
  • reduced width multiplier (n lt m lt 2n)

34
Low-Error Fixed-Width Multipliers
  • Fixed-width sign-magnitude multipliers


?
where
Theorem Given a ?, we have that
and
35
X x5 x4 x3 x2 x1 x0 Y y5 y4
y3 y2 y1 y0
Sign-magnitude multiplier
36
Twos complement multiplier
37
Reduced width multiplier
38
Low-Error Fixed-Width Multipliers
  • Error comparison

39
Application
(a) original
(b) M1
(c) MF
(d) MR1
(f) MS
(e) MR2
40
(b) M1
(a) original
41
(c) MF
(d) MR1
42
(e) MR2
(f) MS
43
Fuzzy Color Corrector
  • Fuzzy Color Correction
  • in previous literature, the color correction
    process was modeled as a three-level fuzzy tree
    inference process
  • the algorithm in it is inefficient and its
    hardware implementation is then costly and slow
  • a new efficient fuzzy tree inference algorithm
    suitable for the center of gravity
    defuzzification method is proposed

44
  • modified fuzzy color correction algorithm

Init L1 S1 while (input pattern Xi ?
NULL) S1 Calculate the address of rule
memory (ROM) S2, S3 s1ROMaddress
Ds1 S4 k0 PathL0
dROMaddress S5 while (klt8 Dgt0)
S6 Dd PathLk k S5 if
(1? k ? 7 D ? d/2) PathLk
S7S13 Calculate Xo using Eq.
(6.6) S7 if (L4) L1
45
2.5. Fuzzy Color Corrector
  • Proposed Sequential Architecture

46
Dynamic pipelined Design
47
Future Work
System-on-a-Chip (SoC) Platform
NNI NoC Network Interface (ISO-OSI 7-Layer RM)
48
References
  • 1 Jer-Min Jou, Shiann-Rong Kuang, Yeu-Horng
    Shiau, and Ren-Der Chen, Design of A Dynamic
    Pipelined Architecture for Fuzzy Color
    Correction, to be published in IEEE Transactions
    on VLSI Systems, 2002.
  • 2 Jer-Min Jou, Yeu-Horng Shiau, Pei-Yin Chen,
    and Shiann-Rong Kuang, A Low Cost Gray
    Prediction Search Chip for Motion Estimation,
    Vol. 49, No. 7, pp. 928-938, July 2002.
  • 3 Shiann-Rong Kuang, Jer-Min Jou, Ren-Der Chen,
    and Yeu-Horng Shiau, Dynamic Pipeline Design of
    an Adaptive Binary Arithmetic Coder, IEEE
    Transactions on Circuits Systems Part II, Vol.
    48, No. 9, pp. 813-825, September 2001.
  • 4 Jer Min Jou, Shiann Rong Kuang, and Ren-Der
    Chen, Design of Low-Error Fixed-Width
    Multipliers for DSP Applications, IEEE
    Transactions on Circuits Systems Part II, Vol.
    46, No. 6, pp. 836-842, June 1999.

49
References
  • 5 Jer-Min Jou, Shiann-Rong Kuang, and Ren-Der
    Chen, A New Efficient Fuzzy Algorithm for Color
    Correction, IEEE Transactions on Circuits
    Systems Part I, Vol. 46, No. 6, pp. 773-775, June
    1999.
  • 6 Shiann-Rong Kuang, Jer-Min Jou, and Yuh-Lin
    Chen, The Design of an Adaptive On-Line Binary
    Arithmetic Coding Chip, IEEE Transactions on
    Circuits Systems Part I, Vol. 45, No. 7, pp.
    693-706, July 1998.
  • 7 Jer-Min Jou and Shiann-Rong Kuang, Design of
    a low-error fixed-width multiplier for DSP
    applications, Electronics Letters, Vol. 33, No.
    19, pp. 1597-1598, 1997.
  • 8 Jer-Min Jou and Shiann-Rong Kuang, A
    Library-Adaptively Integrated High Level
    Synthesis System, Proceedings of NSC Part A
    Physical Science and Engineering, Vol. 19, No. 3,
    pp. 220-234, May 1995.
Write a Comment
User Comments (0)
About PowerShow.com