Title: DataFlow Analysis in the Memory Management of RealTime Multimedia Processing Systems
1Data-Flow Analysis in the Memory Management of
Real-Time Multimedia Processing Systems
- Florin Balasa
- University of Illinois at Chicago
2Introduction
Real-time multimedia processing systems
(video and image processing, real-time 3D
rendering, audio and speech coding, medical
imaging, etc.)
- A large part of power dissipation is due to
data transfer and data storage
Fetching operands from an off-chip memory for
addition consumes 33 times more power than
the computation
Catthoor 98
- Area cost often largely dominated by memories
3Introduction
In the early years of high-level synthesis
memory management tasks tackled at scalar level
Algebraic techniques -- similar to those used
in modern compilers -- allow to handle memory
management at non-scalar level
Requirement addressing the entire class of
affine specifications
- multidimensional signals with (complex) affine
indexes
- loop nests having as boundaries affine iterator
functions
- conditions relational and / or logical
operators of affine fct.
4Outline
- Memory size computation using data
- dependence analysis
- Hierarchical memory allocation
- based on data reuse analysis
- Data-flow driven data partitioning
- for on/off- chip memories
- Conclusions
5Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
A 2i3j1 5ij2 4i6j3
for (k0 klt511 k)
B ik jk
How many memory locations are necessary to
store the array references A 2i3j1 5ij2
4i6j3 B ik jk
6Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
for (k0 klt511 k)
B ik jk
Number of iterator triplets (i,j,k), that is 5123
??
(i,j,k)(0,1,1)
No !!
B 1 2
(i,j,k)(1,2,0)
7Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
for (k0 klt511 k)
B ik jk
Number of index values (ik,jk), that is 10232 ??
(since 0 lt ik , jk lt 1022)
any (i,j,k)
B 0 512
No !!
8Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
A 2i3j1 5ij2 4i6j3
z4i6j3
Axyz
j
y5ij2
i
Iterator space
Index space
x2i3j1
9Computation of array reference size
A 2i3j1 5ij2 4i6j3
j
Axyz
i
Iterator space
Index space
10Computation of array reference size
Remark
The iterator space may have holes too
for (i4 ilt8 i)
for (ji-2 jlti2 j2)
Cij
j
for (i4 ilt8 i)
8
for (j0 jlt2 j)
C2i2j-2
6
j
4
normalization
2
2
1
i
i
0
4 6 8
4 5 6 7 8
11Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
A 2i3j1 5ij2 4i6j3
2
3
1
x
i
5
1
2
y
j
4
6
3
z
affine
Iterator space
Index space
mapping
0 lt i , j lt 511
12Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
for (k0 klt511 k)
B ik jk
k
yjk
Bxy
j
i
xik
Index space
Iterator space
13Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
for (k0 klt511 k)
B ik jk
i
x
0
1
0
1
j
y
0
1
1
0
k
affine
Iterator space
Index space
mapping
0 lt i , j , k lt 511
14Computation of array reference size
Any array reference can be modeled as a
linearly bounded lattice (LBL)
LBL x Ti u Ai gt b
Iterator space
Affine mapping
- scope of nested loops, and
- iterator-dependent conditions
affine
Polytope
LBL
mapping
15Computation of array reference size
The size of the array reference is the size of
its index space an LBL !!
LBL x Ti u Ai gt b
f Zn Zm
f(i) Ti u
Is function f a one-to-one mapping ??
If YES
Size(index space) Size(iterator space)
16Computation of array reference size
f Zn Zm
f(i) Ti u
0
H
PTS
Minoux 86
G
0
H - nonsingular lower-triangular matrix
S - unimodular matrix
P - row permutation
When rank(H)m lt n , H is the Hermite Normal
Form
17Computation of array reference size
rank(H)n
function f is a one-to-one mapping
Case 1
for (i0 ilt511 i)
A 2i3j1 5ij2 4i6j3
for (j0 jlt511 j)
2
3
1
x
i
5
1
2
y
j
4
6
3
z
2
3
1
0
-1
3
H
PTS
I3
13
5
1
-4
1
-2
- - - -
4
6
2
0
G
Nr. locations A size ( 0 lt i,j lt
511 ) 512 x 512
18Computation of array reference size
rank(H)ltn
Case 2
for (i0 ilt511 i)
for (j0 jlt511 j)
for (k0 klt511 k)
B ik jk
1
0
-1
1
0
0
1
0
1
PTS
I2
0
1
-1
0
1
0
0
1
1
0
0
1
0
H
0 lt i , j , k lt 511
0 lt I-K , J-K , K lt 511
Bikjk size ( 0ltI,Jlt1022 ,
I-511ltJltI511 ) 784,897
19Computation of array reference size
Array reference B ik jk
20Computation of array reference size
Computation of the size of an integer polytope
The Fourier-Motzkin elimination
n-dim polytope
1. xn gt Di (x1,,xn-1)
?
aikxk gt bk
2. xn lt Ej (x1,,xn-1)
3. 0 lt Fk (x1,,xn-1)
(n-1)-dim polytope
Di (x1,,xn-1)
lt Ej (x1,,xn-1)
0 lt Fk (x1,,xn-1)
for each value of x1
1-dim polytope
add size (n-1)-dim polytope
Range of x1
21Memory size computation
define n 6
for ( j0 jltn j ) A j 0 in0
for ( i0 iltn i ) A j i1
A j i 1
for ( i0 iltn i ) alpha i A i
ni for ( j0 jltn j ) A
j ni1 j lt i ? A
j ni alpha i A j
ni for ( j0 jltn j ) B j A
j 2n
22Memory size computation
Decompose the LBLs of the array refs. into
non-overlapping pieces !!
23Memory size computation
Keeping minimal the set of inequalities in the
LBL intersection
for ( i0 iltn i ) for ( j0 jltn j
) A j ni1 j lt i
? A j ni alpha i A
j ni
Iterator space
0 lt i , j lt n-1 , j1 lt i
j
(5 ineq.)
n-1
0 lt j , i lt n-1 , j1 lt i
i
(3 ineq.)
1
n-1
24Memory size computation
Keeping minimal the set of inequalities in the
LBL intersection
The decomposition theorem of polyhedra
Polyhedron x Cx d , Ax gt b
Motzkin 1953
Polyhedron x x Va Lb Rg
a , g gt 0 , S ai 1
25Memory size computation
LBLs of signal A (illustrative example)
26Granularity level 0
Granularity level 1
Polyhedral data-dependence graphs
27Granularity level 2
Scalar-level data-dependence graph
28Polyhedral data-dependence graph
scalars
motion detection algorithm
Chan 93
dependencies
29Memory size computation
Memory size variation during the motion detection
alg.
30Memory size computation
To handle high throughput applications
Extract the (largely hidden) parallelism from
the initially specified code
Find the lowest degree of parallelism to meet
the throughput/hardware requirements
Perform memory size computation for code with
explicit parallelism instructions
31Hierarchical memory allocation
A large part of power dissipation in
data-dominated applications is due to
data transfers and data storage
Power cost reduction
memory hierarchy
exploiting temporal locality in the data accesses
Power dissipation
f ( memory size , access frequency )
32Hierarchical memory allocation
Power dissipation
f ( memory size , access freq. )
heavily used data
Layer of small memories
Layer of large memories
33Hierarchical memory allocation
Hierarchical distribution
Non-hierarchical distribution
Lower power consumption by accessing from smaller
memories
trade-offs
Higher power consumption due to additional
transfers
Larger area
- additional area overhead (addressing logic)
34Hierarchical memory allocation
Synthesis of multilevel memory architecture optimi
zed for area and / or power subject to
performance constraints
1. Data reuse exploration
Which intermediate copies of data are necessary
for accessing data in a power- and area-
efficient way
2. Memory allocation assignment
Distributed (hierarchical) memory architecture
( memory layers, memory size/ports/address-logic
, signal-to-memory signal-to-port
assignment )
35Hierarchical memory allocation
Synthesis of multilevel memory architecture optimi
zed for area and / or power subject to
performance constraints
1. Data reuse exploration
Array partitions to be considered as copy
candidates
the LBLs from the recursive intersection of
array refs.
2. Memory allocation assignment
Cost a S Pread / write ( N bits , N words
, f read / write )
b S Area ( N bits , N words ,
Nports , technology )
36Partitioning for on/off- chip memories
1 cycle
DRAM off-chip
SRAM on-chip
CPU
Memory address space
10-20
Cache
1 cycle
cycles
Optimal data mapping to the SRAM / DRAM to
maximize the performance of the application
37Partitioning for on/off- chip memories
Total number of array accesses exposed to cache
conflicts
Total conflict factor
The importance of mapping to the on-chip SRAM
Using the polyhedral data-dependence graph
Precise info about the relative lifetimes of the
different parts of arrays
38Conclusions
- Algebraic techniques are powerful non-scalar
instruments - in the memory management of multimedia
signal processing
- Data-dependence analysis at polyhedral level
- useful in many memory management tasks
- memory size computation for behavioral
specifications - hierarchical memory allocation
- data partitioning between on- and off- chip
memories
The End