DataFlow Analysis in the Memory Management of RealTime Multimedia Processing Systems - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

DataFlow Analysis in the Memory Management of RealTime Multimedia Processing Systems

Description:

Memory size computation using data. dependence analysis ... Computation of array reference size. Number of iterator triplets (i,j,k), that is 5123 ? ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 39

Provided by: florin8

Category:

more less

Transcript and Presenter's Notes

Title: DataFlow Analysis in the Memory Management of RealTime Multimedia Processing Systems

1
Data-Flow Analysis in the Memory Management of
Real-Time Multimedia Processing Systems

Florin Balasa
University of Illinois at Chicago

2
Introduction
Real-time multimedia processing systems
(video and image processing, real-time 3D
rendering, audio and speech coding, medical
imaging, etc.)

A large part of power dissipation is due to

data transfer and data storage
Fetching operands from an off-chip memory for
addition consumes 33 times more power than
the computation
Catthoor 98

Area cost often largely dominated by memories

3
Introduction
In the early years of high-level synthesis
memory management tasks tackled at scalar level
Algebraic techniques -- similar to those used
in modern compilers -- allow to handle memory
management at non-scalar level
Requirement addressing the entire class of
affine specifications

multidimensional signals with (complex) affine
indexes

loop nests having as boundaries affine iterator
functions

conditions relational and / or logical
operators of affine fct.

4
Outline

Memory size computation using data
dependence analysis
Hierarchical memory allocation
based on data reuse analysis
Data-flow driven data partitioning
for on/off- chip memories
Conclusions

5
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
A 2i3j1 5ij2 4i6j3
for (k0 klt511 k)
B ik jk
How many memory locations are necessary to
store the array references A 2i3j1 5ij2
4i6j3 B ik jk
6
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)

for (k0 klt511 k)
B ik jk
Number of iterator triplets (i,j,k), that is 5123
??
(i,j,k)(0,1,1)
No !!
B 1 2
(i,j,k)(1,2,0)
7
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)

for (k0 klt511 k)
B ik jk
Number of index values (ik,jk), that is 10232 ??
(since 0 lt ik , jk lt 1022)
any (i,j,k)
B 0 512
No !!
8
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
A 2i3j1 5ij2 4i6j3
z4i6j3
Axyz
j
y5ij2
i
Iterator space
Index space
x2i3j1
9
Computation of array reference size
A 2i3j1 5ij2 4i6j3
j
Axyz
i
Iterator space
Index space
10
Computation of array reference size
Remark
The iterator space may have holes too
for (i4 ilt8 i)
for (ji-2 jlti2 j2)
Cij
j
for (i4 ilt8 i)
8
for (j0 jlt2 j)
C2i2j-2
6
j
4
normalization
2
2
1
i
i
0
4 6 8
4 5 6 7 8
11
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)
A 2i3j1 5ij2 4i6j3
2
3
1
x
i

5
1
2
y

j
4
6
3
z
affine
Iterator space
Index space
mapping
0 lt i , j lt 511
12
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)

for (k0 klt511 k)
B ik jk
k
yjk
Bxy
j
i
xik
Index space
Iterator space
13
Computation of array reference size
for (i0 ilt511 i)
for (j0 jlt511 j)

for (k0 klt511 k)
B ik jk
i
x
0
1
0
1
j

y
0
1
1
0
k
affine
Iterator space
Index space
mapping
0 lt i , j , k lt 511
14
Computation of array reference size
Any array reference can be modeled as a
linearly bounded lattice (LBL)
LBL x Ti u Ai gt b
Iterator space
Affine mapping
- scope of nested loops, and

iterator-dependent conditions

affine
Polytope
LBL
mapping
15
Computation of array reference size
The size of the array reference is the size of
its index space an LBL !!
LBL x Ti u Ai gt b
f Zn Zm
f(i) Ti u
Is function f a one-to-one mapping ??
If YES
Size(index space) Size(iterator space)
16
Computation of array reference size
f Zn Zm
f(i) Ti u
0
H
PTS
Minoux 86
G
0
H - nonsingular lower-triangular matrix
S - unimodular matrix
P - row permutation
When rank(H)m lt n , H is the Hermite Normal
Form
17
Computation of array reference size
rank(H)n
function f is a one-to-one mapping
Case 1
for (i0 ilt511 i)
A 2i3j1 5ij2 4i6j3
for (j0 jlt511 j)
2
3
1
x
i

5
1
2
y

j
4
6
3
z
2
3
1
0
-1
3
H
PTS
I3

13
5
1
-4
1
-2
- - - -
4
6
2
0
G
Nr. locations A size ( 0 lt i,j lt
511 ) 512 x 512
18
Computation of array reference size
rank(H)ltn
Case 2
for (i0 ilt511 i)
for (j0 jlt511 j)
for (k0 klt511 k)
B ik jk
1
0
-1
1
0
0
1
0
1
PTS
I2

0
1
-1
0
1
0
0
1
1
0
0
1
0
H
0 lt i , j , k lt 511
0 lt I-K , J-K , K lt 511
Bikjk size ( 0ltI,Jlt1022 ,
I-511ltJltI511 ) 784,897
19
Computation of array reference size
Array reference B ik jk
20
Computation of array reference size
Computation of the size of an integer polytope
The Fourier-Motzkin elimination
n-dim polytope
1. xn gt Di (x1,,xn-1)
?
aikxk gt bk
2. xn lt Ej (x1,,xn-1)
3. 0 lt Fk (x1,,xn-1)
(n-1)-dim polytope
Di (x1,,xn-1)
lt Ej (x1,,xn-1)
0 lt Fk (x1,,xn-1)
for each value of x1
1-dim polytope
add size (n-1)-dim polytope
Range of x1
21
Memory size computation
define n 6
for ( j0 jltn j ) A j 0 in0
for ( i0 iltn i ) A j i1
A j i 1
for ( i0 iltn i ) alpha i A i
ni for ( j0 jltn j ) A
j ni1 j lt i ? A
j ni alpha i A j
ni for ( j0 jltn j ) B j A
j 2n
22
Memory size computation
Decompose the LBLs of the array refs. into
non-overlapping pieces !!
23
Memory size computation
Keeping minimal the set of inequalities in the
LBL intersection
for ( i0 iltn i ) for ( j0 jltn j
) A j ni1 j lt i
? A j ni alpha i A
j ni
Iterator space
0 lt i , j lt n-1 , j1 lt i
j
(5 ineq.)
n-1
0 lt j , i lt n-1 , j1 lt i
i
(3 ineq.)
1
n-1
24
Memory size computation
Keeping minimal the set of inequalities in the
LBL intersection
The decomposition theorem of polyhedra
Polyhedron x Cx d , Ax gt b
Motzkin 1953
Polyhedron x x Va Lb Rg
a , g gt 0 , S ai 1
25
Memory size computation
LBLs of signal A (illustrative example)
26
Granularity level 0
Granularity level 1
Polyhedral data-dependence graphs
27
Granularity level 2
Scalar-level data-dependence graph
28
Polyhedral data-dependence graph
scalars
motion detection algorithm
Chan 93
dependencies
29
Memory size computation
Memory size variation during the motion detection
alg.
30
Memory size computation
To handle high throughput applications
Extract the (largely hidden) parallelism from
the initially specified code
Find the lowest degree of parallelism to meet
the throughput/hardware requirements
Perform memory size computation for code with
explicit parallelism instructions
31
Hierarchical memory allocation
A large part of power dissipation in
data-dominated applications is due to
data transfers and data storage
Power cost reduction
memory hierarchy
exploiting temporal locality in the data accesses
Power dissipation
f ( memory size , access frequency )
32
Hierarchical memory allocation
Power dissipation
f ( memory size , access freq. )
heavily used data
Layer of small memories
Layer of large memories
33
Hierarchical memory allocation
Hierarchical distribution
Non-hierarchical distribution
Lower power consumption by accessing from smaller
memories
trade-offs
Higher power consumption due to additional
transfers

to store copies of data

Larger area

additional area overhead (addressing logic)

34
Hierarchical memory allocation
Synthesis of multilevel memory architecture optimi
zed for area and / or power subject to
performance constraints
1. Data reuse exploration
Which intermediate copies of data are necessary
for accessing data in a power- and area-
efficient way
2. Memory allocation assignment
Distributed (hierarchical) memory architecture
( memory layers, memory size/ports/address-logic
, signal-to-memory signal-to-port
assignment )
35
Hierarchical memory allocation
Synthesis of multilevel memory architecture optimi
zed for area and / or power subject to
performance constraints
1. Data reuse exploration
Array partitions to be considered as copy
candidates
the LBLs from the recursive intersection of
array refs.
2. Memory allocation assignment
Cost a S Pread / write ( N bits , N words
, f read / write )
b S Area ( N bits , N words ,
Nports , technology )
36
Partitioning for on/off- chip memories
1 cycle
DRAM off-chip
SRAM on-chip
CPU
Memory address space
10-20
Cache
1 cycle
cycles
Optimal data mapping to the SRAM / DRAM to
maximize the performance of the application
37
Partitioning for on/off- chip memories
Total number of array accesses exposed to cache
conflicts
Total conflict factor
The importance of mapping to the on-chip SRAM
Using the polyhedral data-dependence graph
Precise info about the relative lifetimes of the
different parts of arrays
38
Conclusions