Design and Synthesis of Image Processing Systems

using Reconfigurable Dataflow Graphs

- Mainak Sen and Shuvra S. Bhattacharyya
- Department of Electrical and Computer

Engineering, andInstitute for Advanced Computer

StudiesUniversity of Maryland at College Park - Maryland DSPCAD Research Grouphttp//www.ece.umd.

edu/DSPCAD/home/dspcad.htm

November 22, 2005Leiden University, The

Netherlands

Outline

- Dataflow-based model of computation for modeling

the behavior of DSP applications - Decidable dataflow models
- Example use of decidable dataflow as a model of

computation for modeling the mapping of

(decidable) dataflow behaviors onto embedded

multiprocessors - Structured reconfiguration of dataflow graphs
- Examples of meta-modeling techniques that can be

classified as structured, reconfigurable dataflow - Parameterized dataflow and its application to SDF
- Homogeneous-parameterized dataflow and its

application to SDF and CSDF - Experiments on a gesture recognition application
- Summary

Dataflow-based design for DSP(Example from

Agilent ADS tool)

DSP-oriented Dataflow Models of Computation

- Used widely in design tools for DSP
- Application is modeled as a directed graph
- Nodes (actors) represent functions
- Edges represent communication channels between

functions - Nodes produce and consume data from edges
- Edges buffer data in FIFO (first-in first-out)

fashion - Data-driven execution model
- A node can execute whenever it has sufficient

data on its input edges - The order in which nodes execute is not part of

the specification - The order is typically determined by the

compiler, the hardware, or both - Iterative execution
- Body of loop to be iterated a large or infinite

number of times

Dataflow Features and Advantages

- Exposes coarse-grain parallelism.
- Exposes high-level structure that facilitates

analysis, verification, and optimization. - Captures multi-rate behavior.
- Complementary to ongoing advances in DSP compiler

technology for procedural languages, such as C

and MATLAB. - Encourages desirable software engineering

practices modularity and code reuse - Amenable also to aspect-oriented design.
- Intuitive to DSP algorithm designers signal flow

graphs.

Evolution of Dataflow Models for DSP

- Synchronous dataflow static multirate behavior
- Agilent ADS, Cadence SPW, etc.
- Well-behaved dataflow schemas for bounded

dynamics - Boolean/integer dataflow Turing complete models
- Multidimensional synchronous dataflow image and

video - Scalable synchronous dataflow block processing
- Synopsys COSSAP
- Cyclo-static dataflow phased behavior
- Synopsys El Greco, Eonic Systems Virtuoso

Synchro, System Canvas - Bounded dynamic dataflow bounded dynamics
- The processing graph method reconfigurable

dynamic DF - US Naval Research Laboratory, MCCI Autocoding

Toolset - Parameterized dataflow dynamically-reconfigurable

static DF - Blocked dataflow image and video in terms of

reconfigurable dataflow

Modeling Design Space

(Third dimension simplicity and intuitive appeal)

Decidable Dataflow Models

- Modeling flow for representing static flowgraph

behavior - Cyclo-static dataflow (CSDF), multiphase modeling

? - Synchronous dataflow (SDF), multirate modeling ?
- Homogeneous synchronous dataflow (HSDF) ?
- Acyclic homogeneous synchronous dataflow (task

graphs) - These are in decreasing order or generality
- Designs represented in the more general models

can be converted to equivalent representations in

the less general ones - e.g., CSDF? SDF ? HSDF ? task graph
- HSDF each actor (graph node) produces/consumes

exactly one data value to/from each incident

output/input edge - Suitable for exposing parallelism
- Not the best model for minimizing memory

requirements

Synthesis Techniques for Decidable Models

- Static scheduling low overhead, predictability
- Performance analysis through synchronization

graphs - Loop scheduling
- Implicit repetition in the dataflow graph

(through changes in sample rate) needs to be

translated into explicit repetition in the form

of loops on the execution target. - Complex design space exists for such translation
- Complementary to procedural language techniques

for nested loop compilation - Loop scheduling techniques
- Simulation speedup (minimization of scheduling

complexity) - Code/data minimization
- Hierarchical parallel scheduling
- Block processing
- Task scheduling for latency/throughput

optimization - Probabilistic design exploiting tolerances to

deadline misses

Example Intermediate representations for

synthesis from decidable dataflow models

- Consider a decidable dataflow behavior that is to

be implemented on a self-timed, embedded

multiprocessor - Natural way to implement DSP multiprocessors from

decidable dataflow - Actor assignment and ordering are performed

statically - Invocation (dispatch) of actors is performed

dynamically, through synchronization - Candidate mappings of the behavior onto the

architecture can be represented through an

intermediate representation that also has

decidable dataflow semantics - This representation is useful for understanding

the performance, communication overhead, and

synchronization structure associated with the

candidate mapping - Facilitates the separation of communication and

synchronization functionality - This is a useful modeling methodology for design

space exploration

Interprocessor Communication Graph (Gipc)

Self-timed schedule and its IPC graph

The synchronization graph Gs

- Derived from the interprocessor communication

graph - Synchronization edges are distinguished from

interprocessor communication (IPC) edges - Synchronization edges represent precedence

constraints that are enforced by synchronization

protocols - IPC edges represent data transfers
- Interprocessor connections
- Coincident synchronization and IPC edges ?

communication together with synchronization

protocol (conventional approach) - IPC edge only ? communication without synch.

protocol - Synchronization edge only ? synchronization

protocol only

Applications of Synchronization Graphs

- Simulation
- Throughput estimation through cycle mean

analysis - Removal of redundant synchronizations
- Resynchronization
- Conversion to more efficient synchronization

protocols (strongly connected synchronization

graphs) - Statically determining and minimizing the sizes

of interprocessor communication buffers

- All are post-processing methods that can be

applied to improve a wide range of existing task

graph scheduling techniques on a wide range of

multiprocessor architectures. - These techniques benefit from good execution

time estimates, but do not depend on exact

execution time values to deliver useful results.

Beyond Decidable Models

- Limited expressive power DSP applications

increasingly employ high-level dynamics in their

behavior - User interface functionality
- Mode changes
- Adaptive algorithms
- Reconfiguration of processing resources/parameters

- However, key subsystems still exhibit large

amounts of quasi-static structure --- structure

that stays fixed across significant windows of

time. - Various dynamic dataflow models have been

proposed that address the limitation above by

abandoning most or all restrictions related to

decidable dataflow - However, these methods are correspondingly

limited in their ability to exploit the

quasi-static structure described above

Parameterized Dataflow Structured Control of

Dynamic Parameters

- The Key discipline that is imposed on

reconfiguration is that each subsystem must have

a consistent view of each of its actors

(hierarchical or primitive) throughout any given

iteration of that subsystem.

Parameterized Dataflow

parent graph

- Hierarchical modeling

- Parameterized DF subsystem is composed of 3

parmeterized DF graphs - init, subinit, body

subsystem

parameter n, ...

subinit

init

- Subsystem parameters
- configured in init/subinit, used in body

writes n

body

- Dynamically reconfigurable

reads n

Meta-modeling with parameterized dataflow

- Parameterized dataflow can be applied to any

dataflow model of computation (base model) to

augment that model with dynamic reconfiguration

capabilities in a structured way - Provides for efficient quasi-static scheduling
- Enables execution to be viewed in terms of a

sequence of dataflow graphs in the base model - Parameterized dataflow XYZ ? Parameterized

XYZ - Examples of parameterized dataflow models of

computation that we are developing and

experimenting with - parameterized synchronous dataflow (PSDF)
- parameterized cyclo-static dataflow (PCSDF)

Parameterized Synchronous Dataflow (PSDF)

- Locally synchrony conditions can be formulated

and checked in a quasi-static fashion to ensure

that bounded token production and consumption

along with bounded delays lead to bounded memory

requirements overall. - This is not true of unstructured dynamic dataflow

models, such as general dynamic dataflow, boolean

dataflow, and bounded dynamic dataflow - Techniques for construction of streamlined looped

schedules for synchronous dataflow graphs have

natural and efficient extensions to the

construction of parameterized looped schedules

for PSDF graphs.

PSDF Example CD to DAT Conversion

initChild

repeat 5 times fire setFac / sets i1, d1,

i2, d2, i3, d3, i4, d4 / int _g1 gcd(i1,

d2) int _g2gcd((i2 x i1)/_g1, d3) int

_g3gcd((i3 x i2 x i1)/(_g2 x _g1), d4)

repeat (d4/_g3) times repeat (d3/_g2)

times

repeat (d2/_g1) times repeat (d1)

times fire CD fire PF1

repeat (i1/_g1) times fire PF2

repeat ((i2 x i1)/(_g2 x _g1)) times

fire PF3 repeat ((i3 x i2 x i1)/(_g3 x

_g2 x _g1)) times fire PF4 repeat

(i4) times fire DAT

params i1, d1, ., i4, d4

setFac (sets i1,d4)

init

preamble

1 1 d1

i4 i1

i3 d2

d4 i2 d3

CD

DAT

PF1

PF4

PF2

PF3

body

body

PSDF Example Speech Compression

PCSDF Version of Speech Compression

Outline

- Dataflow-based model of computation for modeling

the behavior of DSP applications - Decidable dataflow models
- Example use of decidable dataflow as a model of

computation for modeling the mapping of

(decidable) dataflow behaviors onto embedded

multiprocessors - Structured reconfiguration of dataflow graphs
- Examples of meta-modeling techniques that can be

classified as structured, reconfigurable dataflow - Parameterized dataflow and its application to SDF
- Homogeneous-parameterized dataflow and its

application to SDF and CSDF - Experiments on a gesture recognition application
- Summary

Homogeneous Parameterized Dataflow

(HPDF)

- Parameterized dataflow model that can

encapsulate dynamicity of application. - Meta-modeling technique. Hierarchical actors can

have any other underlying dataflow model (SDF,

CSDF, PSDF etc.) - Data production consumption rates though

dynamic are equal across an edge for a large

number of applications - thus the name

homogeneous. - Reconfiguration can be performed without

introducing hierarchy when more natural to do so

(advantage over parameterized dataflow). - Parameterized dataflow is a more powerful

technique and thus can be used to represent a

wider set of applications.

Applications

- Applications with dynamic run-time data and

aggregated final-stage processes perform

especially well for HPDF over SDF semantics. - Many applications in image and speech processing

seem well suited for our model. - We applied the model on two applications
- - A real-time video processing algorithm

for smart camera developed at Princeton - - A face detection algorithm developed at

CFAR labs in UMD.

Application characteristics

- This structure seems to be abundant in many

audio/video applications. - Our HPDF model is a natural fit for applications

with the above structure.

Gesture recognition algorithm

- Real-time video processing for gesture

recognition. - Does low-level (red oval) and high-level

processing. - Low-level processing recognizes body parts and

identifies movements. - High-level processing recognized actions.
- We concentrate on low-level processing.

Ref W. Wolf, B. Ozer, T. LV. Smart cameras as

embedded systems. IEEE Computer Magazine Vol 35,

Iss 9, Sept 2002, Pages 48-53

HPDF model of gesture recognition algorithm

Dynamic data

Dynamic data

Aggregating final-stage

n n

p p

Ptolemy II implementation

Modeling with HPDF/CSDF

phases pixels s

(s 1) (s 1)

(s 1) (Xi, Yi)

VIDEO INPUT

REGION EXTRACTION

CONTOUR FOLLOWING

(s 1) (s 1)

(s 1) (s 1)

(s 1) (Xi, Yi)

p phases with 1 token and (n-p) phases with 0

token production

(I 0,I ki) (n 1)

ELLIPSE FITTING

MATCH

p (pi1, qi 0)

Integrating HPDF and CSDF

- Number of phases in a fundamental period can vary

dynamically. - Number of tokens produced or consumed in a given

phase can also vary dynamically. - HPDF constraint the total number of tokens

produced by a source actor of a given edge in a

given invocation (a fundamental period) must

equal the total number of tokens consumed by the

sink in its corresponding invocation.

Finer granularity and Input modeling

- Each frame has 384x240 pixels, so we model the

input as a CSDF actor with 92160 s phases. - Model captures pixel level parallelism present in

Region. - It also captures the frame level parallelism

through the number of phases in Input (s).

Modeling dynamicity - Contour

- 2 phases for Contour
- First one scans until finds a contour.
- Output 0 tokens
- Second one follows this contour and all the

overlapping ones. - Output ki tokens, each token is a list of

pixels from a contour - Homogeneous condition remains
- s

Scheduling

- VRCEM
- (s V)(s R)(2I C)(n E)M
- (s VR)(2I C)(n E)M

Results

- We applied HPDF to successfully model a face

detection algorithm also. - We developed a TI DSP implementation of the HPDF

model of the gesture recognition algorithm. - The application was run on a TMS320C64xx fixed

point processor. - When implemented with our HPDF model, the

runtime was 21405671 cycles. - With a 40ns cycle period, execution time for the

application was 0.86 sec.

Results (contd.)

- Scheduling overhead was minimal as imperatively

highly streamlined quasi-static schedule was

obtained. - Worst case buffer size 642 Kb when the input

images were 384X240 pixels. HPDF modeling

suggested buffer reuse between the edges. - Original C code had runtime of 27741882 cycles,

execution time was 1.11 sec with the same clock

period of 40 ns. - HPDF improved runtime by 23.
- Efficient hardware code generation is being

looked into using hardware synthesis framework

developed in our research group.

Summary

- Dataflow-based model of computation for is

attractive for modeling the behavior of DSP

applications - Decidable dataflow models are useful for exposing

and exploiting static structure in synthesis

tools for DSP - Decidable dataflow models in conjunction with

structured reconfigurable techniques allow for

efficient handling of application dynamics - Examples of structured, reconfigurable dataflow

techniques that we discussed - Parameterized dataflow and its application to SDF
- Homogeneous-parameterized dataflow and its

application to SDF and CSDF - Experiments on a gesture recognition application
- Other examples include dynamic configuration of

graph topologies, and blocked dataflow modeling.

References

- B. Bhattacharya and S. S. Bhattacharyya.

Parameterized dataflow modeling for DSP systems.

IEEE Transactions on Signal Processing,

49(10)2408-2421, October 2001 - S. S. Bhattacharyya, R. Leupers, and P. Marwedel.

Software synthesis and code generation for DSP.

IEEE Transactions on Circuits and Systems --- II

Analog and Digital Signal Processing,

47(9)849-875, September 2000. - G. Bilsen, M. Engels, R. Lauwereins, and J. A.

Peperstraete. Cyclo-static dataflow. IEEE

Transactions on Signal Processing, 44(2)397-408,

February 1996. - D. Ko and S. S. Bhattacharyya. Dynamic

configuration of dataflow graph topology for DSP

system design. In Proceedings of the

International Conference on Acoustics, Speech,

and Signal Processing, pages V-69-V-72,

Philadelphia, Pennsylvania, March 2005. - E. A. Lee and D. G. Messerschmitt. Static

scheduling of synchronous dataflow programs for

digital signal processing. IEEE Transactions on

Computers, February 1987. - S. Neuendorffer and E. Lee. Hierarchical

reconfiguration of dataflow models. In

Proceedings of the International Conference on

Formal Methods and Models for Codesign, June

2004. - M. Sen, S. S. Bhattacharyya, T. Lv, and W. Wolf.

Modeling image processing systems with

homogeneous parameterized dataflow graphs. In

Proceedings of the International Conference on

Acoustics, Speech, and Signal Processing, pages

V-133-V-136, Philadelphia, Pennsylvania, March

2005