Title: Design and Synthesis of Image Processing Systems using Reconfigurable Dataflow Graphs
1Design and Synthesis of Image Processing Systems
using Reconfigurable Dataflow Graphs
- Mainak Sen and Shuvra S. Bhattacharyya
- Department of Electrical and Computer
Engineering, andInstitute for Advanced Computer
StudiesUniversity of Maryland at College Park - Maryland DSPCAD Research Grouphttp//www.ece.umd.
edu/DSPCAD/home/dspcad.htm
November 22, 2005Leiden University, The
Netherlands
2Outline
- Dataflow-based model of computation for modeling
the behavior of DSP applications - Decidable dataflow models
- Example use of decidable dataflow as a model of
computation for modeling the mapping of
(decidable) dataflow behaviors onto embedded
multiprocessors - Structured reconfiguration of dataflow graphs
- Examples of meta-modeling techniques that can be
classified as structured, reconfigurable dataflow - Parameterized dataflow and its application to SDF
- Homogeneous-parameterized dataflow and its
application to SDF and CSDF - Experiments on a gesture recognition application
- Summary
3Dataflow-based design for DSP(Example from
Agilent ADS tool)
4DSP-oriented Dataflow Models of Computation
- Used widely in design tools for DSP
- Application is modeled as a directed graph
- Nodes (actors) represent functions
- Edges represent communication channels between
functions - Nodes produce and consume data from edges
- Edges buffer data in FIFO (first-in first-out)
fashion - Data-driven execution model
- A node can execute whenever it has sufficient
data on its input edges - The order in which nodes execute is not part of
the specification - The order is typically determined by the
compiler, the hardware, or both - Iterative execution
- Body of loop to be iterated a large or infinite
number of times
5Dataflow Features and Advantages
- Exposes coarse-grain parallelism.
- Exposes high-level structure that facilitates
analysis, verification, and optimization. - Captures multi-rate behavior.
- Complementary to ongoing advances in DSP compiler
technology for procedural languages, such as C
and MATLAB. - Encourages desirable software engineering
practices modularity and code reuse - Amenable also to aspect-oriented design.
- Intuitive to DSP algorithm designers signal flow
graphs.
6Evolution of Dataflow Models for DSP
- Synchronous dataflow static multirate behavior
- Agilent ADS, Cadence SPW, etc.
- Well-behaved dataflow schemas for bounded
dynamics - Boolean/integer dataflow Turing complete models
- Multidimensional synchronous dataflow image and
video - Scalable synchronous dataflow block processing
- Synopsys COSSAP
- Cyclo-static dataflow phased behavior
- Synopsys El Greco, Eonic Systems Virtuoso
Synchro, System Canvas - Bounded dynamic dataflow bounded dynamics
- The processing graph method reconfigurable
dynamic DF - US Naval Research Laboratory, MCCI Autocoding
Toolset - Parameterized dataflow dynamically-reconfigurable
static DF - Blocked dataflow image and video in terms of
reconfigurable dataflow
7Modeling Design Space
(Third dimension simplicity and intuitive appeal)
8Decidable Dataflow Models
- Modeling flow for representing static flowgraph
behavior - Cyclo-static dataflow (CSDF), multiphase modeling
? - Synchronous dataflow (SDF), multirate modeling ?
- Homogeneous synchronous dataflow (HSDF) ?
- Acyclic homogeneous synchronous dataflow (task
graphs) - These are in decreasing order or generality
- Designs represented in the more general models
can be converted to equivalent representations in
the less general ones - e.g., CSDF? SDF ? HSDF ? task graph
- HSDF each actor (graph node) produces/consumes
exactly one data value to/from each incident
output/input edge - Suitable for exposing parallelism
- Not the best model for minimizing memory
requirements
9Synthesis Techniques for Decidable Models
- Static scheduling low overhead, predictability
- Performance analysis through synchronization
graphs - Loop scheduling
- Implicit repetition in the dataflow graph
(through changes in sample rate) needs to be
translated into explicit repetition in the form
of loops on the execution target. - Complex design space exists for such translation
- Complementary to procedural language techniques
for nested loop compilation - Loop scheduling techniques
- Simulation speedup (minimization of scheduling
complexity) - Code/data minimization
- Hierarchical parallel scheduling
- Block processing
- Task scheduling for latency/throughput
optimization - Probabilistic design exploiting tolerances to
deadline misses
10Example Intermediate representations for
synthesis from decidable dataflow models
- Consider a decidable dataflow behavior that is to
be implemented on a self-timed, embedded
multiprocessor - Natural way to implement DSP multiprocessors from
decidable dataflow - Actor assignment and ordering are performed
statically - Invocation (dispatch) of actors is performed
dynamically, through synchronization - Candidate mappings of the behavior onto the
architecture can be represented through an
intermediate representation that also has
decidable dataflow semantics - This representation is useful for understanding
the performance, communication overhead, and
synchronization structure associated with the
candidate mapping - Facilitates the separation of communication and
synchronization functionality - This is a useful modeling methodology for design
space exploration
11Interprocessor Communication Graph (Gipc)
Self-timed schedule and its IPC graph
12The synchronization graph Gs
- Derived from the interprocessor communication
graph - Synchronization edges are distinguished from
interprocessor communication (IPC) edges - Synchronization edges represent precedence
constraints that are enforced by synchronization
protocols - IPC edges represent data transfers
- Interprocessor connections
- Coincident synchronization and IPC edges ?
communication together with synchronization
protocol (conventional approach) - IPC edge only ? communication without synch.
protocol - Synchronization edge only ? synchronization
protocol only
13Applications of Synchronization Graphs
- Simulation
- Throughput estimation through cycle mean
analysis - Removal of redundant synchronizations
- Resynchronization
- Conversion to more efficient synchronization
protocols (strongly connected synchronization
graphs) - Statically determining and minimizing the sizes
of interprocessor communication buffers
- All are post-processing methods that can be
applied to improve a wide range of existing task
graph scheduling techniques on a wide range of
multiprocessor architectures. - These techniques benefit from good execution
time estimates, but do not depend on exact
execution time values to deliver useful results.
14Beyond Decidable Models
- Limited expressive power DSP applications
increasingly employ high-level dynamics in their
behavior - User interface functionality
- Mode changes
- Adaptive algorithms
- Reconfiguration of processing resources/parameters
- However, key subsystems still exhibit large
amounts of quasi-static structure --- structure
that stays fixed across significant windows of
time. - Various dynamic dataflow models have been
proposed that address the limitation above by
abandoning most or all restrictions related to
decidable dataflow - However, these methods are correspondingly
limited in their ability to exploit the
quasi-static structure described above
15Parameterized Dataflow Structured Control of
Dynamic Parameters
- The Key discipline that is imposed on
reconfiguration is that each subsystem must have
a consistent view of each of its actors
(hierarchical or primitive) throughout any given
iteration of that subsystem.
16Parameterized Dataflow
parent graph
- Parameterized DF subsystem is composed of 3
parmeterized DF graphs - init, subinit, body
subsystem
parameter n, ...
subinit
init
- Subsystem parameters
- configured in init/subinit, used in body
writes n
body
- Dynamically reconfigurable
reads n
17Meta-modeling with parameterized dataflow
- Parameterized dataflow can be applied to any
dataflow model of computation (base model) to
augment that model with dynamic reconfiguration
capabilities in a structured way - Provides for efficient quasi-static scheduling
- Enables execution to be viewed in terms of a
sequence of dataflow graphs in the base model - Parameterized dataflow XYZ ? Parameterized
XYZ - Examples of parameterized dataflow models of
computation that we are developing and
experimenting with - parameterized synchronous dataflow (PSDF)
- parameterized cyclo-static dataflow (PCSDF)
18Parameterized Synchronous Dataflow (PSDF)
- Locally synchrony conditions can be formulated
and checked in a quasi-static fashion to ensure
that bounded token production and consumption
along with bounded delays lead to bounded memory
requirements overall. - This is not true of unstructured dynamic dataflow
models, such as general dynamic dataflow, boolean
dataflow, and bounded dynamic dataflow - Techniques for construction of streamlined looped
schedules for synchronous dataflow graphs have
natural and efficient extensions to the
construction of parameterized looped schedules
for PSDF graphs.
19PSDF Example CD to DAT Conversion
initChild
repeat 5 times fire setFac / sets i1, d1,
i2, d2, i3, d3, i4, d4 / int _g1 gcd(i1,
d2) int _g2gcd((i2 x i1)/_g1, d3) int
_g3gcd((i3 x i2 x i1)/(_g2 x _g1), d4)
repeat (d4/_g3) times repeat (d3/_g2)
times
repeat (d2/_g1) times repeat (d1)
times fire CD fire PF1
repeat (i1/_g1) times fire PF2
repeat ((i2 x i1)/(_g2 x _g1)) times
fire PF3 repeat ((i3 x i2 x i1)/(_g3 x
_g2 x _g1)) times fire PF4 repeat
(i4) times fire DAT
params i1, d1, ., i4, d4
setFac (sets i1,d4)
init
preamble
1 1 d1
i4 i1
i3 d2
d4 i2 d3
CD
DAT
PF1
PF4
PF2
PF3
body
body
20PSDF Example Speech Compression
21PCSDF Version of Speech Compression
22Outline
- Dataflow-based model of computation for modeling
the behavior of DSP applications - Decidable dataflow models
- Example use of decidable dataflow as a model of
computation for modeling the mapping of
(decidable) dataflow behaviors onto embedded
multiprocessors - Structured reconfiguration of dataflow graphs
- Examples of meta-modeling techniques that can be
classified as structured, reconfigurable dataflow - Parameterized dataflow and its application to SDF
- Homogeneous-parameterized dataflow and its
application to SDF and CSDF - Experiments on a gesture recognition application
- Summary
23Homogeneous Parameterized Dataflow
(HPDF)
- Parameterized dataflow model that can
encapsulate dynamicity of application. - Meta-modeling technique. Hierarchical actors can
have any other underlying dataflow model (SDF,
CSDF, PSDF etc.) - Data production consumption rates though
dynamic are equal across an edge for a large
number of applications - thus the name
homogeneous. - Reconfiguration can be performed without
introducing hierarchy when more natural to do so
(advantage over parameterized dataflow). - Parameterized dataflow is a more powerful
technique and thus can be used to represent a
wider set of applications.
24Applications
- Applications with dynamic run-time data and
aggregated final-stage processes perform
especially well for HPDF over SDF semantics. - Many applications in image and speech processing
seem well suited for our model. - We applied the model on two applications
- - A real-time video processing algorithm
for smart camera developed at Princeton - - A face detection algorithm developed at
CFAR labs in UMD.
25Application characteristics
- This structure seems to be abundant in many
audio/video applications. - Our HPDF model is a natural fit for applications
with the above structure.
26Gesture recognition algorithm
- Real-time video processing for gesture
recognition. - Does low-level (red oval) and high-level
processing. - Low-level processing recognizes body parts and
identifies movements. - High-level processing recognized actions.
- We concentrate on low-level processing.
Ref W. Wolf, B. Ozer, T. LV. Smart cameras as
embedded systems. IEEE Computer Magazine Vol 35,
Iss 9, Sept 2002, Pages 48-53
27HPDF model of gesture recognition algorithm
Dynamic data
Dynamic data
Aggregating final-stage
n n
p p
Ptolemy II implementation
28Modeling with HPDF/CSDF
phases pixels s
(s 1) (s 1)
(s 1) (Xi, Yi)
VIDEO INPUT
REGION EXTRACTION
CONTOUR FOLLOWING
(s 1) (s 1)
(s 1) (s 1)
(s 1) (Xi, Yi)
p phases with 1 token and (n-p) phases with 0
token production
(I 0,I ki) (n 1)
ELLIPSE FITTING
MATCH
p (pi1, qi 0)
29Integrating HPDF and CSDF
- Number of phases in a fundamental period can vary
dynamically. - Number of tokens produced or consumed in a given
phase can also vary dynamically. - HPDF constraint the total number of tokens
produced by a source actor of a given edge in a
given invocation (a fundamental period) must
equal the total number of tokens consumed by the
sink in its corresponding invocation.
30Finer granularity and Input modeling
- Each frame has 384x240 pixels, so we model the
input as a CSDF actor with 92160 s phases. - Model captures pixel level parallelism present in
Region. - It also captures the frame level parallelism
through the number of phases in Input (s).
31Modeling dynamicity - Contour
- 2 phases for Contour
- First one scans until finds a contour.
- Output 0 tokens
- Second one follows this contour and all the
overlapping ones. - Output ki tokens, each token is a list of
pixels from a contour - Homogeneous condition remains
- s
32Scheduling
- VRCEM
- (s V)(s R)(2I C)(n E)M
- (s VR)(2I C)(n E)M
33Results
- We applied HPDF to successfully model a face
detection algorithm also. - We developed a TI DSP implementation of the HPDF
model of the gesture recognition algorithm. - The application was run on a TMS320C64xx fixed
point processor. - When implemented with our HPDF model, the
runtime was 21405671 cycles. - With a 40ns cycle period, execution time for the
application was 0.86 sec.
34Results (contd.)
- Scheduling overhead was minimal as imperatively
highly streamlined quasi-static schedule was
obtained. - Worst case buffer size 642 Kb when the input
images were 384X240 pixels. HPDF modeling
suggested buffer reuse between the edges. - Original C code had runtime of 27741882 cycles,
execution time was 1.11 sec with the same clock
period of 40 ns. - HPDF improved runtime by 23.
- Efficient hardware code generation is being
looked into using hardware synthesis framework
developed in our research group.
35Summary
- Dataflow-based model of computation for is
attractive for modeling the behavior of DSP
applications - Decidable dataflow models are useful for exposing
and exploiting static structure in synthesis
tools for DSP - Decidable dataflow models in conjunction with
structured reconfigurable techniques allow for
efficient handling of application dynamics - Examples of structured, reconfigurable dataflow
techniques that we discussed - Parameterized dataflow and its application to SDF
- Homogeneous-parameterized dataflow and its
application to SDF and CSDF - Experiments on a gesture recognition application
- Other examples include dynamic configuration of
graph topologies, and blocked dataflow modeling.
36References
- B. Bhattacharya and S. S. Bhattacharyya.
Parameterized dataflow modeling for DSP systems.
IEEE Transactions on Signal Processing,
49(10)2408-2421, October 2001 - S. S. Bhattacharyya, R. Leupers, and P. Marwedel.
Software synthesis and code generation for DSP.
IEEE Transactions on Circuits and Systems --- II
Analog and Digital Signal Processing,
47(9)849-875, September 2000. - G. Bilsen, M. Engels, R. Lauwereins, and J. A.
Peperstraete. Cyclo-static dataflow. IEEE
Transactions on Signal Processing, 44(2)397-408,
February 1996. - D. Ko and S. S. Bhattacharyya. Dynamic
configuration of dataflow graph topology for DSP
system design. In Proceedings of the
International Conference on Acoustics, Speech,
and Signal Processing, pages V-69-V-72,
Philadelphia, Pennsylvania, March 2005. - E. A. Lee and D. G. Messerschmitt. Static
scheduling of synchronous dataflow programs for
digital signal processing. IEEE Transactions on
Computers, February 1987. - S. Neuendorffer and E. Lee. Hierarchical
reconfiguration of dataflow models. In
Proceedings of the International Conference on
Formal Methods and Models for Codesign, June
2004. - M. Sen, S. S. Bhattacharyya, T. Lv, and W. Wolf.
Modeling image processing systems with
homogeneous parameterized dataflow graphs. In
Proceedings of the International Conference on
Acoustics, Speech, and Signal Processing, pages
V-133-V-136, Philadelphia, Pennsylvania, March
2005