Title: A systematic approach to exploring embedded system architectures at multiple abstraction levels
1A systematic approach to exploring embedded
system architectures at multiple abstraction
levels
Pimentel, A.D. Erbas, C. Polstra, S.
Informatics Inst., Amsterdam Univ.,
Netherlands IEEE Transactions on Computers , Feb.
2006 Volume 55 , Issue 2 Pages
99-112 Presenter Fu-Ching Yang
2Outline
- Abstract
- Introduction
- Related work
- System-level design space exploration
- Application modeling
- Architecture modeling
- Issues of mapping
- Architecture model refinement through trace
transformation - Case study Motion-JPEG
- Conclusion
3Abstract
- The sheer complexity of today's embedded systems
forces designers to start with modeling and
simulating system components and their
interactions in the very early design stages. It
is therefore imperative to have good tools for
exploring a wide range of design choices,
especially during the early design stages, where
the design space is at its largest. This paper
presents an overview of the Sesame framework,
which provides high-level modeling and simulation
methods and tools for system-level performance
evaluation and exploration of heterogeneous
embedded systems. More specifically, we describe
Sesame's modeling methodology and trajectory. It
takes a designer systematically along the path
from selecting candidate architectures, using
analytical modeling and multiobjective
optimization, to simulating these candidate
architectures with our system-level simulation
environment. This simulation environment
subsequently allows for architectural exploration
at different levels of abstraction while
maintaining high-level and architecture-independen
t application specifications. We illustrate all
these aspects using a case study in which we
traverse Sesame's exploration trajectory for a
motion-JPEG encoder application.
4Introduction
- Background
- SoC-based embedded systems often have a
heterogeneous system architecture - Programmable processor ? dedicated hardware block
- Whats the problem?
- Traditional design methods fall short for the
design of these systems - They cannot deal with the systems complexity and
flexibility - Solution
- System level design
- Modeling and simulating system components and
their interactions in the early design stage. - Minimize the modeling effort and optimize
simulation speed
5Overview of the sesame system
- Basic principle
- Platform architectures
- Separation of concerns
- High level modeling and simulation
- Works to do
- Selecting candidate architectures
- Analytical modeling
- Multi-objective optimization
6Related work (cont.)
- There are several architecture exploration
environment, such as (metro) polis, Mescal, MESH,
Milan. - Mapping a behavioral application specification to
an architecture specification - What does sesame improve?
- To separate the modeling of application and
architecture more - architecture-independent application models
- application-independent architecture models
- a mapping step that relates these models for
trace-driven cosimulation.
7Related work
- In the domain of hardware/software codesign of
embedded systems, multiobjective optimization
studies have been performed extensively for
system-level synthesis (e.g., 20, 21, 22)
and platform configuration. - In this paper, this is not their primary
consideration
8System-level design space exploration
- Y-chart design methodology
- Separate application models and architecture
(performance) models - An explicit mapping step to map application tasks
onto architecture resources. - Application model
- functional behavior of an application in a timing
and architecture independent manner - Architecture model
- Defines architecture resources and captures their
performance constraint - Essential in this methodology is that an
application model is independent of architecture
model - Application models can be reused in the
exploration cycle - Gradual refinement of architecture performance
models
9Application modeling (cont.)
- Use the Kahn Process Network (KPN) model
- This model is natural for describing the streams
of data samples in a signal processing system. - producers are not blocked because queues are of
infinite length - Consumers are blocked when they attempt to get
data from an empty input channel. - This model is determinate
- the results of the computation does not depend on
the firing order of the processes - Node process in application
- Directed edge channels between processes
Process A
Process B
Process C
10Application modeling
- Describe application model in YML
- For rapid creation and modification
- Each node has two characteristics
- computation requirement
- workload imposed by the node onto a particular
component in the architecture model) - This is done by instrumenting the code with
annotations that describe the applications
computational and communication actions - To generate traces of application events for
architecture model - Coarse grained events read, write and execute
- allele set
- the processors that it can be mapped onto
Process A
Process B
Process C
11Architecture modeling
- Implemented using either Pearl or SystemC
- Each processor in an architecture model
- processing capacity
- power consumption
- fixed cost.
- Operate at transaction-level
- Simulate the performance consequences of the
computation and communication events generated by
an application model - Only account for architectural performance
constraints - Dont model functional behavior
12Issues of mapping (cont.)
- Kahn processes maps to virtual processors in the
mapping layer - Kahn channels maps to FIFO buffers
- The mapping layer is automatically generated
- A virtual processor in the mapping layer reads in
an application trace from a Kahn process via a
trace event queue and dispatches the events to a
processing component in the architecture model. - The mapping of a virtual processor onto a
processing component in the architecture model is
freely adjustable.
13Issues of mapping (cont.)
- The enumeration of all possible mappings grows
exponentially - It is a MMPN problem
- The Multiprocessor Mappings of Process Networks
problem
- gi(x) are the constraints
- Each Kahn node has to be mapped onto a single
processor - Each channel in the application model has to be
mapped onto a processor or a memory
14Issues of mapping
- They use Strength Pareto Evolutionary Algorithm
(SPEA2) to find a set of approximated
Pareto-optimal mapping solutions - E. Zitzler, M. Laumanns, and L. Thiele, SPEA2
Improving the Strength Pareto Evolutionary
Algorithm for Multiobjective Optimization,
Evolutionary Methods for Design, Optimisation,
and Control, pp. 95-100, Barcelona CIMNE, 2002. - Each mapping solution is represented by an
individual encoding - a chromosome in which the genes encode the values
of parameters.
15Architecture model refinement through trace
transformation
- Refinement of application events is denoted using
trace transformations - Left-hand side the coarse-grained application
events that need to be refined - Right-hand side the resulting
architecture-level events
cd check-data ld load-data sr signal-room
cr check-room st store-data sd signal-data
16Architecture model refinement through trace
transformation
- For example, an application process that
- Reads a block of data from an input buffer
- Performs some computation on it
- Writes the results to an output buffer
- R -gt E -gt W
- Assume the hardware have no local memory
See if there is a room in output buffer
The input buffer must remain available until the
processing component has finished operating on it
17Event refinement using dataflow graphs
- SDF Synchronous Data Flow
- It performs the actual event refinement
- IDF Integer-controlled Data Flow
- To model repetitions and branching conditions
- Kahn process in the application model IDF graph
at mapping layer - IDF embedded in the corresponding virtual
processor - The IDF graphs are executable as the actors have
an execution mechanism - firing rules when an actor can fire.
- When firing an actor
- Consume the required tokens from its input token
channels - Produce a specified number of tokens on its
output channels.
18Example of SDF
- Two Kahn application processes act as a
producer-consumer pair communicating pixel blocks - The cr actor fires when it receives a W(rite)
application event - SDF actors can be coupled (i.e., mapped) to
architecture model components. - A firing SDF actor may send a token to the
architecture model to initiate the simulation of
an event - The SDF actor in question is then blocked until
it receives an acknowledgment token from the
architecture model indicating that the
performance consequences of the event have been
simulated.
Consumer
The delay of the channel a FIFO buffer of b
elements
Producer
19Example of IDF
- In the IDF graphs, scheduling information of
actors is not incorporated into the graph
definition, but is explicitly supplied by a
scheduler.
20Event refinement using dataflow graphs
- Communication refinement is accomplished
- Simply replacing SDF actors with refined ones
- Allowing for evaluating the performance of
different communication behaviors at the
architecture level - The application model remains unaffected.
Refinement
21Case study M-JPEG
- Objective To find promising instances of this
platform that allow a good mapping of the M-JPEG
application - Application model of the Motion-JPEG encoder
22Case study M-JPEG
Platform architecture model
Processor and Memory Characteristics
Processor Characteristics
23Mapping result
- the nondominated front is shown as obtained by
plotting 17 nondominated solutions that were
found by SPEA2 in a single run. - Takes about 5 seconds on a 2.8GHz Pentium-4
machine - For large system with 26 processes and 75 channel
- Takes about 25 seconds on a 2.8GHz Pentium-4
machine
Parameters of SPEA2 Population size 100 Number
of generations 1000 Mutation probability
0.5 Bit mutation probability 0.01 Crossover
probability 0.8
24Further investigation
- Select two nondominated solutions
- The estimated cycle count
- They also mapped various application models onto
models of existing architecture implementation - To compare the timing of actual implementation
with their estimation - The errors of their estimations lt 5
Solution 1 PE-1, PE-2, PE-3 Solution 2 PE-0,
PE-1
25Further investigation
- To model more implementation details of DCT at
architecture level - Refine the PE onto the DCT task
- Luminance blocks need to be preshifted before a
DCT is preformed.
26The result of refinement
- Solution 1 has a balanced system
- Performance of solution 2 is limited by less
powerful PE-0
Solution 1 PE-1, PE-2, PE-3 Solution 2 PE-0,
PE-1
27Try to improve PE-0 performance
- Allows the PE for parallel execution of preshift
and 2D-DCT - Implementation-5 simply reduce the execution
latency - Implementation-6 refines the preshift and 2D-DCT
operations
28Conclusion
- Modeling and simulation methods and tools for
system-level performance evaluation and
exploration of heterogeneous SoC-based embedded
media systems. - Use analytical modeling to selects candidate
architectures and multiobjective optimization - Bridges the abstraction gap between application
and architecture models - architectural exploration at different levels of
abstraction
29Genetic Algorithm (2)
????2