A systematic approach to exploring embedded system architectures at multiple abstraction levels - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

A systematic approach to exploring embedded system architectures at multiple abstraction levels

Description:

An explicit mapping step to map application tasks onto architecture resources. ... for Design, Optimisation, and Control, pp. 95-100, Barcelona: CIMNE, 2002. ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 30

Provided by: fcy

Category:

more less

Transcript and Presenter's Notes

Title: A systematic approach to exploring embedded system architectures at multiple abstraction levels

1
A systematic approach to exploring embedded
system architectures at multiple abstraction
levels
Pimentel, A.D. Erbas, C. Polstra, S.
Informatics Inst., Amsterdam Univ.,
Netherlands IEEE Transactions on Computers , Feb.
2006 Volume 55 , Issue 2 Pages
99-112 Presenter Fu-Ching Yang
2
Outline

Abstract
Introduction
Related work
System-level design space exploration
Application modeling
Architecture modeling
Issues of mapping
Architecture model refinement through trace
transformation
Case study Motion-JPEG
Conclusion

3
Abstract

The sheer complexity of today's embedded systems
forces designers to start with modeling and
simulating system components and their
interactions in the very early design stages. It
is therefore imperative to have good tools for
exploring a wide range of design choices,
especially during the early design stages, where
the design space is at its largest. This paper
presents an overview of the Sesame framework,
which provides high-level modeling and simulation
methods and tools for system-level performance
evaluation and exploration of heterogeneous
embedded systems. More specifically, we describe
Sesame's modeling methodology and trajectory. It
takes a designer systematically along the path
from selecting candidate architectures, using
analytical modeling and multiobjective
optimization, to simulating these candidate
architectures with our system-level simulation
environment. This simulation environment
subsequently allows for architectural exploration
at different levels of abstraction while
maintaining high-level and architecture-independen
t application specifications. We illustrate all
these aspects using a case study in which we
traverse Sesame's exploration trajectory for a
motion-JPEG encoder application.

4
Introduction

Background
SoC-based embedded systems often have a
heterogeneous system architecture
Programmable processor ? dedicated hardware block
Whats the problem?
Traditional design methods fall short for the
design of these systems
They cannot deal with the systems complexity and
flexibility
Solution
System level design
Modeling and simulating system components and
their interactions in the early design stage.
Minimize the modeling effort and optimize
simulation speed

5
Overview of the sesame system

Basic principle
Platform architectures
Separation of concerns
High level modeling and simulation
Works to do
Selecting candidate architectures
Analytical modeling
Multi-objective optimization

6
Related work (cont.)

There are several architecture exploration
environment, such as (metro) polis, Mescal, MESH,
Milan.
Mapping a behavioral application specification to
an architecture specification
What does sesame improve?
To separate the modeling of application and
architecture more
architecture-independent application models
application-independent architecture models
a mapping step that relates these models for
trace-driven cosimulation.

7
Related work

In the domain of hardware/software codesign of
embedded systems, multiobjective optimization
studies have been performed extensively for
system-level synthesis (e.g., 20, 21, 22)
and platform configuration.
In this paper, this is not their primary
consideration

8
System-level design space exploration

Y-chart design methodology
Separate application models and architecture
(performance) models
An explicit mapping step to map application tasks
onto architecture resources.
Application model
functional behavior of an application in a timing
and architecture independent manner
Architecture model
Defines architecture resources and captures their
performance constraint
Essential in this methodology is that an
application model is independent of architecture
model
Application models can be reused in the
exploration cycle
Gradual refinement of architecture performance
models

9
Application modeling (cont.)

Use the Kahn Process Network (KPN) model
This model is natural for describing the streams
of data samples in a signal processing system.
producers are not blocked because queues are of
infinite length
Consumers are blocked when they attempt to get
data from an empty input channel.
This model is determinate
the results of the computation does not depend on
the firing order of the processes
Node process in application
Directed edge channels between processes

Process A
Process B
Process C
10
Application modeling

Describe application model in YML
For rapid creation and modification
Each node has two characteristics
computation requirement
workload imposed by the node onto a particular
component in the architecture model)
This is done by instrumenting the code with
annotations that describe the applications
computational and communication actions
To generate traces of application events for
architecture model
Coarse grained events read, write and execute
allele set
the processors that it can be mapped onto

Process A
Process B
Process C
11
Architecture modeling

Implemented using either Pearl or SystemC
Each processor in an architecture model
processing capacity
power consumption
fixed cost.
Operate at transaction-level
Simulate the performance consequences of the
computation and communication events generated by
an application model
Only account for architectural performance
constraints
Dont model functional behavior

12
Issues of mapping (cont.)

Kahn processes maps to virtual processors in the
mapping layer
Kahn channels maps to FIFO buffers
The mapping layer is automatically generated

A virtual processor in the mapping layer reads in
an application trace from a Kahn process via a
trace event queue and dispatches the events to a
processing component in the architecture model.
The mapping of a virtual processor onto a
processing component in the architecture model is
freely adjustable.

13
Issues of mapping (cont.)

The enumeration of all possible mappings grows
exponentially
It is a MMPN problem
The Multiprocessor Mappings of Process Networks
problem

gi(x) are the constraints
Each Kahn node has to be mapped onto a single
processor
Each channel in the application model has to be
mapped onto a processor or a memory

14
Issues of mapping

They use Strength Pareto Evolutionary Algorithm
(SPEA2) to find a set of approximated
Pareto-optimal mapping solutions
E. Zitzler, M. Laumanns, and L. Thiele, SPEA2
Improving the Strength Pareto Evolutionary
Algorithm for Multiobjective Optimization,
Evolutionary Methods for Design, Optimisation,
and Control, pp. 95-100, Barcelona CIMNE, 2002.
Each mapping solution is represented by an
individual encoding
a chromosome in which the genes encode the values
of parameters.

15
Architecture model refinement through trace
transformation

Refinement of application events is denoted using
trace transformations
Left-hand side the coarse-grained application
events that need to be refined
Right-hand side the resulting
architecture-level events

cd check-data ld load-data sr signal-room
cr check-room st store-data sd signal-data
16
Architecture model refinement through trace
transformation

For example, an application process that
Reads a block of data from an input buffer
Performs some computation on it
Writes the results to an output buffer
R -gt E -gt W
Assume the hardware have no local memory

See if there is a room in output buffer
The input buffer must remain available until the
processing component has finished operating on it
17
Event refinement using dataflow graphs

SDF Synchronous Data Flow
It performs the actual event refinement
IDF Integer-controlled Data Flow
To model repetitions and branching conditions
Kahn process in the application model IDF graph
at mapping layer
IDF embedded in the corresponding virtual
processor
The IDF graphs are executable as the actors have
an execution mechanism
firing rules when an actor can fire.
When firing an actor
Consume the required tokens from its input token
channels
Produce a specified number of tokens on its
output channels.

18
Example of SDF

Two Kahn application processes act as a
producer-consumer pair communicating pixel blocks
The cr actor fires when it receives a W(rite)
application event
SDF actors can be coupled (i.e., mapped) to
architecture model components.
A firing SDF actor may send a token to the
architecture model to initiate the simulation of
an event
The SDF actor in question is then blocked until
it receives an acknowledgment token from the
architecture model indicating that the
performance consequences of the event have been
simulated.

Consumer
The delay of the channel a FIFO buffer of b
elements
Producer
19
Example of IDF

In the IDF graphs, scheduling information of
actors is not incorporated into the graph
definition, but is explicitly supplied by a
scheduler.

20
Event refinement using dataflow graphs

Communication refinement is accomplished
Simply replacing SDF actors with refined ones
Allowing for evaluating the performance of
different communication behaviors at the
architecture level
The application model remains unaffected.

Refinement
21
Case study M-JPEG

Objective To find promising instances of this
platform that allow a good mapping of the M-JPEG
application
Application model of the Motion-JPEG encoder

22
Case study M-JPEG
Platform architecture model
Processor and Memory Characteristics
Processor Characteristics
23
Mapping result

the nondominated front is shown as obtained by
plotting 17 nondominated solutions that were
found by SPEA2 in a single run.
Takes about 5 seconds on a 2.8GHz Pentium-4
machine
For large system with 26 processes and 75 channel
Takes about 25 seconds on a 2.8GHz Pentium-4
machine

Parameters of SPEA2 Population size 100 Number
of generations 1000 Mutation probability
0.5 Bit mutation probability 0.01 Crossover
probability 0.8
24
Further investigation

Select two nondominated solutions
The estimated cycle count
They also mapped various application models onto
models of existing architecture implementation
To compare the timing of actual implementation
with their estimation
The errors of their estimations lt 5

Solution 1 PE-1, PE-2, PE-3 Solution 2 PE-0,
PE-1
25
Further investigation

To model more implementation details of DCT at
architecture level
Refine the PE onto the DCT task
Luminance blocks need to be preshifted before a
DCT is preformed.

26
The result of refinement

Solution 1 has a balanced system
Performance of solution 2 is limited by less
powerful PE-0

Solution 1 PE-1, PE-2, PE-3 Solution 2 PE-0,
PE-1
27
Try to improve PE-0 performance

Allows the PE for parallel execution of preshift
and 2D-DCT
Implementation-5 simply reduce the execution
latency
Implementation-6 refines the preshift and 2D-DCT
operations

28
Conclusion

Modeling and simulation methods and tools for
system-level performance evaluation and
exploration of heterogeneous SoC-based embedded
media systems.
Use analytical modeling to selects candidate
architectures and multiobjective optimization
Bridges the abstraction gap between application
and architecture models
architectural exploration at different levels of
abstraction

29
Genetic Algorithm (2)
????2

Write a Comment

User Comments (0)