Sesame Opening new doors to Multi-level Design Space Exploration of Embedded Systems Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

Sesame Opening new doors to Multi-level Design Space Exploration of Embedded Systems Architectures

Description:

Opening new doors to Multi-level Design Space Exploration of Embedded Systems Architectures Andy D. Pimentel Computer Systems Architecture group University of Amsterdam – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 54
Provided by: AndyPi1
Category:

less

Transcript and Presenter's Notes

Title: Sesame Opening new doors to Multi-level Design Space Exploration of Embedded Systems Architectures


1
Sesame Opening new doors to Multi-level Design
Space Exploration of Embedded Systems
Architectures
Andy D. Pimentel
Computer Systems Architecture group
University of Amsterdam
Informatics Institute
2
Thank you.Questions?
3
Outline
  • Background and problem statement
  • General overview of modeling methodology
  • Sesame environment
  • Application modeling layer
  • Architecture modeling layer
  • Mapping layer
  • Gradual refinement of architecture models
  • Event refinement using dataflow graphs
  • Both computational and communication refinement
  • Current status and future work

4
Sketching the context
  • Lets play a little quiz
  • What is the most popular microprocessor around?
  • You may have answered something like Intel
    Pentium
  • If so, thanks for playing!
  • Intel Pentium has almost 0 market share.Zip.
    Zilch.
  • Pentium is a statistically insignificant chip
    with tiny sales!
  • The answer should (of course?) be embedded
    processors (no particular brand)

5
Sketching the context (contd)
Relating microprocessors to life on earth are
Pentiums the viruses of the microprocessor
market? -)
6
Sketching the context (contd)
7
Sketching the context (contd)
  • Estimation 5 times as much embedded software
    than normal software
  • Embedded systems are everywhere
  • On the average, a human touches about 50 to 100
    embedded processors per day
  • Average car has 15 processors, luxurious one
    60!
  • The domain of embedded multimedia and signal
    processing applications plays an important role
  • Camcorders, PDAs, set-top boxes, (Digital) TVs,
    cell phones, etc.

8
Embedded media systems
  • Modern embedded systems for media and signal
    processing must
  • support multiple applications and various
    standards
  • often provide real-time performance
  • These systems increasingly have heterogeneous
    system architectures, integrating
  • Dedicated hardware
  • High performance and low power/cost
  • Embedded processor cores
  • High flexibility
  • Reconfigurable components (e.g. FPGAs)
  • Good performance/power/flexibility

9
Trends in system design (contd)
  • Silicon budgets are increasing (Moores Law)
  • Integration of functions Systems-on-Chip
  • (Massively) Parallel Systems on a single chip!
  • Life cycle of systems decreasing (e.g., look at
    cellphones)
  • Short time to market

10
Design crisis
Log Scale
0.35µ
0.25µ
0.18µ
0.15µ
0.12µ
0.1µ
Technology (micron)
11
The system design problem
  • Design better products faster
  • Design productivity
  • Design technology architectures, methods, tools,
    libraries
  • Design quality
  • Low cost, low power, flexible, no bugs
  • Multi-dimensional design space with many
    tradeoffs
  • Cost (silicon area, design time)
  • Performance
  • Power consumption
  • Flexibility
  • Time-to-market
  • etc.

12
Design tradeoffs computational efficiency
13
From Applications to Silicon Software
Silicon
Application(s)

Software
HW / SW Architecture
TM
CP1
MIPS
Architecture components
14
Rethinking system design
  • Design complexity forces us to reconsider current
    design practice
  • Classical design methods
  • often depart from a single application
    specification which is gradually synthesized into
    HW/SW implementation
  • lack generalizability to cope with highly
    programmable architectures targeting multiple
    applications
  • also hamper extensibility to efficiently support
    future applications

15
Rethinking system design (contd)
  • Traditionally, designers only rely on detailed
    simulators for design space exploration
  • HW/SW co-simulation
  • This approach becomes infeasible for the early
    design stages
  • Effort to build these simulators is too high as
    systems become too complex
  • The low speeds of these simulators seriously
    hamper the architectural exploration
  • HW/SW co-simulation requires a HW/SW partitioning
  • A new system model is needed for assessment of
    each HW/SW partitioning

16
Jumping down the design pyramid
High
Low
Effort
Abstraction
Low
High
Alternative realizations
17
Design by stepwise refinement
High
Low
Effort
Abstraction
Low
High
Alternative realizations
18
SesameSimulation of Embedded Systems
Architectures for Multi-level Exploration
  • Part of Artemisia project
  • Design methods for NoC-based embedded systems
  • Co-operation of
  • Leiden Embedded Research Center, Leiden
    University (prof. E.F. Deprettere)
  • Computer Engineering group, Delft University of
    Technology (prof. S. Vassiliadis)
  • Computer Systems Architecture group, University
    of Amsterdam (prof. C. Jesshope)
  • Philips Research Labs in Eindhoven

19
SesameSimulation of Embedded Systems
Architectures for Multi-level Exploration
  • Provides methods and tools to efficiently
    evaluate the performance of heterogeneous
    embedded systems and explore their design space
  • Different architectures, applications, and
    mappings
  • Different HW/SW partitionings
  • Smooth transition between abstraction levels
  • Mixed-level simulations
  • Promotes reuse of models (re-use of IP)
  • Targets the multimedia application domain
  • Techniques and tools also applicable to other
    application domains

20
Y-chart Design Methodology Kienhuis
Architecture
21
Modeling and simulation using the Y-Chart
methodology
  • Application model
  • Description of functional behavior of an
    application
  • Independent from architecture, HW/SW partitioning
    and timing characteristics
  • Generates application events representing the
    workload imposed on the architecture
  • Architecture model
  • Parameterized timing behavior of architecture
    components
  • Models timing consequences of application events
  • Explicit mapping of application and architecture
    models
  • Trace-driven co-simulation Lieverse
  • Easy reuse of both application and architecture
    models!

22
Application modeling
  • Using Kahn Process Networks (KPNs)
  • Parallel (C/C) processes communicating with
    each other via unbounded FIFO channels
  • expresses parallelism in an application and makes
    communication explicit
  • blocking reads, non-blocking writes
  • Generation of application events
  • Code is instrumented with annotations describing
    computational actions
  • Reading from/writing to Kahn channels represent
    communication behavior
  • Application events can be very coarse grain like
    compute a DCT or read/write a pixel block

23
Application modeling (contd)
  • Why Kahn process networks (KPNs)?
  • Fit very well to multimedia application domain
  • KPNs are deterministic
  • automatically guarantees validity of event traces
    when application and architecture simulators are
    executed independently
  • Application model can also be analyzed in
    isolation from any architecture model
  • Investigation of upper performance bounds and
    early recognition of bottlenecks within
    application

24
Architecture modeling
  • Architecture models react to application trace
    events to simulate the timing behavior
  • Accounting for functional behavior is not
    necessary!
  • Architecture modeling at varying abstraction
    levels
  • Starting at black box level
  • Processing cores can model timing behavior of SW,
    HW or reconfigurable execution
  • parameterizable latencies for the application
    events
  • SW execution high latency, HW execution low
    latency
  • Allows for rapid evaluation of different HW/SW
    partitionings!

25
Architecture modeling (contd)
  • Models implemented in Pearl
  • Object-based discrete event simulation language
  • Keeps track of virtual time
  • Provides simulation primitives
  • Inter-object communication via message-passing
  • Keeps track of simulation statistics
  • RISC-like language keep it simple and make the
    common case fast
  • Lacks features not needed for architectural
    modeling (e.g., no dynamic datastructures,
    dynamic object creation, etc.)
  • Result high-performance modeling simulation
  • High simulation speed and low modeling effort

26
Pearl an example
Processor object
message
27
Architecture modeling (contd)
  • Models implemented in SystemC
  • We added a layer on top of SystemC 2.0, called
    SCPEx (SystemC Pearl Extension)
  • Provides SystemC with Pearls message-passing
    semantics
  • Raises abstraction level of SystemC (e.g., no
    ports, transparent incorporation of
    synchronization)
  • Improves transaction-level modeling
  • SCPEx enables reuse of Pearl models in SystemC
    context
  • Makes Pearl ? SystemC translation trivial
  • Provides link towards possible implementation
  • Facilitates importing SystemC IP models in Sesame

28
Sesame in layers
Application model
Event trace
Mapping layer
Architecture model
29
Sesames mapping layer
  • Maps application tasks (event traces)to
    architecture model components
  • Guarantees deadlock-free schedulingof
    application events

30
Scheduling of communication events
Because Read events are blocking (Kahn), some
schedules may yield deadlock
A
C
Application model
B
Write(A)
Read(C)
Read(B)
Write(C)
Proc. core
Proc. core
Architecture model
Bus
31
Sesames mapping layer
  • Accounts for synchronization behavior
  • Mapping layer executes in same time domain as
    architecture model
  • Transforms application-level events into
    primitives (events) for architecture model
  • More on this later on...
  • Tool for auto-generation of mapping layer
  • Maps application tasks (event traces)to
    architecture model components
  • Guarantees deadlock-free schedulingof
    application events

32
Sesame from a software perspective
(SCPEx)
33
Y-chart Modeling Language (YML)
  • Flexible and persistent description (XML) of
  • The structure of application and architecture
    models (connecting library components)
  • SCPEx also supports YML!
  • The mapping of appl. models onto arch.
    models(i.e., the mapping layer)
  • YML combines scripting language within XML
  • Simplifies descriptions of complicated structures
  • Increases expressive power of components
  • E.g., a parameterized complex interconnect
    component modeling a network of arbitrary size
  • Increases reusability
  • Re-use of components and structures

34
An illustrative case study M-JPEG
  • Lossy, Motion-JPEG encoder
  • Accepts both RGB and YUV formats
  • Includes dynamic quality control by on-the-fly
    adaptation of quantization and Huffman tables

35
The platform architecture
  • Bus-based shared memory multiprocessor
    architecture

36
M-JPEG case study (contd)
Exploration
mapping
37
M-JPEG case study (contd)
  • Kahn Process Network
  • Functional behavior
  • Library approach
  • Timing behavior

38
Screenshot model editor
39
M-JPEG design space exploration
  • Experimented with different
  • HW/SW partitionings
  • Application-architecture mappings
  • Processor speeds
  • Interconnect structures (bus, crossbar and O
    networks)
  • This took about 1 person-month (all modeling
    included)
  • Simulation performance for 128x128 frames, a 270
    MHz Sun Ultra 5 Sparcstation simulated 2,3
    frames/second ( 0.43 secs/frame)

40
M-JPEG design space exploration
41
M-JPEG design space exploration
42
Mapping problem implementation gap
Application behavioral model (what?)
Primitive operations
Implementation
Primitive operations
Architecture model (how?)
43
Mapping problem
  • Application events Read, Write and Execute
  • Typical mismatch between application events and
    architecture primitives, examples
  • Architecture primitives operating on different
    data granularities
  • Architecture primitives more refined than
    application events
  • Trace events from the application layer need to
    be refined
  • How?
  • Refine the application model
  • A transformation mechanism between the
    application and architecture models

44
Communication refinement
  • Lets take the mismatch of communication
    primitives as an example
  • Assume following architecture communication
    primitives
  • Check-Data (CD)
  • Load-Data (Ld)
  • Signal-Room (SR)
  • Check-Room (CR)
  • Store-Data (St)
  • Signal-Data (SD)

45
Communication refinement (contd)
  • Transformation rules for refining
    application-level communication events Lieverse
  • R ? CD ? Ld ? SR (1)
  • W ? CR ? St ? SD (2)
  • E ? E (3)
  • How to transform traces of application events
    using (1), (2) and (3)?

Generates R?E?W event sequences
46
Communication refinement (contd)
Processor 1
Processor 2
Processor 3
bus
Mem
  • Assumption 1 processor 2 has local (block)
    memory
  • Transforming R?E?W event sequences from process
    B
  • R ?E?W ? CD?Ld?SR?E?CR?St?SD
  • Assumption 2 processor 2 has NO local (block)
    memory
  • Transforming R?E?W event sequences from process
    B
  • R ?E?W ? CD?CR?Ld?E?St?SR?SD

47
IDF-based trace transformation
  • Virtual processors in mapping layer are refined
    to accomplish trace refinement
  • Integer-controlled DataFlow (IDF) model describes
    internal behavior of a virtual processor
  • Application events specify
  • what a virtual processor executes
  • with whom it communicates
  • Internal IDF model specifies
  • how the computations and communications take
    place at the architecture layer

48
A short Dataflow intermezzo
  • Synchronous DataFlow (SDF) Lee,Messerschmitt
  • Static model of computation allowing compile-time
    scheduling
  • Basic idea each actor consumes and produces a
    fixed number of tokens each time it fires
  • Integer-controlled DataFlow (IDF) Buck
  • Extends SDF with dynamic integer-controlled
    switch and select actors to allow data dependent
    execution
  • Generalization makes it more powerful(Turing
    complete) but generally needs dynamic scheduling
  • Hard to analyze statically

49
Process B
Application modelProcess network
Process A
Process C
Virtual proc. Y
Virtual proc. Z
MappinglayerDataflow
Virtual proc. X
ArchitecturemodelDiscrete event
bus
50
IDF-based trace transformation (contd)
  • IDF models transform application events into
    architecture events at run-time
  • IDF models execute in the same simulation
    time-domain as the architecture model
  • timed IDF models
  • We distinguish three IDF token-channel types
  • Intra-event dependency channels specify
    dependencies within the refinement of an
    application event
  • Inter-event dependency channels specify
    dependencies between refinements of different
    application events
  • Token-exchange channels connected to architecture
    model (accomplish timed execution)

51
Communication refinement revisited
Process B
Process A
Process C
Processor 1
Processor 2
Processor 3
bus
Mem
  • Assumption processor 2 has NO local (block)
    memory
  • Transforming R?E?W event sequences from process
    B
  • R ?E?W ? CD?CR?Ld?E?St?SR?SD

52
Communication refinement revisited (2)
Event trace process B
Virtual processor Y
switch
Virtual processor X
Virtual processor Z
R
E
W
CD
E
CR
CR
CD
b
b
Ld
St
St
Ld
SR
SD
SD
SR
processor 2
Architecture model
Bus
53
Communication refinement revisited (3)
Process B
Process A
Process C
Virtual proc. X
Virtual proc. Z
Virtual proc. Y
Processor 1
Processor 2
Processor 3
bus
Mem
R?E?W?R?E?W ? CD?CR?
Ld(line)?E(line)?St(line)?
Ld(line)?E(line)?St(li
ne)?
Ld(line)?E(line)?St(line)?
SR?SD
  • Now assume that
  • processor 2 operates on lines (3 lines 2
    blocks)
  • processor 2 has a single-entry local line buffer
  • processors 1 and 3 still operate at block
    granularity

54
Communication refinement revisited (4)
Event trace from process B
switch
R
E
W
...,1,0,1,0
0,1,0,1,...
switch
switch
Virtual processor Z
1
0
1
0
2?3
from virtual proc. X
CD
CR
b
CD
1?2
...,1,0,1,0
0,1,0,1,...
1
0
1
0
E(line)
select
select
2?3
2?3
Ld
1?3
Ld(line)
St(line)
SR
2?1
1
3?1
3?1
processor 2
to virtual proc. X
SR
SD
55
A case of computational refinement
  • The application models a synthetic 2D-IDCT by
    computing two consecutive IDCT operations at
    block level
  • High level, so execute(block) 1D-IDCT on a data
    block

while(1) read(block) execute(block)
write(block)
while(1) read(block) execute(block)
write(block)
while(1) write(block) execute(block)
while(1) read(block) execute(block)
write(block)
while(1) read(block) execute(block)
56
Computational refinement (contd)
  • Two target architectures are explored

Proc D
Proc B
Proc C
Proc A
Proc A
Proc C
Proc D
Proc B
Mem
  • Scenario 2 The PE models onto which the IDCT
    tasks are mapped, operate at line leveland are
    pipelined
  • And two scenarios...
  • Scenario 1 All processing elements (PE's) are
    modeled at block level

57
Computational refinement (contd)
  • Trace transformation rules
  • R(block) ? R(line) ? . . . ? R(line) (1)
  • W(block) ? W(line) ? . . . ? W(line) (2)
  • E(block) ? E(line) ? . . . ? E(line) (3)
  • E(line) ? e1 ? . . . ? en (4)

58
Computational refinement
Process B
Process A
Process C
Virtual proc. X
Virtual proc. Z
bus
59
Computational refinement (contd)
60
(No Transcript)
61
(No Transcript)
62
Putting Sesame to use An example design flow
Compaan/Laura (Leiden University) Molen (Delft
University)
Motion-JPEG encoder
Architecture simulation environment
Reconfigurable architecture framework
DCT
Experimentation
System-level architecture exploration
Applications
Code suitable for FPGA execution
63
A real implementation using Compaan/Laura/Molen
Mapping M-JPEG on the Molen platform architecture
The DCT kernel
for k 114, for j 1164, Pixel
(k,j) In(inBlock) end end for k 114,
if k lt 2, for j 1164, Pixel
(k,j) PreShift(Pixel
(k,j)) end end Block 2D_dct( Pixel
) end for k 114, for j 1164,
outBlockOut(Pixel(k,j)) end end
C Compiler
Laura
64
System-level simulation experiment
  • Modeling Molen with DCT mapped onto CCU
  • Validation against real implementation
  • Information from Compaan/Laura/Molen used for
    calibration of architecture model
  • Apply architecture model refinement
  • Keep M-JPEG application model untouched
  • DCT component in architecture model is refined
  • Operates at pixel level
  • Abstract pipeline model, deeply pipelined
  • Other architecture components operate at
    (pixel-)block level

65
Sesames IDF-based model refinement
Process B
Process A
Process C
Application model
M-JPEG
Virtual proc. X
Virtual proc. Z
Mapping layer
Map DCT on CCU and refine
Architecture model
Molen
bus
66
DCT virtual processor
Event trace
scheduler
Control trace
63
P2
P1
Block out
Type in
2d-dct
Block in
To/from architecture model
67
Simulation results
  • Full software implementation
  • Simulation 85024000 cycles
  • Real Molen 84581250 cycles
  • Error 0.5
  • DCT mapped onto CCU
  • Simulation 40107869
  • Real Molen 39369970
  • Error 1.9
  • No tuning was done!

68
Where are we going?
  • Some ongoing and future work

69
NoC modeling
  • So far, we mainly modeled bus-based systems
  • Networks-on-Chip (NoC) will be our (near) future
  • Standardized interfaces
  • Scalable (point-to-point) networks
  • Much more complex protocols (protocol stack?)
  • QoS aspects
  • Modeling NoCs
  • Topologies, switching routing methods,
    flow-control, protocols, QoS, etc.
  • Communication mapping
  • Modeling at multiple abstraction levels
  • Gradual refinement
  • Role of IDF models

70
Communication mapping
With more complex Networks-on-Chip routing
information is needed
71
Architecture model calibration
Initial derivation of latency parameters
  • documentation
  • educated guess
  • performance budgeting (what is the required
    parameter range?)

Next step calibration with lower-level, external
simulation models or prototypes, e.g.
  • Instruction set simulators (ISSs)
  • Compaan/Laura framework

72
Calibration using an ISS
1
C
ISS (e.g. Simplescalar)
2
API
read(1,) API_write(C,..)
ISS measures cycle times of annotated code
fragments
API_read(C,)
computation e
computation
API_read(C,..) write(2,)
API_write(C,)
73
Mixed-level system simulation
  • Zoom in on interesting system components in
    architecture model
  • Simulate these components at a lower level
  • Retain high abstraction level for other
    components
  • Saves modeling effort
  • May save simulation overhead
  • Integration of external simulation models
  • ISSs, SystemC models, etc.
  • Also allows calibration of higher-level models
  • BUT
  • Mixed-level simulation can be complex!
  • multiple time domains and time grain sizes
    (synchronization)
  • differences in protocol and data granularity of
    components

74
Mixed-level system simulation (contd)
Embedding external models
IDF-based refinement
75
Does mixed-level need to be hard? NO!
C
ISS (e.g. Simplescalar)
API
Virtual processor
Virtual processor
Virtual processor
Read E(N cycles) Write
buffer
buffer
Trace calibration!
76
Towards real design space exploration
  • Sesame supplies basic methods tools for
    evaluating application, architecture, and mapping
    combinations
  • Simulating entire design space is not an option
  • More is needed to explore large design spaces
  • What will be the initial design(s) to evaluate?
  • How to react when the evaluated architecture does
    not suffice?
  • We need steering before and during simulation
  • Design decisions using analytical modeling
  • Finding Pareto-optimal candidates using
    multi-objective optimization
  • Design evaluation using simulation

77
Real design space exploration (contd)
Heuristic methods like evolutionary algorithms
78
Credits
This work would not have been possible without
the (ground-laying work of the) following people
  • Cagkan Erbas
  • Simon Polstra
  • Berry van Halderen
  • Joseph Coffland
  • Frank Terpstra
  • Mark Thompson
  • Paul Lieverse
  • Bart Kienhuis
  • Ed Deprettere
  • Pieter van der Wolf
  • Kees Vissers
  • Vladimir Zivkovic
  • Todor Stefanov

79
For more information
  • URL www.science.uva.nl/andy/publications.html
  • or
  • email andy_at_science.uva.nl

Sesame software can be found at sesamesim.sourcef
orge.net
Write a Comment
User Comments (0)
About PowerShow.com