Predictable Design of Embedded Systems using Networked Architectures - PowerPoint PPT Presentation

Loading...

PPT – Predictable Design of Embedded Systems using Networked Architectures PowerPoint presentation | free to download - id: 4a9f59-YjM1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Predictable Design of Embedded Systems using Networked Architectures

Description:

... Informal system specification Design practice Design complexity problem Hitting the ... OS task switching interrupts cache strategy ... Basics of Product ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 74
Provided by: esEleTu
Learn more at: http://www.es.ele.tue.nl
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Predictable Design of Embedded Systems using Networked Architectures


1
Predictable Design of Embedded Systemsusing
Networked Architectures
  • Henk Corporaal
  • www.ics.ele.tue.nl/heco
  • ASCI Winterschool on Embedded Systems
  • Rockanje, March 2006

2
Outline
  • Trends and design problems
  • Unpredictability
  • Platforms
  • Predictable design
  • Proposed design flow
  • Open issues
  • Note this lecture is not about a solved problem

3
Outline
  • Trends and design problems
  • Embedded systems everywhere
  • Design practice
  • Design complexity
  • Memory wall
  • Unpredictability
  • Platforms
  • Predictable design
  • Design flow
  • Open issues

4
Embedded systems everywhere
  • Convergence of 3 Cs
  • computers, communications and consumer
    electronics
  • The computer enters the 3rd fase
  • computing power - networking - intelligent
    processing
  • The world is 1 network
  • wherever, whenever, all information and
    communication available

We get a smart environment
5
Design practice Informal system specification
6
Design practice
System
Structure description
Behavioral specification
Algorithm
R/T
Logic
circuit
  • Y-Chart (Gajski-Kuhn)
  • Design Flow is path in Y chart
  • Till RT-level largely manual flow

Physical realization
7
Design complexity problem
8
Hitting the memory wall
Performance
µProc 55/year
1000
CPU
100
Moores Law
10
DRAM 7/year
DRAM
1
2005
1980
1985
1990
1995
2000
Time
Patterson
9
Outline
  • Trends and design problems
  • Unpredictability
  • Platforms
  • Predictable design
  • Proposed design flow
  • Open issues

10
Unpredictability at all levels
applications
architectures
DSM VLSI design
  • Uncertainty increases at all levels

11
Application Two forms of unpredictability
  • Applications can be data dependent
  • Applications may have different scenarios

12
In addition dynamic changing set of applications
  • Multi-standard modem operation
  • Several applications have to be activated
    simultaneously
  • Too many combinations for an analysis at design
    time (non deterministic events)

Philips EVP
13
Architecture unpredictability
ext. mem
  • Local schedulers
  • OS
  • task switching
  • interrupts
  • cache strategy
  • cache pollution
  • interconnect
  • busses, bridges
  • networks
  • memory controllers
  • external memory
  • e.g. RR, TDMA, FCFS, LRU, EDLF, FIFO, priority,

mem arb.
interconnect
interconnect
interconnect




What is the global behavior (end-to-end), composed
of interacting local solutions ?
14
DSM VLSI Unpredictability
  • Global wiring delay becomes dominant over gate
    delay (timing closure)

15
DSM VLSI Unpredictability
Length of Isosynchronous zone as function of
frequency
  • Other DSM problems
  • Clock distribution, skew
  • VDD and VSS voltage drop
  • Signal integrity, cross-talk
  • Variance in process parameters increases

16
Unpredictability Design Closure problems
  • Design closure
  • a realization meets all requirements, including
    functionality, speed, power, area, yield, etc.,
    without design iterations

application
mapping scheduling
architecture
placement routing
Closure problem at all levels
FPGA realization
VLSI realization
17
Unpredictability Design Closure problems
Computational Requirements ?
Orders of Magnitude
Time ?
Mapping with performance guarantees looks
impossible !!
18
Solution ingredients
  • Higher abstraction levels
  • SW and HW IP reuse / PnP principle
  • Standards
  • Avoid large design iterations
  • Design correct by synthesis
  • Avoid worst case resource requirements
  • How do we achieve all of this?

19
Outline
  • Trends and design problems
  • Unpredictability
  • Platforms
  • Predictable design
  • Design flow
  • Open issues

20
What is a platform?
  • Definition
  • A platform is a generic, but domain specific
  • information processing (sub-)system
  • Generic means that it is flexible, containing
    programmable component(s).
  • Platforms are meant to quickly realize your
    next system
  • (in a certain domain).
  • Single chip?

21
Platforms, why?
  • Reuse
  • Short Time-to-Market
  • High Quality
  • Flexible and Programmable
  • Large software component
  • Standardization
  • Optimized for specific domain
  • and you do not have to solve this design closure
    problem !!

22
Platforms separate the design communities !
23
Platform examples Digital camera
Sanyo Okada99
24
TI OMAP
Up to 192Mbyte off-chip memory
25
SpaceCake (Philips research)
  • Homogeneous set of equal tiles
  • Per tile e.g.
  • n MIPS
  • m TriMedia
  • Accelerators
  • k L2 Cache bank
  • Shared memory
  • Cache coherency
  • Big interconnect switch
  • Inter Tile
  • Router
  • Message passing
  • Working on inter tile cache coherence

switch
L2 cache memory banks
Single tile
26
IMAGINE Stream Processor (Stanford)
  • IMAGINE SIMD of VLIWs
  • It is controlled by a host processor, which send
    it stream instructions (Load, store, receive,
    send, VLIW op, load microcode)

27
Hybrid FPGAs Xilinx Virtex 4-Pro
Memory blocks Multipliers
PowerPCs
ReConfig. logic
Reconfigurable logic blocks
Courtesy of Xilinx (Virtex II Pro)
28
Fundamental platform design decisions
  • Homogeneous versus Heterogeneous ?
  • Bus versus Network ?
  • Shared memory versus Message passing ?
  • QoS support, Guarantees built-in ?
  • Generic versus Application specific ?
  • What types of parallelism to support ?
  • ILP, DLP, TLP
  • Focus on Performance, Power or Cost ?
  • Memory organisation ?
  • HW or SW reconfigurable ?
  • And further
  • OS support, Middleware ?
  • Mapping support?

29
Homogeneous or Heterogeneous
  • Homogenous
  • replication effect
  • memory dominated any way
  • solve realization issuesonce and for all
  • less flexible

30
Homogeneous or Heterogeneous
  • Heterogeneous
  • more flexible
  • better fit to application domain
  • smaller increments
  • no tile reuse

31
Homogeneous or Heterogeneous
  • Middle of the road approach
  • Flexibile tiles
  • Fixed tile structure at top level

tile
router
32
HW or SW reconfigurable?
reset
Reconfiguration time
loopbuffer
context
Subword parallelism
1 cycle
fine
coarse
Data path granularity
33
Outline
  • Trends and design problems
  • Unpredictability
  • Platforms
  • Predictable design
  • Current practise
  • Predictability
  • Architecture consequences
  • Design consequences
  • Design flow
  • Open issues

34
How should we design ?
  • Trajectory, from Idea to Realization
  • Desicions based on models
  • Abstract from implementation details (not all
    known yet)
  • Relatively cheap to create, validate and simulate

Idea
Design Time
Concepts Requirements
Design Problem
  • Generate Ideas
  • Construct Models
  • Evaluate Properties
  • Make Design Decisions

Steers
Realization
35
Current practiceMapping, easy, but...........
Idea
  • Given
  • reference C code for applicatione.g. MPEG-4
    Motion Estimation
  • platform SUPERDUPER-LX50
  • Task
  • map application on architecture
  • But wait a moment
  • me_at_workgt CC o2 mpeg4_me mpeg4_me.cThank you
    for running SUPERDUPER-LX50 compiler.Your
    program uses 257321886 bytes memory, 78 Watt,
    428798765291 clock cycles

ab5d for (...) ..
36
Current design process
application
mapping
constraints
OK ?
no
yes
  • Post analysis check constraints after mapping
  • Simulation based
  • Does it still work for other data ?
  • Does it still work when other applications are
    active ?
  • Too many iterations
  • Easy to program, hard to tune
  • Can this be improved ?
  • e.g. Constraints input

37
Predictable design
  • What is it?
  • Being able to reason at a high level about a
    design (in terms of functional and non-functional
    properties) and
  • Being able to realize this design without time
    consuming iterations in the design flow (design
    closure)
  • How
  • Predictable architecture
  • Making resources predictable
  • Proper modeling of less predictable elements
  • Predictable design flow
  • Compositionality
  • Composability
  • Design time analysis ? Run time analysis

38
Making architectures predictable
  • Getting rid of all unpredictable elements
  • Caches ?
  • No problem, but WCET estimation may be big and
    unacceptable !
  • Software controlled
  • locked cache lines
  • non-cachable memory
  • controlled replacement
  • Shared memory
  • Communication

39
Making architectures predictable NoC Philips
AETHEREAL
Router provides both guaranteed throughput (GT)
and best effort (BE) services to communicate with
IPs. Combination of GT and BE leads to
efficient use of bandwidth and simple programming
model.
Router Network
Network Interface
IP
Network Interface
Network Interface
IP
IP
40
Making the NoC predictable how to support GT
traffic?
  • Time wheel concept
  • control injection traffic at network interface

time
1
8
2
7
3
6
5
4
41
Making the design flow predictable
Compositionality
42
Making the design flow predictable
  • Design time
  • Determine of upper bounds on time and resources
  • ?pareto curves
  • Scenario discovery
  • separate your application in parts for which
    upper bounds not too far from worst case

43
What do we want ? Design time analysis
  • Single application
  • Reasoning about end-to-end timing constraints
    (for given resources and quality)
    predictability
  • Which local arbitration mechanisms are needed ?
  • How to translate this to the global level ?

44
Scenarios MP3
45
What do we want ? Composability
  • Multiple applications
  • If app. 1 and app. 2 fit each individually, what
    can be said about the combination ?
  • Concept of virtual platform

46
Predictability ComposabilityCan we add Pareto
points?
application 1
application 2
Q
Q
(q1,c1)
(q2,c2)
Cost (resources)
Cost (resources)

(q1q2,c1c2) ?
47
Problem Predictable Resource utilization?
50
50
50
50
A
B
50
50
48
Problem Predictable Resource utilization?
Add ordering dependences (edges)
Only 50 processor utilization !
49
Where is the problem?
  • Different throughput obtained for different order
    of actors
  • Possibilities of overall graph increases
    exponentially with number of actors and
    individual graphs
  • Very difficult to do a complete analysis to
    obtain an optimal order
  • Hard to model and analyze different arbitration
    strategies realistically

50
Problem Too many possibilities!
A
B
C
51
So, what is Composability?
  • The degree to which we can analyze the
    applications in isolation
  • Throughput, Latency, Resource utilization,
    Deadlock, Switching / reconfiguration overhead,
    etc.
  • Design time analysis for complete system is too
    expensive and often infeasible
  • Each job should be executed as if it had access
    to its own dedicated resources Virtualization
  • Consider applications separately and then reason
    about the behavior of overall system

52
Providing a Bound for Resources
  • Arbitration strategy plays an important role in
    determining resource requirement
  • A naive strategy leads to over-estimation of
    resources
  • Worst-case estimate is not always possible
  • Need predictable arbitration mechanism
  • More realistic worst case bounds
  • Handle dynamism in the system
  • An overall quality versus resources Pareto curve
    needed

53
Making the design flow predictable Run-time
aspects
  • Scalable applications
  • QoS management

Application n / Scenario m
Local manager
QoS protocol
Global manager
Platform
54
Match quality with resources
55
Outline
  • Trends and design problems
  • Unpredictability
  • Platforms
  • Predictable design
  • Design flow
  • Open issues

56
Design flow
Idea
Requirements spec
Models
POOSL/SystemC
Spec
Reactive Process Network
Kahn Process Network (YAPI)
BDF
SDF
correct by synthesis
Platform
57
RPN (Reactive Process Networks) events and
streaming
  • Processing of events
  • Finite State Machine
  • Controlling host-CPU (e.g. ARM)
  • RTOS hard real-time
  • classical SW complexity

Event_in
Event_out
status
mode
  • Soft Real-time
  • Compute intensive
  • Special hardware

Stream_in
Stream_out
58
POOSL Modeling Language
  • Mathematically defined semantics
  • Allows formal analysis of model properties
  • Can formally describe
  • concurrency
  • synchronous communication
  • timing (delay statements)
  • functionality

P1
P2
delay 1
59
POOSL Phases of Model Execution
State space
State space
State space
Synchronous time passage
Asynchronous actions execution
model time
60
From Model to Realization
Possible execution (timed) traces (S1, t1), (S2,
t1), (S3, t1d1), (S5, t1d1) (S1, t1), (S2, t1),
(S4, t1d2), (S6, t1d2) (S1, t1), (S2,
t1wcet(a)), (S3, t1d1), (S5,
t1d1wcet(b)) (S1, t1), (S2, t1wcet(a)), (S4,
t1wcet(a)wcet(c)), (S6, t1d2)
a()() sel delay d1 b()() or c()() delay
d2 les
61
?-Hypothesis property preservation
  • If the time-deviation between two timed execution
    traces is less than ?, then, if one trace
    satisfies a real-time property, that property,
    weakened upto ?, is preserved in the second one
    as well

e1, e2 lt e
62
Extending SDF
  • SADF Scenario Aware Data Flow
  • Can deal with dynamism
  • Still possible to reason about
  • deadlock,
  • resource utilization,
  • latency and throughput
  • Currently implemented in POOSL

63
SADF example MPEG-2 Decoder
  • Pipelined MPEG-2 decoder for I and P frames
  • VLD and IDCT fire per macro-block
  • MC and RC fire per frame
  • FD (frame detector) models control part of
    VLDthat determines frame type
  • Image size 176x144
  • I-frame
  • 99 macro-blocks
  • No motion vectors
  • Px-frame
  • x macro-blocks
  • Motion vectors from VLD to MC
  • Previous frame from RC to MC
  • P0-frame (still video)
  • Copy previous frame
  • FD model based on occurrenceprobability of frame
    types
  • Execution time distributions ofkernels
    determined with profiling tool

Rate I P0 Px
a 0 0 1
b 0 0 x
c 99 1 x
d 1 0 1
e 99 0 x
x 30, 40, 50 ,60, 70, 80, 99
64
Results for MPEG-2 Decoder
Process Throughput
VLD 0.063 rel. error 0.036
IDCT 0.063 rel. error 0.036
MC 0.00106 rel. error 0.190
RC 0.00106 rel. error 0.191
  • Time unit 1 kCycle

Accuracy results based on confidence levels of
0.95
Process Max. Latency between Successive Firings Average Latency betweenSuccessive Firings Variance in Latency betweenSuccessive Firings
VLD 710 15.99 rel. error 0.031 75.38 rel. error 0.18
IDCT 698 15.99 rel. error 0.031 56.45 rel. error 4.99
MC 3305 940.3 rel. error 0.017 2.4105 rel. error 3.46
RC 2216 940.3 rel. error 0.017 1.5105 rel. error 4.99
Channel Memory between Processes Maximum Occupancy Time-Average Occupancy Time-Variance in Occupancy
VLD and IDCT 9 1.910 rel. error 0.064 0.528 rel. error 1.99
IDCT and RC 154 60.19 rel. error 0.178 671.8 rel. error 4.55
VLD and MC 133 34.73 rel. error 0.517 698.4 rel. error 4.39
MC and RC 1 0.577 rel. error 0.561 0.244 rel. error 3.27
65
Design flow
  • Run-time
  • Combine pareto points
  • exploit pareto algebra
  • QoS management / scalable application

66
Mapping multiple jobs
T1
T2
T0
  • Multiple jobs can be active simultaneously.
  • When can a second job start ?
  • Are the requested resources available ?
  • If not, can the quality level be lowered ?
  • If not, can other jobs go for a lower quality ?
  • If yes, independent from other jobs ?
  • How to give guarantees?

67
Combining Pareto points
Application 1
Application 2
Cost
Cost
80
100
Cycle Budget
Cycle Budget
  • A new thread frame coming
  • 20 cycle budgets available

Application 3
Cost
Cycle Budget
68
Combining Pareto points
Application 1
Application 2
Cost
Cost
80
100
Cycle Budget
Cycle Budget
Application 3
Cost
feasible, but optimal?
20
Cycle Budget
69
Combining Pareto points
Application 1
Application 2
Cost
Cost
cost increase
?1
80
80
100
Cycle Budget
Cycle Budget
Application 3
Cost
a better solution
cost decrease and
?2 gt ?1
40
20
Cycle Budget
70
Outline
  • Trends and design problems
  • Unpredictability
  • Platforms
  • Predictable design
  • Design flow
  • Open issues

71
Open issues
  • Gap between specification and architecture
    modeling
  • High level modeling
  • use of modeling pattern library
  • Incorporate multiple pareto solutions into DSE
  • Pareto Algebra
  • Get synthesis correct for
  • control applications including compute intensive
    tasks
  • mapping to multi-processor
  • Managing QoS
  • Scenario detection, merging, prediction and
    exploitation
  • Runtime resource manager optimizing overall
    quality
  • Measuring overall quality

72
Open issues (cont'd)
  • Architecture modeling
  • how to deal with local memory (scratch pad /
    cache)
  • Modeling scheduling and arbitration
  • make things composable !
  • Definition NAL (run-time services)
  • Automatic partitioning
  • e.g., SPRINT tool of IMEC is a good start (C to
    SystemC)
  • VLSI tiling
  • . and many more .. e.g. see Ogras e.a. Key
    research problems in NoC Design A holistic
    perspective CODES ISSS 2005

73
Thanks
About PowerShow.com