McKinsey presentatie - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

McKinsey presentatie

Description:

Study requirements of embedded IT systems. Identify and solve ... Ibis CMOS. sensor. TCP/IP layers. FPGA. Internet. Client. network. Reconfiguration. ATMEL ... – PowerPoint PPT presentation

Number of Views:405
Avg rating:3.0/5.0
Slides: 81
Provided by: waut
Category:

less

Transcript and Presenter's Notes

Title: McKinsey presentatie


1
General Presentation on IMECs Thematic Design
Activities Ivo Bolsens, Hugo De Man 125
researchers bolsens_at_imec.be
2
IMEC organization
  • CEO Gilbert Declerck
  • Divisions
  • DESICS design technologyIvo Bolsens
  • SPT process technologyLuc van den Hove
  • STDI silicon techn. device integrationHerman
    Maes
  • MCP microsystems packagingRobert Mertens
  • INVOMEC trainingEtienne Bourdeaudhui

3
Mission
  • Design of
  • Architectures, Methods and Tools
  • for the Implementation of
  • Multimedia
  • Internet Terminals

4
How
  • Study requirements of embedded IT systems
  • Identify and solve RELEVANT design challenges
  • build application demonstrators
  • Work out systematic design methods and
    supporting tools
  • build tools for real-life design support
  • Develop re-usable, parameterized, white-box IP
  • Train and educate
  • industry
  • university

5
Measures of success
  • scientific impact
  • cooperation with universities in complementary
    fields
  • international network of cooperation with most of
    the important industrial performers in our field
  • portfolio of protected intellectual property
  • transfer of technologies to existing companies
  • creation of new spin-off companies
  • attracting foreign investments in the field of
    microelectronics and ICT
  • turn-over of well trained researchers to industry

6
Bridge the gap between systems and silicon

Systems Heaven

a
JAVA, CORBA, JINI
5 million lines of VHDL
0.1µm 1/300 Hair
200 M Transistors
Power/ cm2 V3/l3 T_intercon r
l2/l2
Physics Hell
7
Intelligent Home
W W W
MPEG 4 gt100 Gop/s 5 Gtr/s 5 Watt
8
DESICS organization
  • DIMA design of integrated multimedia
    applications
  • MICS multimedia image compression systems
  • EMSYS embedded systems design
  • SEMP system exploration for memory power
  • DISTA design of integrated systems for telecom
    applications
  • MIRA mixed signal RF applications
  • WISE wireless systems
  • DBATE digital broadband terminals

9
Configurable Home Terminal
Head End
Homegateway router storage basestation
IPv6
HFC
IP Home network
ServiceServer
InternethomeAppliance
modem
CSL
User premises
10
Embedded connectivity
Distributed Application
11
Challenges
Reconfigurable Software
Agent
Agent
Agent
VM
TCP/IP
RTOS
user data
Digital
CTL
MON
software agents
Front End
Tx/Rx DSP
CFIL/CB
To I/O
parameters synchro
Hardware
Analog
12
Challenges in Dynamic Reconfiguration
  • Run-time FPGA management
  • dynamical creation and deletion of HW processes
  • dynamical creation of the related HW/SW
    interfaces
  • dynamical extension of the instruction set
  • downloading of FPGA configuration for additional
    instruction
  • Fast HW compilation
  • Novel FPGA architectures optimized for partial
    runtime configuration
  • Performance Estimation (dynamic, configuration
    time)

13
Networked re-configurable computing
Application Layer (Java applet FPGA bitstream)
FPGA
Middleware Layer
FPGA API
FPGA Controller
Real-TimeOperating System
Java Native Interface
Virtual Bus
Hardware Platform
Native Device Driver
Local Bus
Software
Hardware
14
The first demonstrator FPGA based NetCam
InternetClient
Ibis CMOSsensor
ATMELEEPROM
Netscape
HTTP
GIF Engine
Reconfiguration
FPGA
request
IP layers
TCP/IP layers
image
10 Base T
10 Base T
network
  • first FPGA-based thin-server Internet
    Appliance (vs. Dedicated, Linux or uC based)
  • low power (FPGA 0.7 W)
  • throughput scales up to 80 Mb/s

15
Worlds first 80 MB/sec WLAN technology
Base station 155Mb/s multi-user rx antenna
diversity
wired backbone
Multi-path fading
  • Orthogonal Frequency Division Multiplexing
    (OFDM)
  • Turbo-coding
  • Spatial Division Multiple Access (SDMA)
  • Hiperlan-2/ IEEE 802.11 compatible

16
Single-package transceiver
CMOS IF and digital circuitry
BiCMOSRF circuitry
MEMS switches, varactor, resonators
MCM interconnect inductors, capacitors,
resistors, filters, baluns
Antenna
17
Multimedia MPEG-4 member SCtee
- Diversity 3D, Facial and Body Animation,
Video - Scalability time, space, SNR -
Interactivity behaviour f (input bits, user)
18
Focus
  • Graceful degradation, QOS
  • Encode once/ decode everywhere
  • Reduces the terminal cost (soft conformance
    with
  • pathological cases)
  • Man-Machine Interface Facial Animation
  • Real-time SOFTWARE video-coding of CIF images
  • Application Specific Processor for Wavelet coding

Demo
19
Challenges
Multimedia
MPEG 4 JPEG2000
Several orders of magnitude in performance and
power dissipation need to be gained
Huge requirements gt 2 GOP/s gt 6 GB/s gt 10
MB storage
Drastic reduction of design complexity required
20
Worlds first MPEG-4 compliant silicon
Max 30 fps CIF (352x288) Scalable architecture
21
C/C system refinement exploration
Data mngnt
Concurrency mngnt
Platform constraint
Platform integration
22
Deeply embedded system
Interfaces
Dedicated logic
  • mP core
  • Dedicated logic
  • accelerator synthesis
  • multi-DSP core
  • retargetable ASIP compiler
  • Memory/MMU
  • Interfaces
  • system integration
  • Analog

phone book
keypad intfc
phonebook
RAM ROM
DMA
protocol
control
S/P
Frontier
Coware
Demod and sync
Target
Viterbi Equal.
voice recognition
speech quality enhancement
de-intl decoder
A
RPE-LTP speech decoder
digital down conv
D
Multi-DSP core
All of this fits in one, cheap, package
23
Deeply embedded system
mP core
Memory/MMU
System protocol
  • mP core
  • system layer compiler
  • Dedicated logic
  • multi-DSP core
  • memory/MMU
  • dynamic static mem mngnt addr expr.
  • Interfaces
  • Analog
  • A/D RF

Data
phone book
keypad intfc
phonebook
RAM ROM
DMA
protocol
control
S/P
Demod and sync
Viterbi Equal.
voice recognition
speech quality enhancement
Mixed Signal
de-intl decoder
A
RPE-LTP speech decoder
digital down conv
D
Analog
All of this fits in one, cheap, package
24
Current challenges and solutions
  • System Specification and System-level Refinement
    with Exploration Support (algorithm design level,
    concurrent task level, system timing simulation)
  • Data Transfer and Storage Exploration for
    Massive Real Time Data Manipulation (dynamic
    memory mngntstatic transfer and storage, address
    generation)
  • Co-Design for Heterogenous Implementation
    Paradigms (refinement from unified HW/SW
    model,RTOS modeling, complete system
    simulation)
  • RF front-end exploration (fast mixed-signal
    co-simulation, chip-package co-design, noise
    coupling)


25
SoC or --- (S.O.S.)
  • Design productivity gap grows !
  • Complexity increase 40 per year
  • Design productivity increase 15 per year

26
System-level design
  • Solution
  • Paradigm shift
  • Higher abstraction level
  • Executable specs
  • Object-oriented design
  • Multi-paradigm modeling
  • Behavioral IP re-use
  • Incremental refinement to RT-HDL (HW) and C/C
    (SW)

27
System design issues in IT-Application domain
Embedded system
28
Global concurrency management design flow for
dynamic concurrent tasks with data-dominated
behaviour
Dynamic memory mgmt
Physical memory mgmt
Address optimization
29
TCM steps aim at removing the bottlenecks for
better performance
Optimized system specification
Task1
Task2
Inter-task DTSE
Task concurrency mngnt
Task3
Task-level system architecture
30
The gray box approach focuses on the most
relevant TCM issues
High Level Specification
Black-box TCG 1
Improved Gray-box lt10
task concurrency extraction improvement
Initial gray-box TCG 10
Reduce complexity Create freedom
Initial TCG 50
Simplify the model
White-box TCG 100
C Specification
31
Task Level DTSE and TCM
32
Results on IM1 player
Cost
x
x
Time-Budget (MA cycle budget)
33
The 2-processor approach (scheduling assignment)
Taskn
Task2
Task1
Vdd1V
Vdd3.3V
34
Comparison of scheduling the original and
transformed graphs
original
Transformed
35
Combination of static and dynamic scheduler
Static Scheduling
Static Scheduling
Dynamic Scheduling
1
3
2
A
B
1
A
B
3
2
Static scheduling done at compiling time,
exploring all the optimization possibility Dynamic
scheduling done at run time, providing
flexibility and dynamic control at low cost
36
Dynamic Scheduling result
total energy
20
24
32
32
39
node number in timer threads
Two Proc.(vlow 1V, vhigh 5V)
One Proc.(v 5V)
37
SoC refinement and exploration
  • Implementation
  • Final hardware
  • Appl. software
  • OS services optimized for application
  • System requirements
  • Abstract functionality
  • Real-time constraints
  • Target platform constr.

R E Q U I R E M E N T
SoC appl. timing
Application implementation (HW/SW)
Process mgmtconstr.
Memory mgmt constr.
R E A L
Process mgmt impl. (HW/SW)
Memory mgmt impl. (HW/SW)
Final platform (Silicon)
Target platform
38
Refinement and exploration
  • Memory mgmt
  • Dynamic memory
  • alloc / free (C)
  • new / delete (C)
  • abstract data type refinement
  • virtual memory mgmt
  • Static memory
  • platform-independent code transformations
  • real-time cost-optimal physical memory
    organisation
  • Address optimisation
  • Process mgmt
  • Task level concurrency mgmt (platform indep.)
  • transformations
  • static/dynamic scheduling
  • resource allocation
  • Instruction-level concurrency mgmt
  • refinement from unified HW/SW model
  • RTOS modeling/simulation including timing
  • traditional HW/SW co-design and compilers

39
Refinement - OCAPI / MATISSE
  • Implementation
  • Target hardware
  • OS services optimized for application
  • Virtual prototype
  • Soft implementationusing host OS and host
    hardware

SoC appl. arch.
Application implementation (HW/SW)
V I R T U A L
Process mgmt
Memory mgmt
R E A L
Process mgmt impl. (HW/SW)
Memory mgmt impl. (HW/SW)
OSAPI
Target HW (Silicon)
Host HW (HP/PC)
40
Unified Modeling and Refinement of HW and SW
OCAPI-xlC Class Lib
Flexible Primitives express
High LevelSystem Model
  • Concurrency
  • Communication
  • Interface design/reuse

unified HW/SW model
Built-in Code Generators create
RefinedModel
  • VHDL/Verilog/C
  • Testbenches

41
SoC design flow
C System Model
C
HW
SW
OSAPI
FSMD
42
System Model
C
HW
SW
OSAPI
FSMD
43
Global data management design flow for dynamic
concurrent tasks with data-dominated behaviour
Dynamic memory mgmt
Physical memory mgmt
Address optimization
44
Data Management Flow
Abstract Data Type (ADT) Refinement
ADT
ConcreteData types
Dynamic Memory Mngnt.
Virtual memory mgmt (VMM) Refinement
VirtualMemorySegments
Physical memory mgmt(PMM) Refinement
PhysicalMemories
Physical Memory Mngnt.
45
Matisse ADT refinement
ATM_cell Data_In Association_Table
Routing_Table Routing_Table new
Association_Table() Data_In new
ATM_cell() if ( Routing_Table-gtLookup(Data_In)
) ...
Impl. alternatives
46
ADT refinement results
  • Select best DT impl. for each ADT

LL(A)
LL(B)
PA(B)
LL(A)
PA(B)
BT(A)
PA(A)
Power cost function
AR(B)
Different data types
47
VM size for ATM MUX in network 1
PA(9)
PA(9)
PA(9)
PA(5)
PA(5)
32
PA(5)
PA(5)
PA(5)
PA(5)
PA(5)
32
32
PA(5)
PA(5)
32
PA(5)
PA(5)
AR(4)
PA(5)
PA(9)
AR(4)
AR(4)
AR(4)
AR(4)
256
AR(4)
AR(4)
AR(4)
256
256
256
AR(4)
AR(4)
AR(4)
AR(4)
1 VMS Size 133 mm2 Power 110 mW
3 VMS Size 137 mm2 Power 37 mW
2 VMS Size 137 mm2 Power 49 mW
2 VMS Size 137 mm2 Power 68 mW
48
Memory CPU Performance Bottleneck
Performance
1000
100
Moores Law
10
1
1980
1985
1990
1995
2000
Time
Patterson
49
Data-transfer and data-storage bottlenecks SDRAM
access
50
Data-transfer and data-storage bottlenecks cache
misses
51
Data-transfer and data-storage bottlenecks
system bus load
Diskaccess bus
Main system bus
L2 bus
MainMemory
L2 cache
Data paths
L1 cache
Hard disk
System chip
OtherSystemResources
OtherSystemResources
52
Memory Power Bottleneck
53
Multi-processor System Design
54
Platform design requires change
Application engineer
55
Data Transfer Storage Principles
3 Exploit memory hierarchy
Local Latch 1 Bank 1
Processor Data Paths
L1 Cache
L2 Cache
Cache Bank Recombine
Local Latch N Bank N
Chip
Off-chip SDRAM
6 Exploit limited life-time and data layout
freedom
5 Meet real-time constraints
56
Pareto curves allow task trade-off decision DAB
illustration
TASK-1
TASK-2
TASK-3
12
15
1000

y
8
10
g
r
e
500
n
E
4
5
0
0
0
0
10000
20000
30000
40000
0
50000
100000
0.0
2.0
4.0
6.0
Execution time
Execution time
Execution time
Mapped on two processors
Source Digital Audio Broadcast
57
Pareto curves allowtask trade-off decision
TASK-1
TASK-2
TASK-3
12
15
1000

y
8
10
g
r
e
500
n
E
4
5
0
0
0
0
10000
20000
30000
40000
0
50000
100000
0.0
2.0
4.0
6.0
Execution time
Execution time
Execution time
Single proc. Large mem. overhead
Source Digital Audio Broadcast
58
Pareto curves allowtask trade-off decision
TASK-1
TASK-2
TASK-3
12
15
1000

y
8
10
g
r
e
500
n
E
4
5
0
0
0
0
10000
20000
30000
40000
0
50000
100000
0.0
2.0
4.0
6.0
Execution time
Execution time
Execution time
Source Digital Audio Broadcast
59
Cavity Detection Algorithm on Intel Pentium-MMX
(execution time)
60
Resource limited software
TRIMEDIA processor
100
90
80
70
60
50
Percentage ()
40
30
20
10
0
Exec Time
Power
Bus Load
Initial Algorithm
DTSE Transformed
61
Voice coder (SW cache) full power summary
Relative power
  • Gain in power of additional factor 6 comparedto
    optimized (platform independent code)

62
MPEG - 4 Motion Estimation
1.0
Relative Power
0.5
0.0
Resulting Power Reduction 8
63
Consistent Speed Up on Different Platforms for
MPEG4 video decoder
Performance of PI MPEG-4 Video Decoder on
Different Platforms
120.0
Pentium II 350 MHz
HP PA RISC 180 MHz
100.0
TriMedia 100 MHz
80.0
60.0
Framerate (frames/second)
40.0
20.0
0.0
M D CIF 120 kbps
Foreman CIF 450
Cal Mob CIF 2
30 fps
kbps 25 fps
Mbps 30 fps
64
Power Reduced with Factor 21 to 48
Assesment Memory Power Reduction
(Proprietory Architecture)
5.0
4.5
4.0
3.5
3.0
2.5
Remaing Power ()
2.0
1.5
1.0
0.5
0.0
M D CIF 120
Foreman CIF 450
Cal Mob CIF 2
kbps 30 fps
kbps 25 fps
Mbps 30 fps
65
Turbo coding principle
Decoder
Encoder
Û
C
Y
U
C 1
C 2
66
Results
Original bit-rate 0.07 Mbit/s power 1.07
?J/bit latency 5900 ?s area 3.5 mm2
67
3D Texture Mapping using Mesa GL on TriMedia
TM1000
  • Reduction in total cyclesby 44
  • Reduction in Data cache accesses by factor 2
  • Reduction in Instruction Cache accesses by 40

68
Crisis in current (RT) design flow
E F F O R T
Ok?
69
Objectives
Drastically shorten design time (months to
weeks!) ? raise the abstraction level
Meet timing constraints as soon as possible
? expose timing bottlenecks at higher level
Low implementation cost ? systematic
methodology to control cost
70
(Re)Using High-Level Synthesis
Conventional HLS
ADOPTHLS
Less muxes/registers using ACUs (NOT
conventional High-Level Synthesis)
71
Disabling the time-bomb for logic synthesis
Synthesis time (minutes)
160
120
80
40
0
Scheduling
Logic synthesis
72
Exploration_at_High-level avoids complexity explosion
(V)HDL lines (x 103)
8
6
4
2
0
Behavior
RT
Gate
73
Efficient use of high-level synthesis (I)
reduced cost
Gates (x 103)
After logic synthesis !
7
6
5
Muxes
4
Registers
3
2
1
0
HLS
ADOPT HLS
74
Efficient use of high-level synthesis (II)
improved delay
Critical path
After logic synthesis !
75
Results for programmable processors cavity
detection
Performance (seconds)
16
14
12
IMAGE 1280x1000 pixels HP 9000/777 256 MB RAM
10
8
6
4
2
0
Adopt
Initial
DTSE
DTSE Adopt (Glb.Trf.)
DTSE Adopt (Loc.Trf.)
76
Analog-Digital Co-Design FAST
Demonstrator 5 GHz WLAN terminal
Mixed-signal front-end architecture exploration
-gt tools

Analog/digital partitioning
MCM vs. on-chip passives
Digital channel filtering
Chip partitioning
Noise coupling in mixed-signal Ics -gt tools
methods
LO
Chip-package co-design -gt architectures
77
Interaction with ROW
UNIVERSITIES KULeuven, RUGent, VUBrussel, EC univ
INDUSTRY
Problems Tools, IP,.
System Specialists
3 acad.staff 15 Ph.D. 125 researchers 10
Residents
Algorithm specialists
Residents
Circuit specialists
78
The Desics pipeline
Alcatel National Philips Ericsson Intel ESA
D6 RESEARCH PROGRAMME
D6/ INDUSTRY TRANSFER PROJECTS
Industry product development
79
Strategic Research Cooperation
  • Wireless Local Area Network
  • MPEG-4
  • System-on-Chip Design Technology

80
IMEC is part of a closed loop
  • Closed Loop approach
  • You cannot make an economic engine running
    without a closed belt.
  • Only the right combination of ALL elements can
    foster a successful industrial development,
    based upon an increasingly knowledge based
    society.

Knowledge creation
State-of-the-artscience parks
DSP Valley
Venture capital
Entrepreneurship
Permanent training initiatives
81
Conclusions
  • - requirements for future embedded system


    applications learned from
    IIAPs by
  • - building demonstrators
  • - systematic design flows and methods
  • - white box IP re-use
  • - design automation
  • - transfer through education training
  • http//www.imec.be/ocapi
  • http//www.imec.be/3/3.6.html

82
Ivo Bolsens Vice President
Hugo De Man Senior Fellow
Paul Six Associate VP
Jean Roggen Manager Strategic Programmes
Niek Van Dierdonck DST DSP Technology Support
Annemie Stas Administration
Ivo Bolsens Vice President
Marc Engels Department Director
Stephane Donnay MIRA Mixed Signal and
RF Applications
Bert Gyselinckx WISE Wireless Systems
Serge Vernalde DBATE Digital Broadband Terminals
Francky Catthoor SEMP System Exloration for
Memory and Power
Didi Verkest EMSYS Embedded Systems
Jan Bormans MICS Multi-media Image Compression
Systems
Write a Comment
User Comments (0)
About PowerShow.com