A Comprehensive Codesign Framework for Embedded Systems Rabi Mahapatra Department of Computer Scienc - PowerPoint PPT Presentation

Loading...

PPT – A Comprehensive Codesign Framework for Embedded Systems Rabi Mahapatra Department of Computer Scienc PowerPoint presentation | free to download - id: ee9a-MjNlZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

A Comprehensive Codesign Framework for Embedded Systems Rabi Mahapatra Department of Computer Scienc

Description:

Our approach - consider Area, Time and Power for Design optimization ... Software area using memory size used up in bytes ... Enumeration-Gray code with Lookup Table ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 70
Provided by: Mah129
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A Comprehensive Codesign Framework for Embedded Systems Rabi Mahapatra Department of Computer Scienc


1
A Comprehensive Codesign Framework for Embedded
SystemsRabi MahapatraDepartment of Computer
Science Engg.Texas AM UniversityApril 2001
2
Topics to be covered
  • Introduction
  • Existing Codesign Framework on ES
  • What is lacking in this framework
  • Two hot-spots investigation
  • Available results
  • Other Research topics by Codesign team at Texas
    AM
  • Conclusion

3
Introduction
  • Hardware-Software Codesign for embedded system
    research program by DARPA and NSF

4
Introduction
  • Hardware-Software Codesign for embedded system
    research program by DARPA and NSF
  • Codesign Definition
  • Meeting System level objectives by exploiting
    the synergism of hardware and software through
    their concurrent design

5
Concurrent design
  • Traditional design flow
  • Concurrent (codesign) flow

Start
Start
SW
HW
HW
SW
Designed by independent groups of experts
Designed by Same group of experts with
cooperation
6
Why codesign?
  • Reduce time to market
  • Achieve better design
  • Explore alternative designs
  • Good design can be found by balancing the HW/SW
  • To meet strict design constraint
  • power, size, timing, and performance trade-off
  • safety and reliability
  • system on chip

7
Current Design Framework
8
Framework Embedded System Codesign
9
Contemporary Co-design framework
System Specification
Front end Compiler
Validation
Behavior Description of Modules
Partitioning
Performance Estimation
Synthesis S/W Commn H/W
Integration
Co-Simulation
Constraint Verification
Implementation CPU ASIC Memory
10
Two hot-spots in Codesign
  • Partitioning
  • Co-simulation

11
Current Partitioning
  • Binary partitioning prevails
  • Manual even in most recent VCC tool
  • Power is not considered as a parameter while
    partitioning
  • Problem is NP complete imagine extended
    partitioning, multiple-space partitioning!

12
Partitioning
  • Related Work
  • Tiwari, Malik Wolfe, Power Analysis of
    Embedded Software A First step towards Software
    power minimization, IEEE Trans VLSI, 1994
  • Givargis, Henkel Vahid, Interface and Cache
    power exploration for core based Embedded
    system, ICCAD 1999.
  • Stiti, Vahid etal, A first step towards an
    architecture tuning methodology for low power,
    CASES 2000.
  • Lu, Benine MiCheli, Low-Power Task Scheduling
    for Multiple devices, CODES 2000.
  • Our approach - consider Area, Time and Power for
    Design optimization

13
Partitioning with Power as a parameter
System Specification in C
Simulation Analysis of Software Implementation
Convert C code to VHDL (ART Builder)
Power Analysis Sp (Power Profiler)
Calculate Sa,St
Hardware Synthesis (Design Compiler)
Power Analysis Hp (Power Compiler)
Calculate Ha,Ht
Optimization Engine
System Partitioned into H/W S/W
14
Partitioning - Software Analysis
  • Software power for a specific microprocessor
    using Power profiler.
  • Software time using assembly level code and
    instruction time profile
  • Software area using memory size used up in bytes
  • SPARClite processor has been used as the target
    for this experiment.

15
Partitioning - Power Profiler
Input
File
in
Assembly
Total Current 0
Read the file line by line
S
YE
Is
Output
Total
EOF?
Current3.3
NO
Identify Instruction INST
line
from the current
Compare INST with the
Hash table entries
NO
Does INST
Skip the current line
match with
a Hash
Table
YES
Get the corresponding
value of the Current
Current
Total Current
16
Partitioning - Hardware synthesis
  • Hardware power using Synopsys Power Compiler
  • Hardware Area and time using Synopsys Design
    Analyzer
  • Technology in use

17
Partitioning - EGLT optimization
  • Enumeration-Gray code with Lookup Table
  • complete coverage, Most efficient for medium
    sized binary partitioning.
  • Case Studies
  • QAM Modem - Has 21 modules for partitioning into
    hardware and software modules
  • Optimization engine optimizes the allocation
    taking Area, Power and Timing requirements.

18
Partitioning - Case Study
A Ptolemy Snapshot of Modem design
19
Software Analysis Modem
  • Block Name Area(bytes) Time(ns) Power (mW)
  • AddCx 6956 640 873.5
  • AddInt 6804 782 885.2
  • BitsToInt 7024 1440 840.0
  • C2R 6864 626 869.5
  • Dist 6868 513 863.3
  • FtoI 6956 74340 832.1
  • GainInt 6892 9476 886.9
  • Gaussian 7304 76469 811.5
  • IID Uniform 7108 64282 832.1
  • LMSCx 8508 13697 806.1
  • ModuloInt 6900 8715 879.7
  • Quant 7140 4173 816.6
  • R2C 6864 367 878.3
  • SubCx 6972 547 873.5
  • TableCx 7736 902 845.8
  • TableInt 7172 853 848.6

20
Hardware SynthesisModem
  • Block Name Area(of gates) Time(ns) Power(mW)
  • AddCx 594 9.96 40.56
  • AddInt 299 9.96 19.6825
  • BitsToInt 114 1.07 9.1423
  • C2R 0 0 3.1689
  • Dist 0 0 3.1689
  • FtoI 42 0.43 1.13
  • GainInt 0 0 1.5350
  • Gaussian 13555 2.634 2W
  • IID 13555 2.634 2 W
  • LMSCx 80215 30.68 8.0758 W
  • ModuloInt 0 0 99.0293 uW
  • Quant 59 1.18 4.4644
  • R2C 0 0 3.1689
  • SubCx 687 9.96 49.2913
  • TableCx 332 3.18 21.6056
  • TableInt 178 2.54 11.0962

21
Functions
22
EGLT optimization
  • Decimal Binary Gray
  • 0 0000 0000
  • 1 0001 0001
  • 2 0010 0011
  • 3 0011 0010
  • 4 0100 0110
  • 5 0101 0111
  • 6 0110 0101
  • 7 0111 0100
  • 8 1000 1100 etc.

23
Results
  • Best partition
  • HW Modules AddInt, AddCx, TableInt, TableCx,
    GainInt, IID Uniform, Modulo, Dist, Bits to Int,
    Quant, CtoR, RtoC, SubCx
  • SW Modules IID Gaussian, LMSCx, Float to Int
  • Lowest Power obtained for the constraints set
    4619 mW
  • Hardware Area 16238
  • Software Area 23508
  • Timing 166682 ns

24
EGLT Partitioning Results
25
Optimization operations
26
(No Transcript)
27
Partitioning overheads
  • Growing design space due to complex architecture,
    technology, and solutions
  • Number of implementations for mapping a system
    specification made of n tasks on an architecture
    made of q nonempty modules S(n,q) ?
  • With p different kind of technology to implement
    each module, number of possible implementation
    grows to NbArchitecture (n, p)
  • Example n4, p 2, q 3 Nb 309

28
DSE Related work
  • PMOSS (1996), LYCOS (1997), COSYMA (1998)
    Mono-processor
  • POLIS (1996) one up and several co-processor
  • SpecSyn (1998) Multiprocessor, manual
    allocation before partitioning.

Users have no idea to fix number of components
before partitioning.
29
Objective
  • Efficient Design Space Exploration (DSE) to
    reduce partitioning overhead.
  • Determine Partition Boundaries at System-Level
  • Insert pre-allocation stage before Allocation to
    reduce Design Space.
  • Evaluation of associated cost

30
Approach
  • Specify-Explore-Refine Paradigm in system level
  • Specification Specify Desired Functionality with
    no Implementation Detail
  • Exploration Exploring Design Alternatives
    satisfying the design constrains. Partition
    functional specification among components.
    Estimation of alternative design approaches.
  • Refinement Refine Initial Specification
    reflecting decisions made in Exploration stage.
    Verify initial specifications

31
Design Space Exploration
System Behavior
Pre-Allocation Allocation
Performance Estimation
Partitioning
32
Exploration
  • Allocation Adding components to the Design.
  • Each component characterized by its constraints
    and technology file.
  • Partitioning Assigning functional modules to
    components. (Behaviors to standard processors,
    channels to buses etc)
  • Estimation Use of SpecSyn
  • Cost Function k1.F(c1.size, c1.size_constr)
    k2.F(c2.size, c2.size_constr)

33
DSE(contd)
  • Allocation mostly fixed in traditional
    methodology
  • Architecture Processor (Controller), ASIC,
    Memories
  • Includes only HW/SW partitioning (binary)
  • Performance Estimation used only for Partitioning

34
Problem Statement
  • New Methodology
  • Embedded systems now have Heterogeneous
    Multiprocessors with ASICs, ASIPs, DSPs,
    Processors, Memories.etc.
  • Design Space Exploration is thus a NP complete
    problem

35
Pre-Allocation
  • Main Goal Reduce the Design Space to reduce the
    Design Time
  • Use of Performance Estimation for Allocation
  • Decision can be based on Heuristics at the system
    level
  • Exact performance is determined after
    Co-simulation

36
Hardware Platform
  • A platform with arrays of processors and ASIC
  • Number of ASIC and processor to be used is based
    on performance Estimation
  • It gives a start to Designer for Design Space
    Exploration
  • Job of Designer Process Allocation and
    interconnection among various component

37
Proposed Methodology
  • Build Process graph from specifications
  • Identify leaf and root nodes
  • Map Leaf to Processors and Root to ASIC
  • Find performance parameter for each node when
    implemented on ASIC,Processor,DSP
  • Find the critical path from the constraints
  • Processor Merging

38
Proposed Methodology(Contd)
  • No two concurrent leaf on the same processor
  • If constraint is not satisfied leaf is
    implemented on a ASIC ( multiple copies)
  • Optimistic number of processor number of
    concurrent leaf
  • Processors form library of functions

39
Design Space Exploration
40
Pros and Cons...
  • Advantages Can implement many behavioral modules
    on a same chip
  • DisadvantagesDifferent behavioral modules should
    have common leaf nodes otherwise it is very
    expensive

41
Experiment and Results
42
Cosimulation
  • Definition
  • Process of simulating the HW and SW
    components of a mixed HW/SW system within a
    unified environment.

43
Cosimulation Related work
  • Coumeri Thomas(ICCD 95)
  • Tabbara et.al. (DATE 99)
  • Durbhakula, Pai, Adve (HPCA 99)
  • Pirvu, Bhuyan, Mahapatra (ICCD 2000)

44
Co-simulation - Need
  • Architectural simulators overlook hardware
    complexity and lack accuracy
  • Integration of HDL models with Architecture level
    simulator is pretty slow
  • Best solution is to implement the Subsystem under
    Test in FPGA and integrate this with the
    architecture level simulator

45
Co-simulation - How it fits
Execution driven simulation
HDL Description
Synthesize
Resimulate
HW-SW Cosimulation
Execution driven simulation
46
Cosimulation - Case Study
FPGA
47
Cosimulation - Case study(contd)
  • A Multiprocessor system with different switch
    models, arbitration rules and buffering
    techniques of the interconnect
  • RSIM - Architecture level Multiprocessor
    simulation environment
  • FPGA implementation of switching network
  • Serial port interface between the two

48
Cosimulation - Case study(contd)
  • Reducing Pin to Pin latency
  • Optimization of Spider-like switch to minimize
    Pin to Pin/Fall through latency
  • Flit size - 64 bits
  • Phit size - 16 bits
  • Assembling phits - 17.5ns
  • Arbitration - 10ns
  • Crossbar transfer - 10ns
  • Serialization - 2.5ns
  • Total time - 40ns

49
Cosimulation - Case study(contd)
  • Pipeline this structure
  • Start processing immediately after 1st phit
    arrives
  • Reduce data path size by 4 and increase core
    frequency by 4
  • Performance evaluation using Cosimulation shows
    super pipelining can halve the fall through
    latency

50
Cosimulation - Case study(contd)
Results
First Phit arrives
Fall through Latency 40ns
2.5ns link
Synchronize 17.5ns
Arbitration 10ns
Crossbar tx 10ns
Wire tx
Spider-like design
First Phit arrives
Synch
Arbitration
Crossbar link tx (pipelined)
Superpipelined Design
Fall through Latency 17.5ns
51
Cosimulation - Performance of Simulation
Techniques
Processor Memory
Interconnect Commn Simulation Commn
Simulation Synch
Synch - 1.26 - 0.08
4.60 1.26 1.16 4.70 0.35 1.26 NA
NA
Case
RSIM Rsim Verilog Rsim FPGA
52
Modified Design Framework
53
Modified Design Framework What it provides -
54
Other related topics by Codesignteam at Texas
AM University
  • Integration of Power simulation capabilities with
    SimOS
  • Optimized VMX Architecture modeling for future
    generation processors
  • Optimized HDLC Core

55
Codesign Intelligent Agent
  • Comments on this

56
Meeting Custom Design needs
  • Comments on this

57
Summary
58
Comments on this Framework
59
What this framework lacks
60
Co-simulation
61
Co-simulation (contd)
62
Co-simulation (contd)
63
Modified Design Framework
64
Modifies Design Framework (contd)
65
Other related topics by Codesignteam at Texas
AM University
  • Integration of Power simulation capabilities with
    SimOS
  • Optimized VMX Architecture modeling for future
    heneration processors
  • Optimized HDLC Core ...

66
Future Research Plans
  • Codesign Intelligent agent based on ANN hardware
  • DSP reconfigurable array processors to be used as
    HW-SW interface to meet custom designs

67
Codesign Intelligent Agent
  • Comments on this ...

68
Meeting Custom Design needs
69
Conclusion
About PowerShow.com