Title: Srijan: A Methodology for Synthesis of ASIP Based Multiprocessor SoCs Project Progress Presentation
1Srijan A Methodology for Synthesis of ASIP Based
Multiprocessor SoCs Project Progress
Presentation
2Outline
- Introduction
- Participants
- Design Space
- Methodology
- Key Achievements
- Work Done
- Work in Progress
3Srijan
- Objective
- To develop an integrated framework for
synthesis of embedded systems built around
application specific ASIP (RISC and VLIW) based
multiprocessor, where both system and
subsystem-level design-spaces can be
efficiently explored. - A 3-year project started in November 2002
- Being funded by Naval Research Board, Govt. of
India
4Participants
- Faculty
- Prof. Anshul Kumar (chief investigator)
- Prof. M.Balakrishnan
- Dr. Preeti Ranjan Panda
- Prof. Subhashis Banerjee
- Dr. Prem Kalra
- Project Staff
- Satya Kiran M.N.V.
- Nitin Bhardwaj
- Research Scholars (PhD)
- Anup Gangwar
- Basant Kumar Dwivedi
- Students
- M.Techs 7
- B.Techs 12
5Why Application Specific Multiprocessors ?
- Higher performance
- Lesser area
- Low power
Compute Intensive Application
Control Part
General Purpose Multiprocessor
Application Specific Multiprocessor
No customization
Customization
Higher Performance
Avg. Performance
6Customization Opportunities -gt System Level
7Customization Opportunities - gt System Level
- Compute units
- No. and Types ASICs, VLIW or RISC ASIPs, DSPs
- Interconnection Network
- Shared bus, MINs or crossbar switches
- Custom interconnection based on communication
pattern of the application - Memory architecture
- Types synchronous/asynchronous,
pipelined/non-pipelined etc. - Various transfer modes
- Custom memories FIFOs, frame buffers etc.
8Customization Opportunities -gt Processor Level
Register Files
RF1
RF2
No. and Type of Regfile Customization
Interconnect Customization
Functional Units
FU1
FU2
FU3
FU4
AFU2
AFU1
No. and Type of FU Customization
9Customization Opportunities - gt Processor Level
- Functional Units
- MISO, MIMO, MIMO with LD/ST
- Rigid or flexible I/O timeshapes
- Register File Clustering
- Each FU can read from and write to only a subset
of registers - Area grows as N3, Delay grows as N3/2, Power
grows as N3 - where N is the no. of Functional Units connected
to the register file - Powerful application analysis required to
minimize data copying - Interconnects
- between different clusters and between clusters
and memory - Analysis of data access patterns required for
evaluating cost-performance tradeoffs - Current ASIP vendors do not offer customizable
interconnects - Instruction encoding and decoding
- Reduce or remove explicit NOPs in code
- Affects Code size, Object code compatibility,
Branch miss prediction penalty, Hardware cost,
Address specification in code size
10Overall Methodology
Parallel Application Model
Constraints
Manual parallel model refinement
Refined performance numbers
Verification using simulation
FPGA Prototype
RTOS Specialization
11System Level Exploration
Parallel Application Model
Constraints
Component Library
- Estimations
- communication time
- context switch overheads etc.
Annotated Task Graph
Estimator
- Measurements
- - Performance
- Resource
- Utilizations
- - Power
Processor Selection
Partitioning
Interconnection Arch. Evaluation
Constraints met ?
Memory Arch. Evaluation
Y
N
Reconfiguration
System Arch. Description
12VLIW ASIP Synthesis Methodology
Task Set and Constraints
Architecture Description
Application Parameter Extraction
Architecture Design Space Exploration
Retargetable Compiler
Instruction Encoding Specialization
Validation (Simulation with encoded instructions)
Architecture Description (Output to synthesizer)
13Validation Framework
Task-set
Architecture Description
Retargetable Compiler
Retargetable Assembler
Performance and Power Numbers
Output to Other Tools
Retargetable Simulator
Power Consumption Information Gen.
14Frameworks in Place
- System Level Activities
- Synthesis framework for application specific
multiprocessors - Heterogeneous multiprocessor simulation
infrastructure - Prototype and validation platform for LEON based
multiprocessor SoC - Real time kernel for multiprocessor LEON
- A random process network generator
- Subsystem Level Activities
- A framework to evaluate clustered VLIW processors
- Synthesizable RTL for single cluster VLIW
processor - High level synthesis framework with optimizations
15Work Done System Level
- System Level Design Space Exploration
- Address the synthesis/mapping problem of process
networks onto heterogeneous multiprocessor - Existing work
- Heavily deals with acyclic process networks
- Data independent process behavior
- Models are developed to estimate the additional
delays due to communication conflicts - Statistical process behavior is exploited to
provide cheaper solutions - Quality-of-service and energy efficiency are also
being considered - One publication in VLSI Design, Jan04, Mumbai,
India
16Work Done System Level (cont.)
- Validation
- Estimation models are validated against
performance statistics generated by system
simulation - Requires a simulator framework that is flexible
enough to assemble a heterogeneous multiprocessor
with highly customized cores - Should be fast enough to be able to simulate
realistic applications - Developed SrijanSim, a cycle-accurate simulator
framework centered around transaction level
modeling - Generic and highly modular simulator
- Uses both state-of-the art and novel techniques
to expedite simulation - Supports rich variety of system components and
communication architectures. - Reduces model development time and system
composition effort - Implemented simulation models of a retargetable
VLIW ASIP core, SRAM, FIFO memory, point-to-point
links, shared buses within this simulation
framework - Planning to release the simulator for public
access in the first quarter of 2004
17Work Done Sub-system Level
- Subsystem Level Design Space Exploration
- Exploring the design choices in VLIW ASIP
- Analyzed real applications to judge the
suitability of high ILP architectures - Identified the design space for inter-cluster
communication - Built a framework to analyze various
inter-cluster communication mechanisms - Systematic evaluation of the impact of various
FU-FU, FU-RF and RF-RF interconnection
architectures on achievable ILP - Demonstrated that the most commonly used type of
interconnection, RF-to-RF, is not a good
candidate - Proposed a new interconnection network
- Accepted for publication in the proceedings of
Workshop on Application Specific Processors (WASP)
18Work Done Sub-system Level (cont.)
- Power and Energy Estimation
- Recently power has become the driving design
constraint - More power optimization opportunities at system
and sub-system synthesis level but requires
reliable estimates - Setup the tool flow for power characterization
- Power library for VLIW ASIP components
- In this project we built power models for various
VLIW components and develop a methodology to use
these in design space exploration phase - Memory Synthesis
- Customization of Cache Memory for Embedded
Systems - Goal of this project was to generate on-chip
cache configuration by performing application
analysis and estimations
19Work Done Sub-system Level (cont.)
- Implementation and prototyping
- Requires synthesizable descriptions of various
components of the target architecture - We are using synthesizable LEON as RISC processor
but no such is available for VLIW - Designed and implemented a synthesizable VLIW
core - Studied various micro-architectural choices
available - Designed a parameterized VLIW processor
- Synthesized the core using ASIC synthesis tools
and VTVT standard-cell library from VirginiaTech
university - A 4 issue slot configuration works at a clock
speed of 200MHz in 0.25um technology but higher
clock speeds (up to 400MHz, we hope) are possible
with sophisticated libraries - This core is useful not only for prototyping but
also in generating realistic power and
performance estimates for high level exploration
tools
20Work Done Behavioral Synthesis
- Source level optimizations
- Goal was to enhance the C-to-VHDL translator with
optimizations such as loop unrolling and
bit-width analysis - FSM derivation from SystemC
- SystemC is becoming a de-facto standard of system
specification - Goal was to develop a framework for hardware
synthesis from SystemC specification by
leveraging available high-level synthesis tool
support
21Work Done Software Synthesis
- Compiler optimization to exploit pipeline
registers and forwarding circuitry - Idea is to maximize the utilization of available
architectural resources through code
optimizations - This optimization targets to reduce the
register-file, a potential bottle-neck of
multiple issue processors, pressure - Goal was to design and modify the scheduling and
register allocation passes in IMPACT compiler to
incorporate this optimization - Extensions to RtKer
- RtKer is a in-house real time OS and was ported
onto x86, ARM, Trimedia - First goal was to map RtKer on LEON
multiprocessor - Second goal was to develop framework to customize
scheduler(s) - Binary utilities for multiprocessor code
generation - Objective was to develop assembler linker tools
to generate memory footprints for various
multiprocessor architectures
22Work Done Case Studies and Prototyping
- Application modeling and case studies
- LipSync
- Converts the text into an audiovisual speech
stream incorporating the lip movements - Goal was to map LipSync application on embedded
ARM platform - Ray Tracer
- A very computationally intensive graphics
rendering technique - Objective was to develop a FPGA hardware
accelerator for computation intensive part and
interface it with the host through PCI - Prototyping
- Extended LEON-MP, shared bus and shared memory
based multiprocessor built around LEON, to
incorporate local memories, which reduces the
contention for global resources
23Current Status
24Work in Progress
- Energy aware synthesis of application specific
multiprocessors - Impact of inter-cluster connectivity on clock
period in clustered VLIW processors - Case studies on MPEG4 and text-to-speech
applications - Loop unrolling optimizations in high level
synthesis - Cache design space exploration
- Extensions on RtKer-MP
25Expenditure
- Amount of grant sanctioned Rs. 54,80,000
- Amount of funds released Rs. 31,60,000
- Expenditure in FY 2002-03 Rs. 71,385
- Expenditure in FY 2003-04 Rs. 11,70,486
- Expenditure in FY 2004-05 till date Rs. 7,40,121
- Total expenditure till date Rs. 19,81,992
- Balance of released funds Rs. 11,78,008
- Balance of grant sanctioned Rs. 34,98,008
26Thanks