Srijan: A Methodology for Synthesis of ASIP Based Multiprocessor SoCs Project Progress Presentation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Srijan: A Methodology for Synthesis of ASIP Based Multiprocessor SoCs Project Progress Presentation

Description:

Srijan: A Methodology for Synthesis of ASIP Based Multiprocessor SoCs ... in the proceedings of Workshop on Application Specific Processors (WASP) ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 27
Provided by: bas80
Category:

less

Transcript and Presenter's Notes

Title: Srijan: A Methodology for Synthesis of ASIP Based Multiprocessor SoCs Project Progress Presentation


1
Srijan A Methodology for Synthesis of ASIP Based
Multiprocessor SoCs Project Progress
Presentation
  • Anshul Kumar

2
Outline
  • Introduction
  • Participants
  • Design Space
  • Methodology
  • Key Achievements
  • Work Done
  • Work in Progress

3
Srijan
  • Objective
  • To develop an integrated framework for
    synthesis of embedded systems built around
    application specific ASIP (RISC and VLIW) based
    multiprocessor, where both system and
    subsystem-level design-spaces can be
    efficiently explored.
  • A 3-year project started in November 2002
  • Being funded by Naval Research Board, Govt. of
    India

4
Participants
  • Faculty
  • Prof. Anshul Kumar (chief investigator)
  • Prof. M.Balakrishnan
  • Dr. Preeti Ranjan Panda
  • Prof. Subhashis Banerjee
  • Dr. Prem Kalra
  • Project Staff
  • Satya Kiran M.N.V.
  • Nitin Bhardwaj
  • Research Scholars (PhD)
  • Anup Gangwar
  • Basant Kumar Dwivedi
  • Students
  • M.Techs 7
  • B.Techs 12

5
Why Application Specific Multiprocessors ?
  • Higher performance
  • Lesser area
  • Low power

Compute Intensive Application
  • Lower Cost

Control Part
General Purpose Multiprocessor
Application Specific Multiprocessor
No customization
Customization
Higher Performance
Avg. Performance
6
Customization Opportunities -gt System Level
7
Customization Opportunities - gt System Level
  • Compute units
  • No. and Types ASICs, VLIW or RISC ASIPs, DSPs
  • Interconnection Network
  • Shared bus, MINs or crossbar switches
  • Custom interconnection based on communication
    pattern of the application
  • Memory architecture
  • Types synchronous/asynchronous,
    pipelined/non-pipelined etc.
  • Various transfer modes
  • Custom memories FIFOs, frame buffers etc.

8
Customization Opportunities -gt Processor Level
Register Files
RF1
RF2
No. and Type of Regfile Customization
Interconnect Customization
Functional Units
FU1
FU2
FU3
FU4
AFU2
AFU1
No. and Type of FU Customization
9
Customization Opportunities - gt Processor Level
  • Functional Units
  • MISO, MIMO, MIMO with LD/ST
  • Rigid or flexible I/O timeshapes
  • Register File Clustering
  • Each FU can read from and write to only a subset
    of registers
  • Area grows as N3, Delay grows as N3/2, Power
    grows as N3
  • where N is the no. of Functional Units connected
    to the register file
  • Powerful application analysis required to
    minimize data copying
  • Interconnects
  • between different clusters and between clusters
    and memory
  • Analysis of data access patterns required for
    evaluating cost-performance tradeoffs
  • Current ASIP vendors do not offer customizable
    interconnects
  • Instruction encoding and decoding
  • Reduce or remove explicit NOPs in code
  • Affects Code size, Object code compatibility,
    Branch miss prediction penalty, Hardware cost,
    Address specification in code size

10
Overall Methodology
Parallel Application Model
Constraints
Manual parallel model refinement
Refined performance numbers
Verification using simulation
FPGA Prototype
RTOS Specialization
11
System Level Exploration
Parallel Application Model
Constraints
Component Library
  • Estimations
  • communication time
  • context switch overheads etc.

Annotated Task Graph
Estimator
  • Measurements
  • - Performance
  • Resource
  • Utilizations
  • - Power

Processor Selection
Partitioning
Interconnection Arch. Evaluation
Constraints met ?
Memory Arch. Evaluation
Y
N
Reconfiguration
System Arch. Description
12
VLIW ASIP Synthesis Methodology
Task Set and Constraints
Architecture Description
Application Parameter Extraction
Architecture Design Space Exploration
Retargetable Compiler
Instruction Encoding Specialization
Validation (Simulation with encoded instructions)
Architecture Description (Output to synthesizer)
13
Validation Framework
Task-set
Architecture Description
Retargetable Compiler
Retargetable Assembler
Performance and Power Numbers
Output to Other Tools
Retargetable Simulator
Power Consumption Information Gen.
14
Frameworks in Place
  • System Level Activities
  • Synthesis framework for application specific
    multiprocessors
  • Heterogeneous multiprocessor simulation
    infrastructure
  • Prototype and validation platform for LEON based
    multiprocessor SoC
  • Real time kernel for multiprocessor LEON
  • A random process network generator
  • Subsystem Level Activities
  • A framework to evaluate clustered VLIW processors
  • Synthesizable RTL for single cluster VLIW
    processor
  • High level synthesis framework with optimizations

15
Work Done System Level
  • System Level Design Space Exploration
  • Address the synthesis/mapping problem of process
    networks onto heterogeneous multiprocessor
  • Existing work
  • Heavily deals with acyclic process networks
  • Data independent process behavior
  • Models are developed to estimate the additional
    delays due to communication conflicts
  • Statistical process behavior is exploited to
    provide cheaper solutions
  • Quality-of-service and energy efficiency are also
    being considered
  • One publication in VLSI Design, Jan04, Mumbai,
    India

16
Work Done System Level (cont.)
  • Validation
  • Estimation models are validated against
    performance statistics generated by system
    simulation
  • Requires a simulator framework that is flexible
    enough to assemble a heterogeneous multiprocessor
    with highly customized cores
  • Should be fast enough to be able to simulate
    realistic applications
  • Developed SrijanSim, a cycle-accurate simulator
    framework centered around transaction level
    modeling
  • Generic and highly modular simulator
  • Uses both state-of-the art and novel techniques
    to expedite simulation
  • Supports rich variety of system components and
    communication architectures.
  • Reduces model development time and system
    composition effort
  • Implemented simulation models of a retargetable
    VLIW ASIP core, SRAM, FIFO memory, point-to-point
    links, shared buses within this simulation
    framework
  • Planning to release the simulator for public
    access in the first quarter of 2004

17
Work Done Sub-system Level
  • Subsystem Level Design Space Exploration
  • Exploring the design choices in VLIW ASIP
  • Analyzed real applications to judge the
    suitability of high ILP architectures
  • Identified the design space for inter-cluster
    communication
  • Built a framework to analyze various
    inter-cluster communication mechanisms
  • Systematic evaluation of the impact of various
    FU-FU, FU-RF and RF-RF interconnection
    architectures on achievable ILP
  • Demonstrated that the most commonly used type of
    interconnection, RF-to-RF, is not a good
    candidate
  • Proposed a new interconnection network
  • Accepted for publication in the proceedings of
    Workshop on Application Specific Processors (WASP)

18
Work Done Sub-system Level (cont.)
  • Power and Energy Estimation
  • Recently power has become the driving design
    constraint
  • More power optimization opportunities at system
    and sub-system synthesis level but requires
    reliable estimates
  • Setup the tool flow for power characterization
  • Power library for VLIW ASIP components
  • In this project we built power models for various
    VLIW components and develop a methodology to use
    these in design space exploration phase
  • Memory Synthesis
  • Customization of Cache Memory for Embedded
    Systems
  • Goal of this project was to generate on-chip
    cache configuration by performing application
    analysis and estimations

19
Work Done Sub-system Level (cont.)
  • Implementation and prototyping
  • Requires synthesizable descriptions of various
    components of the target architecture
  • We are using synthesizable LEON as RISC processor
    but no such is available for VLIW
  • Designed and implemented a synthesizable VLIW
    core
  • Studied various micro-architectural choices
    available
  • Designed a parameterized VLIW processor
  • Synthesized the core using ASIC synthesis tools
    and VTVT standard-cell library from VirginiaTech
    university
  • A 4 issue slot configuration works at a clock
    speed of 200MHz in 0.25um technology but higher
    clock speeds (up to 400MHz, we hope) are possible
    with sophisticated libraries
  • This core is useful not only for prototyping but
    also in generating realistic power and
    performance estimates for high level exploration
    tools

20
Work Done Behavioral Synthesis
  • Source level optimizations
  • Goal was to enhance the C-to-VHDL translator with
    optimizations such as loop unrolling and
    bit-width analysis
  • FSM derivation from SystemC
  • SystemC is becoming a de-facto standard of system
    specification
  • Goal was to develop a framework for hardware
    synthesis from SystemC specification by
    leveraging available high-level synthesis tool
    support

21
Work Done Software Synthesis
  • Compiler optimization to exploit pipeline
    registers and forwarding circuitry
  • Idea is to maximize the utilization of available
    architectural resources through code
    optimizations
  • This optimization targets to reduce the
    register-file, a potential bottle-neck of
    multiple issue processors, pressure
  • Goal was to design and modify the scheduling and
    register allocation passes in IMPACT compiler to
    incorporate this optimization
  • Extensions to RtKer
  • RtKer is a in-house real time OS and was ported
    onto x86, ARM, Trimedia
  • First goal was to map RtKer on LEON
    multiprocessor
  • Second goal was to develop framework to customize
    scheduler(s)
  • Binary utilities for multiprocessor code
    generation
  • Objective was to develop assembler linker tools
    to generate memory footprints for various
    multiprocessor architectures

22
Work Done Case Studies and Prototyping
  • Application modeling and case studies
  • LipSync
  • Converts the text into an audiovisual speech
    stream incorporating the lip movements
  • Goal was to map LipSync application on embedded
    ARM platform
  • Ray Tracer
  • A very computationally intensive graphics
    rendering technique
  • Objective was to develop a FPGA hardware
    accelerator for computation intensive part and
    interface it with the host through PCI
  • Prototyping
  • Extended LEON-MP, shared bus and shared memory
    based multiprocessor built around LEON, to
    incorporate local memories, which reduces the
    contention for global resources

23
Current Status
24
Work in Progress
  • Energy aware synthesis of application specific
    multiprocessors
  • Impact of inter-cluster connectivity on clock
    period in clustered VLIW processors
  • Case studies on MPEG4 and text-to-speech
    applications
  • Loop unrolling optimizations in high level
    synthesis
  • Cache design space exploration
  • Extensions on RtKer-MP

25
Expenditure
  • Amount of grant sanctioned Rs. 54,80,000
  • Amount of funds released Rs. 31,60,000
  • Expenditure in FY 2002-03 Rs. 71,385
  • Expenditure in FY 2003-04 Rs. 11,70,486
  • Expenditure in FY 2004-05 till date Rs. 7,40,121
  • Total expenditure till date Rs. 19,81,992
  • Balance of released funds Rs. 11,78,008
  • Balance of grant sanctioned Rs. 34,98,008

26
Thanks
Write a Comment
User Comments (0)
About PowerShow.com