Retargetable Cycle Accurate Simulator for Clustered VLIW Architecture in SystemC - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Retargetable Cycle Accurate Simulator for Clustered VLIW Architecture in SystemC

Description:

Studied Texas Instruments C6400 & C6200 VLIW DSP processors. ... Texas Instruments C6x DSP. MTP Presentation, Sem I. 2 December 2002. Slide 27 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 33
Provided by: vip94
Category:

less

Transcript and Presenter's Notes

Title: Retargetable Cycle Accurate Simulator for Clustered VLIW Architecture in SystemC


1
Retargetable Cycle Accurate Simulator for
Clustered VLIW Architecture in SystemC
  • Under Guidance of
  • Prof. Anshul Kumar
  • Prof. M. Balakrishnan
  • Dr. Preeti Ranjan Panda
  • Vipul Jain
  • 98431

2
Presentation Outline
  • Objective
  • Motivation
  • Design Space
  • Simulator Description
  • Work Done/to be Done
  • Acknowledgements and References

3
Objective
  • To define a clustered VLIW architecture and
    implement a retargetable cycle accurate simulator
    in SystemC for it.
  • The simulator will be parameterized and flexible
    to a large extent, allowing to vary common
    parameters and plug in new components as needed.

4
(No Transcript)
5
Presentation Outline
  • Objective
  • Motivation
  • Design Space
  • Simulator Description
  • Work Done/to be Done
  • Acknowledgements and References

6
Motivation
  • VLIW ASIPs increasingly include large numbers of
    functional units(FUs) in order to meet the high
    throughput requirements of applications
    exhibiting high ILP.
  • Creating register files shared by large numbers
    of FUs quickly becomes a dominant
    cost/performance factor.
  • Clustering smaller number of FUs around local
    register files may be beneficial even if data
    transfers are required among clusters.

7
Clustered VLIW Architecture
8
Presentation Outline
  • Objective
  • Motivation
  • Design Space
  • Simulator Description
  • Work Done/to be Done
  • Acknowledgements and References

9
Design Space
  • VLIW Processors have customization opportunities
    due to simplified hardware. The key customization
    domains are as follows
  • Number and types of functional units. Operation
    latencies and pipeline stages are also
    customizable
  • Register file structure
  • Interconnection network between FUs and Register
    files
  • Instruction encoding for NOP compression

10
Design Space
Register Files
RF1
RF2
No. and Type of Regfile Customization
Interconnect Customization
Functional Units
FU1
FU2
FU3
FU4
AFU2
AFU1
No. and Type of FU Customization
11
Presentation Outline
  • Objective
  • Motivation
  • Design Space
  • Simulator Description
  • Work Done/to be Done
  • Acknowledgements and References

12
Simulator Description
  • Simulator is being written in SystemC. SystemC
    enables, promotes and accelerates system-level
    co-design and IP exchange.
  • Object Oriented implementation. So extending and
    modifying will be easier.
  • Simulator runs in two phases.
  • In Setup phase, all the functional units,
    register files, memories and interconnection
    network are initialized.
  • In Execution phase, the simulator executes the
    given program and we can extract performance
    statistics like number of cycles taken, memory
    bandwidth utilized, number of cache misses etc.

13
Main Modules
14
Setup Phase of Simulator
15
A simple instance of simulated architecture
16
Simulator Description
  • Simulator
  • Uses HMDES to describe the architecture
  • Uses Dinero IV cache simulator for simulating
    cache hierarchy
  • Processor performance and cache simulation
  • Three stage pipeline
  • Fetch
  • Decode
  • Execute
  • Generates statistics and execution trace
  • HMDES
  • Developed by UIUCs IMPACT group
  • Specifies resource usage and latency information
    for an arch
  • Input is translated to a low level representation
  • Has efficient mechanisms for querying the
    database
  • Does not specify instruction format information

17
Simulator Description (cont)
  • Dinero IV
  • Developed by Mark D. Hill, Univ. of Wisconsin
    Computer Sciences
  • Provides subroutine interface for a flexible
    simulator of multilevel cache memories.
  • Not a timing or functional Simulator. So these
    details will be taken care of by memory module.

18
HMDES File for Functional Units
  • include "port.hmdes2"
  • CREATE SECTION FU
  • REQUIRED name(STRING)
  • REQUIRED src(LINK(Port))
  • REQUIRED dest(LINK(Port))
  • REQUIRED latency(INT)
  • REQUIRED init_interval(INT)
  • REQUIRED inputType(INT)
  • REQUIRED outputType(INT)

def !integer_32 1 def !integer_64 2 def
!float_32 3 def !float_64 4
19
Sample FU description in HMDES
  • SECTION FU
  • Alu1(name("ADDER") src(inport1 inport2)
    dest(outport1) latency(1) init_interval(1)
    inputType(1) outputType(1))
  • Alu2(name("LS") src(inport3 inport4)
    dest(outport2 outport5) latency(2)
    init_interval(2) inputType(1) outputType(1))

20
Retargetability
  • Parametrized parts
  • Memory Hierarchy
  • Number and types of register files
  • Interconnects between Reg. Files and Fus.
  • Organization of Fus in various clusters
  • Change in Code required
  • Adding custom Fus
  • Changing number of pipeline stages

21
Example of use of templates
  • template ltclass T, int abits, int nRegisters, int
    nReadRegisterPorts, int nWriteRegisterPorts, int
    nReadConditionPortsgt
  • class RegisterFile public sc_module
  • public
  • sc_inltboolgt inClock
  • sc_inltsc_intltabitsgt gt inReadRegisterNumbernReadR
    egisterPorts
  • sc_outltTgt outReadRegisterDatanReadRegisterPorts
  • sc_inltsc_intltabitsgt gt inWriteRegisterNumbernWrite
    RegisterPorts
  • sc_inltTgt inWriteRegisterDatanWriteRegisterPorts

22
Instruction Format
23
Presentation Outline
  • Objective
  • Motivation
  • Design Space
  • Simulator Description
  • Work Done/to be Done
  • Acknowledgements and References

24
Work Done
  • Upto Mid Semester Presentation
  • Literature Survey of VLIW architecture
  • Studied Philips TM - 1000 VLIW DSP processor
  • Studied Texas Instruments C6400 C6200 VLIW DSP
    processors.
  • Decide Design Space laying down the parameters
    for retargetability
  • Revised SystemC to prepare for coding phase
  • Experimented with dynamic library loading in
    SystemC
  • Study about Design Patterns(Factory Design
    Pattern), HMDES

25
Work Done (cont)
  • After Mid Semester presentation
  • Experimented with template classes in SystemC
  • Created HMDES description for the architecture
  • Writing top level code for the simulator(
    interface between various modules)
  • Learned to use subroutine interface of Dinero IV
    to model memory hierarchy
  • Decided instruction format for the architecture
  • Writing code
  • Register files, Functional units, Memory and
    Fetch Unit

26
Work to be done
  • Complete the processor by completing Control Unit
    and decode unit
  • Simulate a simple 2 cluster architecture
  • Validate by modeling
  • Philips Trimedia TM-1000 DSP
  • Texas Instruments C6x DSP

27
(No Transcript)
28
(No Transcript)
29
Presentation Outline
  • Objective
  • Motivation
  • Design Space
  • Simulator Description
  • Work Done/to be Done
  • Acknowledgements and References

30
Acknowledgements
  • Prof. Anshul Kumar
  • Prof. M.Balakrishnan
  • Dr. P.R. Panda
  • Anup Gangwar
  • Basant K. Dwivedi

31
REFERENCES
  • Architectural Design and Analysis of a VLIW
    Processor
  • TriMedia Technologies
  • Introduction to VLIW by Philips
  • Anup's Research Plan
  • TI - C62x data book
  • TI - C64x data book
  • Program Library HOWTO
  • Factory Design Pattern

32
Thanks
Write a Comment
User Comments (0)
About PowerShow.com