Title: Retargetable Cycle Accurate Simulator for Clustered VLIW Architecture in SystemC
1Retargetable Cycle Accurate Simulator for
Clustered VLIW Architecture in SystemC
- Under Guidance of
- Prof. Anshul Kumar
- Prof. M. Balakrishnan
- Dr. Preeti Ranjan Panda
-
- Vipul Jain
- 98431
2Presentation Outline
- Objective
- Motivation
- Design Space
- Simulator Description
- Work Done/to be Done
- Acknowledgements and References
3Objective
- To define a clustered VLIW architecture and
implement a retargetable cycle accurate simulator
in SystemC for it. - The simulator will be parameterized and flexible
to a large extent, allowing to vary common
parameters and plug in new components as needed.
4(No Transcript)
5Presentation Outline
- Objective
- Motivation
- Design Space
- Simulator Description
- Work Done/to be Done
- Acknowledgements and References
6Motivation
- VLIW ASIPs increasingly include large numbers of
functional units(FUs) in order to meet the high
throughput requirements of applications
exhibiting high ILP. - Creating register files shared by large numbers
of FUs quickly becomes a dominant
cost/performance factor. - Clustering smaller number of FUs around local
register files may be beneficial even if data
transfers are required among clusters.
7Clustered VLIW Architecture
8Presentation Outline
- Objective
- Motivation
- Design Space
- Simulator Description
- Work Done/to be Done
- Acknowledgements and References
9Design Space
- VLIW Processors have customization opportunities
due to simplified hardware. The key customization
domains are as follows - Number and types of functional units. Operation
latencies and pipeline stages are also
customizable - Register file structure
- Interconnection network between FUs and Register
files - Instruction encoding for NOP compression
10Design Space
Register Files
RF1
RF2
No. and Type of Regfile Customization
Interconnect Customization
Functional Units
FU1
FU2
FU3
FU4
AFU2
AFU1
No. and Type of FU Customization
11Presentation Outline
- Objective
- Motivation
- Design Space
- Simulator Description
- Work Done/to be Done
- Acknowledgements and References
12Simulator Description
- Simulator is being written in SystemC. SystemC
enables, promotes and accelerates system-level
co-design and IP exchange. - Object Oriented implementation. So extending and
modifying will be easier. - Simulator runs in two phases.
- In Setup phase, all the functional units,
register files, memories and interconnection
network are initialized. - In Execution phase, the simulator executes the
given program and we can extract performance
statistics like number of cycles taken, memory
bandwidth utilized, number of cache misses etc.
13 Main Modules
14Setup Phase of Simulator
15A simple instance of simulated architecture
16Simulator Description
- Simulator
- Uses HMDES to describe the architecture
- Uses Dinero IV cache simulator for simulating
cache hierarchy - Processor performance and cache simulation
- Three stage pipeline
- Fetch
- Decode
- Execute
- Generates statistics and execution trace
- HMDES
- Developed by UIUCs IMPACT group
- Specifies resource usage and latency information
for an arch - Input is translated to a low level representation
- Has efficient mechanisms for querying the
database - Does not specify instruction format information
17Simulator Description (cont)
- Dinero IV
- Developed by Mark D. Hill, Univ. of Wisconsin
Computer Sciences - Provides subroutine interface for a flexible
simulator of multilevel cache memories. - Not a timing or functional Simulator. So these
details will be taken care of by memory module.
18HMDES File for Functional Units
- include "port.hmdes2"
- CREATE SECTION FU
- REQUIRED name(STRING)
- REQUIRED src(LINK(Port))
- REQUIRED dest(LINK(Port))
- REQUIRED latency(INT)
- REQUIRED init_interval(INT)
- REQUIRED inputType(INT)
- REQUIRED outputType(INT)
-
def !integer_32 1 def !integer_64 2 def
!float_32 3 def !float_64 4
19Sample FU description in HMDES
- SECTION FU
-
- Alu1(name("ADDER") src(inport1 inport2)
dest(outport1) latency(1) init_interval(1)
inputType(1) outputType(1)) - Alu2(name("LS") src(inport3 inport4)
dest(outport2 outport5) latency(2)
init_interval(2) inputType(1) outputType(1))
20Retargetability
- Parametrized parts
- Memory Hierarchy
- Number and types of register files
- Interconnects between Reg. Files and Fus.
- Organization of Fus in various clusters
- Change in Code required
- Adding custom Fus
- Changing number of pipeline stages
21Example of use of templates
- template ltclass T, int abits, int nRegisters, int
nReadRegisterPorts, int nWriteRegisterPorts, int
nReadConditionPortsgt - class RegisterFile public sc_module
-
- public
- sc_inltboolgt inClock
- sc_inltsc_intltabitsgt gt inReadRegisterNumbernReadR
egisterPorts - sc_outltTgt outReadRegisterDatanReadRegisterPorts
- sc_inltsc_intltabitsgt gt inWriteRegisterNumbernWrite
RegisterPorts -
- sc_inltTgt inWriteRegisterDatanWriteRegisterPorts
-
22Instruction Format
23Presentation Outline
- Objective
- Motivation
- Design Space
- Simulator Description
- Work Done/to be Done
- Acknowledgements and References
24Work Done
- Upto Mid Semester Presentation
- Literature Survey of VLIW architecture
- Studied Philips TM - 1000 VLIW DSP processor
- Studied Texas Instruments C6400 C6200 VLIW DSP
processors. - Decide Design Space laying down the parameters
for retargetability - Revised SystemC to prepare for coding phase
- Experimented with dynamic library loading in
SystemC - Study about Design Patterns(Factory Design
Pattern), HMDES
25Work Done (cont)
- After Mid Semester presentation
- Experimented with template classes in SystemC
- Created HMDES description for the architecture
- Writing top level code for the simulator(
interface between various modules) - Learned to use subroutine interface of Dinero IV
to model memory hierarchy - Decided instruction format for the architecture
- Writing code
- Register files, Functional units, Memory and
Fetch Unit
26Work to be done
- Complete the processor by completing Control Unit
and decode unit - Simulate a simple 2 cluster architecture
- Validate by modeling
- Philips Trimedia TM-1000 DSP
- Texas Instruments C6x DSP
27(No Transcript)
28(No Transcript)
29Presentation Outline
- Objective
- Motivation
- Design Space
- Simulator Description
- Work Done/to be Done
- Acknowledgements and References
30Acknowledgements
- Prof. Anshul Kumar
- Prof. M.Balakrishnan
- Dr. P.R. Panda
- Anup Gangwar
- Basant K. Dwivedi
31REFERENCES
- Architectural Design and Analysis of a VLIW
Processor - TriMedia Technologies
- Introduction to VLIW by Philips
- Anup's Research Plan
- TI - C62x data book
- TI - C64x data book
- Program Library HOWTO
- Factory Design Pattern
32Thanks