Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC

About This Presentation

Title:

Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC

Description:

Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC ... Decompress. Instruction Decode. DF/AG. Execute. Store Results. Slide 16. Slide 16 ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 34

Provided by: vand165

Category:

more less

Transcript and Presenter's Notes

Title: Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC

1
Cycle Accurate Parameterized Simulator for
Clustered VLIW ASIP in SystemC

Vipul Jain

March 31, 2003
2
Motivation

VLIW ASIPs include increasing number of
functional units to meet the high throughput
requirements of applications exhibiting high ILP.
Degree of parallelism limited by the register
files ability to supply operands to functional
units.
Clustered VLIW overcomes this problem by
clustering the functional units such that each
cluster can read/write to only a subset of
registers.

3
Need for a retargetable simulator

Role of Architecture Customization
Higher performance
Lesser area
Low power
Need of a tool to verify the performance matrices
of given application on a given architecture.

4
Objective

To define a retargetable clustered VLIW
architecture and implement a cycle accurate
simulator in SystemC for it.
Validate the simulator by modeling
Texas Instruments TI-C6400 DSP
Trimedia TM-1000 DSP

5
Clustered VLIW Architecture
6
Customization Opportunities

System Level
Processor Level

7
Customization Opportunities -gt System Level

Compute units
No. and Types ASICs, VLIW or RISC ASIPs etc.
Interconnection Network
Custom interconnection based on communication
pattern of the application
Memory architecture
Types synchronous/asynchronous,
pipelined/non-pipelined etc.
Various transfer modes
Custom memories FIFOs, LIFOs, frame buffers etc.

8
Customization Opportunities -gt System Level
9
Customization Opportunities -gt Processor Level

Functional Units
MISO, MIMO, MIMO with LD/ST
Rigid or flexible I/O timeshapes
Register File Clustering
If many FUs connected to same register file,
delay and cost of register file becomes the
bottleneck.
Each FU can read from and write to only a subset
of registers
Interconnects
Between different clusters and between clusters
and memory
Instruction encoding and decoding
Reduce or remove explicit NOPs in code
Affects Code size, Object code compatibility,
Branch miss prediction penalty, Hardware cost,
Address specification in code size

10
Customization Opportunities -gt System Level
Register Files
RF1
RF2
No. and Type of Regfile Customization
Interconnect Customization
Functional Units
FU1
FU2
FU3
FU4
AFU2
AFU1
No. and Type of FU Customization
11
Simulator Description

Simulator is being written in SystemC.
Simulator runs in two phases.
In Setup phase, all the functional units,
register files, memories and interconnection
network are initialized by reading the HMDES
description of architecture.
In Execution phase, the simulator executes the
given program and we can extract performance
statistics like number of cycles taken, memory
bandwidth utilized, number of cache misses etc.

12
Simulator Description (cont)

Predicated instruction execution
Compiler visible interconnection network
Multiple FUs may be connected by same set of
ports with Decode unit and register files
Pipeline stall occurs on Cache miss or cross path
register access.

13
Simulator Description (cont)

Simulator
Uses HMDES to describe the architecture
Uses Dinero IV cache simulator for simulating
cache hierarchy
Actually runs the input program
Generates statistics and execution trace
Dinero IV
Cache hierarchy is modelled using Dinero IV cache
simulator
Developed by Mark D. Hill, Univ. of Wisconsin
Computer Sciences
Provides subroutine interface for a flexible
simulator of multilevel cache memories.
Not a timing or functional Simulator. So these
details will be taken care of by memory module.

14
Retargetability

Parametrized parts
Memory Hierarchy
Number and types of register files
Interconnects between Reg. Files and Fus.
Organization of Fus in various clusters
Change in Code required
Adding custom Fus
Changing number of pipeline stages

15
The Typical VLIW Pipeline
Instruction Decode
Align
Decode
Decompress
Instruction Fetch
DF/AG
Execute
Store Results
16
Pipeline stages in simulator
Instruction Decode
Align
Decode
Decompress
Instruction Fetch
DF/AG
Execute
Store Results
17
Class Hierarchy
18
Overview of Simulator
19
Register Files

Retargetable parameters
Type of register in register file
Latency (in cycles)
Number of read ports for
Normal registers
Predicate registers
Number of write ports
Can have either
Separate predicate registers
Use least significant bit of normal registers as
predicate value

20
L1 Cache (Data and Instruction)

Use of Dinero IV cache simulator
Retargetable parameters
Size
Total Size
Line Size
Associativity
Read/write policies
Latency
Bus width with memory/ functional units

21
Main Memory

Retargetable parameters
Latency
Delay
Size of burst mode for memory access
Start address of data (so that multiple memory
modules may be added).

22
Bus between Cache/Memory

Retargetable parameters
Bus Width
Plug able arbitrator
Using static priorities for now.

23
Functional Units

Retargetable parameters
Input/output data type
Instructions executed
Latency and initialization interval

24
Fetch Unit

Retargetable parameters
Number of instructions fetched per cycle
Latency
Supports instruction prefetch. Prefetch requests
are not queued.

25
Decode Unit

Retargetable parameters
Cross path delay
Latency
Uses a plug able function for instruction decode.

26
Interconnects

Two types
One to one Connect directly using a signal.
Many to one Instantiate a multiplexor

27
Example of using HMDES architecture description

// beh is 0 read, 1 write, 2 read/write
CREATE SECTION Port
REQUIRED beh(INT)
REQUIRED type(LINK(DataType) )
CREATE SECTION DirectConnect
REQUIRED end1(LINK(Port))
REQUIRED end2(LINK(Port))
REQUIRED type( INT )

28
Example of using HMDES architecture
description(cont)
SECTION Port PortA( beh(write_port)
type(Type1)) PortB( beh(read_port)
type(Type1)) SECTION DirectConnect IC_1(inpu
t(PortA) output(PortB) type(Type1) )
Unit 1
Port A
Port B
Unit 2
29
A simple instance of simulated architecture
30
(No Transcript)
31
(No Transcript)
32
REFERENCES

Introduction to VLIW by Philips
Architectural Design and Analysis of a VLIW
Processor (Arthur Abnous and Nader Bagherzadeh)
TriMedia Technologies
Anup's Research Plan
Dinero IV Cache Simulator manual
TI - C62x data book
TI - C64x data book
TI - C6x instruction set
HMDES 2.0 Specification

33
Thanks

Write a Comment

User Comments (0)

About PowerShow.com

Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC - PowerPoint PPT Presentation

Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC

Cycle Accurate Parameterized Simulator for Clustered VLIW ASIP in SystemC ... Decompress. Instruction Decode. DF/AG. Execute. Store Results. Slide 16. Slide 16 ... – PowerPoint PPT presentation