Exploring VLIW ASIP Design Space using Trimaran Framework presentation

About This Presentation

Transcript and Presenter's Notes

Title: Exploring VLIW ASIP Design Space using Trimaran Framework

1
Exploring VLIW ASIP Design Space using Trimaran
Framework

Under the guidance of
Prof. Anshul Kumar
V Srinivasa Reddy
(2004MCS2453)

2
Introduction

Application Specific Instruction set Processors
(ASIPs)
Gives better Performace than General Purpose
Processors(GPPs)
More flexible than ASICs
Customizes the processor for a set of
applications through
Instruction Set Extension
Processor Specialisation

3
Course Grain AFUs

Chaining of operations reduces the computation
time.
Wiring logic reduces the register pressure by
bypassing the values from one FU to other FU of
the AFU.
Reduces spill code for VLIW processors
If the operands of an operation have a limited
resolution then FU can be made faster.
Parallism can be optimized in AFU.
Comparision operation can be done simultaneously
in AFU eleminates branching delay

4
VLIW Processors

VLIW processors
Better ILP for numeric programs
Static Scheduling
ILP is limited by the branch statements

5
Handling branch statements in VLIW processors

Pipeline strategies
branch always taken
branch always not taken
Predicated execution
Needs the support of processor
Control dependency is eleminated to data
dependency

6
Pipeline Strategy

Consider a 5 stage pipeline processor
Processor with branch always not taken strategy

Condition true cycles wasted 1
Condition false cycles wasted 3
7
Predicated Execution

Present VLIW architectures supports predicated
instructions
HPL PD architecture
Each operation have one bit extra operand called
predicate register
e.g., r2 ADD.W r1,r3 if p
If the predicate register contains 0 operation is
not performed
The value of predicate registers are typically
set by compare-to-predicate operations
P CMPP. lt r4,r5

8
Example

Advantage
If conversion
Control dependency height reduction
Disadvantage
Slots get wasted due to predicated instructions

9
ASIP methodology
if (altb) ca-b else cab

Advantage
Comparison can be done simultaneously
Stores the intermediate results in wires

10
How to handle variable latency AFUs in VLIW ?

Schedule the instructions taking the minimum
latency path of the AFU.
If it takes longest path stall the pipeline till
the results come out
Similar to pipeline branch handling strategy

11
Pipeline flow
Taken Maximum latency path
Taken Minimum latency path
Issue Instruction in next cycle Stall in 3rd
Cycle
Issue Instruction in next cycle
Min lt 2 max lt 3
12
How to handle Inputs and Outputs Of AFUs

All the inputs are read at the beginning and
written at the end
Need to find the Deterministic AFUs
Compare instruction can be performed
simultaneously
Need Data Flow and control flow analysis
Time shaping of inputs and outputs
Compare instruction cannot be performed
simultaneously

13
Comparison

Simple If then else

m, n are schedule lengths
Predicated Execution max(n,m)1 lt schedule
length lt nm1 ASIP methodology min(n,m) lt
schedule length lt max(n,m)
14
Comparison (cont ...)
Super block
Hyper block

Super block
schedule length (1-p)(n3) p(m1)
let nm and p1-p 0.5 (i.e., equal probability)
S L 0.5 (n3) 0.5 (n1) n2

15
Comparison(cont ... )

If then else chain

16
Identification of Special AFUs
Identification Algorithm
Selection Criteria
17
Earlier work

Used Machsuif
identification
Evaluation
Selection
Trimaran
Finding Statistics

18
Limitations of earlier work

Identification is based on different architecture
Evaluation and selection doesnot consider VLIW
features into consideration
If-conversion modified the program in MachSuif
which will decrease the performance

19
Trimaran Framework
source www.trimaran.org
20
Estimation Approach

Identification
Finding deterministic computational blocks(CBs)
Use def-use and use-def chains to find CBs
Renaming the registers
Evaluation of CBs
Fully Perdicate the Sub region
Critical path reduction(CPR)
Control CPR
Data CPR
Schedule the sub region for infinite resources
Estimate the performace improvement, area and
input and outputs of the CB

21
Estimation (cont...)

Selection
Multi objective problem
select the CBs which satisfies the multiple
objectives

22
Ifthen benchmark(source code)
int main() int i,a,b,c,d,e,f,g
abcdefg0 for (i0 ilt200 i)
int x i2 int y i4 a if
(x0) b if (y0)
c else d
else e if (x0)
f else g
printf("ad bd cd dd ed fd gd\n",
a,b,c,d,e,f,g) exit(0)
23
CDFG
s l weight bb 1 7
1 bb 2 3 1 bb 3
7 200 bb 4 2 100
bb 5 2 50 bb 9 2 200
bb 10 2 100 bb 12
3 200 bb 13 9 1 bb 14
9 1 bb 8 2 100
bb 11 2 100 bb 6 2 50
24
op 1 (SHRA_W brlt27i gpr 3gt brlt1i gpr
2gt ilt31gt plttgt s_time(0) s_opcode(SHRA_W.0)
flags(sched)) op 2 (ADD_W brlt2i gpr 5gt
brlt2i gpr 5gt ilt1gt plttgt s_time(0)
s_opcode(ADD_W.2) flags(sched)) op 3
(AND_W brlt24i gpr 6gt brlt27i gpr 3gt ilt1gt
plttgt s_time(1) s_opcode(AND_W.0) flags(sched))
op 4 (AND_W brlt28i gpr 7gt brlt27i gpr 3gt
ilt3gt plttgt s_time(1) s_opcode(AND_W.1)
flags(sched)) op 5 (ADD_W brlt25i gpr
8gt brlt1i gpr 2gt brlt24i gpr 6gt plttgt s_time(2)
s_opcode(ADD_W.0) flags(sched)) op 6
(ADD_W brlt29i gpr 9gt brlt1i gpr 2gt brlt28i
gpr 7gt plttgt s_time(2) s_opcode(ADD_W.1)
flags(sched)) op 7 (AND_W brlt26i gpr
10gt brlt25i gpr 8gt ilt-2gt plttgt s_time(3)
s_opcode(AND_W.0) flags(sched)) op 8
(AND_W brlt30i gpr 11gt brlt29i gpr 9gt ilt-4gt
plttgt s_time(3) s_opcode(AND_W.1) flags(sched))
op 9 (SUB_W brlt9i gpr 4gt brlt1i gpr 2gt
brlt26i gpr 10gt plttgt s_time(4)
s_opcode(SUB_W.0) flags(sched)) op 10
(SUB_W brlt10i gpr 12gt brlt1i gpr 2gt brlt30i
gpr 11gt plttgt s_time(4) s_opcode(SUB_W.1)
flags(sched)) op 11 (CMPP_W_NEQ_UN_UC brlt1p pr
1gt brlt2p pr 2gt brlt9i gpr 4gt ilt0gt plttgt
s_time(5) s_opcode(CMPP_W_NEQ_UN_UN.0)
flags(sched)) //bb8 op 12 (ADD_W
brlt6i gpr 14gt brlt6i gpr 14gt ilt1gt plt1gt
s_time(0) s_opcode(ADD_W.1) flags(sched)) //bb4
op 13 (ADD_W brlt3i gpr 13gt brlt3i gpr 13gt
ilt1gt plt2gt s_time(0) s_opcode(ADD_W.2)
flags(sched)) op 14 (CMPP_W_NEQ_UN_UC
brlt3p pr 3gt brlt4p pr 4gt brlt10i gpr 12gt
ilt0gt plt2gt s_time(0)
s_opcode(CMPP_W_NEQ_UN_UN.1) flags(sched)) //bb6
op 15 (ADD_W brlt5i gpr 18gt brlt5i gpr 18gt
ilt1gt plt3gt s_time(0) s_opcode(ADD_W.1)
flags(sched)) //bb5 op 16 (ADD_W brlt4i gpr
17gt brlt4i gpr 17gt ilt1gt plt4gt s_time(0)
s_opcode(ADD_W.0) flags(sched)) op 17
(CMPP_W_NEQ_UN_UN brlt5p pr 5gt brlt6p pr 6gt
brlt9i gpr 4gt ilt0gt plttgt s_time(0)
s_opcode(CMPP_W_NEQ_UN_UN.1) flags(sched)) //bb10
op 18 (ADD_W brlt7i gpr 15gt brlt7i gpr 15gt
ilt1gt plt6gt s_time(0) s_opcode(ADD_W.0)
flags(sched)) //bb11 op 19 (ADD_W brlt8i gpr
16gt brlt8i gpr 16gt ilt1gt plt5gt s_time(0)
s_opcode(ADD_W.1) flags(sched)) //bb12 op 20
(ADD_W brlt1i gpr 2gt brlt1i gpr 2gt ilt1gt plttgt
s_time(0) s_opcode(ADD_W.0) flags(sched)) op
21 (PBRR brlt38b btr 2gt blt3gt ilt1gt plttgt
s_time(0) s_opcode(PBRR.1) attr(lc 183)
flags(sched)) op 22 (CMPP_W_LT_UN_UN
brlt7p pr 7gt ultgt brlt1i gpr 2gt ilt200gt
plttgt s_time(1) s_opcode(CMPP_W_LT_UN_UN.
0) flags(sched)) op 23 (BRCT
brlt38b btr 2gt brlt39p pr 7gt plttgt s_time(2)
s_opcode(BRCT.0) attr(lc 185)
out(op-93(199) op-101(1)) flags(sched))
25
DDG
2
1
20
21
22
3
4
23
5
6
cycle slot1 slot2 slot3 slot4 0 1 2
20 21 1 3 4 22 2 5 6 3 7 8 4 9 10
5 17 11 6 18 19 12 14 7 13 15 16 23
Schedule length 8 Total cycles 200 8
7 3 9 9 1628 cycles
7
8
9
10
17
11
14
19
18
12
13
15
16
26
AFU
cycle slot1 slot2 slot3 slot4 0 1 2
20 21 1 3 4 22 2 5
6 3 7 8 4 9 10 5
17 11 18 19 12 6
14 15 16 23 13
Schedule length 7 Total cycles
200 7 7 3 9 9 1428
cycles
27
Comparison
28
Work Done

Related work
diviya jain's work
Bhuvan Middha's work
Various identification algorithms
Elcor backend
Understood all the passes of elcor backend
Read related material to understand every pass of
elcor
Written program to identify simple if else
Computation blocks
Developed evaluation methodology

29
Future work

Finding Algorithm that considers the VLIW
architecture into consideration
Implementing the evaluation and selection
strategy in elcor
Availability of different cache levels and direct
accessability gives the opportunity to implement
loops in AFUs

Write a Comment

User Comments (0)

About PowerShow.com

Exploring VLIW ASIP Design Space using Trimaran Framework PowerPoint PPT Presentation