Software estimation for application specific processors - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Software estimation for application specific processors

Description:

LS 0 0. Instruction class F. Instruction class I. 0 1. I x 0. F 0 0. LS x x ... Trying to incorporate memory models. Slide 19. ESG Seminar 06/11/02 ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 30

Provided by: satishpar

Category:

more less

Transcript and Presenter's Notes

Title: Software estimation for application specific processors

1
Software estimation for application specific
processors

Under the Supervision of
Prof. M.Balakrishnan
By
Satish Parvataneni
(2001mcs017)

06th Nov 2002
2
Presentation Outline

Introduction
Objectives
Progress
References

3
Introduction

Why multiprocessor SoCs ?
Why application specific multiprocessors ?
Application specific multiprocessor design flow

4
SRIJAN design methodology
5

Presentation Outline

Introduction
Objectives
Progress
References

6
Objectives

Objectives
Defining multiprocessor architecture description
Developing a tool to generate a annotated task
graph with information
Execution time estimates
Memory traffic estimates
Input
Application IR
Profiled data
Architecture description.
Output
Annotated task graph

7
Presentation Outline

Introduction
Objectives
Progress
References

8
Progress

Described simple multiprocessor architecture and
extracted full information
Generating a task graph
Allowing dynamic thread creation
Generating execution time estimates with the help
of machine SUIF library

9
Architecture Description

Describing the architecture using HmDes and
extracting information using mQS.
There are five sections
Memory section
Processor section
Bus section
BCU section
Main section

10
Generating a task graph(dynamic)

Application model is pthreads
Task is defined as a piece of sequential code

11
Dynamic thread creation

Problems Encountered
Thread creation in loops
Thread creation in if-else statement
Solutions
Extracting average profiling information using
gcov
Unrolling loops
Pruning the less frequently executed part

12
Estimating execution time of a task

Extract target specific CFG of each task
Perform register allocation
Extract DDG at basic block level
Supply the resource model to the scheduler
Generating the estimates by using scheduler

13
Extracting target specific CFG

Convert the application in C to MACHSUIF IR
Convert MACHSUIF IR to C
Run the application and collect the profiling
info
Convert the modified application from C to SUIF2
IR

14
Extracting target specific CFG contd..

Annotate the profiling information using gcov
Convert the SUIF2 IR to MACHSUIF IR
Convert the MACHSUIF IR to SUIF VM IR
Generate the code for target architecture
Convert the obtained instruction list to CFG

15
Resource model

Resources
id, fd, mem
Vectors
id1 i
fd f
idmem, mem ls

16
Collision matrices for instruction classes
Instr class Pipleline cycle
0 1
I
id F fd
LS idmem mem
Instruction class I
0 1 I x 0 F 0 0 LS x 0
Instruction class LS
Instruction class F
0 1 I x 0 F 0 0 LS x x
0 1 I 0 0 F x 0 LS 0 0
17
Generated Automata
F1
f
x 0 0 0 x 0
F0
i
0 0 0 0 0 0
i
F2
F0 and F4 are Cycle advancing states
f
0 0 x 0 0 0
i
ls
ls
F3
x 0 0 0 x x
F4
0 0 0 0 x 0
f
F5
f
0 0 x 0 x 0
i
18
Scheduling

Extract DDG from the target specific CFG
List scheduling on Basic blocks
Branch Delays were incorporated
Trying to incorporate memory models

19
Branch Delays

Unconditional branches
delay Taken_Delay Execution count of the
current block
Conditional branches
Taken_Delay Execution count of the target
block
Not_Taken_Delay Execution Count of the target
block
Delay sum of the above two
Delay information is extracted from the processor
pipeline

20
Memory References

Classifying loads and stores
Loads and stores involving scalar variables
Loads and stores involving array references
Scalar References
All the scalar variables are stored in
consecutive memory locations
There is only one cache miss corresponding to
every cache line containing a scalars

21
Scalar References

N , no of scalar variables involved in the memory
access
M, no of memory access to the N scalar variables
K, no of processor cycles to fetch one line to
the cache
L, cache line size
Processor cycles (kl) ceil(N/L) M
ceil(N/L)

22
Array References

Classifying the array references
Self-spatial reuse
Self-temporal reuse
Group-spatial reuse
Group temporal reuse

23
Array References contd

Self-spatial reuse
A reference access same cache line in different
iterations
Self-temporal reuse
A reference access same data location in
different iterations
Group-spatial reuse
Different references access same cache line in
different iterations
Group-temporal reuse
Different references access same data location in
different iterations

24
Array References contd

Self-temporal reuse references are moved outside
the loop
Group the remaining references into equivalence
classes.
Each class exhibit self-spatial and group-spatial
reuse

25
Example
for(I0IltMI) for(j0jltMj)
aIj aIj aI-1j aI1j
aIj-1 aIj1 bI cjI

bI is self-temporal reuse
aIj1 is self-spatial reuse
aIj and aIj1 group temporal reuse
aIj and aIj-1 group spatial reuse

26
Example contd

aij, aij-1, aij1
ai-1j
ai1j

3 memory access in each iteration for A , 3/L
per j
1 memory access in each iteration for C ie 1 per
j
For B, 1/L off-chip access per each iteration per
i

27
Presentation Outline

Introduction
Objectives
Progress
References

28
References

Trimaran mQs functions in md.h (in
trimaran/impact dir)
SUIF2 documentation
MACHSUIF documentation
Instruction scheduling library for SUIF by Gang
Chen and Cliff Young Harvard University
Efficient instruction scheduling using finite
state automata by Vasanth Bala and Norman Rubin
Local memory exploration and optimization in
embedded systems by P R Panda, Nikil D.Dutt,
Alexandru Nicolau

29
Thank You

Write a Comment

User Comments (0)