Software estimation for application specific processors - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Software estimation for application specific processors

Description:

LS 0 0. Instruction class F. Instruction class I. 0 1. I x 0. F 0 0. LS x x ... Trying to incorporate memory models. Slide 19. ESG Seminar 06/11/02 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 30
Provided by: satishpar
Category:

less

Transcript and Presenter's Notes

Title: Software estimation for application specific processors


1
Software estimation for application specific
processors
  • Under the Supervision of
  • Prof. M.Balakrishnan
  • By
  • Satish Parvataneni
  • (2001mcs017)

06th Nov 2002
2
Presentation Outline
  • Introduction
  • Objectives
  • Progress
  • References

3
Introduction
  • Why multiprocessor SoCs ?
  • Why application specific multiprocessors ?
  • Application specific multiprocessor design flow

4
SRIJAN design methodology
5

Presentation Outline
  • Introduction
  • Objectives
  • Progress
  • References

6
Objectives
  • Objectives
  • Defining multiprocessor architecture description
  • Developing a tool to generate a annotated task
    graph with information
  • Execution time estimates
  • Memory traffic estimates
  • Input
  • Application IR
  • Profiled data
  • Architecture description.
  • Output
  • Annotated task graph

7
Presentation Outline
  • Introduction
  • Objectives
  • Progress
  • References

8
Progress
  • Described simple multiprocessor architecture and
    extracted full information
  • Generating a task graph
  • Allowing dynamic thread creation
  • Generating execution time estimates with the help
    of machine SUIF library

9
Architecture Description
  • Describing the architecture using HmDes and
    extracting information using mQS.
  • There are five sections
  • Memory section
  • Processor section
  • Bus section
  • BCU section
  • Main section

10
Generating a task graph(dynamic)
  • Application model is pthreads
  • Task is defined as a piece of sequential code

11
Dynamic thread creation
  • Problems Encountered
  • Thread creation in loops
  • Thread creation in if-else statement
  • Solutions
  • Extracting average profiling information using
    gcov
  • Unrolling loops
  • Pruning the less frequently executed part

12
Estimating execution time of a task
  • Extract target specific CFG of each task
  • Perform register allocation
  • Extract DDG at basic block level
  • Supply the resource model to the scheduler
  • Generating the estimates by using scheduler

13
Extracting target specific CFG
  • Convert the application in C to MACHSUIF IR
  • Convert MACHSUIF IR to C
  • Run the application and collect the profiling
    info
  • Convert the modified application from C to SUIF2
    IR

14
Extracting target specific CFG contd..
  • Annotate the profiling information using gcov
  • Convert the SUIF2 IR to MACHSUIF IR
  • Convert the MACHSUIF IR to SUIF VM IR
  • Generate the code for target architecture
  • Convert the obtained instruction list to CFG

15
Resource model
  • Resources
  • id, fd, mem
  • Vectors
  • id1 i
  • fd f
  • idmem, mem ls

16
Collision matrices for instruction classes
Instr class Pipleline cycle
0 1
I
id F fd
LS idmem mem
Instruction class I
0 1 I x 0 F 0 0 LS x 0
Instruction class LS
Instruction class F
0 1 I x 0 F 0 0 LS x x
0 1 I 0 0 F x 0 LS 0 0
17
Generated Automata
F1
f
x 0 0 0 x 0
F0
i
0 0 0 0 0 0
i
F2
F0 and F4 are Cycle advancing states
f
0 0 x 0 0 0
i
ls
ls
F3
x 0 0 0 x x
F4
0 0 0 0 x 0
f
F5
f
0 0 x 0 x 0
i
18
Scheduling
  • Extract DDG from the target specific CFG
  • List scheduling on Basic blocks
  • Branch Delays were incorporated
  • Trying to incorporate memory models

19
Branch Delays
  • Unconditional branches
  • delay Taken_Delay Execution count of the
    current block
  • Conditional branches
  • Taken_Delay Execution count of the target
    block
  • Not_Taken_Delay Execution Count of the target
    block
  • Delay sum of the above two
  • Delay information is extracted from the processor
    pipeline

20
Memory References
  • Classifying loads and stores
  • Loads and stores involving scalar variables
  • Loads and stores involving array references
  • Scalar References
  • All the scalar variables are stored in
    consecutive memory locations
  • There is only one cache miss corresponding to
    every cache line containing a scalars

21
Scalar References
  • N , no of scalar variables involved in the memory
    access
  • M, no of memory access to the N scalar variables
  • K, no of processor cycles to fetch one line to
    the cache
  • L, cache line size
  • Processor cycles (kl) ceil(N/L) M
    ceil(N/L)

22
Array References
  • Classifying the array references
  • Self-spatial reuse
  • Self-temporal reuse
  • Group-spatial reuse
  • Group temporal reuse

23
Array References contd
  • Self-spatial reuse
  • A reference access same cache line in different
    iterations
  • Self-temporal reuse
  • A reference access same data location in
    different iterations
  • Group-spatial reuse
  • Different references access same cache line in
    different iterations
  • Group-temporal reuse
  • Different references access same data location in
    different iterations

24
Array References contd
  • Self-temporal reuse references are moved outside
    the loop
  • Group the remaining references into equivalence
    classes.
  • Each class exhibit self-spatial and group-spatial
    reuse

25
Example
for(I0IltMI) for(j0jltMj)
aIj aIj aI-1j aI1j
aIj-1 aIj1 bI cjI
  • bI is self-temporal reuse
  • aIj1 is self-spatial reuse
  • aIj and aIj1 group temporal reuse
  • aIj and aIj-1 group spatial reuse

26
Example contd
  • aij, aij-1, aij1
  • ai-1j
  • ai1j
  • 3 memory access in each iteration for A , 3/L
    per j
  • 1 memory access in each iteration for C ie 1 per
    j
  • For B, 1/L off-chip access per each iteration per
    i

27
Presentation Outline
  • Introduction
  • Objectives
  • Progress
  • References

28
References
  • Trimaran mQs functions in md.h (in
    trimaran/impact dir)
  • SUIF2 documentation
  • MACHSUIF documentation
  • Instruction scheduling library for SUIF by Gang
    Chen and Cliff Young Harvard University
  • Efficient instruction scheduling using finite
    state automata by Vasanth Bala and Norman Rubin
  • Local memory exploration and optimization in
    embedded systems by P R Panda, Nikil D.Dutt,
    Alexandru Nicolau

29
Thank You
Write a Comment
User Comments (0)
About PowerShow.com