Software Estimation for Application Specific Multiprocessor SoCs - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Software Estimation for Application Specific Multiprocessor SoCs

Description:

Satish Parvataneni (2001MCS017) Department of Computer Science & Engineering ... Processor cycles = (K L) * ceil(N/L) M ceil(N/L) Slide 25 ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 38
Provided by: phil253
Category:

less

Transcript and Presenter's Notes

Title: Software Estimation for Application Specific Multiprocessor SoCs


1
Software Estimation for Application Specific
Multiprocessor SoCs
  • Under the Supervision of
  • Prof. M.Balakrishnan

2
Presentation Outline
  • Introduction and Motivation
  • Objectives
  • Implementation
  • Results
  • Conclusions Future work
  • References

3
Introduction and Motivation
  • Application specific processors
  • Multiprocessor SoCs
  • SRIJAN flow

4
Why Application Specific Multiprocessors
Compute Intensive Application
Control Part
General Purpose Multiprocessor
Application Specific Multiprocessor
No customization
Customization
Higher Performance
Avg. Performance
5
Role of Processor Customization
  • Allows effective utilization of resources
  • Makes solution cheaper

6
SRIJAN System Level Design Methodology
7
Presentation Outline
  • Introduction and motivation
  • Objectives
  • Implementation
  • Results
  • Conclusions Future work
  • References

8
Objectives
  • Objectives
  • Defining multiprocessor architecture description
  • Developing a tool to generate a task graph and
    annotate with
  • Computation estimates
  • Communication overheads
  • Input
  • Application IR
  • Profiled data
  • Architecture description.
  • Output
  • Annotated task graph

9
Presentation Outline
  • Introduction and motivation
  • Objectives
  • Implementation
  • Results
  • Conclusions Future work
  • References

10
Implementation
  • Defined sections to describe multiprocessor
    architecture
  • Task graph generation
  • Modified MACHSUIF library for estimating
    execution times

11
Architecture Description
  • Describing the architecture using HMDES and
    extracting information using MQes.
  • There are three sections
  • Memory section
  • Processor section
  • Bus section

12
Architecture Description contd
  • Memory section
  • Memory type
  • No of ports
  • Memory size
  • Bus name
  • Processor section
  • Register file information
  • Cache information
  • Instruction set information
  • Pipeline information

13
Architecture Description contd
  • Bus section
  • Protocol information
  • Connectivity information
  • Bit width information
  • BCU information
  • Main section
  • Integrate all the above three sections
  • Extracting details with MQes

14
Task Graph
  • Application model is pthreads
  • Task is defined as a piece of sequential code

15
Task Graph contd
  • Problems encountered
  • Thread creation in loops
  • Thread creation in if-else statement
  • Solutions
  • Unrolling loops
  • Pruning the less frequently executed part with
    the help of profiling information

16
Execution time Estimation
  • Machine SUIF library
  • Extract DDG at basic block level
  • Supply the resource model to the scheduler
  • Generating the estimates by using scheduler

17
MACHSUIF Flow
Application in C
Lower level SUIF
SUIF virtual machine
Target instructions
Target machine Description Resource Model -gt
Target dependent
Control flow graph
Profiling
Register allocation
Scheduler Estimates
--gt
18
Resource Model
Resources a, b, c Vectors a1 i1 b
i2 ac, c i3
19
Collision matrices for instruction classes
20
Generated Automata
F1
b
x 0 0 0 x 0
F0
a
0 0 0 0 0 0
a
F2
F0 and F4 are Cycle advancing states
b
0 0 x 0 0 0
a
c
c
F3
x 0 0 0 x x
F4
Modified Flow
0 0 0 0 x 0
b
F5
b
a
0 0 x 0 x 0
21
Modified Flow
22
Branch Delays
  • Unconditional Branches
  • delay uncond_delay cur_block_profile_info
  • Conditional Branches
  • taken_delay frequency of branch taken
    taken_delay
  • not_taken_delay frequency of branch not taken
    not_taken_delay
  • delay taken_delay not_taken_delay
  • Delay information is extracted from the processor
    pipeline
  • Branch frequency information is obtained from
    gcov profiler

23
Memory References
  • Classifying loads and stores
  • Loads and stores involving scalar variables
  • Loads and stores involving array references
  • Scalar References
  • All the scalar variables are stored in
    consecutive memory locations
  • There is only one cache miss corresponding to
    every cache line containing a scalars

24
Scalar References
  • N , no of scalar variables involved in the memory
    access
  • M, no of memory access to the N scalar variables
  • K, no of processor cycles to fetch one line to
    the cache
  • L, cache line size
  • Processor cycles (KL) ceil(N/L) M
    ceil(N/L)

25
Array References
  • Self-spatial reuse
  • A reference access same cache line in different
    iterations
  • Self-temporal reuse
  • A reference access same data location in
    different iterations
  • Group-spatial reuse
  • Different references access same cache line in
    different iterations
  • Group-temporal reuse
  • Different references access same data location in
    different iterations

26
Array References contd
  • Self-temporal reuse references are moved outside
    the loop
  • Group the remaining references into equivalence
    classes.
  • Each class exhibit self-spatial and group-spatial
    reuse
  • Calculate effective accesses per iteration

27
Example
for(i0iltMi) for(j0jltMj) aij
aij ai-1j ai1j aij-1
aij1 bi cji
  • bi is self-temporal reuse
  • aij1 is self-spatial reuse
  • aij and aij1 group temporal reuse
  • aij and aij-1 group spatial reuse

28
Example contd
  • aij, aij-1, aij1
  • ai-1j
  • ai1j
  • 3 memory access in each iteration for A , 3/L
    per j
  • 1 memory access in each iteration for C ie 1 per
    j
  • For B, 1/L off-chip access per each iteration per
    i

29
Presentation Outline
  • Introduction and motivation
  • Objectives
  • Implementation
  • Results
  • Conclusions Future work
  • References

30
Tsim vs Our Estimates
Percentage of Error 16, 12, 18
31
Presentation Outline
  • Introduction and motivation
  • Objectives
  • Implementation
  • Results
  • Conclusions Future work
  • References

32
Conclusions Contributions
  • Facilitated system level architecture description
    for SRIJAN
  • Task graph formulation
  • Execution time estimates
  • List scheduler
  • Branch delays
  • Memory
  • Leon target library
  • Architectural exploration
  • Instruction latencies
  • Number of FUs
  • Memory organizations
  • Register file organizations

33
Future Work
  • Task Graph Formulation
  • Synchronization overheads
  • Improving Leon Library
  • Extracting latency information from HMDES

34
Presentation Outline
  • Introduction and motivation
  • Objectives
  • Implementation
  • Results
  • Conclusions Future work
  • References

35
References
  • SRIJAN
  • Trimaran mQs functions in md.h (in
    trimaran/impact dir)
  • SUIF2 documentation
  • MACHSUIF documentation
  • Instruction scheduling library for SUIF by Gang
    Chen and Cliff Young, Harvard University
  • Efficient instruction scheduling using finite
    state automata by Vasanth Bala and Norman Rubin
  • Local memory exploration and optimization in
    embedded systems by P R Panda, Nikil D.Dutt,
    Alexandru Nicolau
  • M.J. Flynn, "Computer Architecture Pipelined
    and Parallel Processor Design", Narosa Publishing
    House, 1996.

36
Acknowledgements
  • Main ideas, motivation and support
  • Prof. M. Balakrishnan and Prof. Anshul Kumar
  • Helpful discussions, Debugging
  • Basant Kr Diwedi
  • Manoj Kr Jain, Anup Gangwar
  • Other members of ESG
  • MACHSUIF Libraries
  • Glenn Holloway

37
Thank You
Write a Comment
User Comments (0)
About PowerShow.com