Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP : JPEG Case Study - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP : JPEG Case Study

Description:

Multiprocessor System on Chip (MPSoC) with ASIPs as building blocks ... Related Work ... Average Latency of critical processor. Runtime Calculation. Processor ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:5.0/5.0

Slides: 24

Provided by: hsienhs

Category:

more less

Transcript and Presenter's Notes

Title: Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP : JPEG Case Study

1
Synthesis of Heterogeneous Pipelined
Multiprocessor Systems Using ILP JPEG Case Study

Haris Javaid UNSW, Australia
Sri Parameswaran UNSW, Australia

2
Overview

Introduction
Motivation
Aims
Case Study
Experimental Setup
Results
Conclusion Future Work

3
Introduction

Transition in embedded devices single processor
system to heterogeneous multiprocessor system
Achieve high performance gains
Minimize area and power consumption

Single Processor
Homogeneous multiprocessor
Heterogeneous multiprocessor
4
Introduction - ASIPs

Application Specific Instruction Set Processors
Instruction set and underlying architecture
configured for a specific application
Extensible processors consists of base processor
and optional instructions

32 KB

Base Instructions
1KB instruction cache size
1KB data cache size
No additional instructions

Base Instructions
2KB instruction cache size
4KB data cache size
10 additional instructions

Base Instructions
16KB instruction cache size
32KB data cache size
25 additional instructions

8 KB
4 KB
8 KB
Heterogeneous Multiprocessor System using ASIPs
5
Motivation

Single processors can not attain high performance
By using higher clock speeds
Instruction level parallelism
Billions of transistors available
Task level parallelism can be exploited
Multiprocessor System on Chip (MPSoC) with ASIPs
as building blocks
Task level parallelism (coarse-grained)
Instruction level parallelism (fine-grained)

6
Related Work

S.L. Shee et al. Design Methodology for
Pipelined Heterogeneous Multiprocessor System in
DAC 2007
Pipelined Multiprocessor System using ASIPs
Heuristic to rapidly search the design space
Minimized runtime x area of the system
F. Sun et al. Synthesis of application-specific
heterogeneous multiprocessor architectures using
extensible processors in VLSID 2005
Heuristic to simultaneously explore application
partitioning and custom instructions for ASIPs
Minimize runtime within an area budget

7
Why Pipelined Multiprocessor Systems?

S. L. Shee et al. Heterogeneous Multiprocessor
Implementations for JPEG A Case Study in
CODESISSS 2006
Pipelined Configuration is better for streaming
applications

8
Aims

To implement an application as a heterogeneous
pipelined multiprocessor system
Application JPEG Compression Algorithm
Six processor pipelined system
Minimize system area while runtime constraint is
satisfied
Explore design space consisting of different
configurations for each processor in the system
Additional instructions
Differing Instruction and Data cache sizes
Speed up the exploration process to target large
design spaces

9
Case Study JPEG Encoder

Tasks 1-8
JPEG encoder kernel
Processes macro blocks one by one
Tasks 9-11
Initialise Quantization Tables
Finalization functions

10
Case Study JPEG Encoder
Six Processor pipeline Implementation of JPEG
encoder
11
Runtime Calculation
Latency (cycles)
1000
2000
1300
900
1000
1
1
1
1
1
2
2
3
4
3
2
2
3
4
5
6
5
4
3
2
7
6
5
4
3
Macro Block
Raw Image 256x128
Macro Block 1
Macro Block 2
Macro Block 3
512 Macro blocks
12
Runtime Calculation
First Macro block processing time
Average Latency of critical processor
13
Processor Configurations
Configuration1
Extended Instructions
Program Executable
Configuration2
Extended Instructions
Tensilicas XPRES technology
Overhead Granularity
Configuration3
Extended Instructions
Configuration4
Extended Instructions
Configuration5
Base Processor
Extended Instructions
14
Pipelined Multiprocessor System
Runtime Constraint Satisfied
Minimum Area
15
Design Space Exploration

Formulated the problem of mapping processes of an
application on to processor configurations as a
0-1 ILP problem
Objective
Minimize area of overall system
Constraints
Only one configuration for each processor can be
selected
Amongst the selected configurations, one
processor configuration is considered critical in
the runtime calculation
System Runtime lt Runtime constraint by designer

16
Design Space Pruning

Runtime constraint imposed by the designer
Some configurations of a processor cannot be part
of the optimal design
Only removes the inferior processor
configurations
Three different times are defined

17
Design Space Pruning
Processor 0 selected
Pruned Design Space
Max(min latencies)
Critical Processor
Min. Latency

Min. Processing
Time

Min. Critical Processing Time
18
Experimental Setup

Tensilica C/C Compilation tools
ISS and XTMP
Instruction Set Simulator
Multiprocessor environment
Queues are used to connect processors
XPRES and TIE Compiler used to create tailored
processors
Lp_solve is used as the 0-1 ILP Solver

19
Results JPEG Encoder

Configurations include additional instructions
and differing instruction and data cache sizes
Design Space 4.2 x 1013 design points

20
Results JPEG Encoder

Time Comparison of ILP Solver

21
Results JPEG Encoder

Pseudo Pareto optimal points of the design space

22
Conclusion

Formulated mapping of an application onto ASIP
configurations in a pipelined multiprocessor
system as a 0-1 ILP problem
Presented a novel design space pruning algorithm
to reduce the complexity of ILP problem
Targeted a design space of 4.2 x 1013 points,
obtaining each of the pseudo Pareto optimal
designs in less than 100 seconds.
Future Work Design heuristics to search design
space faster, comparing with ILP solutions

23
THANK YOU