Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP : JPEG Case Study - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP : JPEG Case Study

Description:

Multiprocessor System on Chip (MPSoC) with ASIPs as building blocks ... Related Work ... Average Latency of critical processor. Runtime Calculation. Processor ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:5.0/5.0
Slides: 24
Provided by: hsienhs
Category:

less

Transcript and Presenter's Notes

Title: Synthesis of Heterogeneous Pipelined Multiprocessor Systems Using ILP : JPEG Case Study


1
Synthesis of Heterogeneous Pipelined
Multiprocessor Systems Using ILP JPEG Case Study
  • Haris Javaid UNSW, Australia
  • Sri Parameswaran UNSW, Australia

2
Overview
  • Introduction
  • Motivation
  • Aims
  • Case Study
  • Experimental Setup
  • Results
  • Conclusion Future Work

3
Introduction
  • Transition in embedded devices single processor
    system to heterogeneous multiprocessor system
  • Achieve high performance gains
  • Minimize area and power consumption

Single Processor
Homogeneous multiprocessor
Heterogeneous multiprocessor
4
Introduction - ASIPs
  • Application Specific Instruction Set Processors
  • Instruction set and underlying architecture
    configured for a specific application
  • Extensible processors consists of base processor
    and optional instructions

32 KB
  • Base Instructions
  • 1KB instruction cache size
  • 1KB data cache size
  • No additional instructions
  • Base Instructions
  • 2KB instruction cache size
  • 4KB data cache size
  • 10 additional instructions
  • Base Instructions
  • 16KB instruction cache size
  • 32KB data cache size
  • 25 additional instructions

8 KB
4 KB
8 KB
Heterogeneous Multiprocessor System using ASIPs
5
Motivation
  • Single processors can not attain high performance
  • By using higher clock speeds
  • Instruction level parallelism
  • Billions of transistors available
  • Task level parallelism can be exploited
  • Multiprocessor System on Chip (MPSoC) with ASIPs
    as building blocks
  • Task level parallelism (coarse-grained)
  • Instruction level parallelism (fine-grained)

6
Related Work
  • S.L. Shee et al. Design Methodology for
    Pipelined Heterogeneous Multiprocessor System in
    DAC 2007
  • Pipelined Multiprocessor System using ASIPs
  • Heuristic to rapidly search the design space
  • Minimized runtime x area of the system
  • F. Sun et al. Synthesis of application-specific
    heterogeneous multiprocessor architectures using
    extensible processors in VLSID 2005
  • Heuristic to simultaneously explore application
    partitioning and custom instructions for ASIPs
  • Minimize runtime within an area budget

7
Why Pipelined Multiprocessor Systems?
  • S. L. Shee et al. Heterogeneous Multiprocessor
    Implementations for JPEG A Case Study in
    CODESISSS 2006
  • Pipelined Configuration is better for streaming
    applications

8
Aims
  • To implement an application as a heterogeneous
    pipelined multiprocessor system
  • Application JPEG Compression Algorithm
  • Six processor pipelined system
  • Minimize system area while runtime constraint is
    satisfied
  • Explore design space consisting of different
    configurations for each processor in the system
  • Additional instructions
  • Differing Instruction and Data cache sizes
  • Speed up the exploration process to target large
    design spaces

9
Case Study JPEG Encoder
  • Tasks 1-8
  • JPEG encoder kernel
  • Processes macro blocks one by one
  • Tasks 9-11
  • Initialise Quantization Tables
  • Finalization functions

10
Case Study JPEG Encoder
Six Processor pipeline Implementation of JPEG
encoder
11
Runtime Calculation
Latency (cycles)
1000
2000
1300
900
1000
1
1
1
1
1
2
2
3
4
3
2
2
3
4
5
6
5
4
3
2
7
6
5
4
3
Macro Block
Raw Image 256x128
Macro Block 1
Macro Block 2
Macro Block 3
512 Macro blocks
12
Runtime Calculation
First Macro block processing time
Average Latency of critical processor
13
Processor Configurations
Configuration1
Extended Instructions
Program Executable
Configuration2
Extended Instructions
Tensilicas XPRES technology
Overhead Granularity
Configuration3
Extended Instructions
Configuration4
Extended Instructions
Configuration5
Base Processor
Extended Instructions
14
Pipelined Multiprocessor System
Runtime Constraint Satisfied
Minimum Area
15
Design Space Exploration
  • Formulated the problem of mapping processes of an
    application on to processor configurations as a
    0-1 ILP problem
  • Objective
  • Minimize area of overall system
  • Constraints
  • Only one configuration for each processor can be
    selected
  • Amongst the selected configurations, one
    processor configuration is considered critical in
    the runtime calculation
  • System Runtime lt Runtime constraint by designer

16
Design Space Pruning
  • Runtime constraint imposed by the designer
  • Some configurations of a processor cannot be part
    of the optimal design
  • Only removes the inferior processor
    configurations
  • Three different times are defined

17
Design Space Pruning
Processor 0 selected
Pruned Design Space
Max(min latencies)
Critical Processor
Min. Latency
  • Min. Processing
  • Time

Min. Critical Processing Time
18
Experimental Setup
  • Tensilica C/C Compilation tools
  • ISS and XTMP
  • Instruction Set Simulator
  • Multiprocessor environment
  • Queues are used to connect processors
  • XPRES and TIE Compiler used to create tailored
    processors
  • Lp_solve is used as the 0-1 ILP Solver

19
Results JPEG Encoder
  • Configurations include additional instructions
    and differing instruction and data cache sizes
  • Design Space 4.2 x 1013 design points

20
Results JPEG Encoder
  • Time Comparison of ILP Solver

21
Results JPEG Encoder
  • Pseudo Pareto optimal points of the design space

22
Conclusion
  • Formulated mapping of an application onto ASIP
    configurations in a pipelined multiprocessor
    system as a 0-1 ILP problem
  • Presented a novel design space pruning algorithm
    to reduce the complexity of ILP problem
  • Targeted a design space of 4.2 x 1013 points,
    obtaining each of the pseudo Pareto optimal
    designs in less than 100 seconds.
  • Future Work Design heuristics to search design
    space faster, comparing with ILP solutions

23
THANK YOU
  • QUESTIONS ??
Write a Comment
User Comments (0)
About PowerShow.com