Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing

Description:

Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image ... call Computes the function on a different processor All data goes through ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 19
Provided by: Wouter9
Category:

less

Transcript and Presenter's Notes

Title: Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing


1
Skeletons and Asynchronous RPC for Embedded Data-
and Task Parallel Image Processing
IAPR Conference on Machine Vision Applications
Wouter Caarls, Pieter Jonker, Henk Corporaal
Quantitative Imaging Group, department of Imaging
Science Technology
2
Overview
  • Introduction Motivation
  • Approach
  • Algorithmic skeletons
  • Asynchronous RPC
  • Implementation
  • Run-time system
  • Prototype architecture
  • Results
  • Conclusions Future work

3
Introduction
  • SmartCam Integrating efficient user programmable
    image processing hardware within the camera or
    sensor itself.
  • Efficient hardware implies parallelism and
    heterogeneity
  • Efficient programmability implies custom,
    application dependent hardware
  • Finding the best hardware configuration for an
    application requires hardware independent
    software
  • For wide acceptance, programming should be easy

Philips CFT IV Inca 320 x 10-bit SIMD 5-issue
VLIW
4
Our approach
5
Algorithmic Skeletons
  • Separating structure from computation

6
Algorithmic Skeletons
  • Implicit parallel programming
  • Choice of skeleton implies set of constraints
    (dependencies)
  • System is free as long as constraints are not
    violated
  • Distribution
  • Scanning order
  • Consistent library interface facilitates
    between-skeleton dependency analysis
  • No side effects
  • Well-defined inputs and outputs


7
Algorithmic Skeletons
Disadvantages
  • Inability to parallelize algorithms that cannot
    be expressed using one of the skeletons in the
    library
  • Inability to specify certain algorithmic
    optimizations
  • Inability to specify architecture-dependent
    optimizations
  • Solution Allow the programmer to add his own
    (application-specific or architecture-specific)
    skeletons to the library

8
Remote procedure call
  • Just like a function call
  • Computes the function on a different processor
  • All data goes through the calling processor
  • Synchronous stub returns when remote function
    is done data is available immediately
  • Asynchronous stub returns immediately data is
    available later.

9
Futures
Control processor
Coprocessor
Function1
  • Function returns reference to future result
  • Reference can be used in other RPC calls
  • Using the reference outside an RPC call requires
    an (implicit) block.

Function1(a)

Function2
Function2(a)

10
Parallelism through RPC
  • RPC is not intrinsically parallel
  • Synchronous RPC calling parallel function data
    parallelism
  • Asynchronous RPC calling (parallel) function
    task parallelism

11
Optimizing communications
  • Real-time image processing requires vast amounts
    of bandwidth
  • Scatter-gather creates a bottleneck at the
    control processor.
  • Allow peer-to-peer communications between remote
    functions

12
Optimizing memory usage
  • In embedded applications, memory is scarce
  • Normal task parallelism requires a frame store
    per concurrent operation
  • Pipelining

13
Example
Object following
  • / Object following /
  • While (1)
  • GetImage(in)
  • IsoWindowOp(WDW(5), gauss_5x5,
  • in, filtered)
  • IsoPixelOp(binarize, filtered,
  • segmented, 50)
  • x, y, n 0, 0, 0
  • AnisoPixelReductionOp(gravity,
  • add, segmented, x, y, n,
  • 3sizeof(int))
  • block(x, y, n)
  • xx/n yy/n
  • SetMotorSpeed(WIDTH/2-x,
  • HEIGHT/2-y)

GetImage
x, y, n0,0,0
gauss_5x5
binarize
gravity
xx/n yy/n
SetMotorSpeed
14
Run-time system implementation
Function calls Read(a) Process(a, b)
15
Prototype architecture
16
Results
Double thresholding edge detection
Operation Single Split (TM only) Split (TMXTL)
Time (ms) 115 124 67
Relative 100 108 58
17
Conclusion
  • The proposed programming model
  • Is easy to use
  • Skeletons hide data parallel bookkeeping
  • RPC hides task parallel implementation
  • Is architecture independent
  • A skeleton can be implemented for different
    architectures
  • RPC can map to heterogeneous system
  • Is optimized for embedded usage
  • Peer-to-peer communication no scatter/gather
    bottleneck
  • Pipelined no frame stores

18
Future work
  • Skeletons
  • Skeleton Definition Language
  • Skeleton merging
  • Mapping
  • Memory
  • Scalar dependencies
  • Evaluation
  • New prototype architecture
  • Dynamic, complex application
Write a Comment
User Comments (0)
About PowerShow.com