Tile Reduction: the first step towards tile aware parallelization in OpenMP - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Tile Reduction: the first step towards tile aware parallelization in OpenMP

Description:

Tiling can increase parallelism and reduce synchronization in parallel programs ... Parallelism is trivial. Data locality is not bad. Not natural and intuitive. 6 ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 20
Provided by: GailArm8
Category:

less

Transcript and Presenter's Notes

Title: Tile Reduction: the first step towards tile aware parallelization in OpenMP


1
Tile Reduction the first step towards tile aware
parallelization in OpenMP
  • Ge Gan
  • Department of Electrical and Computer Engineering
  • Univ. of Delaware

2
Overview
  • Background
  • Motivation
  • A new idea Tile Reduction
  • Experimental Results
  • Conclusion
  • Related Work
  • Future Work

3
Tile/Tiling
  • Natural representation of data objects that are
    heavily used in scientific algorithms
  • Tiling improves data locality
  • Tiling can increase parallelism and reduce
    synchronization in parallel programs
  • It is an effective compiler optimizing technique
  • Essentially a program design paradigm
  • Supported in many parallel programming languages
    ZPL, CAF, HTA, etc.

4
OpenMP
  • OpenMP is the de facto standard for shared-memory
    parallel programming
  • Provides a simple and flexible interface for
    developing portable and scalable parallel
    application
  • Support incremental parallelization
  • Maintain sequential consistency
  • tile oblivious, no directive or clause can be
    used to annotate data tile and carry such
    information to compiler

5
A Motivating Example
6
Parallelizing the traditional way(1)
7
Parallelizing the traditional way(2)
  • Can only leverage the traditional scalar
    reduction in OpenMP
  • Parallelism is trivial
  • Data locality is not bad
  • Not natural and intuitive

8
The Expected Parallelization
  • View the inner most two loops as a macro
    operation performing on the 2x2 data tiles
  • Aggregate the data tiles in parallel
  • More parallelism
  • Better data locality

9
Tile Reduction Interface
10
Terms
  • Reduction Tile the data tile under reduction
  • Tile descriptor the multi-dimensional array in
    the list construct
  • Reduction kernel loops the loops involved in
    performing one recursive calculation
  • Tile name
  • Dimension descriptor the tuples following the
    tile name

11
A Use Case
Tiled Matrix Multiplication
Tile Reduction Applied on the Tiled Matrix
Multiplication Code
12
Code Generation (1)
  • Distribute the iterations of the parallelized
    loop among the threads
  • Allocate memory for the private copy of the tile
    used in the local recursive calculation
  • Perform the local recursive calculation which is
    specified by the reduction kernel loops
  • Update the global copy of the reduction tile

13
Code Generation (2)
14
Experimental Results (1)
2D Histogram Reduction
15
Experimental Results (2)
Matrix-Matrix Multiplication
16
Experimental Results (3)
Matrix-Vector Multiplication
17
Conclusions
  • As one of the building block of the tile aware
    parallelization theory, tile reduction brings
    more opportunities to parallelize dense matrix
    applications
  • For some benchmarks, tile reduction is a more
    natural and intuitive way to reason about the
    best parallelization decision
  • For some benchmarks, tile reduction not only can
    improve data locality, but also can expose more
    parallelism
  • Amiable to programmers
  • Code generation is as simple as the scalar
    reduction in the current OpenMP
  • Runtime overhead is trivial

18
Similar Works
  • Parallel reduction is supported in
  • C Viswanathan, G., Larus, J.R. User-defined
    reductions for efficient communication in
    data-parallel languages. Technical Report 1293,
    University of Wisconsin-Madison (Jan 1996)
  • SAC Scholz, S.B. On defining application-specifi
    c high-level array operations by means of shape
    invariant programming facilities. In APL 98
    Proceedings of the APL98 conference on Array
    processing language, New York, NY, USA, ACM
    (1998) 3238
  • ZPL Deitz, S.J., Chamberlain, B.L., Snyder, L.
    High-level language support for user-defined
    reductions. J. Supercomput. 23(1) (2002) 2337
  • UPC Consortium UPC Collective Operations
    Specifications V1.0 A publication of the UPC
    Consortium (2003)
  • Forum, M.P.I. MPI A message-passing interface
    standard (version 1.0). Technical report (May
    1994) URL http//www.mcs.anl.gov/mpi/mpi-report.ps
    .
  • Kambadur, P., Gregor, D., Lumsdaine, A. Openmp
    extensions for generic libraries. In Lecture
    Notes in Computer Science OpenMP in a New Era of
    Parallelism, IWOMP08, International Workshop on
    OpenMP. Volume 5004/2008., Springer Berlin /
    Heidelberg (2008) 123133

19
Future Works
  • Design and develop OpenMP pragma directives that
    can be used to help compiler to generate
    efficient data movement code for parallel
    applications running on many-core platforms with
    highly non-uniform memory system, like the
    Cyclops-64 processor
Write a Comment
User Comments (0)
About PowerShow.com