Effective Automatic Parallelization of Stencil Computations*

About This Presentation

Title:

Effective Automatic Parallelization of Stencil Computations*

Description:

Effective Automatic Parallelization of Stencil Computations* Sriram Krishnamoorthy1 Muthu Baskaran1, Uday Bondhugula1, Atanas Rountev1, J. Ramanujam2, P. Sadayappan1 – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 18

Provided by: IBMU402

Learn more at: https://web.cse.ohio-state.edu

Category:

more less

Transcript and Presenter's Notes

Title: Effective Automatic Parallelization of Stencil Computations*

1
Effective Automatic Parallelization of Stencil
Computations

Sriram Krishnamoorthy1
Muthu Baskaran1, Uday Bondhugula1, Atanas
Rountev1,
J. Ramanujam2, P. Sadayappan1
1The Ohio State University
2Lousiana State University

Work supported by NSF
2
Introduction

Stencil computations
Sweep through large data set
Multiple time iterations
Simple load balanced schedule
Tiling essential to improve data locality
Dependences between tiles
Pipelined execution
Skewed iteration spaces load imbalance
Solution Adjust tiling re-enable concurrent
execution

3
Motivation
FOR t 0 TO T-1 FOR i 1 TO N-1
At,i(At,i-1At,iAt,i1)/3
t
i
4
Notation

Iteration space B n-dim polyhedron
Dependences D n-dim vectors
Hyperplanes H
n-dim normal vectors
Tile bounded by pairs of hyperplanes

5
Approach

Concurrent start in non-tiled iteration space
Identify hyperplanes inhibiting concurrent start
in tiled space
Replace one face for each inhibiting pair
Overlapped Tiling Replace back-face
Split Tiling Replace front-face

6
Concurrent Start Before Tiling
Condition A boundary that does not carry any
dependence
7
Inter-tile Dependences

Shift vectors
Tile traversal order
Normal to all other hyperplanes
Hyperplane carries dependence
A dependence pokes through
Inter-tile dependence vector
Shift vector
Corresponding hyperplane carries dependence

8
Concurrent Start Inhibition

Concurrent start in original iteration space
along a boundary
But that boundary carries an inter-tile dependence

A boundary has concurrent start
S_j is an inter-tile dependence
That boundary carries Inter-tile dependence
9
Companion Hyperplane

Hyperplane that destroys the inter-tile
dependence
Swivel a hyperplane backward
Dependences carried by original hyperplane are
neutralized
Incoming dependences become non-incoming
Outgoing dependences become non-outgoing

10
Overlapped Tiling

Replace back face with companion hyperplane
Additional region is shared with preceding tile
Region of preceding tile that caused the
dependence
Each new tile independent of preceding tile
(do-all parallelism)
Increased computation cost communication volume

11
Split Tiling

Replace front face with companion hyperplane
Tile split into independent and dependent regions
Execute independent region followed by dependent
region
Increased communications

12
Experimental Evaluation

Cluster
2.8 GHz dual-processor Opteron 254
1MB L2 cache 4GB RAM
Linux 2.6.9 Intel compiler (icc) O3
Comparison
Two pipelined schedules along space and time
1000 time steps
1 32 processors

13
Pipelined Execution Parameters
64000 elements 32 processors
Space tile size 1000 Time tile size 16
14
Performance with Problem Size
15
Weak Scaling

Problem size procs 20000
Horizontal line Linear Scaling

16
Conclusion

Time tiling stencils crucial for data locality
Might inhibit concurrent execution
Presented Two approaches to enabling concurrent
execution
Ongoing work Modeling relative benefits of the
two approaches

17
Thank You!

Write a Comment

User Comments (0)

About PowerShow.com

Effective Automatic Parallelization of Stencil Computations* - PowerPoint PPT Presentation

Effective Automatic Parallelization of Stencil Computations*

Effective Automatic Parallelization of Stencil Computations* Sriram Krishnamoorthy1 Muthu Baskaran1, Uday Bondhugula1, Atanas Rountev1, J. Ramanujam2, P. Sadayappan1 – PowerPoint PPT presentation