The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

About This Presentation

Title:

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

Description:

parasol.tamu.edu – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 28

Provided by: Francis159

Category:

more less

Transcript and Presenter's Notes

Title: The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

1
The R-LRPD TestSpeculative Parallelization of
Partially Parallel Loops

Francis Dang, Hao Yu, and Lawrence Rauchwerger
Department of Computer Science
Texas AM University

2
Motivation

To maximize performance, extract the maximum
available parallelism from loops.
Static compiler methods may be insufficient.
Access patterns may be too complex.
Required information is only available at
runtime.
Run-time methods needed to extract loop
parallelism
Inspector/Executor
Speculative Parallelization

3
Speculative Parallelization LRPD Test

Main Idea
Execute a loop as a DOALL.
Record memory references during execution.
Check for data dependences.
If there was a dependence, re-execute the loop
sequentially.
Disadvantages
One data dependence can invalidate speculative
parallelization.
Slowdown is proportional to speculative parallel
execution time.
Partial parallelism is not exploited.

4
Partially Parallel Loop Example
do i 1, 8 z AKi ALi z Ci end do K18 1,2,3,1,4,2,1,1 L18 4,5,5,4,3,5,3,3
iter 1 2 3 4 5 6 7 8
A()
1 R R R R
2 R R
3 R W W W
4 W W R
5 W W W
5
The Recursive LRPD

Main Idea
Transform a partially parallel loop into a
sequence of fully parallel, block-scheduled
loops.
Iterations before the first data dependence are
correct and committed.
Re-apply the LRPD test on the remaining
iterations.
Worst case
Sequential time plus testing overhead

6
Algorithm
7
Implementation

Implemented in run-time pass in Polaris and
additional hand-inserted code.
Privatization with copy-in/copy-out for arrays
under test.
Replicated buffers for reductions.
Backup arrays for checkpointing.

8
Recursive LRPD Example
do i 1, 8 z AKi ALi z Ci end do K18 1,2,3,1,4,2,1,1 L18 4,5,5,4,2,5,3,3
9
Heuristics

Work Redistribution
Sliding Window Approach
Data Dependence Graph Extraction

10
Work Redistribution

Redistribute remaining iterations across
processors.
Execution time for each stage will decrease.
Disadvantages
May uncover new dependences across processors.
May incur remote cache misses from data
redistribution.

11
Work Redistribution Example
do i 1, 8 z AKi ALi z Ci end do K18 1,2,3,1,4,2,1,1 L18 4,5,5,4,2,5,3,3
12
Redistribution Model

Redistribution may not always be beneficial.
Stop redistribution if
The cost of data redistribution outweighs the
benefit from work redistribution.
Synthetic loop to model this adaptive method.

13
Redistribution Model
14
Sliding Window R-LRPD

R-LRPD can generate a sequential schedule for
long dependence distributions.
Strip-mine the speculative execution.
Apply the R-LRPD on a contiguous block of
iterations.
Only dependences within the window cause
failures.
Adds more global synchronizations and test
overhead.

15
DDG Extraction

R-LRPD can generate sequential schedules for
complex dependence distributions.
Use the SW R-LRPD scheme to extract the data
dependence graph (DDG).
Generate an optimized schedule from the DDG.
Obtains the DDG for loops from which a proper
inspector cannot be extracted.

16
Performance Issues

Performance issues
Blocked scheduling potential cause for load
imbalance.
Checkpointing can be expensive.
Feedback guided blocked scheduling
Use the timing information from the previous
instantiation (Bull, EuroPar 98)
Estimate the processor chunk sizes for minimal
load imbalance.
On-Demand Checkpointing
Checkpoint only data modified during execution.

17
Experiments

Setup
16 processor HP V-Class
4 GB memory
HP-UX 11.0

18
Experimental Results Input Profiles
19
Experimental Results - TRACK
20
Experimental Results - TRACK
21
Experimental Results - TRACK
22
Experimental Results - TRACK
23
Experimental Results Sliding Window
24
Experimental Results Sliding Window
25
Experimental Results FMA3D
26
Experimental Results SPICE 2G6
27
Conclusion

Contribution
Can speculatively parallelize any loop.
Concern is now optimizing the parallelization and
not when to parallelize.
Future work
Use dependence distribution information for
adaptive redistribution and scheduling.

Write a Comment

User Comments (0)