Hybrid Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Hybrid Analysis

Description:

Hybrid Analysis (relative defn) Compile-time overhead. Run-time overhead ... 4-step process for Hybrid Analysis. Collect references as Expression tree. ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 44
Provided by: TVP3
Category:
Tags: analysis | hybrid

less

Transcript and Presenter's Notes

Title: Hybrid Analysis


1
Hybrid Analysis
for Loop-level Parallelization
Kiran Kumar vkirankr_at_iitk.ac.in
2
Outline
  • Core of Hybrid Analysis.
  • Terminology Revision.
  • Motivating Example.
  • Working of Hybrid Analysis.
  • Execution of Hybrid Analyzed Program.
  • Experimental Results.

3
Hybrid Analysis (relative defn)
1. Core of Hybrid Analysis.
Conservative Compiler Analysis
Run-time overhead
Hybrid Analysis
Compile-time overhead
4
Aggressiveness of Hybrid Analysis
1. Core of Hybrid Analysis.
Any cross-iteration dependency?
Dependence distance is not considered
Conservative Compiler Analysis
No
Yes
May be
Generate Sequential Code
Generate Parallel Code
May be
Any cross-iteration dependency?
Yes
No
Hybrid Analysis
5
Loop Parallelization
2. Terminology Revision.
  • Loops are major source for parallelization.
  • One loop iteration can be one thread.
  • Thread granularity - Number of iterations in a
    thread.

Thread Dependence
Thread Execution
Sequential Loop
Thread Extraction
T1
T2
T1
T2
for(i1 ilt3 i) ..
T3
T3
6
Types of Data Dependence
2. Terminology Revision.
X .. .. X
.. X X ..
X .. X ..
Flow
Anti
Output
Privatization
Cross-iteration Dependency
DO j1, 50 a(j) a(j40) ENDDO
7
Compile-time Analysis
3. Motivating Example.
Source http//parasol.tamu.edu/
8
Weakness of Compile-time Analysis
3. Motivating Example.
Source http//parasol.tamu.edu/
9
Run-time Analysis LRPD Test
3. Motivating Example.
Speculative Parallelism
Time Complexity O(n)
Source http//parasol.tamu.edu/
10
Hybrid Analysis
3. Motivating Example.
Time Complexity O(1)
Source http//parasol.tamu.edu/
11
Hybrid Analysis
3. Motivating Example.
Source http//parasol.tamu.edu/
12
4-step process for Hybrid Analysis
4. Working Procedure of Hybrid Analysis.
  1. Collect references as Expression tree.
  2. Aggregate references symbolically.
  3. Formulate independence test.
  4. Extract lowest-cost runtime test.

13
Sample tracing of Hybrid Analysis
4. Working Procedure of Hybrid Analysis.
Source http//parasol.tamu.edu/
14
Expression Tree Grammar
4. Working Procedure of Hybrid Analysis.
1. Collect references as Expression tree.
USR RT_LMAD
Sample Trace of Expr Tree
15
Array symbolic Aggregation
4. Working Procedure of Hybrid Analysis.
2. Aggregate references symbolically.
X
41 40n
j 1 to n
j 40
  1. Triplet Notation.
  2. LMAD Notation.

X(lb1 ub1 incr1, lb2 ub2 incr2. .. ,)
for(i1 iltn i) for(j1 jltm j)
.. Aij
LMAD Notation
Triplet Notation.
16
Independence Test
4. Working Procedure of Hybrid Analysis.
3. Formulate independence test.
  • DS Dependence Set (All loop carried
    dependences)
  • DS DS - Dependency elimination by applying
    transformations (privatization, reduction, etc)
  • Case (DS Ø) Of
  • True Generate Parallel Loop
  • False Generate Sequential Loop
  • Maybe ( not sure at compile time )
  • Extract condition P (DS Ø)
  • Generate Parallel Loop guarded by P

17
Independence Test
4. Working Procedure of Hybrid Analysis.
3. Formulate independence test.
  • R set all addresses that are read in an
    iteration.
  • W set all addresses that are write in an
    iteration.

?
Wi
Ri
?
n
DS
i1..n
i1..n
DS
DS ? (
? (
? (
Wi
n
Wj ))
i1..n
j1..i
  • RO, WF, RW sets based dependence
    Pre-elimination
  • R, W sets based dependence Post-elimination

18
RO, WF, RW sets based Test
4. Working Procedure of Hybrid Analysis.
3. Formulate independence test.
  • RO set all memory locations only read (not
    written).
  • WF set all memory locations that are written
    first and then possibly read and written.
  • RW set all memory locations that are read first
    and written later.

ROj ))
? (
? (
DS
WFi
n
i1..n
j1..i
RWj ))
? (
? (
DS DS
? (
WFi
n
i1..n
j1..i
DS
DS ? (
RWj ))
? (
? (
RWi
n
i1..n
j1..i
Memory Set Aggregation
19
Proof System
4. Working Procedure of Hybrid Analysis.
4. Extract lowest-cost runtime test.
Source http//parasol.tamu.edu/
20
PDAG extraction from RT_LMAD
4. Working Procedure of Hybrid Analysis.
4. Extract lowest-cost runtime test.
Sample Trace of PDAG extraction
21
Pattern based Analysis
4. Working Procedure of Hybrid Analysis.
4. Extract lowest-cost runtime test.
Empty?
n
n lt 40
1 n
41 40n
  • Programmer sorts the tests in ascending order of
    complexity.
  • Programmer defines a set of code-patterns for
    each test.
  • Compiler checks for patterns in given program and
    generates min cost condition.

22
Lowest-cost runtime test
5. Execution of Hybrid Analyzed Program.
Scalar value condition
Min. Cost Runtime test
Unsatisfied
Satisfied
Greater Cost Runtime test
Vector Inspection
Unsatisfied
Satisfied
LRPD test
Max Cost Runtime test
Satisfied
Unsatisfied
Speculative Parallelism
Execute Parallel version
23
Experimental Results
  • Code Coverage.
  • Speedup for ADM.
  • Speedup for DYFESM.
  • Speedup for MDG.
  • Speedup for TRACK.
  • Speedup w.r.t Multi-cores.

24
References
  • Paper - Hybrid Analysis Static Dynamic Memory
    Reference Analysis.
  • Paper - Hybrid Dependence Analysis for Automatic
    Parallelization.
  • PhD Thesis Inter-procedural Parallelization
    Using Memory Classification Analysis.
  • Lectures - http//web.cse.iitk.ac.in/cs738/
  • Textbook - Advanced Compiler Design
    Implementation.
  • Textbook - Compilers Principles, Techniques and
    Tools.

Ref
25
Thank you
Questions
26
Privatization Technique
  • This technique removes output and anti
    dependencies.

A expr1 .. A . A expr2 .. A
t1 expr1 .. t1 . t2 expr2 .. t2 A
t2
Privatization
Go Back
27
Speculative Parallelism
  • All threads read current value or speculate
    required value.
  • All threads execute with values available to
    them.
  • Threads commit in sequential order.
  • Before commit their values each threads checks
    the value in memory with speculated value.
  • If speculation is correct then thread commits
    else it rollbacks.

for(i1 ilt3 i)
T1
T2
T3
Core 1
Core 2
Core 3
T1
T2
T3
1
A
A
1
6
A
2
5
8
6
Speculated value
1
2
6
8
A
Go Back
Main Memory
28
  • Dependence Set Ø condition is necessary because
    we consider in this paper only DOALL
    parallelization (no synchronizations).
  • Ref Page 7 of Report 2

Go Back
29
  • We have changed its name from RT LMAD (run-time
    lmad) to USR because it is used mainly as an
    intermediate representation subject to our
    predicate extraction analysis.
  • Ref Page 5 of Report 2

Go Back
30
Sample Tracing of Expr Tree
  • aexpr
  • Loop(j 1 to 10)
  • If(a gt 5)
  • B(j5) ..
  • End If
  • End Loop

?
a
X

j 1 to 10
j5
a gt 5
Go Back
31
Triplet Notation
Precise when dim. indexing is independent
  • for(i0 ilt2 i)
  • for(j0 jlt3 j)
  • aij ..

A(0 2 1, 0 3 1)
Imprecise when dim. indexing is dependent
for(i0 ilt2 i) for(j0 jlt3 j)
aiij ..
A(0 2 1, 3 1)
32
LMAD Notation
Address computation i10 (ji)
11ij Eliminate j variable Stride for j
11i(j1) (11ij) 1 Span for j 11i3
(11i0) 1 4 Offset for j 11i0
11i Eliminate i variable Stride for i
11(i1) 11i 11 Span for i 112 110 1
23 Offset for i 110 0
for(i0 ilt2 i) for(j0 jlt3 j)
aiij ..
Let size of A be A310 LMAD A(1, 4, 11,
23)
Go Back
33
Memory set aggregation
RO, WF, RW
Section 1
RO1, WF1, RW1
Section 1
RO2, WF2, RW2
Go Back
34
Sample Trace of PDAG extraction
?
n
Empty?
Empty?
n
n
?
1 n


41 40n
1 n
41 40n
1 n
x gt 0
21 20n
x gt 0
21 20n
?
?
?
n lt 40
(nlt20 ? xgt0) ? nlt40
?

n lt 40
x gt 0
Go Back
n lt 20
x gt 0
21 20n
41 40n
35
Vector inspection example
  • Main()
  • Read C(1N), L, lim
  • Do i1 to L
  • If C(I) lt lim then
  • C(I) ..
  • End If
  • .. C(I)
  • End Do

Go Back
36
LRPD Test Example
  • Main()
  • Read C(1N), A(1N), L
  • Do i1 to L
  • C(A(I)) ..
  • .. C(I)
  • End Do

Go Back
37
Speculative Execution example
  • Main()
  • Read C(1N), A(1N), L
  • Do i1 to L
  • A(i-1)
  • C(A(I)) ..
  • .. C(I)
  • End Do

Go Back
38
Code Coverage
Source http//parasol.tamu.edu/
39
Speedup for ADM Benchmark
Source http//parasol.tamu.edu/
40
Speedup for DYFESM Benchmark
Source http//parasol.tamu.edu/
41
Speedup for MDG
Source http//parasol.tamu.edu/
42
Speedup for TRACK
Go Back
Source http//parasol.tamu.edu/
43
Advanced Compiler Architecture
  • Common Subexpr elim.
  • Copy propagation
  • Dead code elimination
  • Code motion
  • Strength reduction
  • Constant folding
  • GCD test
  • Range test

Go Back
Write a Comment
User Comments (0)
About PowerShow.com