Title: Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture
1Commutativity Analysis for Software
Parallelization Letting Program Transformations
See the Big Picture
- Farhana Aleen, Nate Clark
- Georgia Institute of Technology
- Modified by Michelle Goodstein
- LBA Reading Group 6/4/09
2Motivation
Extracting performance from multi-core is hard
I need to write parallel program
Automatic compiler-based parallelization helps
2
3Source Of Parallelism Commutativity
sum(5)
sum(10)
15
sum (10)
sum(5)
15
Application
Application
foo(a) foo(b)
foo(b) foo(a)
output
output
4Existing Approach Of Detecting Commutativity
- Execute the function in two different orders
- Check equivalence of memory
sum(x)
sum(y)
xy
sum(y)
sum(x)
yx
5Opportunities Missed By Existing Approach
Insertion of elements in to Hash-set (vector
ltlinked-listgt)
2
6
2
insert(2)
insert(6)
6
2
6
insert(6)
insert(2)
6The Idea
6
2
2
2
remove(6)
Yes!
insert(2)
insert(6)
is_member(2)
2
6
6
remove(6)
2
Yes!
insert(6)
insert(2)
is_member(2)
class hash_set vectorltlinked_listgt
set insert() remove() is_member()
- Identical memory does not matter
- Final output matters
7 Our Approach Step 1
- Symbolically execute in two different orders
- Check for the identical memory layout
I1
M
I2
M
insert()
insert()
M2
M1
I1
I2
insert()
insert()
?
M1,2
M2,1
If not similar, check reader functions
8Step 2 Checking Reader Functions
M2,1
M2,1
I
M1,2
M1,2
I
I
I
is_member()
is_member()
remove()
remove()
M1,2
M2,1
M1,2
M2,1
insert()
Candidate function
Readers of candidate functions output
is_member()
remove()
Readers of readers output
9Pros/Cons Of Our Approach
- Pros-
- Identifies more commutativity
- Finds more parallelism
- Cons-
- More equivalence checking
10Equivalence Checking Options
Random Testing
X
X
Random Interpretation
X
Speed
Symbolic Execution
X
Accuracy
11Random Interpretation Example
Input(x,y)
x
x2 y3
2
axy
y
3
- Choose random values for input variables
a
5
x
3
- Execute taken branch of the condition
- Execute fall-through branch
- Replicate initial memory state
- Adjust values
if(x!y)
y
3
a
6
fall-through
taken
b2x
ba
w3
- Affine join of v1 and v2 w.r.t. weight w
- ?w(v1,v2) w v1 (1-w)v2
x
x
3
2
y
3
y
3
a
6
a
5
b
b
4
6
assert(b2x)
x
5
y
3
a
8
b
10
12Random Interpretation In Equivalence Checking
Initial memory
Initial memory
foo(x)
foo(y)
foo(y)
foo(x)
Modified memory
13Why Random Interpretation Works
- Avoids scalability problem
- Affine join superposes all execution paths
- Linear relationships same before and after the
join - The error probability is very low
at most - Decreases the error probability exponentially
14(Added Slide) Probability details
- Low error probability
- In general, at most 1 bad random value / join in
program - Prob(error) ( joins )/264
- Empiricially (prior work) of joins increases
linearly in of program statements - Coefficient of .5 to 5.2
- Assume 1000 statement function, commutative
- Prob(error) ? (5.2 1000) / 264 ? 2.8 10-16
- To decrease error, increase of runs
15Experimental Methodology
- Trimaran compiler
- Scheduled them
- Infinite issue machine
- Perfect memory system
- Pointer Analysis
- Stack and heap sensitive
- Tested on
- SPECint2000
- MediaBench
16(Added) Experimental Methodology
- In some ways, an upper bound on commutativity
- Can issue as many instructions as are commutative
- Memory is perfect
- Not a true upper bound tho
- Random interpretation will sometimes fail/give up
17(Added) Suggested Parallelism
- Suppose a sorting algorithm will print to stderr
if debug flag is set - Cannot be parallelized, b/c of dependences
between writes - Human can differentiate
- Compiler identifies things that are almost
parallel, - Human states that the semantic changes (e.g.,
printf orders) do not matter ?parallel - Otherwise, ignore
18Analysis Time Commutativity Analysis
19 Functions Commutative
20Parallelism Uncovered
21Summary
- Commutativity a significant source of parallelism
- Identical memory does not matter for identifying
commutative functions - Our technique
- 13 more commutative functions detected
- 28 more parallelism uncovered
22 23 Functions Commutative
24Parallelism Uncovered
25Analysis Time Commutativity Analysis