Title: Recursion Unrolling for Divide and Conquer Programs
 1Recursion Unrolling for Divide and Conquer 
Programs
- Radu Rugina and Martin Rinard 
 - Presented by Cristian Petrescu-Prahova
 
  2Divide and Conquer
- Idea 
 - Divide problem in smaller sub problems, solve 
each in turn  - Use recursion as primary control structure 
 - Base case computation terminates the recursion 
when a small enough size was reached  - Combine results to generate solution of the 
original problem  - Interesting properties 
 - Lots of inherent parallelism natural recursively 
generated concurrency  - Good cache performance natural fits cache 
hierarchies  - In practice 
 - Potentially too much time spent in divide/combine 
phases  - Increasing the size of the base case alleviates 
the problem  - But the simplest and least error-prone coding 
style reduces the problem to a minimum size 
(typically one)  - Solution recursion unrolling
 
  3Example Divide and Conquer Array Increment
void dcInc (int  p, int n)  if (n  1)  
p  1  else  dcInc (p, n/2) 
dcInc (p  n/2, n/2)  
Base case Divide 
 4Inlining Recursive Calls
void dcIncI (int  p, int n)  if (n  1)  
 p  1  else  if (n/2  1)  p 
 1  else  dcIncI (p, 
n/2/2) dcIncI (p  n/2/2, n/2/2)  
 if (n/2  1)  (p  n/2)  1  
else  dcIncI (p  n/2, n/2/2) 
 dcIncI (p  n/2  n/2/2, n/2/2)    
Base case Divide 
 5Conditional Fusion
void dcIncI (int  p, int n)  if (n  1)  
 p  1  else  if (n/2  1)  p 
 1  else  dcIncI (p, 
n/2/2) dcIncI (p  n/2/2, n/2/2)  
 if (n/2  1)  (p  n/2)  1  
else  dcIncI (p  n/2, n/2/2) 
 dcIncI (p  n/2  n/2/2, n/2/2)    
void dcIncF (int  p, int n)  if (n  1)  
 p  1  else  if (n/2  1)  p 
 1 (p  n/2)  1  else  
dcIncI (p, n/2/2) dcIncI (p 
 n/2/2, n/2/2) dcIncI (p  n/2, 
 n/2/2) dcIncI (p  n/2  n/2/2, 
n/2/2)    
Base case Divide 
 6Reroll Second Unrolling Iteration
void dcInc2 (int  p, int n)  if (n  1)  
 p  1  else  if (n/2  1)  p 
 1 (p  n/2)  1  else  
if (n/2/2  1)  p  1 (p  
n/2/2)  1 (p  n/2)  1 (p 
 n/2  n/2/2)  1  else  dcIncI 
(p, n/2/2/2) 
dcIncI (p  n/2/2/2, n/2/2/2) 
 dcIncI (p  n/2/2, n/2/2/2) 
 dcIncI (p  n/2/2  n/2/2/2, 
n/2/2/2) dcIncI (p  n/2, 
 n/2/2/2) dcIncI (p  n/2  n/2/2/2, 
 n/2/2/2) dcIncI (p  n/2  n/2/2, 
 n/2/2/2) dcIncI (p  n/2  
n/2/2  n/2/2/2, n/2/2/2)     
void dcInc2 (int  p, int n)  if (n  1)  
 p  1  else  if (n/2  1)  p 
 1 (p  n/2)  1  else  
if (n/2/2  1)  p  1 (p  
n/2/2)  1 (p  n/2)  1 (p 
 n/2  n/2/2)  1  else  dcIncI 
(p, n/2/2/2) 
dcIncI (p  n/2/2/2, n/2/2/2) 
 dcIncI (p  n/2/2, n/2/2/2) 
 dcIncI (p  n/2/2  n/2/2/2, 
n/2/2/2) dcIncI (p  n/2, 
 n/2/2/2) dcIncI (p  n/2  n/2/2/2, 
 n/2/2/2) dcIncI (p  n/2  n/2/2, 
 n/2/2/2) dcIncI (p  n/2  
n/2/2  n/2/2/2, n/2/2/2)     
void dcIncR (int  p, int n)  if (n  1)  
 p  1  else  if (n/2  1)  p 
 1 (p  n/2)  1  else  
if (n/2/2  1)  p  1 (p  
n/2/2)  1 (p  n/2)  1 (p 
 n/2  n/2/2)  1  else  dcIncR 
(p, n/2) dcIncR (p  n/2, n/2)      
We need rerolling to ensure that the largest 
unrolled base case is always executed. 
 7Algorithm
Algorithm RecursionUnrolling (Proc f, Int 
m) funroll,0  clone (f) for (i  1 i lt m 
i) funroll,i  RecusionInline (funroll,i-1, 
f) funroll,i  ConditionalFusion 
(funroll) freroll,m  RecursionRerolling 
(funroll,m, f) return freroll,m 
 8Implementation details
- Recursion unrolling 
 - Standard procedure inlining 
 - Increases the code size exponentially, must be 
used with care  - Conditional fusion 
 - Bottom up traversal of HTG  conditional match 
 - Recursion rerolling 
 - Replaces the unrolled procedure recursion block 
with the rolled procedure recursion block if the 
unrolled procedure conditional sequence implies 
the rolled procedure conditional sequence  - Simple transformations !!!
 
  9Experiments
- Programs 
 - Mul divide and conquer matrix multiplication 
 - 1 recursive procedure with 8 recursive calls 
 - Base case size 1 element 
 - LU divide and conquer LU decomposition 
 - 4 mutually recursive procedures main procedure 
has 8 recursive calls  - Base case size 1 element 
 - Implementation 
 - C to C transformations in SUIF 
 - Comparison 
 - Handcoded divide and conquer from Cilk benchmark 
set (designed for thread parallelization) 
  10Results 
 11Conclusion
- Recursion unrolling, similar with loop unrolling. 
 - Basic recursion unrolling reduces the overhead of 
procedure call  - Extra optimizations 
 - Conditional fusion simplifies the control flow 
 - Recursion rerolling ensures the biggest unrolled 
base case is always executed  - Optimized programs performance is close to that 
of handcoded programs