A Language for the Compact Representation of Multiple Program Versions - PowerPoint PPT Presentation

About This Presentation
Title:

A Language for the Compact Representation of Multiple Program Versions

Description:

A Language for the Compact Representation of Multiple Program Versions S bastien Donadio1,2, James Brodman3, Thomas Roeder4, Kamen Yotov4, Denis Barthou2, Albert Cohen5, – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 27
Provided by: DONAD150
Learn more at: http://www.csc.lsu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Language for the Compact Representation of Multiple Program Versions


1
A Language for the Compact Representation of
Multiple Program Versions
  • Sébastien Donadio1,2, James Brodman3, Thomas
    Roeder4,
  • Kamen Yotov4, Denis Barthou2, Albert Cohen5,
  • María Jesús Garzarán3, David Padua3, and Keshav
    Pingali4

1 BULL S.A. 2 University of Versailles 3
University of Illinois at Urbana-Champaign 4
Cornell University 5 INRIA Futurs
International Workshop LCPC 2005
2
Outline
  • Context in optimization for high performance
  • Goals of this language
  • Features of this language
  • Examples (Daxpy Dgemm)
  • Conclusion

3
Context
  • Complex architecture and fragile optimizations
  • Unpredictable performance
  • Architecture, domain-specific optimizations
  • Resort to empirical search
  • Complement general-purpose optimizations with
    user-driven ones

4
Example FFT performance
best available implementation (FFTW, Intel IPP,
Spiral)
Reasonable implementation (Numerical recipes. GNU
scientific library)
5
Goals of X-Language
  • Tool to help programmers generate and evaluate
    multiple versions of their programs
  • Applying control and data structure
    transformations
  • Trying multiple transformation sequences and
    parameters
  • Evaluating performance of each version and taking
    decisions about which transformation variants to
    try

6
Goals of X-Language (cont.)
  • The code must be portable accross ISO-C
    compilers
  • Use pragma annotations for the above tasks
  • Observable program semantics not altered by the
    interpretation of these pragmas (assuming
    transformation legality)

7
Comparaison with related works
8
Features of the language
  • Elementary transformations (fission, stripmining,
    interchanging, unrolling,)
  • Composition of transformations
  • Conditional transformations (versioning)
  • Procedural abstraction of transformations
  • A mechanism to define new transformations
  • No validity check is performed for the
    transformation

9
General schema of X-Language
Code with Pragmas
Transformation Descriptions
search
Different versions
Compile
Execute and measure performance
10
X-Language
  • Naming loops or scopes
  • pragma xlang name loop1
  • for(i0ilt10i) ai4
  • Format of transformation
  • pragma xlang stripmine loop1 4 ii

Transformation name
Loop name
Name of additional loops generated by
transformations
pragma xlang
parameters
11
Elementary transformations implemented in
X-language
  • Full unrolling
  • Partial unrolling
  • Scalar promote
  • Interchange
  • Loop fission
  • Loop fusion
  • Strip mining
  • Lifting
  • Sofware pipelining

12
Applying transformation
  • pragma xlang loop1
  • for(iminilt4maxi)
  • aibi
  • pragma xlang stripmine loop1 4 ii

13
How to search the value of parameters ?
  • Using multistage evaluation
  • External script
  • for(k1klt16k2k)
  • pragma xlang loop1
  • for(iminiltmaxi)
  • aibi
  • pragma xlang stripmine loop1 d(k) ii

14
Composing transformations
  • pragma xlang loop1
  • for(i0ilt4i)
  • pragma xlang loop2
  • for(jmin2jltmax2j)
  • aibj
  • pragma xlang interchange loop1 loop2
  • pragma xlang fullunroll loop1

15
Analyses and Transformations
  • Static analyses should also enable the design of
    smarter (higher level) transformation primitives
  • External tool to find information

16
Example with analysis
for(i2ilt2Ni2) uiui-1ui-2 ui1
uiui-1
17
Extending the X-Language
Rewriting rule pragma xlang name iloop for (i
0 i lt N i) ltbodygt
pragma xlang name iiloop1 for (ii 0 ii lt
(N/4)4 ii 4) pragma xlang name
iloop1 for (i ii i lt ii4 i)
ltbodygt pragma xlang name iloop2 for (i
(N/4)4 i lt N i) f ltbodygt
Pattern before ? Pattern after transformation
18
Daxpy Example
  • pragma xlang name loop1
  • for(k0klt2000k)
  • YkalphaXkYk
  • We can modify values of N
  • / A few values tested for unrolling factor
    Different generated version /
  • pragma xlang transform stripmine loop1 k N
  • pragma xlang transform scalarize-in X in loop1
  • pragma xlang transform lift l1.loads before
    loop1
  • pragma xlang transform scalarize-out Y in loop1
  • pragma xlang transform lift loop1.loads before
    loop1
  • pragma xlang transform lift loop1.stores after
    loop1
  • pragma xlang transform fullunroll loop1.loads
  • pragma xlang transform fullunroll loop1.stores
  • pragma xlang transform fullunroll loop1

19
Daxpy Example Different generated versions
Unrolling factor 8 for(k0klt2000kk16)
double x_0 Xk0 double x_1 Xk1
double x_2 Xk2 y_0alphax_0y_0
y_1alphax_1y_1 y_2alphax_2y_2
y_3alphax_3y_3 Yk0 y_0 Yk1
y_1 Yk2 y_2 Yk3 y_3
  • Unrolling factor 2
  • for(k0klt2000kk2)
  • double x_0 Xk0
  • double x_1 Xk1
  • double y_0 Yk0
  • double y_1 Yk1
  • y_0alphax_0y_0
  • y_1alphax_1y_1
  • Yk0 y_0
  • Yk1 y_1

Unrolling factor 4 for(k0klt2000kk4)
double x_0 Xk0 double x_1 Xk1
double x_2 Xk2 double x_3 Xk3
double y_0 Yk0 double y_1 Yk1
double y_2 Yk2 double y_3 Yk3
y_0alphax_0y_0 y_1alphax_1y_1
y_2alphax_2y_2 y_3alphax_3y_3
Yk0 y_0 Yk1 y_1 Yk2 y_2
20
Matrix Multiply(Loop Declaration)
  • The DGEMM example
  • Matrix Multiplication
  • Problems
  • Data locality
  • Scheduling
  • pragma xlang name iloop
  • for (i 0 i lt NB i)
  • pragma xlang name jloop
  • for (j 0 j lt NB j)
  • pragma xlang name kloop
  • for (k 0 k lt NB k)
  • cijcijaikbkj

21
Matrix Multiply(Transformation Declaration)
Sequence of transformations for Itanium
  • pragma xlang transform stripmine iloop NU NUloop
  • pragma xlang transform stripmine jloop MU MUloop
  • pragma xlang transform interchange kloop MUloop
  • pragma xlang transform interchange jloop NUloop
  • pragma xlang transform interchange kloop NUloop
  • pragma xlang transform fullunroll NUloop
  • pragma xlang transform fullunroll MUloop
  • pragma xlang transform scalarize_in b in kloop
  • pragma xlang transform scalarize_in a in kloop
  • pragma xlang transform scalarize_inout c in
    kloop
  • pragma xlang transform lift kloop.loads before
    kloop
  • pragma xlang transform lift kloop.stores after
    kloop

22
Matrix Multiply(Transformation Sequence)
pragma xlang name iloop for(i 0 i lt NB
i) pragma xlang name jloop for(j 0 j lt NB
j 4) pragma xlang name kloop.loads c_0_0
ci0j0 c_0_1 ci0j1 c_0_2
ci0j2 c_0_3 ci0j3 pragma xlang
name kloop for(k 0 k lt NB k) a_0
ai0k a_1 ai0k a_2 ai0k a_3
ai0k
  • b_0 bkj0
  • b_1 bkj1
  • b_2 bkj2
  • b_3 bkj3
  • c_0_0c_0_0a_0b_0
  • c_0_1c_0_1a_1b_1
  • c_0_2c_0_2a_2b_2
  • c_0_3c_0_3a_3b_3
  • ...
  • pragma xlang name kloop.stores
  • ci0j0 c_0_0
  • ci0j1 c_0_1
  • ci0j2 c_0_2
  • ci0j3 c_0_3
  • ... // Remainder code

23
Block copies
  • Block Matrix Multiplication better performance
    if matrices are contiguous in memory (TLB)
  • Poor performance of C copy
  • Resort to a tool generating specific asm code
  • Tool generating a good code with search (XLG is
    an asm search)

24
Matrix Multiply(Results)
25
Conclusion
  • Describe transformations with reuse, procedures,
    conditionals
  • X-Language
  • language designed to generate multiversion
    programs
  • Multistage language with a flexible
    pattern-matching and rewriting language
  • Experts can describe specific application
    transformation optimizations

26
Future works
  • Dependence analysis
  • Going further searching asm code transformation
  • More transformations vectorization, alignment,
Write a Comment
User Comments (0)
About PowerShow.com