Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations - PowerPoint PPT Presentation

Loading...

PPT – Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations PowerPoint presentation | free to download - id: 809568-NTkzM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations

Description:

Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations Daniel Cociorva, Ohio State Gerald Baumgartner, Ohio State Chi-Chung Lam, Ohio State – PowerPoint PPT presentation

Number of Views:3
Avg rating:3.0/5.0
Slides: 19
Provided by: coc127
Learn more at: http://www.csc.lsu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations


1
Space-Time Trade-Off Optimization for a Class of
Electronic Structure Calculations
  • Daniel Cociorva, Ohio State
  • Gerald Baumgartner, Ohio State
  • Chi-Chung Lam, Ohio State
  • P. Sadayappan, Ohio State
  • J. Ramanujam, Louisiana State
  • Marcel Nooijen, Princeton
  • David E. Bernholdt, ORNL
  • Robert Harrison, PNNL

2
Realistic Quantum Chemistry Example
  • hbara,b,i,j sumfb,cti,j,a,c,c
    -sumfk,ctk,bti,j,a,c,k,c
    sumfa,cti,j,c,b,c -sumfk,ctk,ati
    ,j,c,b,k,c -sumfk,jti,k,a,b,k
    -sumfk,ctj,cti,k,a,b,k,c
    -sumfk,itj,k,b,a,k -sumfk,cti,ctj
    ,k,b,a,k,c sumti,ctj,dva,b,c,d,c,d
    sumti,j,c,dva,b,c,d,c,d
    sumtj,cva,b,i,c,c -sumtk,bva,k,i,j
    ,k sumti,cvb,a,j,c,c
    -sumtk,avb,k,j,i,k -sumtk,dti,j,c,b
    vk,a,c,d,k,c,d -sumti,ctj,k,b,dvk,a,
    c,d,k,c,d -sumtj,ctk,bvk,a,c,i,k,c
    2sumtj,k,b,cvk,a,c,i,k,c
    -sumtj,k,c,bvk,a,c,i,k,c
    -sumti,ctj,dtk,bvk,a,d,c,k,c,d
    2sumtk,dti,j,c,bvk,a,d,c,k,c,d
    -sumtk,bti,j,c,dvk,a,d,c,k,c,d
    -sumtj,dti,k,c,bvk,a,d,c,k,c,d
    2sumti,ctj,k,b,dvk,a,d,c,k,c,d
    -sumti,ctj,k,d,bvk,a,d,c,k,c,d
    -sumtj,k,b,cvk,a,i,c,k,c
    -sumti,ctk,bvk,a,j,c,k,c
    -sumti,k,c,bvk,a,j,c,k,c
    -sumti,ctj,dtk,avk,b,c,d,k,c,d
    -sumtk,dti,j,a,cvk,b,c,d,k,c,d
    -sumtk,ati,j,c,dvk,b,c,d,k,c,d
    2sumtj,dti,k,a,cvk,b,c,d,k,c,d
    -sumtj,dti,k,c,avk,b,c,d,k,c,d
    -sumti,ctj,k,d,avk,b,c,d,k,c,d
    -sumti,ctk,avk,b,c,j,k,c
    2sumti,k,a,cvk,b,c,j,k,c
    -sumti,k,c,avk,b,c,j,k,c
    2sumtk,dti,j,a,cvk,b,d,c,k,c,d
    -sumtj,dti,k,a,cvk,b,d,c,k,c,d
    -sumtj,ctk,avk,b,i,c,k,c
    -sumtj,k,c,avk,b,i,c,k,c
    -sumti,k,a,cvk,b,j,c,k,c
    sumti,ctj,dtk,atl,bvk,l,c,d,k,l,c
    ,d -2sumtk,btl,dti,j,a,cvk,l,c,d,k
    ,l,c,d -2sumtk,atl,dti,j,c,bvk,l,c,d
    ,k,l,c,d sumtk,atl,bti,j,c,dvk,l,c
    ,d,k,l,c,d -2sumtj,ctl,dti,k,a,bvk
    ,l,c,d,k,l,c,d -2sumtj,dtl,bti,k,a,c
    vk,l,c,d,k,l,c,d sumtj,dtl,bti,k,c,
    avk,l,c,d,k,l,c,d -2sumti,ctl,dtj,
    k,b,avk,l,c,d,k,l,c,d sumti,ctl,at
    j,k,b,dvk,l,c,d,k,l,c,d sumti,ctl,b
    tj,k,d,avk,l,c,d,k,l,c,d
    sumti,k,c,dtj,l,b,avk,l,c,d,k,l,c,d
    4sumti,k,a,ctj,l,b,dvk,l,c,d,k,l,c,d
    -2sumti,k,c,atj,l,b,dvk,l,c,d,k,l,c,d
    -2sumti,k,a,btj,l,c,dvk,l,c,d,k,l,c,d
    -2sumti,k,a,ctj,l,d,bvk,l,c,d,k,l,c,
    d sumti,k,c,atj,l,d,bvk,l,c,d,k,l,c,d
    sumti,ctj,dtk,l,a,bvk,l,c,d,k,l,c
    ,d sumti,j,c,dtk,l,a,bvk,l,c,d,k,l,c,
    d -2sumti,j,c,btk,l,a,dvk,l,c,d,k,l,c
    ,d -2sumti,j,a,ctk,l,b,dvk,l,c,d,k,l,
    c,d sumtj,ctk,btl,avk,l,c,i,k,l,c
    sumtl,ctj,k,b,avk,l,c,i,k,l,c
    -2sumtl,atj,k,b,cvk,l,c,i,k,l,c
    sumtl,atj,k,c,bvk,l,c,i,k,l,c
    -2sumtk,ctj,l,b,avk,l,c,i,k,l,c
    sumtk,atj,l,b,cvk,l,c,i,k,l,c
    sumtk,btj,l,c,avk,l,c,i,k,l,c
    sumtj,ctl,k,a,bvk,l,c,i,k,l,c
    sumti,ctk,atl,bvk,l,c,j,k,l,c
    sumtl,cti,k,a,bvk,l,c,j,k,l,c
    -2sumtl,bti,k,a,cvk,l,c,j,k,l,c
    sumtl,bti,k,c,avk,l,c,j,k,l,c
    sumti,ctk,l,a,bvk,l,c,j,k,l,c
    sumtj,ctl,dti,k,a,bvk,l,d,c,k,l,c,d
    sumtj,dtl,bti,k,a,cvk,l,d,c,k,l,c,
    d sumtj,dtl,ati,k,c,bvk,l,d,c,k,l,
    c,d -2sumti,k,c,dtj,l,b,avk,l,d,c,k,l
    ,c,d -2sumti,k,a,ctj,l,b,dvk,l,d,c,k,
    l,c,d sumti,k,c,atj,l,b,dvk,l,d,c,k,l
    ,c,d sumti,k,a,btj,l,c,dvk,l,d,c,k,l,
    c,d sumti,k,c,btj,l,d,avk,l,d,c,k,l,c
    ,d sumti,k,a,ctj,l,d,bvk,l,d,c,k,l,c,
    d sumtk,atl,bvk,l,i,j,k,l
    sumtk,l,a,bvk,l,i,j,k,l
    sumtk,btl,dti,j,a,cvl,k,c,d,k,l,c,d
    sumtk,atl,dti,j,c,bvl,k,c,d,k,l,c,
    d sumti,ctl,dtj,k,b,avl,k,c,d,k,l,
    c,d -2sumti,ctl,atj,k,b,dvl,k,c,d,
    k,l,c,d sumti,ctl,atj,k,d,bvl,k,c,d
    ,k,l,c,d sumti,j,c,btk,l,a,dvl,k,c,d,
    k,l,c,d sumti,j,a,ctk,l,b,dvl,k,c,d,
    k,l,c,d -2sumtl,cti,k,a,bvl,k,c,j,k,l
    ,c sumtl,bti,k,a,cvl,k,c,j,k,l,c
    sumtl,ati,k,c,bvl,k,c,j,k,l,c
    va,b,i,j

3
Problem Tensor Contractions
  • Formulas of the form
  • 10s of arrays and array indices, 100s of terms
  • Index ranges between 10 and 3000
  • And this is still a simple model

4
Application Domain
  • Quantum chemistry, condensed matter physics
  • Example study chemical properties
  • Typical program structure
  • quantum chemistry code
  • while (! converged)
  • tensor contractions
  • quantum chemistry code
  • Bulk of computation in tensor contractions

5
Operation Minimization
  • Requires 4 N10 operations if indices a l have
    range N
  • Using associative, commutative, distributive laws
    acceptable
  • Optimal formula sequence requires only 6 N6
    operations

6
Loop Fusion for Memory Reduction
S 0 for b, c T1f 0 T2f 0 for d,
f for e, l T1f Bbefl
Dcdel for j, k T2fjk
T1f Cdfjk for a, i, j, k Sabij
T2fjk Aacik
T1 0 T2 0 S 0 for b, c, d, e, f, l
T1bcdf Bbefl Dcdel for b, c, d, f, j, k
T2bcjk T1bcdf Cdfjk for a, b, c, i, j, k
Sabij T2bcjk Aacik
Formula sequence
Unfused code
Fused code
7
Tensor Contraction Engine
Tensor contraction expressions
Operation Minimization
Sequence of loop nests (expression tree)
Memory Minimization
(Storage req. exceed limits)
(Storage requirements within limits)
Space-Time Trade-Offs
Imperfectly nested loops
Communication Minimization
Imperfectly nested parallel loops
Data Locality Optimization
Partitioned, tiled, parallel Fortran loops
8
Example to Illustrate Space-Time Trade-Offs
for a, e, c, f for i, j Xaecf
Tijae Tijcf for c, e, b, k T1cebk f1(c, e,
b, k) for a, f, b, k T2afbk f2(a, f, b,
k) for c, e, a, f for b, k Yceaf
T1cebk T2afbk for c, e, a, f E Xaecf
Yceaf
array space time X V4 V4O2
T1 V3O Cf1V3O T2 V3O Cf2V3O Y V4 V5O E
1 V4
a .. f range V 1000 .. 3000 i .. k range O
30 .. 100
9
Memory-Minimal Form
for a, f, b, k T2afbk f2(a, f, b, k) for
c, e for b, k T1bk f1(c, e, b,
k) for a, f for i, j
X Tijae Tijcf for b, k
Y T1bk T2afbk E X Y
array space time X 1 V4O2
T1 VO Cf1V3O T2 V3O Cf2V3O Y 1 V5O E
1 V4
a .. f range V 3000 i .. k range O 100
10
Redundant Computation to Allow Full Fusion
for a, e, c, f for i, j X Tijae
Tijcf for b, k T1 f1(c, e, b,
k) T2 f2(a, f, b, k) Y
T1 T2 E X Y
array space time X 1 V4O2
T1 1 Cf1V5O T2 1 Cf2V5O Y 1
V5O E 1 V4
11
Tiling for Reducing Recomputation
for at, et, ct, ft for a, e, c, f
for i, j Xaecf Tijae Tijcf
for b, k for c, e T1ce
f1(c, e, b, k) for a, f
T2af f2(a, f, b, k) for c, e, a, f
Yceaf T1ce T2af for c, e, a,
f E Xaecf Yceaf
array space time X B4
V4O2 T1 B2 Cf1(V/B)2V3O T2
B2 Cf2(V/B)2V3O Y B4 V5O E 1
V4
12
The Fusion Graph
j
?

A(i,j)
B(j,k)
i
j
j
k
13
Making Fused Loops Explicit
i range 10 j range 10 k range 100
j
?

A(i,j)
B(j,k)
i
j
j
k
14
Adding Recomputation Loops
i range 10 j range 10 k range 100
j
?
(lti j,kgt, 3, 9000), . . .

A(i,j)
B(j,k)
i
j
i
j
k
15
Tiling Recomputation Loops
i range 10 j range 10 k range 100
j
?

A(it,i,j)
B(j,k)
i
j
it
j
k
it
16
Space-Time Trade-Off Algorithm
  • For e1(i,j,k) e2(i,j) consider recomputation of
    e2 in k loop
  • For each expression tree node in bottom-up
    traversal
  • Construct set of solutions (fusion, mem cost,
    rec cost), . . .
  • Prune inferior solutions
  • For all solutions for the root of the expression
    tree
  • Split recomputation loops into tiling/intra-tile
    loops
  • Move intra-tile loops inside tiling loops where
    needed
  • Search for tile sizes that minimize recomputation
    cost

17
Demo
  • mlimit 144000010
  • range V 300
  • range O 70
  • range U 40
  • index a, b, c, d V
  • index e, f O
  • index i, j, k, l U
  • procedure P (in TV,V,U,U, in SV,U,O,U, in
    NV,V,O,U, in DV,O,O,U,
  • out XV,V,O,O)
  • begin
  • Xa,b,i,j sumTa,c,i,k Sd,j,f,k
    Nc,d,e,l Db,e,f,l, c,d,e,f,k,l
  • end

18
Conclusions and Status
  • Computational domain with exploitable structure
  • Search algorithms for optimizing computation
  • Developing compiler in Java
  • Future extensions
  • Sparse arrays, symmetry, common subexpressions
  • Domain-specific optimizations
About PowerShow.com