A Practical Stride Prefetching Implementation in Global Optimizer - PowerPoint PPT Presentation

About This Presentation
Title:

A Practical Stride Prefetching Implementation in Global Optimizer

Description:

Only effective for affine array reference. Only handle with DO loop nest ... all of the strided references are affine array references, such as c STL ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 27
Provided by: zhc
Category:

less

Transcript and Presenter's Notes

Title: A Practical Stride Prefetching Implementation in Global Optimizer


1
A Practical Stride Prefetching
Implementation in Global Optimizer
  • Hucheng Zhou, Xing Zhou
  • Tsinghua University

2
Outline
  • Introduction
  • Motivation
  • Algorithm
  • Phase Ordering
  • Prefetching Scheduling
  • Experiments
  • Future Work

3
Introduction
  • Whats data prefetching
  • Brings data into cache ahead of its use
  • Compiler controlled prefetching
  • Prefetching candidates identification
  • Prefetching timing determination
  • Unnecessary prefetching elimination
  • Other prefetching tuning

4
Introduction
  • Stride data prefetching
  • Massive consecutive memory references
  • Cause to many cache misses, thus poor performance
  • Our focus
  • Compiler based stride data prefetching

5
Motivation
  • Dominant stride prefetching algorithm
  • Loop Nest Optimizer (LNO) based
  • LNO based algorithm
  • Locality analysis
  • (reuse analysis?localized iteration
    space?prefetching predicates)
  • Loop splitting (loop peeling and unrolling)
  • Scheduling prefetches (iterations ahead of use)
  • Limitations of LNO based approach
  • Observations

6
LNO based algorithm
  • Example

7
Limitations
  • Only effective for affine array reference
  • Only handle with DO loop nest
  • Due to the vector space model
  • Just focus on numerical applications operate on
    dense matrices
  • However, not all of the strided references are
    affine array references, such as c STL vector
    traversing and other wrap around data structures

8
Necessity
  • Four common ways of STL vector traversing

9
The Component flow of Open64
10
IR after PRE-OPT
  • For ACCESS1 and ACCESS2

11
Compare with array references
12
Comparison
  • LNO based approach exploits the tight affinity
    with locality analysis and vector space model to
    identify the prefetching candidates which suffer
    from cache misses
  • However, this affinity limits itself only for
    affine array references, cannot handle STL style
    stride references
  • From another angle, identify stride prefetching
    candidates as induction variable recognition,
    then exploit the phase ordering to avoid
    unnecessary prefetches

13
Definitions and Observations
  • A linear inductive variable (expression) is an
    expression whose value is incremented by a
    nonzero integer loop invariant on every
    iterations
  • Lemma 1 linear inductive expression can be
    recursively defined
  • If v is a linear induction variable with stride
    s, then i is a linear inductive expression with
    the same stride s
  • If expr is a linear inductive expression with
    stride s, then expr is a linear inductive
    expression with the same stride -s
  • If expr is linear inductive expression with
    stride s and invar is a loop invariant, then expr
    invar and invar expr are all inductive
    expressions with stride s
  • If expr1 and expr2 are linear inductive
    expressions with stride s1 and s2 respectively,
    then expr1 expr2 is a linear inductive
    expression with stride s1 s2
  • If expr is linear inductive expression with
    stride s and invar is a loop invariant, then expr
    invar and invar expr are all inductive
    expressions with stride invar s
  • If expr is linear inductive expression with
    stride s and invar is a loop invariant, then expr
    / invar is a linear inductive expression with
    stride s/invar.

14
Definitions and Observations
  • So, Mathematically, it equals to the linear
    combination of linear induction variables and
    loop invariants, with the form
  • E c1 i1 c2i2 cnin invar, where
    stride value is
  • A stride reference is the reference in a loop
    whose accessed memory address is incremented by a
    integer loop invariant on every iterations
  • Lemma 2 If a reference in loop whose accessed
    memory address is represented as an inductive
    expression, then it is a stride reference

15
Speculative Induction Variable Recognition for
Stride Prefetching
  • Thus stride reference identification equals to
    induction expression recognition
  • We have presented an algorithm for demand driven
    speculative recognition of induction expression

16
Speculative Induction Variable Recognition for
Stride Prefetching
  • Induction variables in SSA form must satisfy the
    following condition
  • there must be a live phi in the corresponding
    loop header BB
  • among the two operands of the phi, the loop
    invariant operand must point to the
    initialization of the induction variable out of
    the loop, while the other operand must be defined
    within the loop body. We call them init and
    increment respectively
  • After expanding the increment operand of phi by
    copy propagation, the expanding result must
    contain the result of that phi, with a loop
    invariant expression as stride of the induction
    variable

17
Our algorithm
18
(No Transcript)
19
Comparison
  • Traditional induction variable recognition
  • Equals to strongly connected component
  • Just for variable
  • Conservative due to alias
  • Limitations of copy propagation
  • Our algorithm
  • Demand driven
  • Symbolic interpretation
  • Speculative determination
  • Modify a few on the expansion process of current
    implementation

20
Phase Ordering
  • Implement our algorithm after SSAPRE will benefit
    from strength reduction and PRE optimizations

21
Prefetching Scheduling
  • Leading reference determination
  • Prefetching information collection
  • Stride value, data/loop shape, target cache model
  • Prefetching determination for the candidates
  • Based on the heuristics, such as data and loop
    size as well as the number of prefetches in the
    loop
  • Computation of prefetching distance
  • division of memory latency and the estimated
    time per iteration
  • Loop transformations based on locality
    information to further reduce the number of
    prefetches

22
Experiments
  • We have conducted experiments against SPEC2006
    benchmark on IA64
  • Itanium 2 Madison 1.6GHz with 6MB L3 cache and 8
    GBytes memory
  • quad-processor server with RedhatLinux Advanced
    Server 4.0
  • compiler is Open64 4.1

23
Normalized results of SPEC2006 FP
24
Normalized results of SPEC2006 INT
25
Conclusion and Future Work
  • we propose an alternative inductive data
    prefetching algorithm implemented in global
    optimizer at O2 level, which can in theory
    prefetch almost all of the stride references
    statically determined in compile time
  • extend to prefetch periodic, polynomial,
    geometric, monotonic and wrap-around variables
  • Totally integrated stride prefetching algorithm
    with strength reduction optimization in SSAPRE
  • coordinate the data prefetch with data layout
    optimization
  • further investigate the interaction between
    software and hardware prefetching according to
    the static compiler analysis and feedback
    information on X86 platform

26
Thanks
  • Thank you very much
  • And any questions?
Write a Comment
User Comments (0)
About PowerShow.com