EECS 583 Lecture 23 Group 1 Control flow analysis opti Group 2 Dataflow analysis opti - PowerPoint PPT Presentation

1 / 102
About This Presentation
Title:

EECS 583 Lecture 23 Group 1 Control flow analysis opti Group 2 Dataflow analysis opti

Description:

We will head over to Ashley's afterwards (attendance optional) First round is on me ... And U. of Wisconsin - Madison. Efficient path profiling - 9 - Convert ... – PowerPoint PPT presentation

Number of Views:267
Avg rating:3.0/5.0
Slides: 103
Provided by: scottm3
Category:

less

Transcript and Presenter's Notes

Title: EECS 583 Lecture 23 Group 1 Control flow analysis opti Group 2 Dataflow analysis opti


1
EECS 583 Lecture 23Group 1 Control flow
analysis optiGroup 2 Dataflow analysis opti
  • University of Michigan
  • April 8, 2002

2
Today
  • Control flow analysis and optimization
  • Identifying branch correlations with path
    profiles
  • Extending trimaran to perform path profiling,
    optimization using path profile information
  • Nael, Pariwat, Nuwee
  • Compiler switch spacewalking
  • Identifying the best settings for compiler
    switches
  • Ibrahim, Pete
  • Dataflow analysis and optimization
  • BDD-based predicate analysis (record number of
    slides)
  • More intelligent predicate relation analysis
    using binary decision diagrahms
  • Beth, Laura, Bill

3
Next time
  • Scheduling groups
  • TI C6x Dave, Jeff
  • While loop software pipelining Arnar, Tomas,
    Misha
  • Power-sensitive scheduling Hai, Amit
  • Last class Wednes, April 17
  • Start _at_400 pm
  • To be held on central campus
  • We will head over to Ashleys afterwards
    (attendance optional)
  • First round is on me
  • Dont have to drink if you do not want to

4
EECS 583 Group 1 Advanced control flow
analysis and optimization Path Profiling
  • University of Michigan
  • April 8, 2002

5
Profiling?
  • Profiles counts occurrences of event during
    programs execution
  • Point/Basic Block Profiling
  • Edge Profiling
  • Path Profiling

270
120
150
120
250
100
20
250
270
270
160
110
160
270
160
Point Profiling
Edge Profiling
6
Path Profiling What is it? How to?
  • Path execution trace
  • Path profiles how often does a control-flow
    path executes?
  • How to profile paths?
  • Approximate with block or edge profiles
  • Inaccurate!!
  • Trace program
  • High cost!!
  • Use?
  • Compiler optimization
  • Performance Tuning
  • Program Testing

7
Profiling? Edge Profiling Not Enough!
Example how edge profiles misidentify the most
frequently executed paths.
120
150
100
20
250
270
160
110
160
8
Dumb way to collect path profiling
  • Set k history depth
  • Execute program, - new block - add to FIFO
    queue of length k
  • If pattern is found in hash table - counter
  • Else add new pattern in hash table

k 4
.A B C D E F A C D E
String of path
Hash table Path / Counter A B C
D 10 B C D E 5 C D E F 1 E F A C 1 F A C D
1
A C D E 2 ...
F A C D
FIFO
FIFO
A C D E
Most recent path
9
Efficient path profiling
  • By Thomas Ball James R. Larus 1996
  • From Bell Lab. And U. of Wisconsin - Madison

10
Efficient path profiling Algorithm Overview
  • Convert CFG to DAG
  • Assign integer values to edges such that no two
    paths compute the same path sum
  • Select edges to instrument and compute
    appropriate increment for each edge
  • Regenerate path

11
Convert CFG to DAG
  • Path start at procedure entry or loop head -
  • Yes! It is intraprocedural path profiling
  • Acyclic path
  • Remove backedges
  • Add source vertex (ENTRY)
  • Add sink vertex (EXIT)

12
Assign Edge Values
  • Assign each edge value such that
  • Sums along DAG path is unique, non-negative
    integer
  • Sums lie in range 0 NumPaths-1
  • NumPaths number of paths to EXIT

Val(e)
NumPaths(v)
13
Assign Edge Values - Example
Vertex(v) NumPaths(v)

A B C D E F
Val(v-wk) ?i 1 to k-1NumPaths(wi)
14
Assign Edge Values - Example
  • DAG with values computed and Path Encoding

15
Many ways to compute sum
16
Minimal Increments
  • Given edges values - find minimal operation to
    find sum
  • Efficient event counting Ball
  • Weighs edge frequency
  • Max spanning tree on least traveled edges

17
Determine Minimal Increments
  • Inc (B-D) Val(A-B) Val(B-D) Val(D-F)
  • 2 2 0 4

18
Instrumentation
  • Basic
  • Initialize r 0 at ENTRY
  • Increment r Inc(c) along chord
  • Record countr at EXIT
  • Optimize
  • Initialize Increment r Inc(c)
  • Increment Record countr Inc(c)

19
Path Regeneration
  • Given sum P , which path produced it?
  • We know that we have path encoding 0
    NumPaths-1
  • For (I 0 I
  • Go from ENTRY node and traverse the graph
  • Do until reach EXIT
  • At each block, take edge with largest Val(e) NumPaths
  • Decrement NumPaths

20
Performance Measurement
  • Low average run-time overhead 31
    (edge-profiling 16)

21
Performance Measurement
  • Longer path than edge-profiling

22
Related works and Extension
  • Interprocedural Path Profiling by David Melski
    Thomas Reps
  • Paths may cross procedure boundaries
  • Use numbering scheme as in Ball Larus
  • Complicated by cyclic paths from entering
    different call sites
  • Whole Program Paths by James R. Larus
  • A complete, compact record of program entires
    control flow
  • Support interprocedural paths
  • Practically used in finding hot sub-paths
  • Require more storage and post-processing

23
EECS 583 Project Static Correlated Branch
Prediction
  • Nuwee Wiwatwattana
  • , Pariwat Luangsuwimol
  • , Nael Botros.
  • April 8, 2002

24
Introduction
  • Importance of path profiles.
  • Exposes patterns followed by program.
  • Allows optimizations to make use of the path
    frequencies rather than just point frequency.
  • In some situations, while it is possible to
    optimize a statement with respect to some paths
    along which it lies, the same optimization
    opportunity does not exist along other paths
    through the statement. We refer to such
    optimizations as path sensitive optimizations.
  • Rajiv Gupta, University of Arizona

25
Path profile driven optimizations
  • Examples of path profile driven optimizations
  • Static correlated branch prediction.
  • Path profile guided partial dead code
    elimination.
  • Partial redundancy elimination.
  • Load redundancy removal.
  • Elimination of array bound check.

26
Increased importance of branch prediction accuracy
  • Increased importance of branch prediction
  • Deeper pipelines.
  • Delayed branch resolution.
  • Dynamic versus Static branch prediction
  • Use of actual pattern followed by branch.
  • Costs of Hardware dynamic branch prediction
  • Cycle time cost. (Tcpu Ninst Cycle/Inst
    Second/Cycle)
  • Hardware cost.

27
(No Transcript)
28
Static Correlated Branch Prediction(SCBP)
  • Static (compiler) simulation of a hardware per
    branch history branch prediction and branch
    target predictor

29
SCBP
  • Limitation on Static Branch prediction
  • Limited communication with processor (in the form
    of a bit stating if the branch is likely taken or
    Not Taken passed from compiler to processor)
  • Branch prediction static with time unlike
    Hardware dynamic branch predictors.
  • How SCPB gets around those limitations
  • Code Duplication

30
SCBP
  • Trade-off between code expansion and branch
    predictability.
  • Improves performance of multiple-issue, deeply
    pipelined microprocessors.

SCBP utilizes general path profiles, not limiting
the technique to forward path profiles. We a
ppear to have been the first researchers to have
collected Path profile information Young and S
mith
31
SCBP
  • What does SCBP do
  • Algorithm to use general path profiles to improve
    static branch prediction accuracy.
  • Trade off code expansion for improved accuracy.
  • Provide an algorithm to automatically and
    effectively tune SCBP space-time trade-off.

32
SCBP
  • Steps for Static Correlation Branch prediction
  • Profiling.
  • Local minimization.
  • Global reconciliation.
  • Layout.
  • Example

33
SCBP
  • First we generate the path profile for the given
    code

34
SCBP
  • Local minimization
  • Here the concept of a history tree is
    introduced.
  • History tree
  • Nodes of history tree are edges of CFG.
  • For each block that ends with a conditional
    branch there is a history tree.
  • Root node is called predicted branch, block
    containing it is called predicted block.
  • Any path whose last edge starts at predicted
    block is called a predictive path.
  • For each predictive path, last edge is called
    counted edge, and the rest of the path is called
    observed path.
  • Different nodes in history tree may map to same
    CFG block( different paths, or history covering
    multiple iterations of a loop)

35
SCBP
36
SCBP
  • Now we need to find minimum amount of history
    necessary to exploit correlation is it exists.
    The less history we need to preserve, the less
    code expansion will result in final program.
  • Pruning history trees

37
SCBP
Example of Pruning a branch with no correlation
to ancestors
Tree collapses to its root node.
38
SCBP
  • Global Reconciliation
  • Determining the minimum number of copies needed
    for each basic block in order to preserve
    correlation history.

39
SCBP
  • Global Reconciliation steps
  • First step find all potential splitters. ( Each
    non leaf node of a minimized (pruned) tree is a
    splitter of the source block of the edge to which
    it maps.
  • Second step determine the number of pieces in
    each partition of paths leading to each
    splitter.
  • Third step get the intersection of all pieces of
    all partitions of a certain basic block.

40
SCBP
  • Final CFG of our example

41
SCBP
  • Layout issues
  • The CFG produced by reconciliation typically is
    not an executable program, as new join points are
    created.
  • If SCBP is performed in an intermediate
    representation the new joints are no problem.
  • If it is performed in intermediate representation
    that resembles machine code, we may need to add
    some new branch instructions in order to ensure
    correct program semantics.

42
SCBP
  • Trading Off Space and Time
  • So far SCBP attempted to capture maximum
    improvement in branch prediction accuracy with no
    attempt to limit size expansion.
  • Net effect will depend on both improvement in
    branch prediction and the penalties due to worse
    cache miss rate.
  • Rather than making maximum number of copies to
    achieve the maximum improvement in branch
    prediction, we will choose only profitable
    branches and blocks for duplication.
  • Solution
  • Overpruning sacrificing some of the branch
    accuracy in the favor of code size.

43
SCBP
  • Experimental Results
  • Benchmarks used

44
SCBP
  • Experimental Results
  • Training sets used for benchmarks

45
SCBP
  • Experimental Results
  • Additional information about benchmarks

46
SCBP
  • Experimental Results
  • Number of paths profiled as a function of history
    depth

47
SCBP
  • Experimental Results
  • Times for profiling

48
SCBP
  • Experimental Results
  • Sizes of original and transformed code

49
SCBP
  • Experimental Results
  • Mis-predict rate and cache miss rate

50
Reference Papers
  • Static Correlated Branch Prediction
  • Cliff Young
  • Bell Laboratories
  • and
  • Michael D. Smith
  • Harvard University

51
How to Implement optimization Based on Path
Profiling in Trimaran?
52
General Idea
Text file
Impact
Elcor
simu
53
How to Create a Path Profile?
Insert code inside the C program
Rebel Cmpp Add . Sub
C Code If() bb1 Add Sub Bb1
PATH_FILE 4 1 2 3 4 3 2 4 5 6 8
Codegen
a.out
54
Go Back To Elcor
  • 2 steps
  • Create small function to read input from text
    file and generate appropriate data structure to
    store it.
  • Create an optimization code.

55
Compiler Switch Optimization
  • Peter Schwartz
  • Ibrahim Bashir
  • April 8, 2002

56
Importance of Switches
  • The paper
  • A case study on the importance of compiler and
    other optimizations for improving super-scalar
    processor performance
  • by Duvall, Andersen, Leggoe, Graham, Cooke, and
    Antonio
  • 1999
  • What they did
  • Started with a FORTRAN program
  • Optimized by changing compiler switches
  • Tested results on 2 machines

57
Program and Optimizations
  • Program
  • Written in FORTRAN
  • Modeled spherical particle transport phenomena
  • 2500 lines of code
  • 25 subroutines
  • Many nested loops
  • Optimizations

58
Experiment and Results
  • Test machines
  • IBM SP
  • 160MHz POWER2 CPU
  • 4 nodes
  • DEC Alpha
  • 667MHz 21164 CPU
  • 1 node
  • Results

59
Conclusions
  • Other switch settings gave similar results
  • Authors only reported good results
  • Good switch settings gave speedup of 10
  • Used experts to set switches
  • We want to automate the process

60
Genetic Algorithm Parallelisation System (GAPS)
  • The paper
  • GAPS Iterative Feedback Directed Parallelisation
    Using Genetic Algorithms
  • by Andy Nisbet
  • 1998
  • What he did
  • Started with FORTRAN program
  • Used genetic algorithms to find a good sequence
    of compiler optimizations
  • Tested with different numbers of processors

61
The GAPS Approach
62
Population and Evaluation
  • Population initialization
  • Uses domain information
  • Population size 20
  • Individuals represent sequences of
    transformations
  • Fitness evaluation
  • Transformations encoded in individual applied to
    original code
  • Transformed code compiled and run on benchmark
  • Faster execution higher fitness score
  • Illegal code gets lowest fitness

63
Selection and Reproduction
  • Selection
  • Probability linear normalization of fitness
  • Illegal sequences can still reproduce
  • Elitism - only members of weaker half selected
    for deletion
  • Reproduction
  • All transformations changed loop structures
  • Crossover and mutation were specific to this
    domain
  • Details are irrelevant

64
Experiment
  • FORTRAN code
  • Machine
  • SGI Origin 2000
  • 1, 2, 4, 8, and 16 processors
  • Compilers
  • GAPS
  • PFA (Native SGI)
  • Petit

65
Results
  • GAPS performed best
  • 44 faster than PFA
  • 37 faster than Petit
  • Took about 24 hours to produce 20,000
    individuals
  • Only 5.5 of GAPS individuals were legal

66
Conclusions
  • Genetic algorithms can be applied to compiler
    optimizations
  • Can require many reproductions
  • Can take a long time
  • Our search space should be much smaller

67
Adaptive Program Optimization
68
Motivation
  • Trying to find best combination of optimizations
    to apply to a given program
  • Might not be possible to determine the
    applicability of certain transformations at
    compile-time
  • Optimization decisions are limited by lack of
    information about the input data set
  • Drawbacks of profiling
  • Profiling makes use of information collected
    during previous program runs
  • Data collected through profiling is based on
    program input that may not be representative of
    current input
  • Profiles may also become inaccurate if the
    machine configuration changes
  • Adaptive optimization techniques attempt to gain
    more accurate info by making decisions at
    run-time based on the current input and machine
    configuration

69
Adaptive Optimization
  • Rather than applying a particular transformation
    at compile-time, generate adaptive code which can
    behave like transformed code (if needed) at
    run-time
  • Adaptive programs can be thought of as having
    multiple execution paths
  • Selection of a particular execution path is based
    on run-time information/values
  • More practical than multiversioning
  • Cost of run-time analysis is small compared to
    benefits if transformation can be applied

70
Dynamic Optimization
  • Technique similar to adaptive optimization
    perform optimizations dynamically as information
    becomes available to apply and evaluate them
  • Multiversioning
  • Compiler generates multiple versions of a code
    section, and the most appropriate variant is
    selected at run-time based on current input data
    and/or machine environment
  • Major limitation is that variants are generated
    at compile-time and therefore no run-time info
    can be exploited during code generation
  • Creating enough code variants to cover all
    possible scenarios can lead to significant code
    growth, so typically only a few versions are
    created for each code section

71
Dynamic Optimization (2)
  • Dynamic Feedback
  • Technique that selects from compile-time code
    variants, but uses run-time sampling to choose
  • Same problems as multiversioning no run-time
    info used in code generation and code explosion
  • Sampling phase measure execution time for each
    optimization generated at compile-time
  • User-defined duration doesnt monitor changes in
    environment
  • Production phase use the variant with the
    smallest execution time
  • More problems
  • No guarantee that input data/environment are
    constant during sampling phase, so it may be
    unreasonable to compare performance of variants
  • Behavior during sampling phase may not model
    behavior during production phase

72
Dynamic Optimization (3)
  • Dynamic Compilation
  • Generates new code variants during program
    execution
  • Makes use of run-time information
  • Overhead exceeds multiversioning and dynamic
    feedback
  • Program execution is paused as new code variants
    are generated
  • Generating code at run-time is expensive
  • Due to high overhead, only applied to code
    sections that will benefit
  • Difficult to automate selection of such sections

73
Why use adaptive optimization?
  • Applicability
  • Conservative compile-time analysis may exclude
    some optimizations
  • Run-time test can determine whether or not a
    transformation applies
  • Usefulness
  • Whether or not an optimization will be useful may
    depend on program characteristics not known at
    compile-time
  • Selection
  • When there is more than one applicable
    optimization, selecting the most suitable one may
    require run-time info
  • Adaptive code can behave either as untransformed
    code or code that has been transformed using one
    or more optimizations

74
Adaptive optimization vs. Multiversioning
  • Drawback of multiversion programs is the
    resulting code growth
  • Experience has shown that in many situations
    multiple transformations must be applied to gain
    the desired optimization benefit
  • Number of versions grows exponentially
  • Same problem as compiler switch selection
  • Adaptive code avoids exponential code growth by
    requiring execution of some additional
    instructions at run-time
  • Despite run-time overhead due to additional
    instructions, adaptive programs can realize a
    large fraction of the speedup achievable by
    corresponding multiversion programs
  • Results adaptive version had 40-80 of the
    speedup achieved by multiversion program

75
How adaptive optimization works
  • Achieves effects of transformations by
  • Adapting flow of control
  • Modifying bounds of loop variables
  • Adapting usage of loop variables in array
    subscript expressions
  • Choosing between serial and parallel execution of
    loops
  • Example loop fusion
  • Execution of loop iterations is interleaved
  • Adaptive code contains back to back loops whose
    execution can either be interleaved or carried
    out in sequence
  • Set Boolean flags based on run-time info, and use
    value of flags to decide between
    original/transformed execution
  • Adaptive transformation reduces code growth
    associated with multiversioning by introducing
    additional predicates and branches
  • Some problems
  • No mention of how run-time info is used to select
    a transformation
  • Only applied to a few loop transformations

76
Our project
  • Start out with a large number of Elcor/Impact
    optimization switches
  • Phase 1 narrow down switches to some reasonable
    number
  • Whats a reasonable number?
  • Might be decided based on performance analysis
  • How exactly do we narrow down the switches?
  • Couple of different techniques use empirical
    analysis to decide on one
  • Phase 2 feed reasonable number of switches into
    a genetic algorithm
  • Let genetic algorithm run for a really long time
  • Refine this manageable number of switches into an
    optimal combination (near-optimal?)

77
Our project (2)
  • Phase 3 analysis
  • What kind of performance improvements do we get?
  • Looking for reduction in cycle count
  • How close are we to the optimal combination of
    switches?
  • Probably wont know the exact optimal combination
    due to time constraints
  • Is it any better than what Trimaran does by
    default?
  • Evaluate how good Trimaran defaults are
  • Is it any better than what a programmer with some
    knowledge of the program would pick?
  • Any unexpected optimization inter-dependencies?

78
Global Predicate Analysis and its Application to
Register Allocation.
  • 1996
  • William Faris

79
Abstract
  • VLIW machines can exploit ILP
  • Predicated execution is a useful tool
  • Unfortunately predicated code can confuse
    traditional optimization techniques that compiler
    uses
  • Solution is to modify traditional compiler
    optimizations to make them predicate aware

80
Global and local
  • Impact already does this type of pred analysis
    but it does it on a hyper block basis
  • Authors claim there are benefits to doing
    predicate analysis with a global scope (they do
    procedure scope sort of)
  • An example of a global view with partially
    if-converted code

81
  • p, q, s false
  • If () then
  • p,q cmpp.un.uc () if true
  • x if p
  • else
  • r,s cmpp.un.uc () if true
  • x if r
  • y if s
  • x if p
  • y is s
  • A register allocater that was only aware of
    per block information would detect an
    interference between x and y. Need to know p and
    s can never be true at the same time. This
    results in a spurious interference edge in the
    coloring algorithm.

82
How do you analyze predicated code?
  • Build up a partition graph.
  • Query this partition graph as you perform
    optimization techniques.
  • Some old terms

83
Building up the partition graph
  • Execution Trace - all instructions being executed
    from the beginning to the end in straight line
    code.
  • Domain - all predicates have a domain, a trace
    belongs to a domain p if all instructions of the
    trace are executed when p is true.
  • Partition - divides a predicates domain into
    multiple disjoint subsets. The union of these
    subsets equals the domain.

84
Predicate Partition Graph
  • Predicate Partition Graph G (V,E)
  • V has nodes for each predicate
  • E contains directed edges p - q if p has a
    partition and q is a subset of that partition.
  • If partition p q U r then E would contain p-q
    and p-r
  • Things can get messy

85
Example
S1
F
p3
T
p2
S3
S2
F
T
S4
S5
p5
p4
S6
86
  • I1. p2,p3 cmpp.un.uc(s1 cond) if true
  • I2. p4_1 cmpp.uc(s1 cond) if true
  • I3. S2 if p2
  • I4. p5 cmpp.uc(s2 cond) if p2
  • I5. px cmpp.on(s2 cond) if p2
  • I6. p4_2 p4_1 px if true
  • I6. S3 if p3
  • I7. S5 if p5
  • I8. S4 if p4
  • I9. S6 if true

87
Building the partition graph
  • p0 is true
  • p0 is split into p3 and p2 by I1.
  • p4_1 has same condition as p3.

p0
p3
p2
p4_1
88
building partition graph .
  • I4 and I5 split partition p2 into px and p5
  • I6 makes p4_2 a new domain which is the
    complement of p5 so we add new split to p0

p0
p4_2
p3
p2
p4_1
px
p5
89
still making partition graph..
  • p4_2 is the union of p4_1 and px so we add that
    partition.
  • its a little messy.

p0
p4_2
p3
p2
p4_1
px
p5
90
what is the partition graph good for anyway
  • you can ask it queries that are useful when doing
    optimizations.
  • predicate aware register allocater uses
  • isDisjoint
  • LeastUpperBoundSum
  • LeastUpperBoundDiff

91
extending to global
  • Paper does predicate analysis on a procedure to
    get better results. Input is a CFG
  • Modify our terms
  • Trace - all executed instructions on acyclic path
    from start node of CFG to end node of CFG.
  • Trace belongs to a domain of a predicate p, if
    all instructions in a trace are executed when p
    is true.
  • Domain of a basic block - all traces where the
    basic block is executed. If BB4 can be reached
    from BB2 then domains of BB4 and BB2 are not
    disjoint.

92
how to deal with domain of a basic block
  • assign a predicate to each basic block (not real
    just used to build partition graph). Call this
    predicate a control predicate
  • Real predicates that occur in the code are called
    materialized predicates.
  • Build a single partition graph that includes
    control predicates and materialized predicates.

93
Building up the global partition graph, control
predicates first
  • More terms..
  • Critical edge - an edge whose source has more
    than one successor and whose destination has more
    than one predecessor (S2-S4) in example.
  • S4 does not post dominate S2, S4s control
    predicate cannot be a child of S2s control
    predicate in the partition graph.
  • To get around this make a virtual node with a
    virtual control predicate at a critical edge. By
    having this virtual predicate it is easier to
    build the partition graph.

94
Building up the global partition graph, control
predicates first
  • Back edges in a CFG are a problem
  • In a loop p1 and p2 may be disjoint in the
    contexts of one iteration but not over all
    iterations.
  • Paper does some waffling here, ignore back
    edges, be conservative, little impact,
    approximation, less precise
  • They make more virtual nodes that act as hooks to
    return conservative answers when partition graph
    is queried.

95
Algorithm to build partition graph for control
predicates
  • ConstructPartitionGraphForControlPredicates(CFG)

  • Create a virtual node on each critical edge or
    back edge
  • Find control equivalent nodes
  • Assign a predicate to each set of control
    equivalent nodes
  • for every node in the CFG
  • v current node
  • p control predicate assigned to v
  • if( number of successors 1)
  • Create a partition with p as the parent predicate
    and
  • predicates assigned to the successors as the
    child
  • predicates
  • if( number of predecessors 1)
  • Create a partition with p as the parent predicate
    and
  • predicates assigned to the predecessors as the
    child
  • predicates, if this partition has not been
    generated yet
  • if a non-start node, u, has no parent, create a
    partition with
  • the immediate dominator of u as the parent
    predicate, and
  • u and an implicit predicate as the child
    predicates

96
Handling the Materialized predicates
  • Straightforward. Scan basic block for cmpps.
    Depending on the type of cmpp and whether it is
    guarded or not take some action to modify the
    partition graph.
  • Algorithm for this as well

97
Materialized predicate algorithm..
  • ConstructPartitionGraphForMaterializedPredicates
  • (instruction stream CFG)
  • for every compare instruction in CFG
  • cinst current compare instruction
  • qp qualifying predicate of cinst
  • bp control predicate assigned to the basic
    block
  • containing cinst
  • p1 the first destination predicate
  • p1_old the old definition of p1 if p1 is an
    update
  • p2 the second destination predicate if exists
  • pp parent predicate
  • if( qp p0 ) pp bp else pp qp
  • switch (compare type)
  • case .un.uc
  • Create a partition with pp as the parent
    predicate
  • and p1 and p2 as child predicates
  • case .cn.cc
  • if( qp p0 ) process in the same way as a
  • .un.uc case

98
Example input for these algorithms
  • p, q, r, s false
  • if ( .. ) then
  • p,q cmpp.un.uc (...) if true
  • x .. if p
  • else
  • r,s cmpp.un.uc (...) if true
  • x .. if r
  • y .. if s
  • .. x if p
  • .. y if s

99
Algorithms create the following partition graph
p0
p_then
p_else
q
p
r
s
100
Using this global (procedure level) partition
graph they modified a register allocater.
  • Register allocater used the coloring algorithm we
    saw in class, operated on procedures
  • They compiled several SPECint-92 benchmarks two
    times.
  • Once with their global predicate analysis and
    once with local predicate analysis. They measured
    the number of colors required by the register
    allocated.

101
Results
  • Of 1009 procedures compiled 248 procedures showed
    improvement in the number of colors.
  • Of those 248 procedures that showed improvements,
    average improvement was a 20 decrease in number
    of required colors.

102
Why did most procedures not show any improvement?
  • Blame the conservative if converter.
  • More aggressive if converter would generate more
    cmpps and the algorithm would have more
    opportunities to improve.
  • Even if code was if converted it may be so simple
    local predicate analysis is sufficient.
Write a Comment
User Comments (0)
About PowerShow.com