Adapting Convergent Scheduling Using Machine Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Adapting Convergent Scheduling Using Machine Learning

Description:

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson , Una-May O Reilly , Martin Martin , and Saman Amarasinghe – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 35
Provided by: MarkS307
Category:

less

Transcript and Presenter's Notes

Title: Adapting Convergent Scheduling Using Machine Learning


1
Adapting Convergent Scheduling Using Machine
Learning
  • Diego Puppin, Mark Stephenson, Una-May
    OReilly, Martin Martin, and Saman Amarasinghe

Institute for Information Science and
Technologies, Italy Massachusetts Institute of
Technology, USA
2
Outline
  • This talk shows how one can apply machine
    learning techniques to find good phase orderings
    for an instruction scheduler
  • First, Ill introduce the scheduler that we are
    interested in improving
  • Then, Ill discuss genetic programming
  • Then, Ill present experimental results

3
Clustered Architectures
  • Memory and registers separated into clusters
  • RAW
  • Clustered VLIWs
  • When scheduling, we try to co-locate data with
    computation

4
Convergent Scheduling
  • Convergent scheduling passes are symmetric
  • Each pass takes as input a preference map and
    outputs a preference map
  • Passes are modular and can be applied in any
    order

5
Convergent SchedulingPreference Maps
  • Each entry is a weight
  • The weights correspond to the confidence of a
    space-time assignment for a given instruction

6
Example Dependence Graph
  • Four clusters
  • High confidence
  • Low confidence

7
Placement Propagation
8
Critical Path Strengthening
9
Path Propagation
10
Parallelism Distribute
11
Path Propagation
12
Communication Reduction
13
Path Propagation
14
Final Schedule
15
Convergent Scheduling
  • Classical scheduling passes make absolute
    decisions that cant be undone
  • Convergent scheduling passes make soft decisions
    in the form of preferences
  • Mistakes made early on can be undone
  • Passes dont impose order!

Pass
Pass
16
Double-Edged Sword
  • The good news convergent scheduling does not
    constrain phase order
  • Nice interface makes writing and integrating
    passes easy
  • The bad news convergent scheduling does not
    constrain phase order
  • Limitless number of phase orders to consider,
    some of which are much better than others

17
Our Proposal
  • Use genetic programming to automatically search
    for a phase ordering thats catered to a given
  • Architecture
  • Compiler
  • Our inspiration comes from Coopers work Cooper
    et al., LCTES 1999

18
Genetic Programming
  • Searching algorithm analogous to Darwinian
    evolution
  • Maintain a population of expressions

(sequence INITTIME (sequence PLACE (if
imbalanced LOAD COMM)))
19
Genetic Programming
  • Searching algorithm analogous to Darwinian
    evolution
  • Maintain a population of expressions
  • Selection
  • The fittest expressions in the population are
    more likely to reproduce
  • Reproduction
  • Crossing over subexpressions of two expressions
  • Mutation

20
General Flow
  • Randomly generated initial population

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
21
General Flow
  • Compiler is modified to use the given expression
    as the phase ordering
  • Each expression is evaluated by compiling and
    running the benchmark(s)
  • Fitness is the relative speedup over our original
    phase ordering on the benchmark(s)

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
22
General Flow
  • Just as with Natural Selection, the fittest
    individuals are more likely to survive

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
23
General Flow
  • Use crossover and mutation to generate new
    expressions
  • And thus, generate new and hopefully improved
    phase orderings

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
24
Experimental Setup
  • We use an in-house VLIW compiler (SUIF,
    MachSUIF) and simulator
  • Compiler and simulator are parameterized so we
    can easily change VLIW configurations
  • Experiments presented here are for clustered
    architectures
  • Details of the architectures are in the paper

25
Convergent Scheduling Heuristics
  • Noise Introduction
  • Initial Time Assignment
  • Preplacement
  • Critical Path Strengthening
  • Communication Minimization
  • Parallelism Distribution
  • Load Balance
  • Dependence Enforcement
  • Assignment Strengthening
  • Functional Unit Distribution
  • Push to first cluster
  • Critical Path Distance
  • Cluster Creation
  • Register Pressure Reduction in Time
  • Register Pressure Reduction in Space

26
Hand-Tuned Results4-cluster VLIW, Rich
Interconnect
27
Results4-cluster VLIW, Limited Interconnect
28
Training an Improved Sequence
  • Goal find a sequence that works well for all the
    benchmarks in the last graph (vmul, rbsorf, yuv,
    etc.)
  • Train a sequence using these benchmarks then
  • For each expression in the population compile and
    run all the benchmarks, take the average speedup
    as fitness

29
The Schedule
  • Evolved sequence is much more conservative in
    communication
  • inittime ?func ?dep ?func ?load ?func ?dep ?func
    ?comm ?dep ?func ?comm ?place
  • func reduces weights of instructions on
    overloaded clusters
  • dep increases probability that dependent
    instruction scheduled nearby
  • comm tries to keep neighboring instructions in
    same cluster

30
Results4-cluster VLIW, Limited Interconnect
31
ResultsLeave-One-Out Cross Validation
32
Summary of Results
  • When we changed the architecture, the hand-tuned
    sequence failed
  • UAS and PCC outperform convergent scheduling
  • Our GP system found a sequence that usually
    outperforms UAS and PCC
  • Cross validation suggests that it is possible to
    find a general-purpose sequence

33
Running Time
  • Using about 20 machines in a small cluster of
    workstations it takes about 2 days to evolve a
    sequence
  • This is a one-time process!
  • Performed by the compiler vendor

34
Disappointing Result
  • Unfortunately, sequences with conditionals are
    weeded out of the GP selection process
  • Our system rewards parsimony
  • Convergent scheduling passes make soft decisions,
    so running an extra pass may not be detrimental
  • Wed like to get to the bottom of this unexpected
    result

35
Conclusions
  • Using GP were able to find architecture-specific,
    application-independent sequences
  • We can quickly retune the compiler when
  • The architecture changes
  • The compiler itself changes

36
(No Transcript)
37
Implemented Tests
Write a Comment
User Comments (0)
About PowerShow.com