Automatic Algorithm Configuration based on Local Search - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Algorithm Configuration based on Local Search

Description:

Set parameters to maximise empirical performance ... Stochastic Optimisation [Kiefer & Wolfowitz '52], [Geman & Geman '84], [Spall '87] ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 27
Provided by: Ana55
Category:

less

Transcript and Presenter's Notes

Title: Automatic Algorithm Configuration based on Local Search


1
Automatic Algorithm Configuration based on Local
Search
  • EARG presentationDecember 13, 2006
  • Frank Hutter

2
Motivation
  • Want to design best algorithm A to solve a
    problem
  • Many design choices need to be made
  • Some choices deferred to later free parameters
    of algorithm
  • Set parameters to maximise empirical performance
  • Finding best parameter configuration is
    non-trivial
  • Many parameter configurations
  • Many test instances
  • Many runs to get realistic estimates for
    randomised algorithms
  • Tuning still often done manually, up to 50 of
    development time
  • Lets automate tuning!

3
Parameters in different research areas
  • NP-hard problems tree search
  • Variable/value heuristics, learning, restarts,
  • NP-hard problems local search
  • Percentage of random steps, tabu length, strength
    of escape moves,
  • Nonlinear optimisation interior point methods
  • Slack, barrier init, barrier decrease rate, bound
    multiplier init,
  • Computer vision object detection
  • Locality, smoothing, slack,
  • Compiler optimisation, robotics,
  • Supervised machine learning
  • NOT model parameters
  • L1/L2 loss, penalizer, kernel, preprocessing,
    num. optimizer,

4
Related work
  • Best fixed parameter setting
  • Search approaches Minton 93, 96, Hutter
    04, Cavazos OBoyle 05,Adenso-Diaz
    Laguna 06, Audet Orban 06
  • Racing algorithms/Bandit solvers Birattari et
    al. 02, Smith et al. 04 06
  • Stochastic Optimisation Kiefer Wolfowitz 52,
    Geman Geman 84, Spall 87
  • Per instance
  • Algorithm selection Knuth 75, Rice 1976,
    Lobjois and Lemaître 98, Leyton-Brown et al.
    02, Gebruers et al. 05
  • Instance-specific parameter setting Patterson
    Kautz 02
  • During algorithm run
  • Portfolios Kautz et al. 02, Carchrae Beck
    05, Gagliolo Schmidhuber 05, 06
  • Reactive search Lagoudakis Littman 01, 02,
    Battiti et al 05, Hoos 02

5
Static Algorithm Configuration (SAC)
  • SAC problem instance 3-tuple (D,A,Q), where
  • D is a distribution of problem instances,
  • A is a parameterised algorithm, and
  • Q is the parameter configuration space of A.

Candidate solution configuration q 2 Q, expected
cost C(q) EIDCost(A, q, I)
  • Stochastic Optimisation Problem

6
Static Algorithm Configuration (SAC)
  • SAC problem instance 3-tuple (D,A,Q), where
  • D is a distribution of problem instances,
  • A is a parameterised algorithm, and
  • Q is the parameter configuration space of A.

CD(A, q, D) cost distribution of algorithm A
with parameterconfiguration q across instances
from D. Variation due to randomisation of A and
variation in instances.
Candidate solution configuration q 2 Q, expected
cost C(q) statistic CD (A, q, I)
  • Stochastic Optimisation Problem

7
Parameter tuning in practice
  • Manual approaches are often fairly ad hoc
  • Full factorial design
  • Expensive (exponential in number of parameters)
  • Tweak one parameter at a time
  • Only optimal if parameters independent
  • Tweak one parameter at a time until no more
    improvement possible
  • Local search ! local minimum
  • Manual approaches are suboptimal
  • Only find poor parameter configurations
  • Very long tuning time
  • Want to automate

8
Simple Local Search in Configuration Space
9
Iterated Local Search (ILS)
10
ILS in Configuration Space
11
What is the objective function?
  • User-defined objective function
  • E.g. expected runtime across a number of
    instances
  • Or expected speedup over a competitor
  • Or average approximation error
  • Or anything else
  • BUT must be able to approximate objective based
    on a finite (small) number of samples
  • Statistic is expectation ! sample mean (weak law
    of large numbers)
  • Statistic is median ! sample median (converges
    ??)
  • Statistic is 90 quantile ! sample 90 quantile
    (underestimated with small samples !)
  • Statistic is maximum (supremum) ! cannot
    generally approximate based on finite sample?

12
Parameter tuning as pure optimisation problem
  • Approximate objective function (statistic of a
    distribution) based on fixed number of N
    instances
  • Beam search Minton 93, 96
  • Genetic algorithms Cavazos OBoyle 05
  • Exp. design local search Adenso-Diaz Laguna
    06
  • Mesh adaptive direct search Audet Orban 06
  • But how large should N be ?
  • Too large take too long to evaluate a
    configuration
  • Too small very noisy approximation ! over-tuning

13
Minimum is a biased estimator
  • Let x1, , xn be realisations of rvs X1, , Xn
  • Each xi is a priori an unbiased estimator of E
    Xi
  • Let xj min(x1,,xn) This is an unbiased
    estimator of E min(X1,,Xn) but NOT of E Xj
    (because were conditioning on it being the
    minimum!)
  • Example
  • Let Xi Exp(l), i.e. F(xl) 1 - exp(-lx)
  • E min(X1,,Xn) 1/n E min(Xi)
  • I.e. if we just take the minimum and report its
    runtime, we underestimate cost by a factor of n
    (over-confidence)
  • Similar issues for cross-validation etc

14
Primer over-confidence for N1 sample
y-axis runlength of best q found so
far Training approximation based on N1 sample
(1 run, 1 instance) Test 100 independent runs
(on the same instance) for each q Median
quantiles over 100 repetitions of the procedure
15
Over-tuning
  • More training ! worse performance
  • Training cost monotonically decreases
  • Test cost can increase
  • Big error in cost approximation leads to
    over-tuning (on expectation)
  • Let q 2 Q be the optimal parameter configuration
  • Let q 2 Q be a suboptimal parameter
    configuration with better training performance
    than q
  • If the search finds q before q (and it is the
    best one so far), and the search then finds q,
    training cost decreases but test cost increases

16
1, 10, 100 training runs (qwh)
17
Over-tuning on uf400 instance
18
Other approaches without over-tuning
  • Another approach
  • Small N for poor configurations q, large N for
    good ones
  • Racing algorithms Birattari et al. 02, 05
  • Bandit solvers Smith et al., 04 06
  • But treat all parameter configurations as
    independent
  • May work well for small configuration spaces
  • E.g. SAT4J has over a million possible
    configurations! Even a single run each is
    infeasible
  • My work combination of approaches
  • Local search in parameter configuration space
  • Start with N1 for each q, increase it whenever q
    re-visited
  • Does not have to visit all configurations, good
    ones visited often
  • Does not suffer from over-tuning

19
Unbiased ILS in parameter space
  • Over-confidence for each q vanishes as N!1
  • Increase N for good q
  • Start with N0 for all q
  • Increment N for q whenever q is visited
  • Can proof convergence
  • Simple property, even applies for round-robin
  • Experiments SAPS on qwh

ParamsILS with N1
Focused ParamsILS
20
No over-tuning for my approach (qwh)
21
Runlength on simple instance (qwh)
22
SAC problem instances
  • SAPS Hutter, Tompkins, Hoos 02
  • 4 continuous parameters ha,r,wp, Psmoothi
  • Each discretised to 7 values ! 74 2,401
    configurations
  • Iterated Local Search for MPE Hutter 04
  • 4 discrete 4 continuous (2 of them conditional)
  • In total 2,560 configurations
  • Instance distributions
  • Two single instances, one easy (qwh), one harder
    (uf)
  • Heterogeneous distribution with 98 instances from
    satlib for which SAPS median is 50K-1M steps
  • Heterogeneous distribution with 50 MPE instances
    mixed
  • Cost average runlength, q90 runtime,
    approximation error

23
Comparison against CALIBRA
24
Comparison against CALIBRA
25
Comparison against CALIBRA
26
Local Search for SAC Summary
  • Direct approach for SAC
  • Positive
  • Incremental homing-in to good configurations
  • But using the structure of parameter space
    (unlike bandits)
  • No distributional assumptions
  • Natural treatment of conditional parameters
  • Limitations
  • Does not learn
  • Which parameters are important?
  • Unnecessary binary parameter doubles search
    space
  • Requires discretization
  • Could be relaxed (hybrid scheme etc)
Write a Comment
User Comments (0)
About PowerShow.com