Automated Heuristic Refinement Applied to Sokoban - PowerPoint PPT Presentation


PPT – Automated Heuristic Refinement Applied to Sokoban PowerPoint presentation | free to view - id: 27428-MzZiZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Automated Heuristic Refinement Applied to Sokoban


For offline learning, we use 'brute force' to find solutions for simple puzzles ... So one set of weights was not going to make all puzzles crumble at our feet ... – PowerPoint PPT presentation

Number of Views:819
Avg rating:3.0/5.0
Slides: 24
Provided by: dougd


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Automated Heuristic Refinement Applied to Sokoban

Automated Heuristic RefinementApplied to Sokoban
  • Doug Demyen (Grad)
  • Andrew McDonald (Undergrad)
  • Sami Wagiaalla (Undergrad)
  • Stephen Walsh (Undergrad)

  • (Re)introduction
  • Automated Heuristic Refinement
  • Sokoban
  • Challenges
  • Other Approaches
  • Our Goal
  • Enhancements
  • Features Used
  • Our Approaches
  • Regression
  • Gradient Descent
  • Offline Learning
  • Online Learning
  • Results
  • Conclusion

(Re)introductionAutomated Heuristic Refinement
  • For a given problem, one might have a number of
    features of any one state
  • Want to combine them to get an estimate of the
    distance to the goal
  • One way to do this is by weighting each such
    feature to get a linear combination
  • Want to use machine learning to find these weights

(Re)introduction Sokoban
  • Puzzle where a man must move a number of rocks
    onto goal positions
  • The man cannot move through either rocks or walls
  • He can also only push rocks, and one at a time

  • Sokoban is very difficult
  • PSPACE complete
  • Irreversible states
  • Deadlocks the puzzle cant be solved from
  • Huge branching factor (up to 4 x of rocks)
  • Long solutions (can be hundreds of pushes)
  • Heuristics hard to determine and misleading
  • Often cannot generalize between puzzles

Other Approaches
  • Most solutions with limited success, only solving
    a few problems in the standard set
  • Rolling Stone is the most successful
  • Solves over 50 of the standard puzzle set
  • Many domain-specific enhancements
  • Took a PhD student, a professor, and a summer
    student over 2 years to do (taking several months
    to do the first)

Our Goal
  • Create features of a state of a sokoban puzzle,
    and combine them to get a heuristic for running
  • Find a heuristic which would lead to a more
    efficient search to the goal (in number of nodes)
  • Determine heuristics on smaller puzzles and
    extend them to larger ones

Our Enhancements
  • Used rock-pushes instead of man-moves for
  • Increases branching factor but decreases solution
    depth much more
  • Extensive deadlock detection
  • Detects any configuration involving adjacent
    rocks and walls
  • Also rocks on a wall that cant be taken off
    (when the goal isnt along this wall)

  • Average Manhattan Distance
  • For each rock, calculate the summed horizontal
    and vertical moves to each goal
  • Take the average for each and sum them
  • Degrees of Freedom
  • Count how many possible moves the man can make
  • Subtract this from the maximum possible (the
    number of rocks x 4)

Features (contd)
  • Individual Rock Distances
  • Find the number of pushes to get each rock on a
    goal tile, ignoring other rocks
  • Sum these numbers of each rock
  • Single-Rock Subproblems
  • Covert all but one rock into wall tiles and solve
    this sub-problem
  • Sum the solution lengths for all rocks (adding a
    large number for no solution)

Features (contd)
  • Turnaround points
  • For each rock on a wall, determine the distance
    to where it can be taken off
  • Take the average if there is more than one, and
    sum over each of these rocks
  • Clumping
  • For each rock, sum the vertical and horizontal
    distances to each other rock
  • Sum these values together

Features (contd)
  • Random feature
  • Simply returned a random number
  • Included for insight into the problem
  • Ended up with some interesting effects (more on
    this later)
  • Intercept
  • Always returned 1
  • Used for an intercept in regression

Our Approaches
  • For any puzzle state, we have a path of states
    from the start to there
  • Then for each of these states, we can extract the
    values of each feature
  • For offline learning we use the fact that the
    heuristic for the goal should be 0
  • And add one for each move before on the solution

Offline Learning
  • For offline learning, we use brute force to
    find solutions for simple puzzles
  • Given this data, try to find weights for each
    feature such that their combination results in
    the correct distance to the goal
  • This was done using two methods
  • Gradient Descent
  • Regression

An Early (Discouraging) Result
  • Using the above technique with a few features and
    data sets from several simple problems, ran
    stepwise regression
  • This was to find the relevant of these features,
    their squares, and combinations of them, and
    their coefficients
  • The only feature found to be relevant for these
    datasets was random ( random2)

Cant generalize now what?
  • So one set of weights was not going to make all
    puzzles crumble at our feet
  • Still combinations of features can be useful in
    guiding search
  • Considered only one puzzle at a time to train
    weights for others
  • Only linear combinations, not squares or
    interaction, to avoid overfitting

Our More Modest Approach
  • Start by weighting the features to zero
  • Solve a simple problem by brute force
  • Obtain the distances to goal and feature values
    along the solution path
  • Use regression or gradient descent to find
    weights based on that data set
  • Continue applying to harder puzzles

Mixed Results
  • Including weighted features improved the search
    by providing some guidance
  • Improvements ranged from very little to searching
    hundreds of times fewer nodes
  • Results vary by how well our features could
    describe the puzzle
  • Value of training for a puzzle varies with
    similarities to the training puzzle

Online Learning
  • Also attempted to use online learning to improve
    the heuristic during the search
  • Whenever search reaches the IDA depth bound,
    assume the nodes that are cut off are still c
    steps from the goal
  • Use the feature values of the states from the
    start to this state to improve the weights for
    the next iteration

More Results
  • Online learning was quite successful
  • This allowed us to tune weights to the current
    puzzle without having to finish the puzzle, which
    could take a long time
  • A final result worth mentioning
  • Online learning of weights allowed us to solve
    the first puzzle of the standard set!

  • Sokoban puzzles are designed by humans
    specifically to be challenging
  • Each puzzle has its own tricks what works on
    one seldom works on another
  • Using features of a puzzle state as heuristics
    helps guide search, but their relative importance
    varies by the puzzle

  • Gordon S. Novak Jr. (2004) Artificial
    Intelligence Lecture Notes (http//www.cs.utexas.
  • Experiments with Automatically Created
    Memory-based Heuristics, R.C. Holte and Istvan
    Hernadvolgyi (2000), in the Proc. of the
    Symposium on Abstraction, Reformulation and
    Approximation (SARA-2000), Lecture Notes in AI,
    volume 1864, pp. 281-290, Springer-Verlag.
  • F. Schmiedle, D. Grosse, R. Drechsler, B. Becker.
    Too Much Knowledge Hurts Acceleration of Genetic
    Programs for Learning Heuristics. Computational
    Intelligence Theory and Applications, 2001.
  • Andreas Junghanns, Jonathan Schaeffer. Sokoban A
    Challenging Single-Agent Search Problem. Workshop
    on Using Games as an Experimental Testbed for AI
    Research, Proceedings IJCAI-97, Nagoya, Japan,
    August 1997.