# Automated Heuristic Refinement Applied to Sokoban - PowerPoint PPT Presentation

PPT – Automated Heuristic Refinement Applied to Sokoban PowerPoint presentation | free to view - id: 27428-MzZiZ

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Automated Heuristic Refinement Applied to Sokoban

Description:

### For offline learning, we use 'brute force' to find solutions for simple puzzles ... So one set of weights was not going to make all puzzles crumble at our feet ... – PowerPoint PPT presentation

Number of Views:819
Avg rating:3.0/5.0
Slides: 24
Provided by: dougd
Category:
Tags:
Transcript and Presenter's Notes

Title: Automated Heuristic Refinement Applied to Sokoban

1
Automated Heuristic RefinementApplied to Sokoban

2
Outline
• (Re)introduction
• Automated Heuristic Refinement
• Sokoban
• Challenges
• Other Approaches
• Our Goal
• Enhancements
• Features Used
• Our Approaches
• Regression
• Offline Learning
• Online Learning
• Results
• Conclusion

3
(Re)introductionAutomated Heuristic Refinement
• For a given problem, one might have a number of
features of any one state
• Want to combine them to get an estimate of the
distance to the goal
• One way to do this is by weighting each such
feature to get a linear combination
• Want to use machine learning to find these weights

4
(Re)introduction Sokoban
• Puzzle where a man must move a number of rocks
onto goal positions
• The man cannot move through either rocks or walls
• He can also only push rocks, and one at a time

5
Challenges
• Sokoban is very difficult
• PSPACE complete
• Irreversible states
• Deadlocks the puzzle cant be solved from
• Huge branching factor (up to 4 x of rocks)
• Long solutions (can be hundreds of pushes)
• Heuristics hard to determine and misleading
• Often cannot generalize between puzzles

6
Other Approaches
• Most solutions with limited success, only solving
a few problems in the standard set
• Rolling Stone is the most successful
• Solves over 50 of the standard puzzle set
• Many domain-specific enhancements
• Took a PhD student, a professor, and a summer
student over 2 years to do (taking several months
to do the first)

7
Our Goal
• Create features of a state of a sokoban puzzle,
and combine them to get a heuristic for running
IDA
• Find a heuristic which would lead to a more
efficient search to the goal (in number of nodes)
• Determine heuristics on smaller puzzles and
extend them to larger ones

8
Our Enhancements
• Used rock-pushes instead of man-moves for
actions
• Increases branching factor but decreases solution
depth much more
• Detects any configuration involving adjacent
rocks and walls
• Also rocks on a wall that cant be taken off
(when the goal isnt along this wall)

9
Features
• Average Manhattan Distance
• For each rock, calculate the summed horizontal
and vertical moves to each goal
• Take the average for each and sum them
• Degrees of Freedom
• Count how many possible moves the man can make
• Subtract this from the maximum possible (the
number of rocks x 4)

10
Features (contd)
• Individual Rock Distances
• Find the number of pushes to get each rock on a
goal tile, ignoring other rocks
• Sum these numbers of each rock
• Single-Rock Subproblems
• Covert all but one rock into wall tiles and solve
this sub-problem
• Sum the solution lengths for all rocks (adding a
large number for no solution)

11
Features (contd)
• Turnaround points
• For each rock on a wall, determine the distance
to where it can be taken off
• Take the average if there is more than one, and
sum over each of these rocks
• Clumping
• For each rock, sum the vertical and horizontal
distances to each other rock
• Sum these values together

12
Features (contd)
• Random feature
• Simply returned a random number
• Included for insight into the problem
• Ended up with some interesting effects (more on
this later)
• Intercept
• Always returned 1
• Used for an intercept in regression

13
Our Approaches
• For any puzzle state, we have a path of states
from the start to there
• Then for each of these states, we can extract the
values of each feature
• For offline learning we use the fact that the
heuristic for the goal should be 0
• And add one for each move before on the solution
path

14
Offline Learning
• For offline learning, we use brute force to
find solutions for simple puzzles
• Given this data, try to find weights for each
feature such that their combination results in
the correct distance to the goal
• This was done using two methods
• Regression

15
An Early (Discouraging) Result
• Using the above technique with a few features and
data sets from several simple problems, ran
stepwise regression
• This was to find the relevant of these features,
their squares, and combinations of them, and
their coefficients
• The only feature found to be relevant for these
datasets was random ( random2)

16
Cant generalize now what?
• So one set of weights was not going to make all
puzzles crumble at our feet
• Still combinations of features can be useful in
guiding search
• Considered only one puzzle at a time to train
weights for others
• Only linear combinations, not squares or
interaction, to avoid overfitting

17
Our More Modest Approach
• Start by weighting the features to zero
• Solve a simple problem by brute force
• Obtain the distances to goal and feature values
along the solution path
• Use regression or gradient descent to find
weights based on that data set
• Continue applying to harder puzzles

18
Mixed Results
• Including weighted features improved the search
by providing some guidance
• Improvements ranged from very little to searching
hundreds of times fewer nodes
• Results vary by how well our features could
describe the puzzle
• Value of training for a puzzle varies with
similarities to the training puzzle

19
Online Learning
• Also attempted to use online learning to improve
the heuristic during the search
• Whenever search reaches the IDA depth bound,
assume the nodes that are cut off are still c
steps from the goal
• Use the feature values of the states from the
start to this state to improve the weights for
the next iteration

20
More Results
• Online learning was quite successful
• This allowed us to tune weights to the current
puzzle without having to finish the puzzle, which
could take a long time
• A final result worth mentioning
• Online learning of weights allowed us to solve
the first puzzle of the standard set!

21
Conclusion
• Sokoban puzzles are designed by humans
specifically to be challenging
• Each puzzle has its own tricks what works on
one seldom works on another
• Using features of a puzzle state as heuristics
helps guide search, but their relative importance
varies by the puzzle

22
References
• Gordon S. Novak Jr. (2004) Artificial
Intelligence Lecture Notes (http//www.cs.utexas.
edu/users/novak/cs381k110.html)
• Experiments with Automatically Created
Memory-based Heuristics, R.C. Holte and Istvan
Hernadvolgyi (2000), in the Proc. of the
Symposium on Abstraction, Reformulation and
Approximation (SARA-2000), Lecture Notes in AI,
volume 1864, pp. 281-290, Springer-Verlag.
• F. Schmiedle, D. Grosse, R. Drechsler, B. Becker.
Too Much Knowledge Hurts Acceleration of Genetic
Programs for Learning Heuristics. Computational
Intelligence Theory and Applications, 2001.
• Andreas Junghanns, Jonathan Schaeffer. Sokoban A
Challenging Single-Agent Search Problem. Workshop
on Using Games as an Experimental Testbed for AI
Research, Proceedings IJCAI-97, Nagoya, Japan,
August 1997.

23
Questions?