SinglePerson Game - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

SinglePerson Game

Description:

examples: Solitaire, dragons and dungeons, Rubik's cube. little attention in AI ... Min explores the next sub-tree, and finds a value that is worse than the other ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 49

Provided by: gmitwe

Category:

more less

Transcript and Presenter's Notes

Title: SinglePerson Game

1
Single-Person Game

conventional search problem
identify a sequence of moves that leads to a
winning state
examples Solitaire, dragons and dungeons,
Rubiks cube
little attention in AI
some games can be quite challenging
some versions of solitaire
a heuristic for Rubiks cube was found by the
Absolver program

2
Two-Person Game

games with two opposing players
often called MAX and MIN
usually MAX moves first, then min
in game terminology, a move comprises one step,
or play by each player
Typically you are Max
MAX wants a strategy to find a winning state
no matter what MIN does
Max must assume MIN does the same
or at least tries to prevent MAX from winning

3
Perfect Decisions

Optimal strategy for MAX
traverse all relevant parts of the search tree
this must include possible moves by MIN
identify a path that leads MAX to a winning state
So MAX must do some work to estimate all the
possible moves (to a certain depth) from the
current position and try to plan the best way
forward such that he will win
often impractical
time and space limitations

4
Nodes are discovered using DFS Once leaf nodes
are discovered they are scored Here Max is
building a tree of possibilities Which way should
he play when the tree is finished?
Maxs possible moves
Minss possible moves
Maxs moves
Minss moves
4
7
9
5
Max-Min Example
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

terminal nodes values calculated from the
utility function

6
MiniMax Example
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

other nodes values calculated via minimax
algorithm
Here the green nodes pick the minimum value from
the nodes underneath

7
MiniMax Example
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

other nodes values calculated via minimax
algorithm
Here the red nodes pick the maximum value from
the nodes underneath

8
MiniMax Example
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

other nodes values calculated via minimax
algorithm

9
MiniMax Example
5
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
10
MiniMax Example
5
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

moves by Max and countermoves by Min

11
MiniMax Observations

the values of some of the leaf nodes are
irrelevant for decisions at the next level
this also holds for decisions at higher levels
as a consequence, under certain circumstances,
some parts of the tree can be disregarded
it is possible to still make an optimal decision
without considering those parts

12
What is pruning?

You dont have to look at every node in the tree
discards parts of the search tree
That are guaranteed not to contain good moves
results in substantial time and space savings
as a consequence, longer sequences of moves can
be explored
the leftover part of the task may still be
exponential, however

13
Alpha-Beta Pruning

extension of the minimax approach
results in the same move as minimax, but with
less overhead
prunes uninteresting parts of the search tree
certain moves are not considered
wont result in a better evaluation value than a
move further up in the tree
they would lead to a less desirable outcome
applies to moves by both players
a indicates the best choice for Max so far never
decreases
b indicates the best choice for Min so far never
increases

14
Note

For the following example remember
Nodes are found with DFS
As a terminal or leaf node (or at a temporary
terminal node at a certain depth) is found a
utility function gives it a score
So do we need to evaluate F and G once we find E?
Can we prune the tree?

5
E
F
G
15
Alpha-Beta Example 1
-8, 8
5
Max
Local Values
-8, 8
Min
a best choice for Max ? b best choice for Min ?
Global Values

Step 1 --- we expand the tree a little
we assume a depth-first, left-to-right search as
basic strategy
the range ( -8, 8 ) of the possible values for
each node are indicated
initially the local values -8, 8 reflect the
values of the sub-trees in that node from Maxs
or Mins perspective
Since we havent expanded they are infinite
the global values ? and ? are the best overall
choices so far for Max or Min

16
Alpha-Beta Example 2
-8, 8
5
Max
-8, 7
Min
7
a best choice for Max ? b best choice for Min 7
_at_ min node if value of node lt value of parent
then abandon. NO because no a yet

We evaluate a node
Min obtains the first value from a successor node

17
Alpha-Beta Example 3
-8, 8
5
Max
-8, 6
Min
7
6
a best choice for Max ? b best choice for Min 6
_at_ min node if value of node lt value of parent
then abandon. NO

Min obtains the second value from a successor node

18
Alpha-Beta Example 4
5, 8
5
Max
5
Min
7
6
5
a best choice for Max 5 b best choice for Min 5
_at_ min node if value of node lt value of parent
then abandon. No more nodes

Min obtains the third value from a successor node
this is the last value from this sub-tree, and
the exact value is known
Min is finished on this branch
Max now has a value for its first successor node,
but hopes that something better might still come

19
Alpha-Beta Example 5
5, 8
5
Max
5
-8, 3
Min
7
6
5
3
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. YES!

Min continues with the next sub-tree, and gets a
better value
Max has a better choice from its perspective (the
5) and should not consider a move into the
sub-tree currently being explored by Min

20
Alpha-Beta Example 6
5, 8
5
Max
5
-8, 3
Min
7
6
5
3
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. YES!

Max wont consider a move to this sub-tree it
would choose 5 first gtabandon expansion
this is a case of pruning, indicated by
Pruning means that we wont do a DFS into these
nodes
No need to use the utility function to calculate
the nodes value
No need to explore that part of the tree further.

21
Alpha-Beta Example 7
5, 8
5
Max
5
-8, 3
-8, 6
Min
7
6
5
6
3
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. NO

Min explores the next sub-tree, and finds a value
that is worse than the other nodes at this level
Min knows this 6 is good for Max and will then
evaluate the leaf nodes looking for a lower value
(if it exists)
if Min is not able to find something lower, then
Max will choose this branch

22
Alpha-Beta Example 8
5, 8
5
Max
5
-8, 3
-8, 5
Min
7
6
5
6
3
5
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. YES!

Min is lucky, and finds a value that is the same
as the current worst value at this level
Max can choose this branch, or the other branch
with the same value

23
Alpha-Beta Example 9
5
5
Max
5
-8, 3
-8, 5
Min
7
6
5
6
3
5
a best choice for Max 5 b best choice for Min 3

Min could continue searching this sub-tree to see
if there is a value that is less than the current
worst alternative in order to give Max as few
choices as possible
this depends on the specific implementation
Max knows the best value for its sub-tree

24
Alpha-Beta Example Overview
5
5
Max
Min
5
lt5
lt 3
a best choice for Max 5 b best choice for Min 3

some branches can be pruned because they would
never be considered
after looking at one branch, Max already knows
that they will not be of interest since Min would
choose a value that is less than what Max already
has at its disposal

25
Properties of Alpha-Beta Pruning

in the ideal case, the best successor node is
examined first
alpha-beta can look ahead twice as far as minimax
assumes an idealized tree model
uniform branching factor, path length
random distribution of leaf evaluation values
requires additional information for good players
game-specific background knowledge
empirical data

26
Imperfect Decisions

Does this mean I have to expand the tree all the
way???
complete search is impractical for most games
alternative search the tree only to a certain
depth
requires a cutoff-test to determine where to stop

27
Evaluation Function

If I stop part of the way down how do I score
these nodes (which are not terminal nodes!)
Use an evaluation function to score these nodes!
must be consistent with the utility function
values for terminal nodes (or at least their
order) must be the same
tradeoff between accuracy and time cost
without time limits, minimax could be used

28
Example Tic-Tac-Toe

simple evaluation function
E(s) (rx cx dx) - (ro co do)
where r,c,d are the number of rows, columns
and diagonals lines available for a win for X (or
O) in that state
x and o are the pieces of the two players
1-ply lookahead
start at the top of the tree
evaluate all 9 choices for player 1
pick the maximum E-value
2-ply lookahead
also looks at the opponents possible move
assuming that the opponents picks the minimum
E-value

29
Tic-Tac-Toe 1-Ply
E(s0) MaxE(s11), E(s1n) Max2,3,4 4
E(s12) 8 - 6 2
E(s13) 8 - 5 3
E(s14) 8 - 6 2
E(s15) 8 - 4 4
E(s16) 8 - 6 2
E(s17) 8 - 5 3
E(s18) 8 - 6 2
E(s19) 8 - 5 3
E(s11) 8 - 5 3
X
X
X
X
X
X
X
X
X
E(s11) E of state _at_ depth-level 1 number
1 Simple evaluation function E(s11) (rx
cx dx) - (ro co do) E(s11) X has
(3rows 3 cols 2 diags) O has (2r 2c
1d) E(s11) 8 -5 3
30
Tic-Tac-Toe 2-Ply
E(s0) MaxE(s11), E(s1n) Max2,3,4 4
E(s16) 8 - 6 2
E(s17) 8 - 5 3
E(s18) 8 - 6 2
E(s19) 8 - 5 3
E(s15) 8 - 4 4
E(s13) 8 - 5 3
E(s12) 8 - 6 2
E(s11) 8 - 5 3
E(s14) 8 - 6 2
X
X
X
X
X
X
X
X
X
E(s248) 5 - 4 1
E(s247) 6 - 4 2
E(s245) 6 - 4 2
E(s241) 5 - 4 1
E(s242) 6 - 4 2
E(s243) 5 - 4 1
E(s244) 6 - 4 2
E(s246) 5 - 4 1
O
O
O
X
O
X
X
O
X
X
X
X
X
O
O
O
E(s216) 5 - 6 -1
E(s215) 5 -6 -1
E(s213) 5 - 6 -1
E(s29) 5 - 6 -1
E(s210) 5 -6 -1
E(s211) 5 - 6 -1
E(s212) 5 - 6 -1
E(s214) 5 - 6 -1
X
X
X
X
X
X
O
O
X
X
O
O
O
O
O
O
E(s28) 5 - 5 0
E(s27) 6 - 5 1
E(s25) 6 - 5 1
E(s21) 6 - 5 1
E(s22) 5 - 5 0
E(s23) 6 - 5 1
E(s24) 4 - 5 -1
E(s26) 5 - 5 0
X
X
X
X
X
X
O
X
O
X
O
O
O
O
O
O
Note For 2-Ply we must expand all nodes in the
1st level but as scores of 2 and 3 repeat a few
times we only expand one of each.
31
Tic-Tac-Toe 2-Ply
E(s0) MaxE(s11), E(s1n) Max2,3,4 4
E(s16) 8 - 6 2
E(s17) 8 - 5 3
E(s18) 8 - 6 2
E(s19) 8 - 5 3
E(s15) 8 - 4 4
E(s13) 8 - 5 3
E(s12) 8 - 6 2
E(s11) 8 - 5 3
E(s14) 8 - 6 2
X
X
X
X
X
X
X
X
X
E(s248) 5 - 4 1
E(s247) 6 - 4 2
E(s245) 6 - 4 2
E(s241) 5 - 4 1
E(s242) 6 - 4 2
E(s243) 5 - 4 1
E(s244) 6 - 4 2
E(s246) 5 - 4 1
O
O
O
X
O
X
X
O
X
X
X
X
X
O
O
O
E(s216) 5 - 6 -1
E(s215) 5 -6 -1
E(s213) 5 - 6 -1
E(s29) 5 - 6 -1
E(s210) 5 -6 -1
E(s211) 5 - 6 -1
E(s212) 5 - 6 -1
E(s214) 5 - 6 -1
X
X
X
X
X
X
O
O
X
X
O
O
O
O
O
O
E(s28) 5 - 5 0
E(s27) 6 - 5 1
E(s25) 6 - 5 1
E(s21) 6 - 5 1
E(s22) 5 - 5 0
E(s23) 6 - 5 1
E(s24) 4 - 5 -1
E(s26) 5 - 5 0
X
X
X
X
X
X
O
X
O
X
O
O
O
O
O
O
It seems that the centre X (i.e. a move to
E(s15)) is the best because its successor nodes
have a value of 1 (as opposed to 0 or worse -1)
32
Checkers Case Study

initial board configuration
Black single on 20 single on 21 king on
31
Red single on 23 king on 22
evaluation functionE(s) (5 x1 x2) - (5r1
r2)
where
x1 black king advantage,
x2 black single advantage,
r1 red king advantage,
r2 red single advantage

1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
17
18
19
20
24
22
21
23
25
26
27
28
31
32
30
29
33
Part 1 MiniMax using DFS
31 -gt 27
20 -gt 16
MAX moves
21 -gt 17
31 -gt 26
MIN moves
22 -gt 17
takes red king
21 -gt 14
MAX moves
Typically you expand DFS style to a certain depth
and then eval. node E(s) (5 x1 x2) - (5r1
r2) (52) (0 1) 6
34
31 -gt 27
20 -gt 16
MAX
21 -gt 17
31 -gt 26
MIN
23 -gt 32
23 -gt 30
22 -gt 31
22 -gt 13
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
23 -gt 27
MAX
16 -gt 11
16 -gt 11
31 -gt 27
31 -gt 27
31 -gt 24
21 -gt 17
20 -gt 16
31 -gt 27
21 -gt 14
31 -gt 27
20 -gt 16
31 -gt 27
21 -gt 17
16 -gt 11
21 -gt 17
20 -gt 16
20 -gt 16
31 -gt 26
We fast-forward and see the full tree to a
certain depth
35
Then we score leaf nodes
31 -gt 27
20 -gt 16
MAX
21 -gt 17
31 -gt 26
MIN
23 -gt 32
23 -gt 30
22 -gt 31
22 -gt 13
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
23 -gt 27
MAX
16 -gt 11
16 -gt 11
31 -gt 27
31 -gt 27
31 -gt 24
21 -gt 17
20 -gt 16
31 -gt 27
21 -gt 14
31 -gt 27
20 -gt 16
31 -gt 27
21 -gt 17
16 -gt 11
21 -gt 17
20 -gt 16
20 -gt 16
31 -gt 26
36
Then we propagate Min-Max scores upwards
31 -gt 27
20 -gt 16
MAX
21 -gt 17
31 -gt 26
MIN
23 -gt 32
23 -gt 30
22 -gt 31
22 -gt 13
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
23 -gt 27
MAX
16 -gt 11
16 -gt 11
31 -gt 27
31 -gt 27
31 -gt 24
21 -gt 17
20 -gt 16
31 -gt 27
21 -gt 14
31 -gt 27
20 -gt 16
31 -gt 27
21 -gt 17
16 -gt 11
21 -gt 17
20 -gt 16
20 -gt 16
31 -gt 26
37
Could we make this a little easier?

This tree will really expand some considerable
distance
What about if we introduce a little pruning
OK then but first we have to go back to the
start.

38
Checkers Min-Max With a/b pruning
Temporarily propagate values up
31 -gt 27
20 -gt 16
MAX moves
21 -gt 17
31 -gt 26
6
MIN moves
6
22 -gt 17
21 -gt 14
MAX moves
39
Checkers Alpha-Beta Example
31 -gt 27
MAX
20 -gt 16
21 -gt 17
31 -gt 26
MIN
Next node is expanded
22 -gt 17
22 -gt 18
MAX
_at_ max node if value of node gt value of parent
then abandon. No! No! No! . No . No!
21 -gt 14
31 -gt 27
16 -gt 11
Alas no pruning below this Max node!
40
Checkers Alpha-Beta Example
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
31 -gt 27
MAX
20 -gt 16
17
18
19
20
21 -gt 17
31 -gt 26
24
22
21
23
25
26
27
28
31
32
30
29
Notice new value propagated up!
MIN
22 -gt 17
22 -gt 18
22 -gt 25
_at_ max node if value of node gt value of parent
then abandon. No! So its in the interest of Max
node to see if it can do better No! etc No!
etc . No etc . No!
MAX
16 -gt 11
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
41
Checkers Alpha-Beta Example
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
31 -gt 27
MAX
20 -gt 16
17
18
19
20
21 -gt 17
31 -gt 26
24
22
21
23
25
26
27
28
31
32
30
29
MIN
22 -gt 17
22 -gt 18
22 -gt 25
Its not in MAXs interest to bother expanding
these nodes as they have the same value as 16-gt
11 so we prune them
MAX
16 -gt 11
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
42
Checkers Alpha-Beta Example
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
31 -gt 27
MAX
20 -gt 16
17
18
19
20
21 -gt 17
31 -gt 26
24
22
21
23
25
26
27
28
31
32
30
29
1
MIN
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
MAX
16 -gt 11
16 -gt 11
31 -gt 22
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
43
Checkers Alpha-Beta Example
31 -gt 27
MAX
20 -gt 16
21 -gt 17
31 -gt 26
1
MIN
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
MAX
16 -gt 11
16 -gt 11
31 -gt 22
31 -gt 27
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
44
Search Limits

search must be cut off because of time or space
limitations
strategies like depth-limited or iterative
deepening search can be used
dont take advantage of knowledge about the
problem
more refined strategies apply background
knowledge
quiescent search
cut off only parts of the search space that dont
exhibit big changes in the evaluation function

45
Games and Computers

state of the art for some game programs
Chess
Checkers
Othello
Backgammon
Go

46
Chess

Deep Blue, a special-purpose parallel computer,
defeated the world champion Gary Kasparov in 1997
the human player didnt show his best game
some claims that the circumstances were
questionable
Deep Blue used a massive data base with games
from the literature
Fritz, a program running on an ordinary PC,
challenged the world champion Vladimir Kramnik to
an eight-game draw in 2002
top programs and top human players are roughly
equal

47
Checkers

Arthur Samuel develops a checkers program in the
1950s that learns its own evaluation function
reaches an expert level stage in the 1960s
Chinook becomes world champion in 1994
human opponent, Dr. Marion Tinsley, withdraws for
health reasons
Tinsley had been the world champion for 40 years
Chinook uses off-the-shelf hardware, alpha-beta
search, end-games data base for six-piece
positions

48
Othello

Logistello defeated the human world champion in
1997
many programs play far better than humans
smaller search space than chess
little evaluation expertise available

49
Backgammon

TD-Gammon, neural-network based program, ranks
among the best players in the world
improves its own evaluation function through
learning techniques
search-based methods are practically hopeless
chance elements, branching factor

Write a Comment

User Comments (0)