Adaptive Agents for Modern Strategy Games: an Approach Based on Reinforcement Learning

About This Presentation

Title:

Adaptive Agents for Modern Strategy Games: an Approach Based on Reinforcement Learning

Description:

Age of Empires (Ensemble Studios / Microsoft) Act of War (Eugen Systems / Atari) ... TD-Gammon became the best Backgammon player in the world [Tesauro 2002] But... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 42

Provided by: vincentc

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Agents for Modern Strategy Games: an Approach Based on Reinforcement Learning

1
Adaptive Agents for Modern Strategy Games an
Approach Based on Reinforcement Learning
Thèse de Doctorat de lUniversité Paris 6

Charles A. G. MADEIRA
Supervisor Vincent CORRUBLE,
Under the direction of Jean-Gabriel GANASCIA

2
Modern strategy games
Act of War (Eugen Systems / Atari)
Age of Empires (Ensemble Studios / Microsoft)
Imperial Glory (Pyro Studios / Eidos Interactive)
Battleground (John Tiller Games / Talonsoft)
3
Background

Artificial Intelligence allows players to play
when
Human opponents are not available ? Artificial
opponents
Players would like to control only some units ?
Assistants
AI must propose realistic behaviors in order to
entertain players Nareyek 2002, 2004
Adapting behavior to new situations
Suggesting new experiences to players
However, industrial development is dominated by
rule-based systems Rabin 2002, 2003, 2006
Fixed reasoning and hard programming
Confirmed players can easily detect the adopted
strategy

4
Alternative solutions for modern strategy games

Machine learning
It can automatically learn believable strategies
This is a worthy alternative for problems where
believable strategies are
unknown
hard to code
the environment is stochastic
Online learning is well adapted to games
There is no supervisor
It follows the principle of trial/error
It is based on expected rewards

5
Reinforcement learning (RL) Samuel 1959Sutton
et Barto 1998

The agent learns to converge to an optimal
strategy by interacting with his environment
Sequential decision-making
Stochastic and unknown environment
Great practical results have been obtained on
complex problems
TD-Gammon became the best Backgammon player in
the world Tesauro 2002
But we are interested in much more complex
problems
Our case study is a commercial wargame named
Battleground

6
Case study Battleground
Supply wagon
French cavalry
French infantry
Embankment
Russian skirmisher
Russian infantry
French artillery
Forest
Objectives to conquer or defend
Rendered scene
7
Difficulties for applying RL to modern games
Backgammon
Battleground (John Tiller Games / Talonsoft)

How to represent an environment state whether one
cannot do it by employing an entire situation?
How to choose consistent actions for a set of
agents?

8
Research direction

Multiagent distributed systems Weiss 2000
Capacities of perception and decision-making are
distributed
However, classical RL methods are not well
adapted to this approach
Each agent observes a non-Markov environment
Convergence of RL algorithms is not guaranteed
A collective effort of all agents is required in
order to design coherent and optimal solutions
Multiagent coordination problem Malone et
Crowston 1994 Boutilier 1996

9
Reinforcement learning of coordination

Approaches organized in three main groups

Multiagent Markov
Emergent Knowledge-based
Decision Process coordination
coordination Littman
1994, 2001 Crites et Barto
1998 Dietterich 2000
Uther et Veloso 1997 Sen
et Weiss 2000 Barto et Mahadevan
2003 Hu et Wellman 1998, 2003
Wolpert et Tumer 1999 Boutilier
et al. 2000 Claus et Boutilier 1998
Riedmiller et Merke 2002
Guestrin et al. 2003
10
STRADA approach Madeira et al. 2004, 2005, 2006

Decomposing the decision-making process
Modern strategy games present an organization in
groups
Hierarchical structure of decision-making
Representing adequately state and action spaces
Modern strategy games use a geographical map
Adapting granularity by an automatic terrain
analysis algorithm
Terrain analysis Rabin 2003Grindle et al.
2004
Qualitative spatial reasoning Cohn et Hazarika
2001
Generalizing behavior strategies
Function approximators
Defining interesting learning scenarios
Learning by levels of the hierarchy
Playing initially against an opponent other than
oneself

11
STRADA applied to Battleground

Decomposing the decision-making process
Modern strategy games present an organization in
groups
Hierarchical structure of decision-making
Representing adequately state and action spaces
Modern strategy games use a geographical map
Adapting granularity by an automatic terrain
analysis algorithm
Generalizing behavior strategies
Function approximators
Defining interesting learning scenarios
Learning by levels of the hierarchy
Playing initially against an opponent other than
oneself

12
Hierarchical structure of command and control
Leader of Army
Long-term objective (Strategy)
Order
Situation

Leader of Corps
LC
Situation
Order

Leader of Division
LD
Order
Order
Situation

Leader of Brigade
Leader of Brigade
Situation
Order
Order
Situation
Front-line Units
Front-line Units
Specific action (Tactic)

Battery of artillery
Regiment of cavalry
Battalion of infantry
Battery of artillery
Regiment of cavalry
Battalion of infantry
Actions
Perception
Actions
Perception
13
STRADA applied to Battleground

Decomposing the decision-making process
Modern strategy games present an organization in
groups
Hierarchical structure of decision-making
Representing adequately state and action spaces
Modern strategy games use a geographical map
Adapting granularity by an automatic terrain
analysis algorithm
Generalizing behavior strategies
Function approximators
Defining interesting learning scenarios
Learning by levels of the hierarchy
Playing initially against an opponent other than
oneself

14
Abstraction of action space

Definition of high-level orders
Extreme attack, attack, wait, defend, extreme
defend
Identification of key locations on the map
Strategic action space A
A high-level orders X key locations

10180
33
15
Abstraction of state space

Situation of a group of units for the 1st level
of the hierarchy
Center of mass, strength, fatigue, quality,
movement allowance
Situation of units placed on zones for the 1st
level of the hierarchy
Strength and fatigue by side
Identification of strategic zones on the map
Environment state space S
S center of mass X strength X fatigue X quality
X movement allowance X strength by zone and side
X fatigue by zone and side

1
3
102000
1082
5
4
6
2
16
Abstraction of state space

Situation of a group of units for the 2nd level
of the hierarchy
Order given, center of mass, strength, fatigue,
quality, movement allowance
Situation of units placed on zones for the 2nd
level of the hierarchy
Strength and fatigue by side
Identification of strategic zones on the map
Environment state space S
S order given X center of mass X strength X
fatigue X quality X movement allowance X strength
by zone and side X fatigue by zone and side

C
1
A
3
F
5
D
E
B
J
I
4
M
K
6
G
2
L
H
17
STRADA applied to Battleground

Decomposing the decision-making process
Modern strategy games present an organization in
groups
Hierarchical structure of decision-making
Representing adequately state and action spaces
Modern strategy games use a geographical map
Adapting granularity by an automatic terrain
analysis algorithm
Generalizing behavior strategies
Function approximators
Defining interesting learning scenarios
Learning by levels of the hierarchy
Playing initially against an opponent other than
oneself

18
Function approximators

Neural network
Neural networks CMAC
Albus 1975
(1)
(2)
(3)

s
s
Q(s,a1)
Q(s,a1)
Q(s,a1)
s
s
Q(s,a2)
Q(s,a2)

s
Q(s,a2)
Q(s,an)

s
Q(s,an)
s
Q(s,an)
19
STRADA applied to Battleground

Decomposing the decision-making process
Modern strategy games present an organization in
groups
Hierarchical structure of decision-making
Representing adequately state and action spaces
Modern strategy games use a geographical map
Adapting granularity by an automatic terrain
analysis algorithm
Generalizing behavior strategies
Function approximators
Defining interesting learning scenarios
Learning by levels of the hierarchy
Playing initially against an opponent other than
oneself

20
Bootstrap mechanism
vs.
Army controlled by the learning AI and by the
commercial AI
Army controlled by the commercial AI
LA
Leader of Army
LA

Leader of Corps
LC
LC
LC
LC

LD
LD
Leader of Division
LD
LD
LD
LD

LB
LB
Leader of Brigade
LB
LB
LB
LB

FLU
FLU
Front-line Units
FLU
FLU
21
Experiments with Battleground

Evaluating the STRADA approach with our case
study
Comparing performances of STRADA agents with
those of other agent models

22
Experiments with Battleground
52 x 42

1st phase
1st level of the hierarchy
Global reward
Without communication between agents
2nd phase
2nd level of the hierarchy
Global, local and combined rewards
Without communication between agents
3rd phase
1st level of the hierarchy
Global and local reward
With simple communication between agents

35 x 20
LA

LC
LC

LD
LD
LD
LD

LB
LB
LB
LB

FLU
FLU
23
Experiments with Battleground
52 x 42

1st phase
1st level of the hierarchy
Global reward
Without communication between agents
2nd phase
2nd level of the hierarchy
Global, local and combined rewards
Without communication between agents
3rd phase
1st level of the hierarchy
Global and local reward
With simple communication between agents

35 x 20
LA

LC
LC
X

LD
LD
LD
LD

LB
LB
LB
LB

FLU
FLU
24
1st phase of experiments

Decision-making scheme
Instant global reward
Cumulative reward

STRADA
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Random
Commercial
Order
Order
Situation
Order
Situation
Situation
Human
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Subordinate units (bootstrap AI)
25
Scenario 1 (map of 35 x 20)
3
1
5
300
300
600
4
6
2

State representation s (32 variables)
Action space A for each subordinate agent (33
actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a33)
26
Experimental results (scenario 1)
27
Scenario 2 (map of 52 x 42)
6
1
3
7
100
4
9
1000
200
200
8
10
5
11
2

State representation s (64 variables)
Action space A for each subordinate agent (49
actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a49)
28
Experimental results (scenario 2)
1300
800
29
Experiments with Battleground

1st phase
1st level of the hierarchy
Global reward
Without communication between agents
2nd phase
2nd level of the hierarchy
Global, local and combined rewards
Without communication between agents
3rd phase
1st level of the hierarchy
Global and local reward
With simple communication between agents

35 x 20
LA
LC
LC
X
GD
GD
LD
LD
X

LB
LB

FLU
FLU
30
2nd phase of experiments
STRADA
Random
Commercial

Decision-making scheme
Instant reward
Global score (1)
Local score
Objectives conquered (2)
Order accomplished (3)
Combined score (1 2 3)

Human
Emperor Napoleon (strategy learned in
the 1st phase)
Order
Order
Situation
Order
Situation
Situation
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial Guard)
Marshal Mortier (Imperial Guard)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial Guard)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial Guard)
Order
Order
Order
Situation
Order
Situation
Order
Order
Situation
Situation
Situation
Situation
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)

Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Subordinate units (bootstrap AI)
31
Scenario 1 (map of 35 x 20)
C
A
D
E
B
F
J
300
300
I
M
K
600
G
L
H

State representation s (35 variables)
Action space A for each subordinate agent (37
actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a37)
32
Experimental results (Scenario 1 - on attack)
33
Experiments with Battleground

1st phase
1st level of the hierarchy
Global reward
Without communication between agents
2nd phase
2nd level of the hierarchy
Global, local and combined rewards
Without communication between agents
3rd phase
1st level of the hierarchy
Global and local reward
With simple communication between agents

35 x 20
LA
LC
LC

LD
LD

LB
LB

FLU
FLU
34
3rd phase of experiments

Decision-making scheme
Immediate reward
Global score
Local score (objectives conquered)
Communicating the action executed by partners in
the previous turn

STRADA
Random
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Commercial
Human
Order
Order
Situation
Order
Situation
Situation
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Subordinate units (bootstrap AI)
35
Scenario 1 (map of 35 x 20)
3
1
5
300
300
600
4
6
2

State representation s (38 variables)
Action space A for each subordinate agent (33
actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a33)
36
Experimental results (Scenario 1 - on attack)
37
Experimental results (Scenario 1 - on defense)
38
Conclusions of the experiments

Very good results have been obtained
STRADA outperformed the commercial system taking
only a partial control over the decision-making
Some thousands of learning episodes were enough
to achieve these results
Global reward is the key for the 1st level of the
hierarchy
Combined reward is required from the 2nd level of
the hierarchy
Communication allows a more stable learning
STRADA can be further evaluated in order to
achieve better results
An adequate combination of different types of
reward is required
An efficient strategy for coordinating agents is
required

39
Conclusions

We proposed STRADA for the automatic design of
adaptive strategies for modern strategy games
Hierarchical decomposition of decision-making
Appropriate representation of state and action
spaces
Generalization of strategies
Bootstrapping the learning process
The effectiveness of STRADA and the relevance of
the representations designed were evaluated on
Battleground
Several versions of learning agents were tested
The generality of the approach has been tested on
two game scenarios
Three agent models were used for comparing
performances
We obtained quite encouraging results

40
Future perspectives (applied to games)

Find the right combination of different types of
rewards
Improve strategies learned by playing against
Opponent STRADA agents
Confirmed human players
Adaptation of STRADA to real-time strategy games
Have players more fun when playing against the
STRADA agents? Demasi et Cruz 2002Andrade et
al. 2005, 2006

41
Future perspectives (theory-oriented)

Full automation of the abstraction procedure
Representation of the action space
High-level orders Corruble, Madeira et Ramalho
2002
Representation of the state space
Variables describing a summary of a group
situation Blum et Langley
1997Saitta et Zucker 2001Li, Walsh et Littman
2006
Improve coordination between agents
Guestrin, Lagoudakis et Parr
2002Chalkiadakis et Boutilier 2003Sigaud
2004
Generalization of strategies among different game
scenarios Guestrin et al. 2003