Adaptive Agents for Modern Strategy Games: an Approach Based on Reinforcement Learning - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Adaptive Agents for Modern Strategy Games: an Approach Based on Reinforcement Learning

Description:

Age of Empires (Ensemble Studios / Microsoft) Act of War (Eugen Systems / Atari) ... TD-Gammon became the best Backgammon player in the world [Tesauro 2002] But... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 42
Provided by: vincentc
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Agents for Modern Strategy Games: an Approach Based on Reinforcement Learning


1
Adaptive Agents for Modern Strategy Games an
Approach Based on Reinforcement Learning
Thèse de Doctorat de lUniversité Paris 6
  • Charles A. G. MADEIRA
  • Supervisor Vincent CORRUBLE,
  • Under the direction of Jean-Gabriel GANASCIA

2
Modern strategy games
Act of War (Eugen Systems / Atari)
Age of Empires (Ensemble Studios / Microsoft)
Imperial Glory (Pyro Studios / Eidos Interactive)
Battleground (John Tiller Games / Talonsoft)
3
Background
  • Artificial Intelligence allows players to play
    when
  • Human opponents are not available ? Artificial
    opponents
  • Players would like to control only some units ?
    Assistants
  • AI must propose realistic behaviors in order to
    entertain players Nareyek 2002, 2004
  • Adapting behavior to new situations
  • Suggesting new experiences to players
  • However, industrial development is dominated by
    rule-based systems Rabin 2002, 2003, 2006
  • Fixed reasoning and hard programming
  • Confirmed players can easily detect the adopted
    strategy

4
Alternative solutions for modern strategy games
  • Machine learning
  • It can automatically learn believable strategies
  • This is a worthy alternative for problems where
  • believable strategies are
  • unknown
  • hard to code
  • the environment is stochastic
  • Online learning is well adapted to games
  • There is no supervisor
  • It follows the principle of trial/error
  • It is based on expected rewards

5
Reinforcement learning (RL) Samuel 1959Sutton
et Barto 1998
  • The agent learns to converge to an optimal
    strategy by interacting with his environment
  • Sequential decision-making
  • Stochastic and unknown environment
  • Great practical results have been obtained on
    complex problems
  • TD-Gammon became the best Backgammon player in
    the world Tesauro 2002
  • But we are interested in much more complex
    problems
  • Our case study is a commercial wargame named
    Battleground

6
Case study Battleground
Supply wagon
French cavalry
French infantry
Embankment
Russian skirmisher
Russian infantry
French artillery
Forest
Objectives to conquer or defend
Rendered scene
7
Difficulties for applying RL to modern games
Backgammon
Battleground (John Tiller Games / Talonsoft)
  • How to represent an environment state whether one
    cannot do it by employing an entire situation?
  • How to choose consistent actions for a set of
    agents?

8
Research direction
  • Multiagent distributed systems Weiss 2000
  • Capacities of perception and decision-making are
    distributed
  • However, classical RL methods are not well
    adapted to this approach
  • Each agent observes a non-Markov environment
  • Convergence of RL algorithms is not guaranteed
  • A collective effort of all agents is required in
    order to design coherent and optimal solutions
  • Multiagent coordination problem Malone et
    Crowston 1994 Boutilier 1996

9
Reinforcement learning of coordination
  • Approaches organized in three main groups

Multiagent Markov
Emergent Knowledge-based
Decision Process coordination
coordination Littman
1994, 2001 Crites et Barto
1998 Dietterich 2000
Uther et Veloso 1997 Sen
et Weiss 2000 Barto et Mahadevan
2003 Hu et Wellman 1998, 2003
Wolpert et Tumer 1999 Boutilier
et al. 2000 Claus et Boutilier 1998
Riedmiller et Merke 2002
Guestrin et al. 2003
10
STRADA approach Madeira et al. 2004, 2005, 2006
  • Decomposing the decision-making process
  • Modern strategy games present an organization in
    groups
  • Hierarchical structure of decision-making
  • Representing adequately state and action spaces
  • Modern strategy games use a geographical map
  • Adapting granularity by an automatic terrain
    analysis algorithm
  • Terrain analysis Rabin 2003Grindle et al.
    2004
  • Qualitative spatial reasoning Cohn et Hazarika
    2001
  • Generalizing behavior strategies
  • Function approximators
  • Defining interesting learning scenarios
  • Learning by levels of the hierarchy
  • Playing initially against an opponent other than
    oneself

11
STRADA applied to Battleground
  • Decomposing the decision-making process
  • Modern strategy games present an organization in
    groups
  • Hierarchical structure of decision-making
  • Representing adequately state and action spaces
  • Modern strategy games use a geographical map
  • Adapting granularity by an automatic terrain
    analysis algorithm
  • Generalizing behavior strategies
  • Function approximators
  • Defining interesting learning scenarios
  • Learning by levels of the hierarchy
  • Playing initially against an opponent other than
    oneself

12
Hierarchical structure of command and control
Leader of Army
Long-term objective (Strategy)
Order
Situation

Leader of Corps
LC
Situation
Order

Leader of Division
LD
Order
Order
Situation

Leader of Brigade
Leader of Brigade
Situation
Order
Order
Situation
Front-line Units
Front-line Units
Specific action (Tactic)

Battery of artillery
Regiment of cavalry
Battalion of infantry
Battery of artillery
Regiment of cavalry
Battalion of infantry
Actions
Perception
Actions
Perception
13
STRADA applied to Battleground
  • Decomposing the decision-making process
  • Modern strategy games present an organization in
    groups
  • Hierarchical structure of decision-making
  • Representing adequately state and action spaces
  • Modern strategy games use a geographical map
  • Adapting granularity by an automatic terrain
    analysis algorithm
  • Generalizing behavior strategies
  • Function approximators
  • Defining interesting learning scenarios
  • Learning by levels of the hierarchy
  • Playing initially against an opponent other than
    oneself

14
Abstraction of action space
  • Definition of high-level orders
  • Extreme attack, attack, wait, defend, extreme
    defend
  • Identification of key locations on the map
  • Strategic action space A
  • A high-level orders X key locations

10180
33
15
Abstraction of state space
  • Situation of a group of units for the 1st level
    of the hierarchy
  • Center of mass, strength, fatigue, quality,
    movement allowance
  • Situation of units placed on zones for the 1st
    level of the hierarchy
  • Strength and fatigue by side
  • Identification of strategic zones on the map
  • Environment state space S
  • S center of mass X strength X fatigue X quality
    X movement allowance X strength by zone and side
    X fatigue by zone and side

1
3
102000
1082
5
4
6
2
16
Abstraction of state space
  • Situation of a group of units for the 2nd level
    of the hierarchy
  • Order given, center of mass, strength, fatigue,
    quality, movement allowance
  • Situation of units placed on zones for the 2nd
    level of the hierarchy
  • Strength and fatigue by side
  • Identification of strategic zones on the map
  • Environment state space S
  • S order given X center of mass X strength X
    fatigue X quality X movement allowance X strength
    by zone and side X fatigue by zone and side

C
1
A
3
F
5
D
E
B
J
I
4
M
K
6
G
2
L
H
17
STRADA applied to Battleground
  • Decomposing the decision-making process
  • Modern strategy games present an organization in
    groups
  • Hierarchical structure of decision-making
  • Representing adequately state and action spaces
  • Modern strategy games use a geographical map
  • Adapting granularity by an automatic terrain
    analysis algorithm
  • Generalizing behavior strategies
  • Function approximators
  • Defining interesting learning scenarios
  • Learning by levels of the hierarchy
  • Playing initially against an opponent other than
    oneself

18
Function approximators
  • Neural network
    Neural networks CMAC
    Albus 1975
  • (1)
    (2)
    (3)

s
s
Q(s,a1)
Q(s,a1)
Q(s,a1)
s
s
Q(s,a2)
Q(s,a2)

s
Q(s,a2)
Q(s,an)


s
Q(s,an)
s
Q(s,an)
19
STRADA applied to Battleground
  • Decomposing the decision-making process
  • Modern strategy games present an organization in
    groups
  • Hierarchical structure of decision-making
  • Representing adequately state and action spaces
  • Modern strategy games use a geographical map
  • Adapting granularity by an automatic terrain
    analysis algorithm
  • Generalizing behavior strategies
  • Function approximators
  • Defining interesting learning scenarios
  • Learning by levels of the hierarchy
  • Playing initially against an opponent other than
    oneself

20
Bootstrap mechanism
vs.
Army controlled by the learning AI and by the
commercial AI
Army controlled by the commercial AI
LA
Leader of Army
LA


Leader of Corps
LC
LC
LC
LC


LD
LD
Leader of Division
LD
LD
LD
LD


LB
LB
Leader of Brigade
LB
LB
LB
LB


FLU
FLU
Front-line Units
FLU
FLU
21
Experiments with Battleground
  • Evaluating the STRADA approach with our case
    study
  • Comparing performances of STRADA agents with
    those of other agent models

22
Experiments with Battleground
52 x 42
  • 1st phase
  • 1st level of the hierarchy
  • Global reward
  • Without communication between agents
  • 2nd phase
  • 2nd level of the hierarchy
  • Global, local and combined rewards
  • Without communication between agents
  • 3rd phase
  • 1st level of the hierarchy
  • Global and local reward
  • With simple communication between agents

35 x 20
LA

LC
LC

LD
LD
LD
LD

LB
LB
LB
LB

FLU
FLU
23
Experiments with Battleground
52 x 42
  • 1st phase
  • 1st level of the hierarchy
  • Global reward
  • Without communication between agents
  • 2nd phase
  • 2nd level of the hierarchy
  • Global, local and combined rewards
  • Without communication between agents
  • 3rd phase
  • 1st level of the hierarchy
  • Global and local reward
  • With simple communication between agents

35 x 20
LA

LC
LC
X

LD
LD
LD
LD

LB
LB
LB
LB

FLU
FLU
24
1st phase of experiments
  • Decision-making scheme
  • Instant global reward
  • Cumulative reward

STRADA
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Random
Commercial
Order
Order
Situation
Order
Situation
Situation
Human
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Subordinate units (bootstrap AI)
25
Scenario 1 (map of 35 x 20)
3
1
5
300
300
600
4
6
2
  • State representation s (32 variables)
  • Action space A for each subordinate agent (33
    actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a33)
26
Experimental results (scenario 1)
27
Scenario 2 (map of 52 x 42)
6
1
3
7
100
4
9
1000
200
200
8
10
5
11
2
  • State representation s (64 variables)
  • Action space A for each subordinate agent (49
    actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a49)
28
Experimental results (scenario 2)
1300
800
29
Experiments with Battleground
  • 1st phase
  • 1st level of the hierarchy
  • Global reward
  • Without communication between agents
  • 2nd phase
  • 2nd level of the hierarchy
  • Global, local and combined rewards
  • Without communication between agents
  • 3rd phase
  • 1st level of the hierarchy
  • Global and local reward
  • With simple communication between agents

35 x 20
LA
LC
LC
X
GD
GD
LD
LD
X

LB
LB

FLU
FLU
30
2nd phase of experiments
STRADA
Random
Commercial
  • Decision-making scheme
  • Instant reward
  • Global score (1)
  • Local score
  • Objectives conquered (2)
  • Order accomplished (3)
  • Combined score (1 2 3)

Human
Emperor Napoleon (strategy learned in
the 1st phase)
Order
Order
Situation
Order
Situation
Situation
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial Guard)
Marshal Mortier (Imperial Guard)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial Guard)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial Guard)
Order
Order
Order
Situation
Order
Situation
Order
Order
Situation
Situation
Situation
Situation
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)
Leader of the 1st Division (1st Corps)
Leader of the Nth Division (1st Corps)
Leader of the 1st Division (3rd Corps)
Leader of the Nth Division (3rd Corps)
Leader of the 1st Division (Imp. Guard)
Leader of the Nth Division (Imp. Guard)



Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Subordinate units (bootstrap AI)
31
Scenario 1 (map of 35 x 20)
C
A
D
E
B
F
J
300
300
I
M
K
600
G
L
H
  • State representation s (35 variables)
  • Action space A for each subordinate agent (37
    actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a37)
32
Experimental results (Scenario 1 - on attack)
33
Experiments with Battleground
  • 1st phase
  • 1st level of the hierarchy
  • Global reward
  • Without communication between agents
  • 2nd phase
  • 2nd level of the hierarchy
  • Global, local and combined rewards
  • Without communication between agents
  • 3rd phase
  • 1st level of the hierarchy
  • Global and local reward
  • With simple communication between agents

35 x 20
LA
LC
LC

LD
LD

LB
LB

FLU
FLU
34
3rd phase of experiments
  • Decision-making scheme
  • Immediate reward
  • Global score
  • Local score (objectives conquered)
  • Communicating the action executed by partners in
    the previous turn

STRADA
Random
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Emperor Napoleon
Commercial
Human
Order
Order
Situation
Order
Situation
Situation
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Davout (1st Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Ney (3rd Corps)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Marshal Mortier (Imperial
Guard)
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Controlled by the bootstrap AI
Subordinate units (bootstrap AI)
35
Scenario 1 (map of 35 x 20)
3
1
5
300
300
600
4
6
2
  • State representation s (38 variables)
  • Action space A for each subordinate agent (33
    actions)

Q(s,a1)
s
Q(s,a2)

Q(s,a33)
36
Experimental results (Scenario 1 - on attack)
37
Experimental results (Scenario 1 - on defense)
38
Conclusions of the experiments
  • Very good results have been obtained
  • STRADA outperformed the commercial system taking
    only a partial control over the decision-making
  • Some thousands of learning episodes were enough
    to achieve these results
  • Global reward is the key for the 1st level of the
    hierarchy
  • Combined reward is required from the 2nd level of
    the hierarchy
  • Communication allows a more stable learning
  • STRADA can be further evaluated in order to
    achieve better results
  • An adequate combination of different types of
    reward is required
  • An efficient strategy for coordinating agents is
    required

39
Conclusions
  • We proposed STRADA for the automatic design of
    adaptive strategies for modern strategy games
  • Hierarchical decomposition of decision-making
  • Appropriate representation of state and action
    spaces
  • Generalization of strategies
  • Bootstrapping the learning process
  • The effectiveness of STRADA and the relevance of
    the representations designed were evaluated on
    Battleground
  • Several versions of learning agents were tested
  • The generality of the approach has been tested on
    two game scenarios
  • Three agent models were used for comparing
    performances
  • We obtained quite encouraging results

40
Future perspectives (applied to games)
  • Find the right combination of different types of
    rewards
  • Improve strategies learned by playing against
  • Opponent STRADA agents
  • Confirmed human players
  • Adaptation of STRADA to real-time strategy games
  • Have players more fun when playing against the
    STRADA agents? Demasi et Cruz 2002Andrade et
    al. 2005, 2006

41
Future perspectives (theory-oriented)
  • Full automation of the abstraction procedure
  • Representation of the action space
  • High-level orders Corruble, Madeira et Ramalho
    2002
  • Representation of the state space
  • Variables describing a summary of a group
    situation Blum et Langley
    1997Saitta et Zucker 2001Li, Walsh et Littman
    2006
  • Improve coordination between agents
    Guestrin, Lagoudakis et Parr
    2002Chalkiadakis et Boutilier 2003Sigaud
    2004
  • Generalization of strategies among different game
    scenarios Guestrin et al. 2003
Write a Comment
User Comments (0)
About PowerShow.com