Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man

Description:

Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man By Jacob Schrum and Risto Miikkulainen Introduction Challenge: Discover behavior ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 31

Provided by: HeDec4

Learn more at: http://nn.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man

1
Evolving Multimodal Behavior With Modular Neural
Networks in Ms. Pac-Man

By Jacob Schrum and Risto Miikkulainen

2
Introduction

Challenge Discover behavior automatically
Simulations
Robotics
Video games (focus)
Why challenging?
Complex domains
Multiple agents
Multiple objectives
Multimodal behavior required (focus)

3
Multimodal Behavior

Animals can perform many different tasks
Imagine learning a monolithic policy as complex
as a cardinals behavior HOW?
Problem more tractable if broken into component
behaviors

Flying
Nesting
Foraging
4
Multimodal Behavior in Games

How are complex software agents designed?
Finite state machines
Behavior trees/lists
Modular design leads to multimodal behavior

FSM for NPC in Quake
Behavior list for our winning BotPrize bot
5
Modular Policy

One policy consisting of several policies/modules
Number preset, or learned
Means of arbitration also needed
Human specified, or learned via preference
neurons
Separate behaviors easily represented
Sub-policies/modules can share components

Outputs
Inputs
Multitask
Preference Neurons
(Caruana 1997)
6
Constructive Neuroevolution

Genetic Algorithms Neural Networks
Good at generating control policies
Three basic mutations Crossover
Other structural mutations possible
(NEAT by Stanley 2004)

Perturb Weight
Add Connection
Add Node
7
Module Mutation

A mutation that adds a module
Can be done in many different ways
Can happen more than once for multiple modules

Out
In
MM(Random)
MM(Duplicate)
(cf Calabretta et al 2000)
(Schrum and Miikkulainen 2012)
8
Ms. Pac-Man

Domain needs multimodal behavior to succeed
Predator/prey variant
Pac-Man takes on both roles
Goals Maximize score by
Eating all pills in each level
Avoiding threatening ghosts
Eating ghosts (after power pill)
Non-deterministic
Very noisy evaluations
Four mazes
Behavior must generalize

Human Play
9
Task Overlap

Distinct behavioral modes
Eating edible ghosts
Clearing levels of pills
More?
Are ghosts currently edible?
Possible some are and some are not
Task division is blended

10
Previous Work in Pac-Man

Custom Simulators
Genetic Programming Koza 1992
Neuroevolution Gallagher Ledwich 2007, Burrow
Lucas 2009, Tan et al. 2011
Reinforcement Learning Burrow Lucas 2009,
Subramanian et al. 2011, Bom 2013
Alpha-Beta Tree Search Robles Lucas 2009
Screen Capture Competition Requires Image
Processing
Evolution Fuzzy Logic Handa Isozaki 2008
Influence Map Wirth Gallagher 2008
Ant Colony Optimization Emilio et al. 2010
Monte-Carlo Tree Search Ikehata Ito 2011
Decision Trees Foderaro et al. 2012
Pac-Man vs. Ghosts Competition Pac-Man
Genetic Programming Alhejali Lucas 2010, 2011,
2013, Brandstetter Ahmadi 2012
Monte-Carlo Tree Search Samothrakis et al. 2010,
Alhejali Lucas 2013
Influence Map Svensson Johansson 2012
Ant Colony Optimization Recio et al. 2012
Pac-Man vs. Ghosts Competition Ghosts
Neuroevolution Wittkamp et al. 2008
Evolved Rule Set Gagne Congdon 2012

11
Evolved Direction Evaluator

Inspired by Brandstetter and Ahmadi (CIG 2012)
Net with single output and direction-relative
sensors
Each time step, run net for each available
direction
Pick direction with highest net output

Right Preference
Left Preference
argmax
Left
12
Module Setups

Manually divide domain with Multitask
Two Modules Threat/Any Edible
Three Modules All Threat/All Edible/Mixed
Discover new divisions with preference nodes
Two Modules, Three Modules, MM(R), MM(D)

Out
In
Two-Module Multitask
Two Modules
MM(D)
13
Average Champion Scores Across 20 Runs
14
Most Used Champion Modules
Patterns of module usage correspond to different
score ranges
15
Most Used Champion Modules
Low One Module
Most low-scoring networks only use one module
16
Most Used Champion Modules
Low One Module
Medium-scoring networks use their primary module
80 of the time
Edible/Threat Division
17
Most Used Champion Modules
Luring/Surrounded Module
Low One Module
Surprisingly, the best networks use one module
95 of the time
Edible/Threat Division
18
Multimodal Behavior

Different colors are for different modules

Learned Edible/Threat Division
Learned Luring/Surrounded Module
Three-Module Multitask
19
Comparison with Other Work
Authors Method Method Method Eval Type AVG MAX
Alhejali and Lucas 2010 GP GP GP Four Maze 16,014 44,560
Alhejali and Lucas 2011 GPCamps GPCamps GPCamps Four Maze 11,413 31,850
Best Multimodal Result Two Modules/MM(D) Two Modules/MM(D) Two Modules/MM(D) Four Maze 32,959 44,520

Recio et al. 2012 Recio et al. 2012 ACO Competition Competition 36,031 43,467
Brandstetter and Ahmadi 2012 Brandstetter and Ahmadi 2012 GP Direction Competition Competition 19,198 33,420
Alhejali and Lucas 2013 Alhejali and Lucas 2013 MCTS Competition Competition 28,117 62,630
Alhejali and Lucas 2013 Alhejali and Lucas 2013 MCTSGP Competition Competition 32,641 69,010
Best Multimodal Result Best Multimodal Result MM(D) Competition Competition 65,447 100,070
Based on 100 evaluations with 3 lives. Four Maze
Visit each maze once. Competition Visit each
maze four times and advance after 3,000 time
steps.
20
Discussion

Obvious division is between edible and threat
But these tasks are blended
Strict Multitask divisions do not perform well
Preference neurons can learn when best to switch
Better division one module when surrounded
Very asymmetrical surprising
Highest scoring runs use one module rarely
Module activates when Pac-Man almost surrounded
Often leads to eating power pill luring
Helps Pac-Man escape in other risky situations

21
Future Work

Go beyond two modules
Issue with domain or evolution?
Multimodal behavior of teams
Ghost team in Pac-Man
Physical simulation
Unreal Tournament, robotics

22
Conclusion

Intelligent module divisions result in best
results
Modular networks make learning separate modes
easier
Results are better than previous work
Module division unexpected
Half of neural resources for seldom-used module
(lt 5)
Rare situations can be very important
Some modules handle multiple modes
Pills, threats, edible ghosts

23
Questions?

E-mail schrum2_at_cs.utexas.edu
Movies http//nn.cs.utexas.edu/?ol-pm
Code http//nn.cs.utexas.edu/?mm-neat

24
What is Multimodal Behavior?

From Observing Agent Behavior
Agent performs distinct tasks
Behavior very different in different tasks
Single policy would have trouble generalizing
Reinforcement Learning Perspective
Instance of Hierarchical Reinforcement Learning
A mode of behavior is like an option
A temporally extended action
A control policy that is only used in certain
states
Policy for each mode must be learned as well
Idea From Supervised Learning
Multitask Learning trains on multiple known tasks

25
Behavioral Modes vs. Network Modules

Different behavioral modes
Determined via observation of behavior,
subjective
Any net can exhibit multiple behavioral modes
Different network modules
Determined by connectivity of network
Groups of policy outputs
designated as modules (sub-policies)
Modules distinct even if behavior
is same/unused
Network modules should help
build behavioral modes

Module 2
Module 1
Sensors
26
Preference Neuron Arbitration

How can network decide which module to use?
Find preference neuron (grey) with maximum output
Corresponding policy neurons (white) define
behavior

0.7 gt 0.1, So use Module 2
Policy neuron for Module 2 has output of 0.5
0.6, 0.1, 0.5, 0.7
Output value 0.5 defines agents behavior
Outputs
Inputs
27
Pareto-based Multiobjective Optimization (Pareto
1890)
High health but did not deal much damage
Tradeoff between objectives
Dealt lot of damage, but lost lots of health
(Deb et al. 2000)
28
Non-dominated Sorting Genetic Algorithm II (Deb
et al. 2000)

Population P with size N Evaluate P
Use mutation ( crossover) to get P size N
Evaluate P
Calculate non-dominated fronts of P È P size 2N
New population size N from highest fronts of P È
P

29
Direction Evaluator Modules

Network is evaluated in each direction
For each direction, a module is chosen
Human-specified (Multitask) or Preference Neurons
Chosen module policy neuron sets direction
preference

0.5, 0.1, 0.7, 0.6 ? 0.6 gt 0.1 Left Preference
is 0.7 0.3, 0.8, 0.9, 0.1 ? 0.8 gt 0.1 Right
Preference is 0.3
0.7 gt 0.3
Ms. Pac-Man chooses to go left, based on Module 2
Left Inputs Right Inputs
30
Discussion (2)