Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man

Description:

Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man By Jacob Schrum and Risto Miikkulainen Introduction Challenge: Discover behavior ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 31
Provided by: HeDec4
Learn more at: http://nn.cs.utexas.edu
Category:

less

Transcript and Presenter's Notes

Title: Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man


1
Evolving Multimodal Behavior With Modular Neural
Networks in Ms. Pac-Man
  • By Jacob Schrum and Risto Miikkulainen

2
Introduction
  • Challenge Discover behavior automatically
  • Simulations
  • Robotics
  • Video games (focus)
  • Why challenging?
  • Complex domains
  • Multiple agents
  • Multiple objectives
  • Multimodal behavior required (focus)

3
Multimodal Behavior
  • Animals can perform many different tasks
  • Imagine learning a monolithic policy as complex
    as a cardinals behavior HOW?
  • Problem more tractable if broken into component
    behaviors

Flying
Nesting
Foraging
4
Multimodal Behavior in Games
  • How are complex software agents designed?
  • Finite state machines
  • Behavior trees/lists
  • Modular design leads to multimodal behavior

FSM for NPC in Quake
Behavior list for our winning BotPrize bot
5
Modular Policy
  • One policy consisting of several policies/modules
  • Number preset, or learned
  • Means of arbitration also needed
  • Human specified, or learned via preference
    neurons
  • Separate behaviors easily represented
  • Sub-policies/modules can share components

Outputs
Inputs
Multitask
Preference Neurons
(Caruana 1997)
6
Constructive Neuroevolution
  • Genetic Algorithms Neural Networks
  • Good at generating control policies
  • Three basic mutations Crossover
  • Other structural mutations possible
  • (NEAT by Stanley 2004)

Perturb Weight
Add Connection
Add Node
7
Module Mutation
  • A mutation that adds a module
  • Can be done in many different ways
  • Can happen more than once for multiple modules

Out
In
MM(Random)
MM(Duplicate)
(cf Calabretta et al 2000)
(Schrum and Miikkulainen 2012)
8
Ms. Pac-Man
  • Domain needs multimodal behavior to succeed
  • Predator/prey variant
  • Pac-Man takes on both roles
  • Goals Maximize score by
  • Eating all pills in each level
  • Avoiding threatening ghosts
  • Eating ghosts (after power pill)
  • Non-deterministic
  • Very noisy evaluations
  • Four mazes
  • Behavior must generalize

Human Play
9
Task Overlap
  • Distinct behavioral modes
  • Eating edible ghosts
  • Clearing levels of pills
  • More?
  • Are ghosts currently edible?
  • Possible some are and some are not
  • Task division is blended

10
Previous Work in Pac-Man
  • Custom Simulators
  • Genetic Programming Koza 1992
  • Neuroevolution Gallagher Ledwich 2007, Burrow
    Lucas 2009, Tan et al. 2011
  • Reinforcement Learning Burrow Lucas 2009,
    Subramanian et al. 2011, Bom 2013
  • Alpha-Beta Tree Search Robles Lucas 2009
  • Screen Capture Competition Requires Image
    Processing
  • Evolution Fuzzy Logic Handa Isozaki 2008
  • Influence Map Wirth Gallagher 2008
  • Ant Colony Optimization Emilio et al. 2010
  • Monte-Carlo Tree Search Ikehata Ito 2011
  • Decision Trees Foderaro et al. 2012
  • Pac-Man vs. Ghosts Competition Pac-Man
  • Genetic Programming Alhejali Lucas 2010, 2011,
    2013, Brandstetter Ahmadi 2012
  • Monte-Carlo Tree Search Samothrakis et al. 2010,
    Alhejali Lucas 2013
  • Influence Map Svensson Johansson 2012
  • Ant Colony Optimization Recio et al. 2012
  • Pac-Man vs. Ghosts Competition Ghosts
  • Neuroevolution Wittkamp et al. 2008
  • Evolved Rule Set Gagne Congdon 2012

11
Evolved Direction Evaluator
  • Inspired by Brandstetter and Ahmadi (CIG 2012)
  • Net with single output and direction-relative
    sensors
  • Each time step, run net for each available
    direction
  • Pick direction with highest net output

Right Preference
Left Preference
argmax
Left
12
Module Setups
  • Manually divide domain with Multitask
  • Two Modules Threat/Any Edible
  • Three Modules All Threat/All Edible/Mixed
  • Discover new divisions with preference nodes
  • Two Modules, Three Modules, MM(R), MM(D)

Out
In
Two-Module Multitask
Two Modules
MM(D)
13
Average Champion Scores Across 20 Runs
14
Most Used Champion Modules
Patterns of module usage correspond to different
score ranges
15
Most Used Champion Modules
Low One Module
Most low-scoring networks only use one module
16
Most Used Champion Modules
Low One Module
Medium-scoring networks use their primary module
80 of the time
Edible/Threat Division
17
Most Used Champion Modules
Luring/Surrounded Module
Low One Module
Surprisingly, the best networks use one module
95 of the time
Edible/Threat Division
18
Multimodal Behavior
  • Different colors are for different modules

Learned Edible/Threat Division
Learned Luring/Surrounded Module
Three-Module Multitask
19
Comparison with Other Work
Authors Method Method Method Eval Type AVG MAX
Alhejali and Lucas 2010 GP GP GP Four Maze 16,014 44,560
Alhejali and Lucas 2011 GPCamps GPCamps GPCamps Four Maze 11,413 31,850
Best Multimodal Result Two Modules/MM(D) Two Modules/MM(D) Two Modules/MM(D) Four Maze 32,959 44,520

Recio et al. 2012 Recio et al. 2012 ACO Competition Competition 36,031 43,467
Brandstetter and Ahmadi 2012 Brandstetter and Ahmadi 2012 GP Direction Competition Competition 19,198 33,420
Alhejali and Lucas 2013 Alhejali and Lucas 2013 MCTS Competition Competition 28,117 62,630
Alhejali and Lucas 2013 Alhejali and Lucas 2013 MCTSGP Competition Competition 32,641 69,010
Best Multimodal Result Best Multimodal Result MM(D) Competition Competition 65,447 100,070
Based on 100 evaluations with 3 lives. Four Maze
Visit each maze once. Competition Visit each
maze four times and advance after 3,000 time
steps.
20
Discussion
  • Obvious division is between edible and threat
  • But these tasks are blended
  • Strict Multitask divisions do not perform well
  • Preference neurons can learn when best to switch
  • Better division one module when surrounded
  • Very asymmetrical surprising
  • Highest scoring runs use one module rarely
  • Module activates when Pac-Man almost surrounded
  • Often leads to eating power pill luring
  • Helps Pac-Man escape in other risky situations

21
Future Work
  • Go beyond two modules
  • Issue with domain or evolution?
  • Multimodal behavior of teams
  • Ghost team in Pac-Man
  • Physical simulation
  • Unreal Tournament, robotics

22
Conclusion
  • Intelligent module divisions result in best
    results
  • Modular networks make learning separate modes
    easier
  • Results are better than previous work
  • Module division unexpected
  • Half of neural resources for seldom-used module
    (lt 5)
  • Rare situations can be very important
  • Some modules handle multiple modes
  • Pills, threats, edible ghosts

23
Questions?
  • E-mail schrum2_at_cs.utexas.edu
  • Movies http//nn.cs.utexas.edu/?ol-pm
  • Code http//nn.cs.utexas.edu/?mm-neat

24
What is Multimodal Behavior?
  • From Observing Agent Behavior
  • Agent performs distinct tasks
  • Behavior very different in different tasks
  • Single policy would have trouble generalizing
  • Reinforcement Learning Perspective
  • Instance of Hierarchical Reinforcement Learning
  • A mode of behavior is like an option
  • A temporally extended action
  • A control policy that is only used in certain
    states
  • Policy for each mode must be learned as well
  • Idea From Supervised Learning
  • Multitask Learning trains on multiple known tasks

25
Behavioral Modes vs. Network Modules
  • Different behavioral modes
  • Determined via observation of behavior,
    subjective
  • Any net can exhibit multiple behavioral modes
  • Different network modules
  • Determined by connectivity of network
  • Groups of policy outputs
    designated as modules (sub-policies)
  • Modules distinct even if behavior
    is same/unused
  • Network modules should help
    build behavioral modes

Module 2
Module 1
Sensors
26
Preference Neuron Arbitration
  • How can network decide which module to use?
  • Find preference neuron (grey) with maximum output
  • Corresponding policy neurons (white) define
    behavior

0.7 gt 0.1, So use Module 2
Policy neuron for Module 2 has output of 0.5
0.6, 0.1, 0.5, 0.7
Output value 0.5 defines agents behavior
Outputs
Inputs
27
Pareto-based Multiobjective Optimization (Pareto
1890)
High health but did not deal much damage
Tradeoff between objectives
Dealt lot of damage, but lost lots of health
(Deb et al. 2000)
28
Non-dominated Sorting Genetic Algorithm II (Deb
et al. 2000)
  • Population P with size N Evaluate P
  • Use mutation ( crossover) to get P size N
    Evaluate P
  • Calculate non-dominated fronts of P È P size 2N
  • New population size N from highest fronts of P È
    P

29
Direction Evaluator Modules
  • Network is evaluated in each direction
  • For each direction, a module is chosen
  • Human-specified (Multitask) or Preference Neurons
  • Chosen module policy neuron sets direction
    preference

0.5, 0.1, 0.7, 0.6 ? 0.6 gt 0.1 Left Preference
is 0.7 0.3, 0.8, 0.9, 0.1 ? 0.8 gt 0.1 Right
Preference is 0.3
0.7 gt 0.3
Ms. Pac-Man chooses to go left, based on Module 2
Left Inputs Right Inputs
30
Discussion (2)
  • Good divisions are harder to discover
  • Some modular champions use only one module
  • Particularly MM(R) new modules too random
  • Are evaluations too harsh/noisy?
  • Easy to lose one life
  • Hard to eat all pills to progress
  • Discourages exploration
  • Hard to discover useful modules
  • Make search more forgiving
Write a Comment
User Comments (0)
About PowerShow.com