Testbed for Integrating and Evaluating Learning Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Testbed for Integrating and Evaluating Learning Techniques

Description:

Example: Knowledge base content. Status: Implementation & documentation ... Few deployed cognitive systems integrate techniques that exhibit rapid ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 84
Provided by: rli27
Category:

less

Transcript and Presenter's Notes

Title: Testbed for Integrating and Evaluating Learning Techniques


1
Testbed for Integrating and Evaluating Learning
Techniques
TIELT
David W. Aha1 Matthew Molineaux2 1Intelligent
Decision Aids Group Navy Center for Applied
Research in AI Naval Research Laboratory
Washington, DC 2ITT Industries AES Division
Alexandria, VA first.surname_at_nrl.navy.mil
17 November 2004
2
Outline
  • Motivation Learning in cognitive systems
  • Objectives
  • Encourage machine learning research on complex
    tasks that require knowledge-intensive approaches
  • Provide industry military with access to the
    results
  • Design TIELT functionality components
  • Example Knowledge base content
  • Status
  • Implementation documentation
  • Collaborations events
  • Task list
  • Summary

3
DARPA
  • Defense Advanced Research Projects Agency
    (2.3B/yr)

4
Cognitive Systems
Systems that know what theyre doing
  • A cognitive system is one that
  • can reason, using substantial amounts of
    appropriately represented knowledge
  • can learn from its experience so that it performs
    better tomorrow than it did today
  • can explain itself and be told what to do
  • can be aware of its own capabilities and reflect
    on its own behavior
  • can respond robustly to surprise

5
Anatomy of a Cognitive Agent
Reflective Processes
LTM
Cognitive Agent
Concepts
STM
Deliberative Processes
Learning
Other reasoning
Sentences
Communication (language, gesture, image)
Prediction, planning
Perception
Action
Reactive Processes
Sensors
Effectors
External Environment
Attention
(Brachman, 2003)
6
Learning in Cognitive Systems(Langley Laird,
2002)
Many opportunities exist for learning in
cognitive systems
7
Status of Learning in Cognitive Systems
Problem
  • Few deployed cognitive systems integrate
    techniques that exhibit rapid enduring learning
    behavior on complex tasks
  • Its costly to integrate evaluate embedded
    learning techniques

8
TIELT Motivation
  • We want Cognitive Agents that Learn
  • Rapidly,
  • in context, and
  • over the long-term.
  • We have few (if any) of them

9
TIELT Objective
  • Encourage the study of research on learning in
    cognitive systems, with subsequent transition
    goals

Learning Modules
Cognitive Agents That Learn
Military
ML Researchers
Cognitive Agents
Industry
10
Current ML Research Focus
  • Benchmark studies of multiple algorithms on
    simple (e.g., supervised) learning tasks from
    many static datasets

ML Researcher
ML System1
Database1
m results on System1
Analysis
Benchmark Analysis
Database2
ML System2
m results on System2
. . .
. . .
. . .
Databasem
ML Systemn
m results on Systemn
This was encouraged (in part) by the availability
of datasets in a standard (interface) format
11
Previous API for ML Investigations
Inspiration
  • UC Irvine Repository of Machine Learning (ML)
    Databases
  • An interface for empirical benchmarking studies
    on supervised learning
  • 1525 citations (and many publications use it w/o
    citing) since 1986

Supervised Learning

ML Systemj
Decision Systemk
Interface (standard format)
Databasei
12
Accomplishing TIELTs Objective
  • One approach Shift ML research focus from static
    datasets to dynamic simulators of rich
    environments

13
Refining TIELTs Objective
Objective
  • Develop a tool for evaluating decision systems in
    simulators
  • Specific support for evaluating learning
    techniques
  • Demonstrate research utility prior to approaching
    industry/military

Benefits
  1. Reduce system-simulator integration costs from
    mn to mn (see next)
  2. Permits benchmark studies on selected simulator
    tasks
  3. Encourages study of ML for knowledge-intensive
    problems
  4. Provide support for DARPA Challenge Problems on
    Cognitive Learning

14
Reducing Integration Costs
15
What Domain?
Desiderata
  1. Available implementations (cheap to acquire
    run)
  2. Challenging problems for CogSys/ML research
  3. Significant interest (academia, military,
    industry, funding, public)

Simulation Games?
16
Gaming Genres of Interest(modified from (Laird
van Lent, 2001))
AI Roles
Sub-Genres
Description
Example
Genre
Control enemies
1st vs. 3rd person, solo vs team play
Control a character
Quake, Unreal
Action
Control enemies, partners, and supporting
characters
Solo vs. (massively) multi-player
Be a character (includes puzzle solving, etc.)
Temple of Elemental Evil
Role-Playing
Control all units and strategic enemies
God, first-person perspectives
Controlling at multiple levels (e.g., strategic,
tactical warfare)
Empire Earth 2, AoE, Civilization
Strategy (real-time, discrete)
Control units and strategic enemy (i.e., other
coach), commentator
Act as coach and a key player
Madden NFL Football
Team Sports
1st vs. 3rd person
Control enemy
Individual competition
Many (e.g., driving games)
Individual Sports
17
Some Game Environment Challenges
  • Significant background knowledge available
  • e.g., Processes, tasks, objects, actions
  • Use Provide opportunities for rapid learning
  • Adversarial
  • Collaborative
  • Multiple reasoning levels (e.g., strategic,
    tactical)
  • Real-time
  • Uncertainty (Fog of War)
  • Noise (e.g., imprecision)
  • Relational (e.g., social networks)
  • Temporal
  • Spatial

18
Academia Learning in Simulation Games
Focus Broad interests
  • Game engines (e.g., GameBots, ORTS, RoboCup
    Soccer Server)
  • Use (other) open source engines (e.g., FreeCiv,
    Stratagus)
  • Representation (e.g., Forbus et al., 2001 Houk,
    2004 Munoz-Avila Fisher, 2004)
  • Learning opponent unit models (e.g., Laird, 2001
    Hill et al., 2002)
  • (see table)

Evidence of commitment
  • Interactive Computer Games Human-Level AIs
    Killer Application (Laird van Lent, AAAI00
    Invited Talk)
  • Meetings
  • AAAI symposia (several in recent years)
  • International Conference on Computers and Games
  • AAAI04 Workshop on Challenges in Game AI
  • AI in Interactive Digital Entertainment
    Conference (2005-)
  • New journals focusing on (e.g., real-time)
    simulation games
  • J. of Game Development
  • Int. J. of Intelligent Games and Simulation

19
Survey Selected Previous Work onLearning
Gaming Simulators
Name Reference Method Tasks Tasks Test Plan Metrics (independent variables to vary and dependents to measure)
Name Reference Method Learning Performance Test Plan Metrics (independent variables to vary and dependents to measure)
(Goodman, AAAI93) Projective Visualization 1 TDIDT per feature cluster Predict amount of inflicted damage Vary training amount projection length predict summed pain
MAYOR (Fasciano, 1996 M.S. Thesis) Case-based planning Plan Execution Conds. Maximize SimCity Game Score Online Vary whether learning was used measure successful plan executions
(Fogel et al., CCGFBR96) Genetic Alg Rule learning 1x1 tank battles Vary locations/space of routes measure damage
KnoMic (van Lent Laird, ICML98) Production Rules Rule Conds. Goals Racetrack Mission for TacAir SOAR Measure speed in which KnoMic learned correct control rules
(Agogino et al., 1999 NPL) Neuro-evolution Wt genetic learning 30 gold-collecting peons vs. 1 human Vary learning methodology measure survival rate of peons
(Laird, ICAA01) SOAR Chunking Rule learning Predict enemy beh. None would focus on speedup
(Geisler, 2002 M.S. Thesis) NB, TDIDT, BP, ensembles Depends on the method 4 simple classification tasks Vary training set size ensembles measure classification accuracy
Bryant Mikkulainen, CEC03) Neuroevolution NN wts, etc. Discrete Legions vs. Barbarians Offline Vary training set size measure a game-specific fn.
(Chia Williams, BRIMS03) Naïve Bayes Learning to add/del. rules 1x1 tank battles Vary adversarial aggressiveness whether learning occurs measure wins
(Fagan Cunningham, ICCBR03) Case-based prediction Selecting plans to save Predict a players action Vary the stored plans and the user measure acc. prediction freq.
(Guestrin et al., IJCAI03) Relational MDPs Partition objects Beat enemy in 3x3 Freecraft games Simplistic one run.
(Sweetser Dennis, 2003 Ent. Computing Tech. Applications) Advice giving Regression wts Just-in-time Hints to Human Player Vary with vs. without providing hints measure hints that were useful
(Spronck et al., 2004 IJIGS) Dynamic Scripting Rule wts Beat NWN AI in simple scenarios Offline Measure average turning pt speed, effectiveness, robustness, efficiency
(Ponsen, 2004 M.S. Thesis) Dynamic Scripting GA for rule learning Rule wts and new rules Defeat Wargus opponent Offline Vary map size, learning algorithm, and opponent control alg measure wins
(Ulam et al., AAAI04 Workshop) Self-adaptation Task Edits Defend city (FreeCiv) Offline Vary trace size measure successes
20
Industry Learning in Simulation Games
Focus Increase sales via enhanced gaming
experience
  • USA 7B in sales in 2003 (ESA, 2004)
  • Strategy games 0.3B
  • Simulators Many! (e.g., SimCity, Quake, SoF, UT)
  • Target Control avatars, unit behaviors

Evidence of commitment
  • Developers keenly interested in building AIs
    that might learn, both from the player
    environment around them. (GDC03 Roundtable
    Report)
  • Middleware products that support learning (e.g.,
    MASA, SHAI, LearningMachine)
  • Long-term investments in learning (e.g., iKuni,
    Inc.)
  • Conferences
  • Game Developers Conference
  • Computer Game Technology Conference

21
Industry Learning in Simulation Games
Status
  • Few deployed systems have used learning (Kirby,
    2004) e.g.,
  • Black White on-line, explicit (player
    immediately reinforces behavior)
  • CC Renegade on-line, implicit (agent updates
    set of legal paths)
  • Re-volt off-line, implicit (GA tunes racecar
    behaviors prior to shipping)
  • Problems Performance, constraints (preventing
    learning something dumb), trust in learning
    system

Some Promising Techniques (Rabin, 2004)
  • Belief networks for probabilistic inference
  • Decision tree learning
  • Genetic algorithms (e.g., for offline parameter
    tuning)
  • Statistical prediction (e.g., using N-grams to
    predict future events)
  • Neural networks (e.g., for offline applications)
  • Player modeling (e.g., to regulate game
    difficulty, model reputation)
  • Reinforcement learning
  • Weakness modification learning (e.g., dont
    repeat failed strategies)

22
Military Learning in Simulation Games
Focus Training, analysis, experimentation
  • Learning Acquisition of new knowledge or
    behaviors
  • Simulators JWARS, OneSAF, Full Spectrum Command,
    etc.
  • Target Control strategic opponent or own units

Evidence of commitment
  • Learning is an essential ability of intelligent
    systems (NRC, 1998)
  • To realize the full benefit of a human behavior
    model within an intelligent simulator,the model
    should incorporate learning (Hunter et al.,
    CCGBR00)
  • Successful employment of human behavior
    modelsrequires that they possess the ability
    to integrate learning (Banks Stytz, CCGBR00)
  • Conferences BRIMS, I/ITSEC

Status No CGF simulator has been deployed with
learning (D. Reece, 2003)
  • Some problems (Petty, CGFBR01)
  • Cost of training phase
  • Loss of training control
  • Learning non-doctrinal behaviors
  • Learning unpredictable behaviors

23
Analysis Conclusions
State-of-the-art
  • Research on learning in complex gaming simulators
    is in its infancy
  • Knowledge-poor approaches are limited to simple
    performance tasks
  • Knowledge-intensive approaches require huge
    knowledge bases, which to date have been manually
    encoded
  • Existing approaches have many simplifying
    assumptions
  • Scenario limitations (e.g., on number and/or
    capabilities of adversaries)
  • Learning is (usually) performed only off-line
  • Learned knowledge is not transferred (e.g., to
    playing other games)

Significant advances would include
  • Fast acquisition approaches for a large amount of
    domain knowledge
  • This would enable rapid learning without
    requiring manual encoding
  • Demonstrations of on-line learning (i.e., within
    a single simulation run)
  • Increasing knowledge transfer among tasks
    simulators over time
  • e.g., knowledge of processes, strategies, tasks,
    roles, objects, actions

24
TIELT Specification
  • Simplifies integration evaluation!
  • Learning-embedded decision systems gaming
    simulators
  • Supports communications, game model, perf. task,
    evaluation
  • Free available
  • Learning foci
  • Task (e.g., learn how to execute, or advise on, a
    task)
  • Player (e.g., accept advice, predict a players
    strategies)
  • Game (e.g., learn/refine its objects, their
    relations, behaviors)
  • Learning methods
  • Supervised/unsupervised, immediate/delayed
    feedback, analytic, active/passive,
    online/offline, direct/indirect,
    automated/interactive
  • Learning results should be available for
    inspection
  • Gaming simulators Those with challenging
    learning tasks
  • Reuse
  • Communications are separated from the game model
    perf. task
  • Provide access to libraries of simulators
    decision systems

25
Distinguishing TIELT
System Focus Game Engine(s) Prominent Feature Reasoning Activity
DirectIA (MASA) AI SDK ? FPS, RTS, etc. Behavior authoring Sense-act,
SimBionic (SHAI) AI SDK ? FPS, etc. Behavior authoring Sense-act,
FEAR AI SDK Quake 2, etc. Behavior authoring Sense-act,
RoboCup Research Testbed RoboCup Soccer game play Sense-act, coaching, etc.
GameBots Research Testbed UT (FPS) UT game play Sense-act
ORTS Research Testbed RTS games Hack-free MM RTS Sense-act, strategy
TIELT Research Testbed Several genres Experimentation for evaluating learning learned behaviors Sense-act, advice processing, prediction, model updating, etc.
  1. Provides an interface for message-passing
    interfaces
  2. Supports composable system-level interfaces

26
TIELT Integration Architecture
TIELTs User Interface
Game Engine Library
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
. . .
Learned Knowledge (inspectable)
Game Player(s)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
27
TIELTs Knowledge Bases
Game Interface Model
Defines communication processes with the game
engine
Decision System Interface Model
Defines communication processes with the decision
system
Game Model
  • Defines interpretation of the game
  • e.g., initial state, classes, operators,
    behaviors (rules)
  • Behaviors could be used to provide constraints on
    learning

Agent Description
Defines what decision tasks (if any) TIELT must
support
Experiment Methodology
Defines selected performance tasks (taken from
Game Model Description) and the experiment to
conduct
28
TIELT Supported Performance Tasks
Performance vs. learning tasks
Performance Application of the learned knowledge
(e.g., classification) Learning Activity of
learning system (e.g., update weights in a neural
net)
TIELT users will define complex,
user-configurable performance tasks
29
An Example Complex Learning Task
Task description
This involves several challenging learning tasks
Win a real-time strategy game
Subtasks and supporting operations
  • Diagnosis Identify (computer and/or human)
    opponent strategies goals
  • Classification Opponent recognition
  • Recording Actions of opponents and their effects
  • This repeatedly involves classification
  • Diagnosis Identify goal(s) being solved by these
    effects
  • Classification Identify goal(s), if solved, that
    prevents opponent goals
  • Planning Select/adapt or create plan to achieve
    goals and win the game
  • Classification Select top-level actions to
    achieve goals
  • Iteratively identify necessary sub-goals and,
    finally, primitive actions
  • Design (parametric) Identify good initial layout
    of controllable assets
  • Execute plan
  • Recording Collect measures of effectiveness, to
    provide feedback
  • Planning If needed, re-plan, based on feedback,
    at Step 2

30
Use Controlling a Game Character
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
31
UT Example Game Model
State Description
Operators
Players Array of Player Self Player Score
Integer
Shoot(Player) Preconditions Player.isVisible
Effects Player.Health - rand(10) MoveTo(Location
) Preconditions Location.isReachable()
Effects Self.position Location
Classes
Player Team String Number Integer
Position Location
Rules
GetShotBy(Player) Preconditions
Player.hasLineOfSight(Self) Effects
Self.Health - rand(10) EnemyMovements(Enemy,
Location1, Location2) Preconditions
Location2.isReachableFrom(Location1)
Enemy.position Location1 Effects
Enemy.position Location2
Location x Integer y Integer z
Integer
32
UT Example Game Interface Model
Communication
Medium TCP/IP, Port 3000 Message Format ltnamegt
ltattr1gt ltvalue1gt ltattr2gt ltvalue2gt
  • Examples interface messages from the GameBots API
  • http//www.planetunreal.com/gamebots/docapi.html

33
UT Example Decision System Interface Model
34
UT Example Agent Description
Think-Act Cycle
Shoot Something
Pick up a Healthpack
Go Somewhere Else
Call Shoot Operator
Ask Decision System Where Do I Go?
Call Pickup Operator
Ask Decision System Where Do I Go?
35
UT Example Experiment Methodology
Initialization
Game Model Unreal Tournament.xml Game Interface
GameBots.xml Decision System MyUTBot.xml Runs
100 Call slowdown(0.5)
36
Use Predicting Opponent Actions
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
Processed State
Raw State
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
37
Use Updating a Game Model
TIELTs User Interface
Evaluation Interface
Prediction Interface
Coordination Interface
Advice Interface
TIELT User
TIELTs Internal Communication Modules
Selected Game Engine
Selected Decision System
Learned Knowledge (inspectable)
TIELTs KB Editors
Selected/Developed Knowledge Bases
Game Model
Agent Description
Game Interface Model
Decision System Interface Model
Experiment Methodology
TIELT User
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
38
TIELT A Researcher Use Case
  1. Define/store decision system interface model
  2. Select game simulator interface
  3. Select game model
  4. Select/define performance task(s)
  5. Define/select expt. methodology
  6. Run experiments
  7. Analyze displayed results

Selected/Developed Knowledge Bases
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
39
TIELT A Game Developer Use Case
  1. Define/store game interface model
  2. Define/store game model
  3. Select decision system/interface
  4. Define performance task(s)
  5. Define/select expt. methodology
  6. Run experiments
  7. Analyze displayed results

Selected/Developed Knowledge Bases
Knowledge Base Libraries
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
GM
AD
EM
GIM
DSIM
40
TIELTs Internal Communication Modules
Database
Evaluation Interface
Advice Interface
Database Engine
State
Evaluator
Controller
Stored State
Current State
Translated Model (Subset)
Learning Translator (Mapper)
Model Updater
Selected Decision System
Learning Task
Selected Game Engine
Percepts
Action / Control Translator (Mapper)
Learning Outputs
Actions
Perf. Task
Game Model
Game Interface Model
Agent Description
Decision System Interface Model
Experiment Methodology
User
Game Interface Model Editor
Decision System Interface Model Editor
Game Model Editor
Agent Descr. Editor
Expt. Method. Editor
41
Sensing the Game State (City placement example,
inspired by Alpha Centauri, etc.)
1
In Game Engine, the game begins a colony pod is
created and placed.
TIELT
Current State
2
The Game Engine sends a See sensor message
identifying the pods location.
4
Action Translator
Actions
3
Updates
Game Engine
1
The Model Updater receives the sensor message and
finds the corresponding message template in the
Game Interface Model.
Sensors
3
5
2
Model Updater
Controller
3
4
Game Model
4
Game Interface Model
This message template provides updates
(instructions) to the Current State, telling it
that there is a pod at the location See describes.
User

Game Interface Model Editor
Game Model Editor
5
The Model Updater notifies the Controller that
the See action event has occurred.
42
Fetching Decisions from the Decision System (City
placement example)
1
The Controller notifies the Learning Translator
that it has received a See message.
TIELT
Controller
2
The Learning Translator finds a city location
task, which is triggered by the See message. It
queries the controller for the learning mode,
then creates a TestInput message to send to the
reasoning system with information on the pods
location and the map from the Current State.
Selected Decision System
1
Learning Outputs
Action Translator
4
Translated Model (Subset)
Learning Module 1
Learning Translator
Current State
3
. . .
2
2
Learning Module n
Agent Description
Decision System Interface Model
3
The Learning Translator transmits the TestInput
message to the Decision System.
User
4

The Decision System transmits output to the
Action Translator.
Decision System Interface Model Editor
Agent Desc. Editor
43
Acting in the Game World (City placement example)
1
The Action Translator receives a TestOutput
message from the Decision System.
4.b, c
The Advice Interface receives Move and displays
advice to a human player on what to do next, or
makes a Prediction.
2
The Action Translator finds the TestOutput
message template, determines it is associated
with the city location task, and builds a MovePod
operator (defined by the Current State) with the
parameters of TestOutput.
TIELT

Advice Interface
Prediction Interface
4.b
4.c
Current State
1
2
Actions
Action Translator
4.a
Game Engine
3
3
The Action Translator determines that the Move
Action from the Game Interface Model is triggered
by the MovePod Operator and binds Move using
information from MovePod.
2
3
Game Interface Model
Decision System Interface Model
User
Game Interface Model Editor
Decision System Interface Model Editor
4.a
The Game Engine receives Move and updates the
game to move the pod toward its destination, or
44
TIELT Status (November 2004)
Implementation
  • TIELT (v0.5) available
  • Features
  • Message protocols
  • Current Console I/O, TCP/IP, UDP
  • Future Library calls, HLA interface, RMI
    (possibly)
  • Message content Configurable
  • Instantiated templates tell it how to communicate
    with other modules
  • Initialization messages Start, Stop, Load
    Scenario, Set Speed
  • Game Model representations (w/ Lehigh University)
  • Simple programs
  • TMK process models
  • PDDL (language used in planning competitions)

45
TIELT Status (November 2004)
Documentation
  • TIELT Users Manual (82 pages)
  • TIELT Overview
  • The TIELT User Interface
  • Scripting in TIELT
  • Theory of the Game Model
  • Communications
  • TMK Models
  • Experiments
  • TIELT Tutorial (45 pages)
  • The Game Model
  • The Game Interface Model
  • Decision System Interface Model
  • Agent Description
  • Experiment Methodology

46
TIELT Status (November 2004)
Access
  • TIELT www site (new)
  • Selected Components
  • Documents Documentation, publications, XML Spec
  • Status
  • Forum A full-featured web forum/bulletin board
  • Bug Tracker TIELT bug/feature tracking facility
  • FAQ-o-Matic Questions and problem solutions
    user-driven
  • Download

47
TIELT Issues (November 2004)
1. Communication
TIELT
TIELT is a multilingual application this
provides interfacing with many different games.
TCP/IP
Library Calls
SWIG
2. Resources for learning to use TIELT
  • TIELT Scripting syntax highlighting
  • Map of TIELT Component Interactions
  • Thanks, Megan
  • Typed script interface

48
TIELT Issues (November 2004)
3. Formatting
Game Model
  • To no ones surprise, everyone agrees that
    TIELTs Game Model representation is inadequate.
  • Requests have been made for
  • 3D Maps (Quake)
  • A different programming language
  • A relational operator representation
  • Standardized events

49
TIELT Collaborations (2004-05)
TIELTs User Interface
TIELT User
Prediction Interface
Evaluation Interface
Coordination Interface
Advice Interface
U.Minn-D.
USC/ICT
U.Mich.
Decision System Library
Game Library
TIELTs Internal Communication Modules
Soar U.Mich
ICARUS ISLE
DCA UT Arlington
EE2
Learning Modules
Mad Doc
Troika
Neuroevolution UT Austin
FreeCiv
Others Many
NWU
ISLE
TIELTs KB Editors
Selected/Developed Knowledge Bases
LU, USC
Mich/ISLE
U. Mich.
Many
Game Model
Task Descriptions
Game Interface Model
Decision System Interface Model
Experiment Methodology
Many
TIELT User
Knowledge Base Libraries
50
TIELT Collaboration Projects (2004-05)
Organization Game Interface and Model Decision System Tasks and Evaluation Methodology
Mad Doc Software Empire Earth 2 (RTS)
Troika Games Temple of Elemental Evil (RPG)
ISLE SimCity (RTS) ICARUS ICARUS w/ FreeCiv, design
Lehigh U. Stratagus/Wargus (RTS), and HTN/TMK designs Case-based planner (CBP) Wargus/CBP
NWU FreeCiv (discrete strategy), and qualitative game representations
U. Michigan SOAR SOAR w/ 2 games (e.g., FSW, ToEE), design
U. Minnesota-Duluth RoboCup (team sports) Advice-taking components Advice processing
USC/ICT Full Spectrum Command (RTS) SOAR with FSC
UT Arlington Urban Terror (FPS) DCA (lite version)
UT Austin Neuroevolution e.g., Neuroevolution/EE2
51
Games Being Integrated with TIELT
Category Gaming Simulator Description Description Description
Category Gaming Simulator Genre Foci Perspective
Commercial Empire Earth II? (Mad Doc S/W) Temple of Elemental Evil? (Toika) SimCity? (ISLE) RTS Role-playing RTS Civilization Solve quests City manager God 1st person God
Freeware FreeCiv (NWU) (Civilization?) Wargus (Lehigh U.) (Warcraft II?) Urban Terror (UT Arlington) RoboCup Soccer (UW) Discrete strategy RTS FPS Team sports Civilization Civilization Shooter Team of agents God God 1st person Behavior designer
Military Full Spectrum Command? (USC/Inst. Creative Technologies) RTS Leading an Army Light Infantry Company 1st person
52
Promising Learning Strategies
Learning Strategy Description When to Use Justification
Advice Giving Expert explains how to perform in a given state (this is the only interactive strategy listed here) Speedup needed expert is available Permits quick acquisition of specific and general domain knowledge
Backpropagation Trains a 3-layer neural network (NN) of sigmoidal hidden units Target is a non-linear function offline training is ok Many learning tasks are non-linear and some can be performed off-line
Case-Based Reasoning Use/adapt solutions from experiences to solve similar problems Cases complement incomplete domain model problem-solving speed is crucial. Quicker to adapt cases than reason from scratch, but requires domain-specific adaptation knowledge
Chunking Compile a sequence of steps into a macro For tasks requiring speedup Transforms a complex reasoning task into a fast retrieval task
Dynamic Scripting RL for tasks with large state spaces that w/ domain knowledge can be collapsed into a smaller set Small set of states exist, with a set of rules for each Greatly speeds up RL approach, but requires analysis of task states
Evolutionary Computation Evolutionary (genetic) selection on a population of genomes, where application dictates their repn Search space is huge, and training can be done offline Genome repns can be task specific, so this powerful search method can be tuned for the task
Meta Reasoning After a failure, this identifies its type task that failed, it retrieves a task-specific strategy to avoid this failure, and updates its model To support self-adaptation Although knowledge intensive, this is an excellent method for changing problem-solving strategies
Neuroevolution Using a separate genetic algorithm population for learning each hidden units weight in a NN To support cooperating heterogeneous agents A good offline agent-based learning approach for multi-agent gaming
Reinforcement Learning (RL) Reinforce sequence of decisions after problem solving is completed Reward is known only after sequence ends, and blame can be ascribed Well-understood paradigm for learning action policies (i.e., what action to perform in a given state)
Relational MDPs Learn a Markov decision process re objects their relations using probabilistic relational models Seeking knowledge transfer (KT) to similar environments KT is crucial for learning quickly, and feasibly, for some tasks
53
TIELT-General Game Player Integration(with
Stanford Universitys Michael Genesereth)
TIELT
GGP-TIELT
  • Experiment design/control capabilities
  • Common game engine interface
  • Support for several learning approaches
  • Play entire class of general games as well as
    TIELT-integrated gaming simulators.
  • Compete remotely against reference players and
    other GGP systems.
  • Define evaluation methodologies for learning
    experimentation.
  • Participate in AAAI05 GGP Competition.

GGP
  • Logical game formalisms
  • Access to remote players
  • WWW access

54
Upcoming Events
  • National Conference on AI (AAAI05 24-28 July
    Pittsburgh)
  • General Game Playing Competition (10K prize)
  • Int. Joint Conference on AI (IJCAI05 30 July-5
    August Edinburgh)
  • Workshop Reasoning, Representation, and Learning
    in Gaming Simulation Tasks (Tentative title)
  • Int. Conference on ML (ICML05 7-11 August
    Bonn)
  • Workshop submission in progress
  • Int. Conference on CBR (ICCBR05 23-26 August
    Chicago)
  • Workshop Competition CBR in Games

55
Summary
  • TIELT Mediates between a (gaming) simulator and
    a learning-embedded decision system
  • Goals
  • Simplify running learning expts with cognitive
    systems
  • Support DARPA challenge problems in learning
  • Designed to work with many types of simulators
    decision systems
  • Status
  • TIELT (v0.5 Alpha) completed in 10/04
  • Users Manual, Tutorial, www site exist
  • 10 collaborating organizations (1-year contracts)
  • Enhances probability that TIELT will achieve its
    goals
  • Were planning several TIELT-related events

56
Backup Slides
57
Metrics
Research perspective
  1. Time required to develop reasoning interface KB
  2. Ability to design/facilitate selected evaluation
    methodology
  3. Expressiveness of KB representation
  4. Breadth of learning techniques supported
  5. Breadth of learning and performance tasks
    supported
  6. Availability of integrated gaming simulators
    challenges

Industry perspective
  • Ability to develop learned/learning behaviors of
    interest
  • Time required to
  • develop game interface model KBs, and
  • these behaviors
  • Availability of learning-embedded reasoning
    systems
  • Support for both off-line and on-line learning

58
Some Expected User Metrics
Performance tasks
  • Some standards
  • e.g., classification accuracy, ROC analyses,
    precision recall
  • Decision making speed and accuracy
  • Plan execution quality (e.g., time to execute,
    mission-specific Measures of Effectiveness)
  • Number of constraint violations
  • Ability to transfer learned knowledge

59
TIELT Potential Learning Challenge Problems
  • Learn to win a game (i.e., accomplish an
    objective)
  • e.g., solve a challenging diplomacy task, provide
    a realistic military training course facing
    intelligent adversaries, or help users to develop
    real-time cognitive reasoning skills for a
    defined role in support of a multi-echelon mission
  • Learn an adversarys strategy
  • e.g., predict a terrorist groups plan and/or
    tactics, suggest appropriate responses to prevent
    adversarial goals, help users identify
    characteristics of adversarial strategies
  • Learn crucial processes of an environment
  • e.g., learn to improve an incorrect/incomplete
    game model so that it more accurately/reliably
    defines objects/agents in the game, their
    behaviors, their capabilities, and their
    limitations
  • Intelligent situation assessment
  • e.g., learn which factors in the simulation
    require attention to accomplish different types
    of tasks

60
Example Game FreeCiv(Discrete-time strategy)
Civilization II? (MicroProse)
  • Civilization II? (1996-) 850K copies sold
  • PC Gamer Game of the Year Award winner
  • Many other awards
  • Civilization? series (1991-) Introduced the
    civilization-based game genre

FreeCiv (Civ II clone)
  • Open source freeware
  • Discrete strategy game
  • Goal Defeat opponents, or build a spaceship
  • Resource management
  • Economy, diplomacy, science, cities, buildings,
    world wonders
  • Units (e.g., for combat)
  • Up to 7 opponent civs
  • Partial observability

http//www.freeciv.org
61
Previous FreeCiv/Learning Research
(Ulam et al., AAAI04 Workshop on Challenges in
Game AI)
  • Title Reflection in Action Model-Based
    Self-Adaptation in Game Playing Agents
  • Scenarios
  • City defense Defend a city for 3000 years

62
FreeCiv CP Scenario
General description
  • Game initialization Your only unit, a settler,
    is placed randomly on a random world (see Game
    Options below). Players cyclically alternate play
  • Objective Obtain highest score, conquer all
    opponents, or build first spaceship
  • Scoring Basic goal is to obtain 1000 points.
    Game options affect the score.
  • Citizens 2 pts per happy citizen, 1 per content
    citizen
  • Advances 20 pts per World Wonder, 5 per
    futuristic advance
  • Peace 3 pts per turn of world peace (no wars or
    combat)
  • Pollution -10pts per square currently polluted
  • Top-level tasks (to achieve a high score)
  • Develop an economy
  • Increase population
  • Pursue research advances
  • Opponent interactions Diplomacy and
    defense/combat

Game Option Y1 Y2 Y3
World size Small Normal Large
Difficulty level Warlord (2/6) Prince (3/6) King (4/6)
Opponent civilizations 5 5 7
Level of barbarian activity Low Medium High
63
FreeCiv CP Information Sources
Concepts in an Initial Knowledge Base
  • Resources Collection and use
  • Food, production, trade (money)
  • Terrain
  • Resources gained per turn
  • Movement requirements
  • Units
  • Type (Military, trade, diplomatic, settlers,
    explorers)
  • Health
  • Combat Offense defense
  • Movement constraints (e.g., Land, sea, air)
  • Government Types (e.g., anarchy, despotism,
    monarchy, democracy)
  • Research network Identifies constraints on what
    can be studied at any time
  • Buildings (e.g., cost, capabilities)
  • Cities
  • Population Growth
  • Happiness
  • Pollution
  • Civilizations (e.g., military strength,
    aggressiveness, finances, cities, units)
  • Diplomatic states negotiations

64
FreeCiv CP Decisions
Civilization decisions
  • Choice of government type (e.g., democracy)
  • Distribution of income devoted to research,
    entertainment, and wealth goals
  • Strategic decisions affecting other decisions
    (e.g., coordinated unit movement for trade)

City decisions
  • Production choice (i.e., what to create,
    including city buildings and units)
  • Citizen roles (e.g., laborers, entertainers, or
    specialists), and laborer placement
  • Note Locations vary in their terrain, which
    generate different amounts of food, income, and
    production capability

Unit decisions
  • Task (e.g., where to build a city, whether/where
    to engage in combat, espionage)
  • Movement

Diplomacy decisions
  • Whether to sign a proffered peace treaty with
    another civilization
  • Whether to offer a gift

65
FreeCiv CP Decision Space
Variables
  • Civilization-wide variables
  • N Number of civilizations encountered
  • D Number of diplomatic states (that you can have
    with an opponent)
  • G Number of government types available to you
  • R Number of research advances that can be
    pursued
  • I Number of partitions of income into
    entertainment, money, research
  • U Units
  • L Number of locations a unit can move to in a
    turn
  • C Cities
  • Z Number of citizens per city
  • S Citizen status (i.e., laborer, entertainer,
    doctor)
  • B Number of choices for city production

Decision complexity per turn (for a typical game
state)
  • O(DNGRILU(SZB)C) this ignores both
    other variables and domain knowledge
  • This becomes large with the number of units and
    cities
  • Example N3 D5 G3 R4 I10 U25 L4
    C8 Z10 S3 B10
  • Size of decision space (i.e., possible next
    states) 2.51065 (in one turn!)
  • Comparison Decision space of chess per turn is
    well below 140 (e.g., 20 at first move)

66
FreeCiv CP A Simple Example Learning Task
Situation
  • Were England (e.g., London)
  • Barbarians are north (in red)
  • Two other civs exist
  • Our military is weak

What should we do?
  • Ally with Wales? If so, how?
  • Build a military unit? Which?
  • Improve defenses?
  • Increase citys production rate?
  • Build a new city to the south? Where?
  • Research Gun Powder? Or?
  • Move our diplomat back to London?
  • A combination of these?

What information could help with this decision?
  • Previous similar experiences
  • Generalizations of those experiences
  • Similarity knowledge
  • Adaptation knowledge
  • Opponent model
  • Statistics on barbarian strength, etc.

67
Analysis of the Example Learning Task
Complexity function
  • O(DNGRILU(SZB)C)

Situation
  • D 3 (war, neutral, peace)
  • N Only 1 other civilization contacted (i.e.,
    Wales)
  • G 2 government types known
  • R 4 research advances available
  • I 5 partitions of income available
  • L 14 per unit
  • U 3 Units (1 external, 2 in city)
  • C 1 City
  • S 3 (entertainer, laborer, doctor)
  • Z 6 citizens
  • B 5 units/buildings it can produce

Decision Space Size
  • 1.2109
  • This reduces to 32 sensible choices after
    applying some domain knowledge
  • e.g., dont change diplomatic status now, keep
    units in city for defense, dont change
    government now (because itll slow production),
    keep external unit away from danger

68
FreeCiv CP Learning Opportunities
Learn to keep citizens happy
  • Citizens in a city who are unhappy will revolt
    this temporarily eliminates city production
  • Several factors influence happiness (e.g.,
    entertainment, military presence, govt type)

Learn to obtain diplomatic advantages
  • Countries at war tend to have decreased trade,
    lose units and cities, etc.
  • Diplomats can sometimes obtain peace treaties or
    otherwise end wars
  • Unit movement decisions can also impact
    opponents diplomatic decisions

Learn how to wage war successfully
  • Good military decisions can yield new
    cities/citizens/trade, but losses can be huge
  • Unit decisions can benefit from learning tactical
    coordinated behaviors
  • The selection of a military unit(s) for a task
    depends on the opponents capabilities

Learn how to increase territory size
  • Initially, unexplored areas are unknown their
    resources (e.g., gold) cannot be harvested
  • Exploration needs to be balanced with security
  • City placement decisions influence territory
    expansion

69
FreeCiv CP Example Learned Knowledge
Learn what playing strategy to use in each
adversarial situation
  • Situations are defined by relative military
    strength, diplomatic status, whether the opponent
    has strong alliances, locations of forces, etc.
  • Selecting a good playing strategy depends on many
    of these variables

70
What Techniques Could Learn the Task of Selecting
a Playing Strategy?
Meta-reasoning (e.g., Ulam et al., AAAI04 Wkshp
on Challenges in Game AI)
  • Requires knowledge on
  • Tasks being performed
  • Types of failures that can occur when performing
    these tasks
  • T2 Overestimate own strength, underestimate
    enemy strength,
  • T3 Incorrect assessment of enemys diplomatic
    status,
  • Strategies for adapting these tasks
  • S1 Increase military strength
  • S2 Assess distribution of enemy forces
  • S3 Consider enemys diplomatic history
  • Mapping of failure types in (2) to adaptation
    strategies in (3)
  • Example We decided to Attack, but underestimated
    enemy strength. This was indexed by strategy S2,
    which well do from now on in T2.

T1 Determine Playing Strategy
T3 Assess Diplomatic Status
T4 Select Strategy
T2 Assess Military Advantage
Attack
Retreat!
Fortify
Trade
Seek Peace
Bribe
71
Challenges for Using Learning via Meta-Reasoning
How can its background knowledge be learned
(efficiently)?
  • i.e., tasks, failure types, failure adaptation
    strategies, mappings
  • Also, the agent needs to understand how to
    diagnosis an error (i.e., identify which task
    failed and its failure type)

What if only incomplete background knowledge
exists?
  • Could complementary learning techniques apply it?
  • e.g., Relational MDPs (which handle uncertainty)
  • Could learning techniques be used to
    extend/correct it?
  • e.g., Learning from advice, case-based reasoning

Can we scale it to more challenging learning
problems?
  • Currently, it has only been applied to simpler
    tasks
  • Defend a City (in FreeCiv)
  • More difficult would be Play Entire Game

72
Full Spectrum Command Warrior(http//www.ict.us
c.edu/disp.php?bdproj_games)
Organization USCs Institute for Creative
Technologies
  • POC Michael van Lent (Editor-in-Chief, Journal
    of Game Development)
  • Goal Develop immersive, interactive, real time
    training simulations to help the Army create
    decision-making leadership-development tools

Focus US Army training tools (deployed _at_ Ft
Benning Afghanistan)
  • Full Spectrum Command (PC-based simulator)
  • Role Commander of a U.S. Army light infantry
    Company (120 soldiers)
  • Tasks Interpret the assigned mission, organize
    the force, plan strategically, coordinate the
    actions of the Company
  • Full Spectrum Warrior (MS Xbox-based simulator)
  • Role Light infantry squad leader
  • Tasks Complete assigned missions safely

73
METAGAME(Pell, 1992)
Focus Learn strategies to win any game in a
pre-defined category
  • Initial category Chess-like games
  • Games are produced by a game generator
  • Input Rules on how to play the game
  • Move grammar is used to communicate actions
  • Output (desired) A winning playing strategy

Games
Graphics for Spectators
Game Manager
percept actions clocks
action
Temporary State Data
Records
74
Collaborator Mad Doc Software
Summary
  • PI Ron Rosenberg (Producer)
  • Experience
  • Mad Doc is a leader in real-time strategy games
    Empire Earth II is expected to sell in the
    millions of copies
  • CEO Ian Davis (CMU PhD in Robotics) is a well
    known collaborator with the AI research
    community, and gave an invited presentation at
    AAAI04. He will work with Ron on this contract.
  • Deliverables Mad Doc (RTS) game simulator API
  • This will be used by multiple other collaborators

75
Collaborator Troika Games
Summary
  • PI Tim Cain, Joint-CEO
  • Experience
  • Troika has outstanding experience with developing
    state-of-the-art role playing games, including
    Temple of Elemental Evil (ToEE)
  • A game developer since 1982, Tim obtained an M.S.
    with a focus on machine learning at UC Irvine in
    the late 1980s.
  • Deliverables ToEE (RPG) game simulator API
  • This will be used by some other collaborators
    (e.g., U. Michigan)

76
Collaborator ISLE
Summary
  • PIs Dr. Seth Rogers, Dr. Pat Langley
  • Experience
  • ISLE (Institute for the Study of Learning and
    Expertise) is known for its ICARUS cognitive
    architecture, which is distinguished in part by
    its commitment to ground every symbol with a
    physical world object
  • Pat Langley, founder of the journal Machine
    Learning, is known for his expertise in cognitive
    architectures and evaluation methodologies of
    learning systems.
  • Deliverables
  • ICARUS reasoning system API
  • FreeCiv agent (with assistance from NWU) and
    SimCity agent
  • This will also be used by USC/ICT
  • SimCity (RTS) game simulator API

77
Collaborator Lehigh U.
Summary
  • PI Prof. Héctor Muñoz-Avila
  • Experience
  • Héctor is an expert on hierarchical planning
    technology, and in particular has expertise in
    case-based planning
  • Collaborating with NRL on TIELT during CY04 on
    (1) Game Model description representations, (2)
    Stratagus/Wargus game simulator API, and (3)
    feedback on TIELT usage
  • Deliverables
  • Software for translating among Game Model
    representations
  • Stratagus/Wargus (RTS) game simulator API
  • This may be used by UT Austin
  • Case-based planning reasoning system API

78
Collaborator NWU
Summary
  • PIs Prof. Ken Forbus, Prof. Tom Hinrichs
  • Experience
  • Ken is a leading AI/games researcher. He is also
    the leading worldwide researcher in computational
    approaches to reasoning by analogy.
  • Kens group has extensive experience with
    qualitative reasoning approaches and with using
    the FreeCiv gaming simulator.
  • Deliverables
  • FreeCiv (Discrete Strategy) game simulator API
  • This will be used by ISLE
  • Qualitative spatial reasoning system for FreeCiv
    API

79
Collaborator U. Michigan
Summary
  • PI Prof. John Laird
  • Experience
  • John is the best-known AI/games researcher, and
    has extensive experience with integrating many
    commerical, freeware, and military game
    simulators with the Soar cognitive architecture.
  • Deliverables
  • Soar reasoning system API
  • This will be used by USC/ICT
  • Applications of Soar to two game simulators
    (e.g., ToEE, Wargus)

80
Collaborator USC/ICT
Summary
  • PI Dr. Michael van Lent
  • Experience
  • Extensive implementation experience with AI/game
    research PhD advisor was John Laird.
  • Lead ICTs development of Full Spectrum Warrior
    and Full Spectrum Command (FSC) in collaboration
    with Quicksilver Software and the Armys PEO
    STRI. FSC is deployed at Ft. Benning and
    Afghanistan.
  • Editor-in-Chief, Journal of Game Development
  • Deliverables
  • FSC (RTS) game simulator API
  • Applications of FSC with U. Michigans Soar and
    ISLEs ICARUS

81
Collaborator UT Arlington
Summary
  • PIs Prof. Larry Holder, G. Michael Youngblood
  • Experience
  • Larry has extensive experience with developing
    unsupervised machine learning systems that use
    relational representations, and has lead efforts
    on developing the DArtagnan cognitive
    architecture.
  • Deliverables
  • Urban Terror (FPS) game simulator API
  • DArtagnan reasoning system API (partial)

82
Collaborator UT Austin
Summary
  • PI Prof. Risto Miikkulainen
  • Experience
  • Risto has significant experience with integrating
    neuro-evolution and similar approaches with game
    simulators.
  • Collaborating with UT Austins Digital Media
    Laboratorys development of the NERO (FPS) game
    simulator
  • Deliverables
  • Knowledge-intensive neuro-evolution reasoning
    system API
  • Application of this API using other simulators
    (e.g., FSC, Wargus) and U. Wisconsins advice
    processing module

83
Collaborator U. Wisconsin
Summary
  • PI(s) Prof. Jude Shavlik (UW), Prof. Richard
    Maclin (U. Minn-Duluth)
  • Experience
  • Jude advised the first significant M.S. Thesis on
    applying machine learning to FPS game simulators
    (Geisler, 2002)
  • Maclin, who will be on sabbatical at U. Wisconsin
    during this project, has performed extensive work
    with applying AI techinques (e.g., advice
    processing) to the RoboCup game simulator
  • Deliverables
  • RoboCup (team sports) game simulator API
  • Advice processing module
  • WWW-based repository for TIELT software
    components (e.g., APIs)
Write a Comment
User Comments (0)
About PowerShow.com