Multi-Agent Exploration

About This Presentation

Title:

Multi-Agent Exploration

Description:

Title: Autonomous Inter-Task Transfer in Reinforcement Learning Domains Author: Matthew E. Taylor Last modified by: Matthew Taylor Created Date: 12/2/2006 10:58:49 PM – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 28

Provided by: Matth609

Category:

more less

Transcript and Presenter's Notes

Title: Multi-Agent Exploration

1
Multi-Agent Exploration
Matthew E. Taylor
http//teamcore.usc.edu/taylorm/
2
DCOPs Distributed Constraint Optimization
Problems

Multiple domains
Multi-agent plan coordination
Sensor networks
Meeting scheduling
Traffic light coordination
RoboCup soccer
Distributed
Robust to failure
Scalable
(In)Complete
Quality bounds

3
DCOP Framework
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6

a1
a2
a3
Different levels of coordination possible
4
Motivation DCOP Extension

Unrealistic often environment is not fully
known!
Agents need to learn
Maximize total reward
Real-world applications
Mobile ad-hoc networks
Sensor networks

5
Problem Statement

DCEE
Distributed Coordination of Exploration
Exploitation
Address Challenges
Local communication
Network of (known) interactions
Cooperative
Unknown rewards
Maximize on-line reward
Limited time-horizon
(Effectively) infinite reward matrix

5
6
Mobile Ad-Hoc Network

Rewards signal strength between agents 1,200
Goal Maximize signal strength over time
Assumes
Small Scale fading dominates
Topology is fixed

a1
a2
75
95
100
a3
a4
50
7
MGM

Review
Ideas?

8
Static Estimation SE-Optimistic
Rewards on 1,200
If I move, Id get R200
a1
a2
a3
a4
100
50
75
9
Static EstimationSE-Optimistic
Rewards on 1,200
If I move, Id gain 275
If I move, Id gain 250
If I move, Id gain 100
If I move, Id gain 125
a3
a1
a2
a4
a3
100
50
75
10
Results SimulationMaximize total reward area
under curve
SE-Optimistic
No Movement
11
Balanced Exploration Techniques

BE-Backtrack
Decision theoretic calculation of exploration
Track previous best location Rb
Bid to explore for some number of steps (te)

Reward while exploiting P(improve reward)
Reward while exploiting P(NOT improve reward)
Reward while exploring
12
Results SimulationMaximize total reward area
under curve
BE-Backtrack
SE-Optimistic
No Movement
13
Omniscient Algorithm

(Artificially) convert DCEE to DCOP
Run MGM algorithm Pearce Tambe, 2007
Quickly find local optimum
Establish upper bound
Only works in simulation

13
14
Results SimulationMaximize total reward area
under curve
Omniscient
BE-Backtrack
SE-Optimistic
No Movement
15
Balanced Exploration Techniques

BE-Rebid
Allows agents to backtrack
Re-evaluate every time-step Montemerlo04
Allows for on-the-fly reasoning

15
16
Balanced Exploration Techniques

BE-Stay
Agents unable to backtrack
True for some types of robots
Dynamic Programming Approach

16
17
Results (simulation)
(10 agents, random graphs with 15-20 links)
18
Results (simulation)
(chain topology, 100 rounds)
19
Results (simulation)
(20 agents, 100 rounds)
20
Also Tested on Physical Robots
Used iRobot Creates (Unfortunately, they dont
vacuum)
21
Sample Robot Results
21
22
k-Optimality

Increased coordination
Find pairs of agents to change variables
(location)
Higher communication overhead
SE-Optimistic SE-Optimistic-2 SE-Optimistic-3
SE-Mean SE-Mean-2
BE-Rebid BE-Rebid-2
BE-Stay BE-Stay-2

22
23
Confirm Previous DCOP Results
If (artificially) provided rewards, k2
outperforms k1
24
Sample coordination results
Full Graph
Chain Graph
24
25
Surprising ResultIncreased Coordination can Hurt
26
Surprising ResultIncreased Coordination can Hurt
27
Regular Graphs

Write a Comment

User Comments (0)