Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs

About This Presentation

Title:

Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs

Description:

Praveen Paruchuri, Milind Tambe University of Southern California Spiros Kapetanakis University of York,UK Sarit Kraus Bar-Ilan University,Israel University of ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 20

Provided by: Mili58

Category:

more less

Transcript and Presenter's Notes

Title: Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs

1
Between Collaboration and Competition An Initial
Formalization using Distributed POMDPs

Praveen Paruchuri, Milind Tambe
University of Southern California
Spiros Kapetanakis
University of York,UK
Sarit Kraus
Bar-Ilan University,Israel
University of Maryland, College Park
July 2003

2
Motivation

Many domains present where agents act in team but
need to maintain some self interest.
Electric Elves Agents take decisions for users
but act as a team like arranging a meeting etc.
SDR Software for Distributed Robotics where
100 robots must locate and protect objects.
Robots must ensure their survival like refilling
batteries

3
The Problem

Framework for teams of agents maintaining private
goals for stochastic, complex and dynamic
environments.
Agents need to maximize joint objectives and yet
honor private preferences.
Private versus Team Interest Might be
conflicting
Build framework based on Distributed POMDPs for
policy generation
Analyze complexity of policy generation

4
Previous work

Distributed POMDPs like COM-MTDP
Have single joint reward
Optimal policy maximizes joint value (Ex1)
Solution not stable
Stochastic Games
Have individual rewards.
Policy finds equilibrium solution. Stability, key
concept (Ex2)
Solution not favorable to both individually and
as team
Ex1
Ex2

4(6,-2) 2(1,1)
0(0,0) 3(-2,5)
5,5(10) -1,6(5)
6,-1(5) 0,0(0)
5
Motivation Simple examples

One shot game without stochastic elements
Ex1 Two people need to meet, one prefers 4pm,
other 5pm
When should they meet ??
Need to compromise some extent, but not totally.
No meeting is bad for both. Agree on mutually
acceptable solution.
Ex2 Team of robots work on task
Limited battery
Last n battery for re-fuelling itself. Otherwise
die.
Need to achieve team goal while they dont die.

6
MTDP A Distributed POMDP Model

An MTDP is a tuple ltS,A(a),P,O(a),O(a),B(a),Rgt
where,
S is a set of world states.
A(a) is a set of allowed team actions. A(a) p (
A(i) ) , A(i) is a set of domain level actions
for each agent i.
P is a probability distribution that governs the
effect of domain level actions.( P( s,a,s1) Pr
( s1/s,a) )
O(a) is the joint set of observations.
B(a) is the combination of all the agents set of
possible belief states.
R is the common reward for the team. RS A(a)?
R

7
E-MTDP Formally Defined

An E-MTDP is a tuple ltS,A(a),P,O(a),O(a),B(a),Rgt
where S,A(a),P,O(a),O(a),B(a) are as defined in
MTDP.
R lt R1, R2,.., Rn, Ra gt where,
R1,R2,..,Rn are rewards of agents 1,2,..,n
Ra is the joint reward for the n agents where Ra
?R1 dR2
Both individual and joint rewards can be
expressed.

8
E-MTDP Policy

Policy maps belief states to actions - ? Bi ?
Ai
Centralized Policy generator.
Policy p is such that
V1(p) gt T1 , V2(p) gt T2
For p ltgt p, where V1(p) gt T1 and V2(p)gtT2,
V(p) gt V(p)
where, T1 and T2 are thresholds for agents 1 and
2.
V1 is value from policy for agent1 and V2
for agent2.
V is overall value of policy without splitting.

9
Novelties of E-MTDP

Maintains individual rewards for each agent and a
joint reward for the team.
Solution concept is novel because optimal policy
both
Maximizes joint reward
and
Ensures certain minimum expected value for
individual team members.

10
Experimental Validation

Goal Show utility of EMTDP
A real system called Electric(E)-Elves based on
MDPs.
Based on maximizing single joint reward.
Expressed as EMTDP and helped improve
performance.
E-Elves- A published real world multi agent
system
Used at USC/ISI for 6 months.
Agents called proxies - Reschedule meetings,
Decide to present talks on behalf of user, Order
meals, Track user location etc etc.

11
Electric Elves

Focus on task of rescheduling meetings.
Used single agent MDP to model an agent
Actions like delaying/canceling meeting, asking
user etc.
Asking user for his input is critical.
Time constraints might prevent agent asking user
for input.
Policy generator uses the notion of team reward
for deciding actions.
No notion of individual reward.

12
Perceived Problem and Improvement

Original formulation had R(a) and R(user)
terms1.
However R(a) R(user) is maximized in policy
generation.
As R(a) increased with R(user) constant, agent
stopped asking user.
As R(a) increases, cost(Uncertainty in getting
response from user) gt d ( Increase in quality of
decision due to users feedback ).
Hence, decision taken without asking.
User might want to have different decision.
User can set his importance to meeting using
R(user)
If user important, agent needs to make a correct
decision regarding user.
Users opinion becomes important affecting of
asks.

13
Original Elves Result

x-axis Value of meeting without the user.
y-axis of times the agent asks the user.
Number of asks decrease as R(alpha) increases.
Agents sometime cancel important meeting without
asking user ( Very high cost )1.

14
E-MTDP based E-Elves

Solving using E-MTDP
Let there be two agents
Priv1 R(user), agent 1s private reward
Priv2 R(alpha), agent 2s private reward
Set priv1 gt Threshold.
of asks now dependent on Threshold.
User importance(priv1) set high. Agent asks the
user for his input before deciding unlike
earlier.
Setting threshold is important to obtain the
required behavior.

15
E-MTDP result

From graph above, giving flexibility to the user
to set his threshold can result in agent asking
him more times.
User opinion taken into consideration.
Flexibility is the key word. Users like control
over their agents.

16
Conclusions

A framework for teams of self-interested agents.
E-MTDP presented as a solution concept.
E-MTDP applied to E-Elves
Improvement in performance of system measured in
terms of number of asks.
Fine-tuning of agents, according to user needs,
now possible.

17
Future Work

Fine tune the existing E-MTDP framework.
Need to analyze complexity of E-MTDP policies.
Analyze stability of the E-MTDP solutions.
References
1. Towards Adjustable Autonomy for the Real World
Paul Scerri, David V.Pynadath and Milind Tambe,
JAIR-02
THANK YOU
Any Questions ??

Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs - PowerPoint PPT Presentation

Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs

Praveen Paruchuri, Milind Tambe University of Southern California Spiros Kapetanakis University of York,UK Sarit Kraus Bar-Ilan University,Israel University of ... – PowerPoint PPT presentation