Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs

Description:

Praveen Paruchuri, Milind Tambe University of Southern California Spiros Kapetanakis University of York,UK Sarit Kraus Bar-Ilan University,Israel University of ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 20
Provided by: Mili58
Category:

less

Transcript and Presenter's Notes

Title: Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs


1
Between Collaboration and Competition An Initial
Formalization using Distributed POMDPs
  • Praveen Paruchuri, Milind Tambe
  • University of Southern California
  • Spiros Kapetanakis
  • University of York,UK
  • Sarit Kraus
  • Bar-Ilan University,Israel
  • University of Maryland, College Park
  • July 2003

2
Motivation
  • Many domains present where agents act in team but
    need to maintain some self interest.
  • Electric Elves Agents take decisions for users
    but act as a team like arranging a meeting etc.
  • SDR Software for Distributed Robotics where
    100 robots must locate and protect objects.
  • Robots must ensure their survival like refilling
    batteries

3
The Problem
  • Framework for teams of agents maintaining private
    goals for stochastic, complex and dynamic
    environments.
  • Agents need to maximize joint objectives and yet
    honor private preferences.
  • Private versus Team Interest Might be
    conflicting
  • Build framework based on Distributed POMDPs for
    policy generation
  • Analyze complexity of policy generation

4
Previous work
  • Distributed POMDPs like COM-MTDP
  • Have single joint reward
  • Optimal policy maximizes joint value (Ex1)
  • Solution not stable
  • Stochastic Games
  • Have individual rewards.
  • Policy finds equilibrium solution. Stability, key
    concept (Ex2)
  • Solution not favorable to both individually and
    as team
  • Ex1
    Ex2

4(6,-2) 2(1,1)
0(0,0) 3(-2,5)
5,5(10) -1,6(5)
6,-1(5) 0,0(0)
5
Motivation Simple examples
  • One shot game without stochastic elements
  • Ex1 Two people need to meet, one prefers 4pm,
    other 5pm
  • When should they meet ??
  • Need to compromise some extent, but not totally.
  • No meeting is bad for both. Agree on mutually
    acceptable solution.
  • Ex2 Team of robots work on task
  • Limited battery
  • Last n battery for re-fuelling itself. Otherwise
    die.
  • Need to achieve team goal while they dont die.

6
MTDP A Distributed POMDP Model
  • An MTDP is a tuple ltS,A(a),P,O(a),O(a),B(a),Rgt
    where,
  • S is a set of world states.
  • A(a) is a set of allowed team actions. A(a) p (
    A(i) ) , A(i) is a set of domain level actions
    for each agent i.
  • P is a probability distribution that governs the
    effect of domain level actions.( P( s,a,s1) Pr
    ( s1/s,a) )
  • O(a) is the joint set of observations.
  • B(a) is the combination of all the agents set of
    possible belief states.
  • R is the common reward for the team. RS A(a)?
    R

7
E-MTDP Formally Defined
  • An E-MTDP is a tuple ltS,A(a),P,O(a),O(a),B(a),Rgt
    where S,A(a),P,O(a),O(a),B(a) are as defined in
    MTDP.
  • R lt R1, R2,.., Rn, Ra gt where,
  • R1,R2,..,Rn are rewards of agents 1,2,..,n
  • Ra is the joint reward for the n agents where Ra
    ?R1 dR2
  • Both individual and joint rewards can be
    expressed.

8
E-MTDP Policy
  • Policy maps belief states to actions - ? Bi ?
    Ai
  • Centralized Policy generator.
  • Policy p is such that
  • V1(p) gt T1 , V2(p) gt T2
  • For p ltgt p, where V1(p) gt T1 and V2(p)gtT2,
  • V(p) gt V(p)
  • where, T1 and T2 are thresholds for agents 1 and
    2.
  • V1 is value from policy for agent1 and V2
    for agent2.
  • V is overall value of policy without splitting.

9
Novelties of E-MTDP
  • Maintains individual rewards for each agent and a
    joint reward for the team.
  • Solution concept is novel because optimal policy
    both
  • Maximizes joint reward
  • and
  • Ensures certain minimum expected value for
    individual team members.

10
Experimental Validation
  • Goal Show utility of EMTDP
  • A real system called Electric(E)-Elves based on
    MDPs.
  • Based on maximizing single joint reward.
  • Expressed as EMTDP and helped improve
    performance.
  • E-Elves- A published real world multi agent
    system
  • Used at USC/ISI for 6 months.
  • Agents called proxies - Reschedule meetings,
    Decide to present talks on behalf of user, Order
    meals, Track user location etc etc.

11
Electric Elves
  • Focus on task of rescheduling meetings.
  • Used single agent MDP to model an agent
  • Actions like delaying/canceling meeting, asking
    user etc.
  • Asking user for his input is critical.
  • Time constraints might prevent agent asking user
    for input.
  • Policy generator uses the notion of team reward
    for deciding actions.
  • No notion of individual reward.

12
Perceived Problem and Improvement
  • Original formulation had R(a) and R(user)
    terms1.
  • However R(a) R(user) is maximized in policy
    generation.
  • As R(a) increased with R(user) constant, agent
    stopped asking user.
  • As R(a) increases, cost(Uncertainty in getting
    response from user) gt d ( Increase in quality of
    decision due to users feedback ).
  • Hence, decision taken without asking.
  • User might want to have different decision.
  • User can set his importance to meeting using
    R(user)
  • If user important, agent needs to make a correct
    decision regarding user.
  • Users opinion becomes important affecting of
    asks.

13
Original Elves Result
  • x-axis Value of meeting without the user.
  • y-axis of times the agent asks the user.
  • Number of asks decrease as R(alpha) increases.
  • Agents sometime cancel important meeting without
    asking user ( Very high cost )1.

14
E-MTDP based E-Elves
  • Solving using E-MTDP
  • Let there be two agents
  • Priv1 R(user), agent 1s private reward
  • Priv2 R(alpha), agent 2s private reward
  • Set priv1 gt Threshold.
  • of asks now dependent on Threshold.
  • User importance(priv1) set high. Agent asks the
    user for his input before deciding unlike
    earlier.
  • Setting threshold is important to obtain the
    required behavior.

15
E-MTDP result
  • From graph above, giving flexibility to the user
    to set his threshold can result in agent asking
    him more times.
  • User opinion taken into consideration.
  • Flexibility is the key word. Users like control
    over their agents.

16
Conclusions
  • A framework for teams of self-interested agents.
  • E-MTDP presented as a solution concept.
  • E-MTDP applied to E-Elves
  • Improvement in performance of system measured in
    terms of number of asks.
  • Fine-tuning of agents, according to user needs,
    now possible.

17
Future Work
  • Fine tune the existing E-MTDP framework.
  • Need to analyze complexity of E-MTDP policies.
  • Analyze stability of the E-MTDP solutions.
  • References
  • 1. Towards Adjustable Autonomy for the Real World
  • Paul Scerri, David V.Pynadath and Milind Tambe,
    JAIR-02
  • THANK YOU
  • Any Questions ??

18
(No Transcript)
19
Stability of solution
  • Designed a multistage game for E-MTDP policy to
    be stable.
Write a Comment
User Comments (0)
About PowerShow.com