Increasing Security through Communication and Policy Randomization in Multiagent Systems - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Increasing Security through Communication and Policy Randomization in Multiagent Systems

Description:

University of Maryland, College Park. 2. University of Southern California ... Add entropies of each state ... Agent B has no hint of state if NC actions. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 20
Provided by: Milind8
Category:

less

Transcript and Presenter's Notes

Title: Increasing Security through Communication and Policy Randomization in Multiagent Systems


1
Increasing Security through Communication and
Policy Randomization in Multiagent Systems
  • Praveen Paruchuri, Milind Tambe, Fernando Ordonez
  • University of Southern California
  • Sarit Kraus
  • Bar-Ilan University,Israel
  • University of Maryland, College Park

2
Motivation The Prediction Game
  • An UAV (Unmanned Aerial Vehicle)
  • Flies between the 4 regions
  • Can you predict the UAV-fly pattern ??
  • Pattern 1
  • 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,
  • Pattern 2
  • 1, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3, (as
    generated by 4-sided dice)
  • Can you predict if 100 numbers in pattern 2 are
    given ??
  • Randomization decreases Predictability
  • Increases Security

3
Problem Definition
  • Problem Increase security by decreasing
    predictability for agent-team acting in
    adversarial environments.
  • Even if Policy Given, it is Secure
  • Environment is stochastic and observable
    (MDP-based)
  • Communication is a limited
  • Efficient Algorithms for Reward/Randomization/Comm
    unication Tradeoff

4
Assumptions
  • Assumptions for agent-team
  • Adversary is unobservable
  • Adversarys actions/capabilities or payoffs are
    unknown
  • Communication is encrypted (safe)
  • Assumptions for Adversary
  • Knows the agents plan/policy
  • Exploits action predictability
  • Can see the agents state

5
Solution Technique
  • Technique developed
  • Intentional policy randomization
  • CMDP based framework
  • Sequential Decision Making
  • Limited Communication Resources
  • CMDP ? Constrained Markov Decision Process
  • Increase Security gt Solve Multi-criteria problem
    for agents
  • Maximize action unpredictability (Policy
    randomization)
  • Maintain reward above threshold (Quality
    constraints)
  • Communication usage below threshold (Resource
    constraints)

6
Domains
  • Scheduled activities at airports like security
    check, refueling etc
  • Can be observed by adversaries
  • Randomization of schedules helpful
  • UAV-team patrolling humanitarian mission
  • Adversary disrupts mission Can disrupt food,
    harm refugees, shoot down UAVs etc
  • Randomize UAV patrol policy

7
Our Contributions
  • Randomized policies for Multi-agent CMDP (MCMDP)
  • Solve Miscoordination
  • Randomized polices in team settings
  • Policy not implementable!
  • (Reward constraint gets violated)

Communication Resource lt Threshold
Expected Team Reward gt Threshold
Maximize Policy Randomization
8
Miscoordination Effect of Randomization
  • Meeting tomorrow
  • 9am 40, 10am 60
  • Communicate to coordinate
  • Limited Communication

Should have been 0 (Violates Threshold Rewards)
9
Communication Issue
  • Generate Randomized Implementable policies
  • Limited communication
  • Problem of communication
  • M coordination points
  • N units of communication
  • Generate best communication policy
  • Communication policy can also be randomized
  • Transform MCMDP to implementable MCMDP
  • Solution algorithm for transformed MCMDP

10
MCMDP Formally Defined
  • An MCMDP (for a 2 agent case) is a tuple
  • ltS,A,P,R, C1,C2, T1,T2, N,Qgt where,
  • S,A,R Joint states, actions, rewards
  • P Transition function
  • C1 - Cost vector for resource k
  • T1 - Threshold on expected resource k
    consumption.
  • N - Joint communication cost vector
  • Q - Threshold on communication costs
  • Basic terms used
  • x(s,a) Expected times action a is taken in
    state s
  • Policy (as function of x)

11
Entropy Measure of randomness
  • Randomness or information content quantified
    using Entropy ( Shannon 1948 )
  • Entropy for CMDP -
  • Additive Entropy Add entropies of each state
  • Weighted Entropy Weigh each state by it
    contribution to total flow
  • where alpha_j is the initial flow of the system

12
Issue 1 Randomized Policy Generation
  • Non-linear Program Max entropy, Reward above
    threshold, Communication below threshold
  • Obtains required randomization
  • Appends communication for every action
  • Issue 2 Generate the Communication Policy

13
Issue 2 Transformed MCMDP
a1b1
a1C
a1b2
a1b1
a1o
S1
a1b2
S1
a2b1
a2C
a2b2
a2o
a2b1
a2b2
For each state, for each joint action,
Introduce C (communication) and NC for different
individual action, add corresponding new states
Transition between original and new states
Transitions between new states and original
target states
14
Non-linear Constraints
  • Need to introduce non-linear constraints
  • For each original state
  • For each new state introduced by no communication
    action
  • Conditional probability of corresponding actions
    equal
  • Ex P(b1/ ) P(b1/ )
  • P(b2/ ) P(b2/ )
  • , - Observable, Reached by Comm
    action
  • , - Unobservable, No Comm
    action

15
Non-Linear constraints Handling Miscoordination
  • Agent B has no hint of state if NC actions.
  • Necessity to make its actions independent of
    source state.
  • Probability of action b1 from state
    should equal probability of same action (i.e b1)
    from .
  • Meeting scenario
  • Irrespective of agent As plan
  • If agent Bs plan is 20 9am 80 10am
  • B is independent of A
  • Miscoordination avoided ? Actions independent
    of state.

16
Experimental Results
Z-axis
Y axis
X-axis
17
Experimental Conclusions
  • Reward Threshold decreases gt Entropy increases
  • Communication increases gt Agents coordinate
    better
  • Coordination invisible to adversary
  • Agents coordinate better to fool the adversary
  • Increased communication ? Higher entropy !!!

18
Summary
  • Randomized Policies in Multiagent MDP settings
  • Developed NLP to maximize weighted entropy with
    reward and communication constraints.
  • Provided transformation algorithm to explicitly
    reason about communication actions.
  • Showed that communication increases security.

19
  • Thank You
  • Any Questions ???
Write a Comment
User Comments (0)
About PowerShow.com