Increasing Security through Communication and Policy Randomization in Multiagent Systems - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Increasing Security through Communication and Policy Randomization in Multiagent Systems

Description:

University of Maryland, College Park. 2. University of Southern California ... Add entropies of each state ... Agent B has no hint of state if NC actions. ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 20

Provided by: Milind8

Category:

more less

Transcript and Presenter's Notes

Title: Increasing Security through Communication and Policy Randomization in Multiagent Systems

1
Increasing Security through Communication and
Policy Randomization in Multiagent Systems

Praveen Paruchuri, Milind Tambe, Fernando Ordonez
University of Southern California
Sarit Kraus
Bar-Ilan University,Israel
University of Maryland, College Park

2
Motivation The Prediction Game

An UAV (Unmanned Aerial Vehicle)
Flies between the 4 regions
Can you predict the UAV-fly pattern ??
Pattern 1
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,
Pattern 2
1, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3, (as
generated by 4-sided dice)
Can you predict if 100 numbers in pattern 2 are
given ??
Randomization decreases Predictability
Increases Security

3
Problem Definition

Problem Increase security by decreasing
predictability for agent-team acting in
adversarial environments.
Even if Policy Given, it is Secure
Environment is stochastic and observable
(MDP-based)
Communication is a limited
Efficient Algorithms for Reward/Randomization/Comm
unication Tradeoff

4
Assumptions

Assumptions for agent-team
Adversary is unobservable
Adversarys actions/capabilities or payoffs are
unknown
Communication is encrypted (safe)
Assumptions for Adversary
Knows the agents plan/policy
Exploits action predictability
Can see the agents state

5
Solution Technique

Technique developed
Intentional policy randomization
CMDP based framework
Sequential Decision Making
Limited Communication Resources
CMDP ? Constrained Markov Decision Process
Increase Security gt Solve Multi-criteria problem
for agents
Maximize action unpredictability (Policy
randomization)
Maintain reward above threshold (Quality
constraints)
Communication usage below threshold (Resource
constraints)

6
Domains

Scheduled activities at airports like security
check, refueling etc
Can be observed by adversaries
Randomization of schedules helpful
UAV-team patrolling humanitarian mission
Adversary disrupts mission Can disrupt food,
harm refugees, shoot down UAVs etc
Randomize UAV patrol policy

7
Our Contributions

Randomized policies for Multi-agent CMDP (MCMDP)
Solve Miscoordination
Randomized polices in team settings
Policy not implementable!
(Reward constraint gets violated)

Communication Resource lt Threshold
Expected Team Reward gt Threshold
Maximize Policy Randomization
8
Miscoordination Effect of Randomization

Meeting tomorrow
9am 40, 10am 60
Communicate to coordinate
Limited Communication

Should have been 0 (Violates Threshold Rewards)
9
Communication Issue

Generate Randomized Implementable policies
Limited communication
Problem of communication
M coordination points
N units of communication
Generate best communication policy
Communication policy can also be randomized
Transform MCMDP to implementable MCMDP
Solution algorithm for transformed MCMDP

10
MCMDP Formally Defined

An MCMDP (for a 2 agent case) is a tuple
ltS,A,P,R, C1,C2, T1,T2, N,Qgt where,
S,A,R Joint states, actions, rewards
P Transition function
C1 - Cost vector for resource k
T1 - Threshold on expected resource k
consumption.
N - Joint communication cost vector
Q - Threshold on communication costs
Basic terms used
x(s,a) Expected times action a is taken in
state s
Policy (as function of x)

11
Entropy Measure of randomness

Randomness or information content quantified
using Entropy ( Shannon 1948 )
Entropy for CMDP -
Additive Entropy Add entropies of each state
Weighted Entropy Weigh each state by it
contribution to total flow
where alpha_j is the initial flow of the system

12
Issue 1 Randomized Policy Generation

Non-linear Program Max entropy, Reward above
threshold, Communication below threshold
Obtains required randomization
Appends communication for every action
Issue 2 Generate the Communication Policy

13
Issue 2 Transformed MCMDP
a1b1
a1C
a1b2
a1b1
a1o
S1
a1b2
S1
a2b1
a2C
a2b2
a2o
a2b1
a2b2
For each state, for each joint action,
Introduce C (communication) and NC for different
individual action, add corresponding new states
Transition between original and new states
Transitions between new states and original
target states
14
Non-linear Constraints

Need to introduce non-linear constraints
For each original state
For each new state introduced by no communication
action
Conditional probability of corresponding actions
equal
Ex P(b1/ ) P(b1/ )
P(b2/ ) P(b2/ )
, - Observable, Reached by Comm
action
, - Unobservable, No Comm
action

15
Non-Linear constraints Handling Miscoordination

Agent B has no hint of state if NC actions.
Necessity to make its actions independent of
source state.
Probability of action b1 from state
should equal probability of same action (i.e b1)
from .
Meeting scenario
Irrespective of agent As plan
If agent Bs plan is 20 9am 80 10am
B is independent of A
Miscoordination avoided ? Actions independent
of state.

16
Experimental Results
Z-axis
Y axis
X-axis
17
Experimental Conclusions

Reward Threshold decreases gt Entropy increases
Communication increases gt Agents coordinate
better
Coordination invisible to adversary
Agents coordinate better to fool the adversary
Increased communication ? Higher entropy !!!

18
Summary

Randomized Policies in Multiagent MDP settings
Developed NLP to maximize weighted entropy with
reward and communication constraints.
Provided transformation algorithm to explicitly
reason about communication actions.
Showed that communication increases security.