Title: Increasing Security through Communication and Policy Randomization in Multiagent Systems
1Increasing Security through Communication and
Policy Randomization in Multiagent Systems
- Praveen Paruchuri, Milind Tambe, Fernando Ordonez
- University of Southern California
- Sarit Kraus
- Bar-Ilan University,Israel
- University of Maryland, College Park
2Motivation The Prediction Game
- An UAV (Unmanned Aerial Vehicle)
- Flies between the 4 regions
- Can you predict the UAV-fly pattern ??
- Pattern 1
- 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,
- Pattern 2
- 1, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3, (as
generated by 4-sided dice) - Can you predict if 100 numbers in pattern 2 are
given ?? - Randomization decreases Predictability
- Increases Security
3Problem Definition
- Problem Increase security by decreasing
predictability for agent-team acting in
adversarial environments. - Even if Policy Given, it is Secure
- Environment is stochastic and observable
(MDP-based) - Communication is a limited
- Efficient Algorithms for Reward/Randomization/Comm
unication Tradeoff
4Assumptions
- Assumptions for agent-team
- Adversary is unobservable
- Adversarys actions/capabilities or payoffs are
unknown - Communication is encrypted (safe)
- Assumptions for Adversary
- Knows the agents plan/policy
- Exploits action predictability
- Can see the agents state
5Solution Technique
- Technique developed
- Intentional policy randomization
- CMDP based framework
- Sequential Decision Making
- Limited Communication Resources
- CMDP ? Constrained Markov Decision Process
- Increase Security gt Solve Multi-criteria problem
for agents - Maximize action unpredictability (Policy
randomization) - Maintain reward above threshold (Quality
constraints) - Communication usage below threshold (Resource
constraints)
6Domains
- Scheduled activities at airports like security
check, refueling etc - Can be observed by adversaries
- Randomization of schedules helpful
- UAV-team patrolling humanitarian mission
- Adversary disrupts mission Can disrupt food,
harm refugees, shoot down UAVs etc - Randomize UAV patrol policy
7Our Contributions
- Randomized policies for Multi-agent CMDP (MCMDP)
- Solve Miscoordination
- Randomized polices in team settings
- Policy not implementable!
- (Reward constraint gets violated)
Communication Resource lt Threshold
Expected Team Reward gt Threshold
Maximize Policy Randomization
8Miscoordination Effect of Randomization
- Meeting tomorrow
- 9am 40, 10am 60
- Communicate to coordinate
- Limited Communication
Should have been 0 (Violates Threshold Rewards)
9Communication Issue
- Generate Randomized Implementable policies
- Limited communication
- Problem of communication
- M coordination points
- N units of communication
- Generate best communication policy
- Communication policy can also be randomized
- Transform MCMDP to implementable MCMDP
- Solution algorithm for transformed MCMDP
10MCMDP Formally Defined
- An MCMDP (for a 2 agent case) is a tuple
- ltS,A,P,R, C1,C2, T1,T2, N,Qgt where,
- S,A,R Joint states, actions, rewards
- P Transition function
- C1 - Cost vector for resource k
- T1 - Threshold on expected resource k
consumption. - N - Joint communication cost vector
- Q - Threshold on communication costs
- Basic terms used
- x(s,a) Expected times action a is taken in
state s - Policy (as function of x)
11Entropy Measure of randomness
- Randomness or information content quantified
using Entropy ( Shannon 1948 ) - Entropy for CMDP -
- Additive Entropy Add entropies of each state
- Weighted Entropy Weigh each state by it
contribution to total flow - where alpha_j is the initial flow of the system
12Issue 1 Randomized Policy Generation
- Non-linear Program Max entropy, Reward above
threshold, Communication below threshold - Obtains required randomization
- Appends communication for every action
- Issue 2 Generate the Communication Policy
13Issue 2 Transformed MCMDP
a1b1
a1C
a1b2
a1b1
a1o
S1
a1b2
S1
a2b1
a2C
a2b2
a2o
a2b1
a2b2
For each state, for each joint action,
Introduce C (communication) and NC for different
individual action, add corresponding new states
Transition between original and new states
Transitions between new states and original
target states
14Non-linear Constraints
- Need to introduce non-linear constraints
- For each original state
- For each new state introduced by no communication
action - Conditional probability of corresponding actions
equal - Ex P(b1/ ) P(b1/ )
- P(b2/ ) P(b2/ )
- , - Observable, Reached by Comm
action - , - Unobservable, No Comm
action
15Non-Linear constraints Handling Miscoordination
- Agent B has no hint of state if NC actions.
- Necessity to make its actions independent of
source state. - Probability of action b1 from state
should equal probability of same action (i.e b1)
from . - Meeting scenario
- Irrespective of agent As plan
- If agent Bs plan is 20 9am 80 10am
- B is independent of A
- Miscoordination avoided ? Actions independent
of state.
16Experimental Results
Z-axis
Y axis
X-axis
17Experimental Conclusions
- Reward Threshold decreases gt Entropy increases
- Communication increases gt Agents coordinate
better - Coordination invisible to adversary
- Agents coordinate better to fool the adversary
- Increased communication ? Higher entropy !!!
18Summary
- Randomized Policies in Multiagent MDP settings
- Developed NLP to maximize weighted entropy with
reward and communication constraints. - Provided transformation algorithm to explicitly
reason about communication actions. - Showed that communication increases security.
19-
-
- Thank You
- Any Questions ???