Title: An empirical investigation on quantifying the competitive advantage of an agent learning models abou
1An empirical investigation on quantifying the
competitive advantage of an agent learning models
about other agents in a multiagent environment
A dissertation presented by Leonardo Garrido
2Contents
- Introduction
- Experimental framework
- Probabilistic modeling approach
- Other learning approaches
- Conclusions
3Introduction
- Motivation
- Modeling other agents
- Research hypothesis
- Our research path
4Motivation
- Multiagent systems are usually complex and
dynamic systems. - Specially when agents are autonomous,
heterogeneous, and individual preferences and
strategies are kept private. - Important agents beliefs are precisely beliefs
about the other agents.
5Modeling other agents
- Observe the other agents behavior in the
multiagent environment. - Learn in an iterative and incremental way
internal models about the others. - Predict their next behavior based on these
models. - Behave in a rational way based on the predictions.
6Research hypothesis
In a multiagent system, an agent with
capabilities of modeling other agents (i.e.
learning and using internal models about the
others) using a probabilistic approach must be
able to gain significative competitive advantage.
7Our research path
- The empirical lower-limit has been established by
the indifferent agent. - The empirical upper-limit has been established by
the oracle agent. - For comparison purposes, we have also developed
other non-modeling agents, such as the
self-interested and collaborative agents. - We developed a semi-modeler agent who is capable
of exploit probabilistic models about the others. - Our bayesian-modeler agent uses a combined
technique of bayesian and decision-theoretic
approaches for exploiting and learning the models
about the others. - We have also developed other modeler agents using
reinforcement learning techniques, such as the
reinforcement-modeler agent. - We have even compare under this experimental
framework the performance of human beings
modeling the other agents.
8Our experimental framework
- The Meeting Scheduling Game
- Non-modeling basic strategies
- Other non-modeling strategies
- Experiments set up
- Experiments Low- and upper-limits
- Closing remarks
9The Meeting Scheduling GameGeneral features
- To allow self-interested and cooperative
behavior. - To show or hide agents private information.
- To define different agent roles and strategies.
- To evaluate the advantage of modeling other
agents. - To evaluate different modeling mechanisms.
10The Meeting Scheduling Game Description
- Each game is composed of a series of rounds.
- At the beginning of the game, each agent is
initialized with a role and a strategy from
predefined sets of roles and strategies. - Each agent has a calendar composed of eight slots
and it is randomly scrambled after each round
with a predefined calendar density (the
percentage of occupied calendar slots in the
calendars). - At each round, each agent simultaneously proposes
a slot according to his own role/strategy and his
calendar availability. - At the end of the game, the agent with the
highest accumulated points wins.
11The Meeting Scheduling GameRoles and strategies
- Strategies are rules that tell the agent what
actions to choose at each decision point.
Strategies can take into account only the own
preference profile or even can use models about
the others. - A role is defined by a preference profile which
is coded as a calendar slot utility function,
ranking each slot from the most preferable slot
to the least preferable one. For example - Early-rising It prefers the early hours of the
day. - Night-owl It prefers meetings as late as
possible. - Medium It prefers hours around noon.
- Extreme It prefers to have meeting early in the
morning or late in the afternoon.
12The Meeting Scheduling GameCalendar slot
utilities examples
- Considering calendars of eight slots, the
calendar slot utility function for each role
would be - Early-rising
- Us (s0,8), (s1,7), (s2,6), (s3,5), (s4,4),
(s5,3), (s6,2), (s7,1) - Night-owl
- Us (s0,1), (s1,2), (s2,3), (s3,4), (s4,5),
(s5,6), (s6,7), (s7,8) - Medium
- Us (s0,2), (s1,4), (s2,6), (s3,8), (s4,7),
(s5,5), (s6,3), (s7,1) - Extreme
- Us (s0,1), (s1,3), (s2,5), (s3,7), (s4,8),
(s5,6), (s6,4), (s7,2)
13The Meeting Scheduling Game Description
- After each round
- Several teams are formed.
- Each team is composed of all those agents who
proposed the same slot. - Each team joint utility (TJU) is calculated
- The winning team is selected (with the highest
TJU). - Each agent accumulates points according to the
scoring procedure.
14The Meeting Scheduling GameJoint utilities and
winning teams
- After each round, a Team Joint Utility TJU is
calculated for each team t summing up all the
team members calendar utilities at the slot st
chosen by the team - TJU(t) ? ?m?t Um(st)
- The winning team is the one with the highest TJU.
15The Meeting Scheduling GameScoring procedures
- All the players outside the winning team
accumulate zero points. - Each agent a in the winning team t with slot st
accumulates points according to one of the
following scoring procedures - Individual scoring to accumulate just his
individual contribution to the team Ua(st) - Group scoring to accumulate the team joint
utility TJU(t) - Mixed scoring to accumulate his own slot utility
plus the TJU, that is TJU(t) Ua(st)
16Non-modeling basic strategiesIndifferent and
oracle
- The indifferent strategy
- Proposes a slot using a uniform equiprobable
distribution. - The oracle strategy
- Knows the others calendars, roles, and
strategies. - Sees in advance the others choices.
- For each free slot s in his calendar,
- Finds the agent m who would earn the maximum gain
Gm(s) among the rest of the players. - It calculates its utility U(s) Go(s) - Gm(s)
- Proposes the slot with the highest utility arg
max s U(s)
17Other non-modeling strategies Self-centered and
collaborative
- The self-centered strategy
- Proposes the slot which maximizes its own
calendar utility - The team-centered strategy
- Proposes the slot that was proposed by the
biggest team (greatest number of members) at the
previous round. - In case of ties, ranks the slots according to its
own calendar utility
18Experiments set up
- Experimental scenarios set of different but
related experiments. - An experiment is a series of plays (500) of the
MSG. - Each game is composed of ten rounds.
- At the beginning of each game, each agent starts
with a random role - We use the mixed scoring procedure.
- After each round, each agent calendar is randomly
reset to a fixed calendar density at 50.
19ExperimentsEstablishing low- and upper-limits
- Goal To compare the performance of the basic
non-modeling strategies. - Indifferent strategy is always the worst one.
- Unexpected result Self-centered strategy
outperformed the team-centered one. - Hypothesis The team-centered strategy could be
better, if we increased the number of players.
20ExperimentsEstablishing low- and upper-limits
- Goal To investigate the performance of the
self-centered strategy and team-centered one
increasing the number of players. - Team-centered strategy indeed starts to
outperform the self-centered when we increase the
number of players. - Team-centered agents tend to team each other
outperforming the others.
21ExperimentsEstablishing low- and upper-limits
- Goal To evaluate the performance of the
indifferent and oracle strategies. - Clearly, the indifferent agent has the worst
performance, getting the empirical lower-limit. - The oracle strategy clearly outperforms the other
strategies, getting the empirical upper-limit. - Although it could be expected a higher oracles
performance, it is reasonable because the oracle
agent can not always win due to the random
calendars and, some times, he match games due to
collaborative feature of the MSG.
22Closing remarks
- We explored a collection of initial reference
points for the characterization of the modeler
agents performance. - The indifferent and oracle strategies provide the
extremes of the spectrum, ranging from the least-
to most-informed one. - The self-centered and team-centered agents where
used as another couple of fixed non-modeling
strategies for comparing and situating the
empirical lower- and upper-limits.
23Our probabilistic modeling approach
- Probabilistic model representation
- Exploiting models about the others
- Experiments refining the lower- and upper-limits
- Learning models about the others
- Experiments situating our modeler agent
- Closing remarks
24Probabilistic model representation
- Basic models
- Vectors recording a probability distribution of
the actual character of the modeled agent - Role model
- Strategy model
- Combined model
- Two-dimensional matrix where each element is
based on the basic models - Personality model
25Exploiting models about the othersThe
semi-modeler strategy
- Start with predefined and static role and
strategy models about the others. - For each agent a, generate his personality model
rsa and generate a set O with all the possible
opponent scenarios that the semi-modeler could
face. Each scenario, o?O, is a combination of
possible personalities of the other agents. - For each possible scenario o?O, incrementally
accumulate the expected utility of the slot
proposed so under that possible scenario EU(so) - Propose the slot with the maximum expected
utility arg max so EU(so)
26Exploiting models about the othersThe
semi-modeler strategy
- For each possible scenario o?O
- Assuming that this scenario o represents the
actual personalities of the other agents, run the
oracle strategy in order to get the best slot to
propose so and its utility U(so) under this
assumption. Let us call r the outcome due to
action of choosing slot so. - Calculate the probability P(r so), just the
product of the corresponding probabilities in
each agent personality model involved in this
scenario. - The utility of this outcome U(r) is precisely the
utility U(so) obtained in the previous step. - In order to incrementally get the expected
utility of so - EU(so) ? ?i P(ri so) U(ri)
- Calculate the product P(ri so) U(ri) and
accumulate it to previous products in other
previous possible scenarios where the slot so had
been chosen.
27ExperimentsRefining the lower- and upper-limits
- Goal To compare the performance of the
semi-modeler strategy with different fixed models
about the others. - The semi-modeler strategy with static correct
models clearly outperforms the other strategies. - A semi-modeler strategy with static equiprobable
models is lower and about 50. - A semi-modeler strategy with opposite models is
the worst performance outperformed by the
self-centered.
28Learning models about the others The
bayesian-modeler strategy
- After the first round, start with equiprobable
models about the others, run the semi-modeler
strategy and propose the resulting slot. - At the next round, for each other agent a
- Observe his previous proposal and update his
personality model using a bayesian updating
mechanism. - Decompose the updated personality model in order
to build two new separated role and strategy
models. - Using the new updated models, run the
semi-modeler strategy and propose the the slot sm
with the maximum expected utility. - If it was the last round, the game is over.
Otherwise go to the second step.
29Learning models about the others The
bayesian-modeler strategy
- At the next round, for each other agent a
- Observe his previous proposal sa and update his
personality model rsa using a bayesian updating
mechanism to obtain the corresponding posterior
probabilities of agent as personality pera(i,j)
given that this agent a proposed that slot proa
(sa) in the previous round - rsa(i,j) P(pera(i,j) proa (sa))
- Decompose the updated as personality model in
order to build two new separated role and
strategy models. That is, update each element in
ra and sa - ra(i) ? ?i rsa(i,j) and sa(j) ? ?i rsa(i,j)
30Learning models about the others The
bayesian-modeler strategy
- The model-updating mechanism is based on the well
known Bayes rule - Considering multi-valued random variabels, the
basic probability axioms, and algebra we know
that - Thus, the equation used to update each
personality model is
31ExperimentsSituating our modeler agent
- Goal To evaluate the performance of the
bayesian-modeler strategy with the oracle and
semi-modeler modeling strategies. - The bayesian-modeler performance is better than
any semi-modeler one - The bayesian-modeler performance is close to the
oracle one. - Question 1 Why the bayesian-modeler performance
is not as good as the oracle one? - Question 2 How fast the bayesian-modeler can
learn the models about the others?
32Experiments Situating our modeler agent
- Goal To evaluate the performance of the
bayesian-modeler strategy varying the number of
rounds needed to learn. - Clearly the bayesian-modeler performance improves
as the number of rounds increases. - After 13 rounds, its performance does not improve
very much but it is already very close to the
oracle performance.
33Experiments Situating our modeler agent
- Goal To evaluate the performance of the
bayesian-modeler strategy when it faces
unreliable agents (after 20 rounds). - Lying agents impostors agents that broadcast a
false personality (the opposite one) at the
beginning of the game. - Fickle agents Unstable agents that start with an
unknown personality and randomly changes it at
the middle of the game.
34Closing remarks
- The modeling mechanism used by the
bayesian-modeler has two main advantages - The decision-theoretic approach chooses the
rational decision at each round of the game
maximizing his advantage with respect to the most
dangerous opponent. - The bayesian updating mechanism is capable of
building models about the others in an iterative
and incremental way after each round.
Furthermore, it also can correctly rebuild the
models about the others, if the others
personalities (roles and strategies) dynamically
change during the game.
35Graphical summaryThe bayesian approach
36Other learning approaches
- Machine-based learning approaches
- Reinforcement-learning strategies
- Reinforcement-modeler strategy
- Experiments situating reinforcement approaches
- Closing remarks
37Machine-based learning approaches
- Reinforcement learning techniques may be used
without taking into account the behavior of the
other agents (the myslotvalues-learner strategy) - These techniques may also be useful to model the
behavior of the other agents without considering
explicit cognitive models about the others
(theirslotvalues-learner strategy). - These techniques may be even more useful if they
try to learn models about the others
(reinforcement-modeler strategy).
38Reinforcement learningThe myslotvalues-learner
strategy
- Start first round with a slot-value vector v
initialized with zeros for each calendar slot v
(0, 0, 0, 0, 0, 0, 0, 0). Then choose a random
initial slot proposal s. - At the next round k, observe the points
accumulated by proposing the slot s in the
previous round. This is the reward rk to be used
in this round k. - Using reinforcement learning and the reward rk,
update the value s in the slot-value vector v - vk(s) vk-1(s) ? rk - vk-1(s)
- Choose a new slot s with a predefined slot
selection mechanism (e.g. ?-greedy) and propose
it in the current round. - If it was the last round, the game is over.
Otherwise go to the second step.
39Reinforcement learningThe theirslotvalues-learne
r strategy
- Start first round with a slot-value vector va
initialized with zeros for each other agent a va
(0, 0, 0, 0, 0, 0, 0, 0). Then choose a random
initial slot proposal s. - At the next round k, for each other agent a,
observe the others proposals sa and their
accumulated points ra,k , then update all the
slot-value vectors va - va,k(sa) va,k-1(sa) ? ra,k - va,k-1(sa)
- Assuming the other agents will choose their slots
for this round k using a greedy selection
mechanism based on the slot-value vectors va.
Then, using the oracle strategy, select a free
slot with the highest utility, trying to maximize
his gain in this round k. - If it was the last round, the game is over.
Otherwise go to the second step.
40Reinforcement modelingReinforcement models
representation
- Basic models
- Role-value and strategy-value vectors, recording
the estimated value of each agents role and
strategy - Role model
- Strategy model
- Combined model
- Two-dimensional matrix where each element is
based on the basic models - Personality model
41Reinforcement modelingThe reinforcement-modeler
strategy
- Start the first round with role-value and
strategy-value vectors (and the combined
personality-value vectors) for every other agent
initialized with zeros. Then choose an initial
slot proposal using the semi-modeler strategy. - At the next round k, for each other agent a,
update their role-value and strategy-value
vectors using a reinforcement learning mechanism. - Using the new updated models about the others and
using the semi-modeler strategy, propose the slot
sm with the maximum expected utility - If it was the last round, the game is over.
Otherwise go to the second step.
42Reinforcement modelingThe reinforcement-modeler
strategy
- At the next round k, for each other agent a
- Observe the others proposals sa and the teams
formed in the previous round. - Calculate the rewards ra,k,i,j in this step k for
each possible personality rsa(i,j) of agent a. - Update each as possible personality value in his
vector rsa - rsa,k(i,j) rsa,k-1(i,j) ? ra,k,i,j -
rsa,k-1(i,j) - Decompose the new updated as personality model
in order to build two new separated role and
strategy models.
43ExperimentsSituating reinforcement approaches
- Goal To evaluate the myslotvalues-learner
strategy with different slot selection
mechanisms. - The myslotvalues-learner performance is very bad
and increasing the epsilon value it is worse. - The classic exploration-exploitation trade-off is
clearly present here. - It shows that the performance is better with a
totally greedy selection slot mechanism. - The MSG density provides an extra exploration
phase in this case. - Question What if we let the agent learn during
more rounds?
44ExperimentsSituating reinforcement approaches
- Goal To evaluate the myslotvalues-learner
strategy when increasing the number of rounds. - In this case, we fixed the slot selection
mechanism at epsilon0.1 - Unexpectedly, instead of increasing, the
myslotvalues-learner performance decreases when
we increases the number of rounds. - Question What if the agent try to learn the slot
values of the other agents?
45ExperimentsSituating reinforcement approaches
- Goal To evaluate the theirslotvalues-learner
strategy. - We confirm our expectation of getting better
results with this strategy. - However it is still bad and it is outperformed by
the self-centered one. - Here we can observe the same effect of the
exploration-exploitation trade-off observed
before when increasing the epsilon value. - Question What if we again increase the number of
rounds now with this strategy?
46ExperimentsSituating reinforcement approaches
- Goal To evaluate the theirslotvalues-learner
strategy when increasing the number of rounds. - Now we fixed the totally greedy slot selection
mechanism in these experiments. - As happened with the previous strategy the
performance decreases (instead of increasing)
when we increase the number of rounds. - Question What if the learner try to learn the
personality-value vectors of the other agents?
47ExperimentsSituating reinforcement approaches
- Goal To evaluate the reinforcement-modeler
strategy varying the slot selection mechanism. - Finally this is a good performance. This strategy
always outperforms the other two. - We still observe the exploration-exploitation
trade-off. - Question Would this performance vary when
increasing the number of rounds?
48ExperimentsSituating reinforcement approaches
- Goal To evaluate the reinforcement-modeler
strategy varying the number of rounds. - As in the case of the bayesian-modeler strategy,
this one also increases with the number of
rounds. - After 13 rounds, its performance does not improve
very much but it is already very close to the
bayesian-modeler performance.
49Closing remarks
- The reinforcement-modeler strategy is a hybrid
approach - The same decision-theoretic approach used by the
semi-oracle agent. - A new learning technique using RL, instead of the
bayesian one, without using nor computing
explicit probabilities. - Although preliminary, the experiments with humans
showed encouraging results for further research
on the hybridization of human-agent modelers.
50Graphical summaryThe bayesian and reinforcement
approaches
51Conclusions
- Changes
- Generality
- Aplicability
- Main contributions
- Future research work
52Last changes
- Introduction of the communication concept and
its relationship with other concepts in chapter
2. - Contrast and comparison with the other related
work presented in chapter 7. - Explicit state space representation in RL agents
in chapter 5. - Change of the name of collaborative agents to
team-centered agents through the whole
document. - Explicit mention of the generality of the
framework in chapter 6. - Explicit set up of experiments with human beings
in chapter 5. - Explicit explanation of the computation of the
personality models, specially in the RL part in
chapter 4 and 5. - Detail of book shopper example using the
bayesian-modeler agent in chapter 6.
53General framework
- Our experimental framework is a general frame of
reference for the assessment of modeling
approaches - The empirical upper-limit established by the
oracle strategy is clearly the optimal limit when
evaluating modeling strategies because it indeed
has the correct models about the others. - Although the lower-limit established by the
indifferent strategy may be broken by other
possible strategies, these can not be seen as
reasonable ones. - The other middle-limits established by the
semi-modeler strategy using different static
models provide different sub-limits from totally
wrong or opposite models to the complete correct
or real models.
54Applicability of our approach
- Clearly, we would choose the bayesian modeling
approach than the reinforcement one because it
converges faster and in a more accurate way to
the correct models about the others computing
their explicit probabilities. - However, if we are able to do this would depend
on two basic issues - The features of the domain.
- The tractability of the computation.
55Applicability of the bayesian-modelerRequirement
s
- To identify a finite set of possible
personalities to be modeled over the other agents
(such as the roles and strategies in the MSG). - To identify a finite set of possible behaviors to
be observed in the other agents. - To be able to have in advance or to compute the
conditional probabilities of the possible
observations given the possible personalities. - To have independent new pieces of evidence to
compute the new posterior probabilities.
56An example modeling book shopping
- A book shopper may have different shopping
personalities. For instance The poor shopper,
the rich one, the novel reader, the Christmas
shopper, etc. - Their possible observed behaviors are clearly
differentiated by the features of the book they
buy (e.g. author, subject), the season when they
buy and the amount of money they spend. - It is possible to compute conditional
probabilities of these behaviors given each
possible personalities. - Each new shopping can be considered as
independent events.
57Main contributions
- An empirical framework for the assessment of
modeler agents. We created a set of basic
non-modeling strategies to establish the lower-
and upper-limits and other middle-limits. Then,
using this frame of reference and a systematic
and detailed empirical research methodology we
have evaluated the performance of our
probabilistic strategy and compared it with
other different learning approaches. The results
have showed that the performance of the
bayesian-modeler strategy is very closed to the
upper-limit, outperforming other agents using
different learning and modeling strategies.
58Main contributions
- The design and development of the
bayesian-modeler agent which uses a totally
probabilistic approach, combining concepts of
utility, decision and probability theories which
are certainly well founded and guarantee to
converge to the correct models. The
decision-theoretic module always chooses a
rational decision at each round of the game,
maximizing the agents expected utility. The
bayesian learning module is capable of building
and updating models about the others in an
iterative and incremental way after each round.
Furthermore, it is robust enough to correctly
rebuild the models, if they change dynamically
during the process.
59Main contributions
- A computational multiagent testbed, the
Multiagent Meeting Scheduling Game, for doing
research in multiagent systems. This is a
non-zero sum game of incomplete information that
is mainly competitive but it has collaborative
features. Furthermore, it is flexible enough for
doing experimental research because is possible
to easily run different kinds of experiments
varying several control variables in a gradual
way, such as the number of agents, kinds of
roles and strategies, the randomness of the
environment, the size of the calendars
personalities, and the publicity of the involved
information.
60Future research work
- Migration to other domains and learning other
traits. In this work the modeler strategy model
the others roles and strategies but in other
domains could be interesting to models other
traits or personalities, such as goals or plans.
As we explained before, it is possible to migrate
our modeling approach to other domains, such as
the book shopping domain. There are other
distributed problems with similar competitive and
collaborative features in the multiagent research
agenda such as the robotic soccer domain where we
think could be possible to empirically explore
the use of our approach.
61Future research work
- Integration of bayesian or belief networks.
Although we used the recursive bayesian mechanism
for learning the others models, we did not use
bayesian networks in this work. Basically we use
the probability distribution and the conditional
probabilities to compute the joint distribution
and then update the posterior probabilities using
the recursive bayesian mechanism. However, this
can be intractable and the use of bayesian
networks may be the solution since they sidestep
the joint and one of the basic tasks of these
probabilistic reasoning systems is precisely the
computation of the posterior probability
distribution.
62Future research work
- Collaborative environments and coalition
formation. In contrast of special focus on the
competitive advantage of modeling other agents,
the modeling-other-agents task can prove to be
useful in collaborative problems, given possibly
more efficient coordination mechanisms of giving
the ability of allowing coordination without
communication for example. - Concurrent multiagent modeling. Instead of having
only one modeler agent the research agenda
expands if we allow many modeler agents modeling
each other in a concurrent way. This includes to
address the problem of modeling nested models.
63Future research work
- Hibridization of human and modeler agents. Our
experiments have also given some although
preliminary but encouraging results for further
research on how can be humans benefited by
modeler artificial agents. This approach has to
be with the human-computer interaction and multi
agent-human collaboration research agendas. - The multiagent testbed that we have designed and
developed is a flexible one for further MAS
research. However, it is necessary to refine and
detail the computational code and documentation
in order to have a clear and easy-to-use release.