An empirical investigation on quantifying the competitive advantage of an agent learning models abou

About This Presentation

Title:

An empirical investigation on quantifying the competitive advantage of an agent learning models abou

Description:

At the end of the game, the agent with the highest ... The Meeting Scheduling Game: Calendar slot utilities examples ... For each free slot s in his calendar, ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 64

Provided by: SCS951

Category:

more less

Transcript and Presenter's Notes

Title: An empirical investigation on quantifying the competitive advantage of an agent learning models abou

1
An empirical investigation on quantifying the
competitive advantage of an agent learning models
about other agents in a multiagent environment
A dissertation presented by Leonardo Garrido
2
Contents

Introduction
Experimental framework
Probabilistic modeling approach
Other learning approaches
Conclusions

3
Introduction

Motivation
Modeling other agents
Research hypothesis
Our research path

4
Motivation

Multiagent systems are usually complex and
dynamic systems.
Specially when agents are autonomous,
heterogeneous, and individual preferences and
strategies are kept private.
Important agents beliefs are precisely beliefs
about the other agents.

5
Modeling other agents

Observe the other agents behavior in the
multiagent environment.
Learn in an iterative and incremental way
internal models about the others.
Predict their next behavior based on these
models.
Behave in a rational way based on the predictions.

6
Research hypothesis

In a multiagent system, an agent with
capabilities of modeling other agents (i.e.
learning and using internal models about the
others) using a probabilistic approach must be
able to gain significative competitive advantage.
7
Our research path

The empirical lower-limit has been established by
the indifferent agent.
The empirical upper-limit has been established by
the oracle agent.
For comparison purposes, we have also developed
other non-modeling agents, such as the
self-interested and collaborative agents.
We developed a semi-modeler agent who is capable
of exploit probabilistic models about the others.
Our bayesian-modeler agent uses a combined
technique of bayesian and decision-theoretic
approaches for exploiting and learning the models
about the others.
We have also developed other modeler agents using
reinforcement learning techniques, such as the
reinforcement-modeler agent.
We have even compare under this experimental
framework the performance of human beings
modeling the other agents.

8
Our experimental framework

The Meeting Scheduling Game
Non-modeling basic strategies
Other non-modeling strategies
Experiments set up
Experiments Low- and upper-limits
Closing remarks

9
The Meeting Scheduling GameGeneral features

To allow self-interested and cooperative
behavior.
To show or hide agents private information.
To define different agent roles and strategies.
To evaluate the advantage of modeling other
agents.
To evaluate different modeling mechanisms.

10
The Meeting Scheduling Game Description

Each game is composed of a series of rounds.
At the beginning of the game, each agent is
initialized with a role and a strategy from
predefined sets of roles and strategies.
Each agent has a calendar composed of eight slots
and it is randomly scrambled after each round
with a predefined calendar density (the
percentage of occupied calendar slots in the
calendars).
At each round, each agent simultaneously proposes
a slot according to his own role/strategy and his
calendar availability.
At the end of the game, the agent with the
highest accumulated points wins.

11
The Meeting Scheduling GameRoles and strategies

Strategies are rules that tell the agent what
actions to choose at each decision point.
Strategies can take into account only the own
preference profile or even can use models about
the others.
A role is defined by a preference profile which
is coded as a calendar slot utility function,
ranking each slot from the most preferable slot
to the least preferable one. For example
Early-rising It prefers the early hours of the
day.
Night-owl It prefers meetings as late as
possible.
Medium It prefers hours around noon.
Extreme It prefers to have meeting early in the
morning or late in the afternoon.

12
The Meeting Scheduling GameCalendar slot
utilities examples

Considering calendars of eight slots, the
calendar slot utility function for each role
would be
Early-rising
Us (s0,8), (s1,7), (s2,6), (s3,5), (s4,4),
(s5,3), (s6,2), (s7,1)
Night-owl
Us (s0,1), (s1,2), (s2,3), (s3,4), (s4,5),
(s5,6), (s6,7), (s7,8)
Medium
Us (s0,2), (s1,4), (s2,6), (s3,8), (s4,7),
(s5,5), (s6,3), (s7,1)
Extreme
Us (s0,1), (s1,3), (s2,5), (s3,7), (s4,8),
(s5,6), (s6,4), (s7,2)

13
The Meeting Scheduling Game Description

After each round
Several teams are formed.
Each team is composed of all those agents who
proposed the same slot.
Each team joint utility (TJU) is calculated
The winning team is selected (with the highest
TJU).
Each agent accumulates points according to the
scoring procedure.

14
The Meeting Scheduling GameJoint utilities and
winning teams

After each round, a Team Joint Utility TJU is
calculated for each team t summing up all the
team members calendar utilities at the slot st
chosen by the team
TJU(t) ? ?m?t Um(st)
The winning team is the one with the highest TJU.

15
The Meeting Scheduling GameScoring procedures

All the players outside the winning team
accumulate zero points.
Each agent a in the winning team t with slot st
accumulates points according to one of the
following scoring procedures
Individual scoring to accumulate just his
individual contribution to the team Ua(st)
Group scoring to accumulate the team joint
utility TJU(t)
Mixed scoring to accumulate his own slot utility
plus the TJU, that is TJU(t) Ua(st)

16
Non-modeling basic strategiesIndifferent and
oracle

The indifferent strategy
Proposes a slot using a uniform equiprobable
distribution.
The oracle strategy
Knows the others calendars, roles, and
strategies.
Sees in advance the others choices.
For each free slot s in his calendar,
Finds the agent m who would earn the maximum gain
Gm(s) among the rest of the players.
It calculates its utility U(s) Go(s) - Gm(s)
Proposes the slot with the highest utility arg
max s U(s)

17
Other non-modeling strategies Self-centered and
collaborative

The self-centered strategy
Proposes the slot which maximizes its own
calendar utility
The team-centered strategy
Proposes the slot that was proposed by the
biggest team (greatest number of members) at the
previous round.
In case of ties, ranks the slots according to its
own calendar utility

18
Experiments set up

Experimental scenarios set of different but
related experiments.
An experiment is a series of plays (500) of the
MSG.
Each game is composed of ten rounds.
At the beginning of each game, each agent starts
with a random role
We use the mixed scoring procedure.
After each round, each agent calendar is randomly
reset to a fixed calendar density at 50.

19
ExperimentsEstablishing low- and upper-limits

Goal To compare the performance of the basic
non-modeling strategies.
Indifferent strategy is always the worst one.
Unexpected result Self-centered strategy
outperformed the team-centered one.
Hypothesis The team-centered strategy could be
better, if we increased the number of players.

20
ExperimentsEstablishing low- and upper-limits

Goal To investigate the performance of the
self-centered strategy and team-centered one
increasing the number of players.
Team-centered strategy indeed starts to
outperform the self-centered when we increase the
number of players.
Team-centered agents tend to team each other
outperforming the others.

21
ExperimentsEstablishing low- and upper-limits

Goal To evaluate the performance of the
indifferent and oracle strategies.
Clearly, the indifferent agent has the worst
performance, getting the empirical lower-limit.
The oracle strategy clearly outperforms the other
strategies, getting the empirical upper-limit.
Although it could be expected a higher oracles
performance, it is reasonable because the oracle
agent can not always win due to the random
calendars and, some times, he match games due to
collaborative feature of the MSG.

22
Closing remarks

We explored a collection of initial reference
points for the characterization of the modeler
agents performance.
The indifferent and oracle strategies provide the
extremes of the spectrum, ranging from the least-
to most-informed one.
The self-centered and team-centered agents where
used as another couple of fixed non-modeling
strategies for comparing and situating the
empirical lower- and upper-limits.

23
Our probabilistic modeling approach

Probabilistic model representation
Exploiting models about the others
Experiments refining the lower- and upper-limits
Learning models about the others
Experiments situating our modeler agent
Closing remarks

24
Probabilistic model representation

Basic models
Vectors recording a probability distribution of
the actual character of the modeled agent
Role model
Strategy model
Combined model
Two-dimensional matrix where each element is
based on the basic models
Personality model

25
Exploiting models about the othersThe
semi-modeler strategy

Start with predefined and static role and
strategy models about the others.
For each agent a, generate his personality model
rsa and generate a set O with all the possible
opponent scenarios that the semi-modeler could
face. Each scenario, o?O, is a combination of
possible personalities of the other agents.
For each possible scenario o?O, incrementally
accumulate the expected utility of the slot
proposed so under that possible scenario EU(so)
Propose the slot with the maximum expected
utility arg max so EU(so)

26
Exploiting models about the othersThe
semi-modeler strategy

For each possible scenario o?O
Assuming that this scenario o represents the
actual personalities of the other agents, run the
oracle strategy in order to get the best slot to
propose so and its utility U(so) under this
assumption. Let us call r the outcome due to
action of choosing slot so.
Calculate the probability P(r so), just the
product of the corresponding probabilities in
each agent personality model involved in this
scenario.
The utility of this outcome U(r) is precisely the
utility U(so) obtained in the previous step.
In order to incrementally get the expected
utility of so
EU(so) ? ?i P(ri so) U(ri)
Calculate the product P(ri so) U(ri) and
accumulate it to previous products in other
previous possible scenarios where the slot so had
been chosen.

27
ExperimentsRefining the lower- and upper-limits

Goal To compare the performance of the
semi-modeler strategy with different fixed models
about the others.
The semi-modeler strategy with static correct
models clearly outperforms the other strategies.
A semi-modeler strategy with static equiprobable
models is lower and about 50.
A semi-modeler strategy with opposite models is
the worst performance outperformed by the
self-centered.

28
Learning models about the others The
bayesian-modeler strategy

After the first round, start with equiprobable
models about the others, run the semi-modeler
strategy and propose the resulting slot.
At the next round, for each other agent a
Observe his previous proposal and update his
personality model using a bayesian updating
mechanism.
Decompose the updated personality model in order
to build two new separated role and strategy
models.
Using the new updated models, run the
semi-modeler strategy and propose the the slot sm
with the maximum expected utility.
If it was the last round, the game is over.
Otherwise go to the second step.

29
Learning models about the others The
bayesian-modeler strategy

At the next round, for each other agent a
Observe his previous proposal sa and update his
personality model rsa using a bayesian updating
mechanism to obtain the corresponding posterior
probabilities of agent as personality pera(i,j)
given that this agent a proposed that slot proa
(sa) in the previous round
rsa(i,j) P(pera(i,j) proa (sa))
Decompose the updated as personality model in
order to build two new separated role and
strategy models. That is, update each element in
ra and sa
ra(i) ? ?i rsa(i,j) and sa(j) ? ?i rsa(i,j)

30
Learning models about the others The
bayesian-modeler strategy

The model-updating mechanism is based on the well
known Bayes rule
Considering multi-valued random variabels, the
basic probability axioms, and algebra we know
that
Thus, the equation used to update each
personality model is

31
ExperimentsSituating our modeler agent

Goal To evaluate the performance of the
bayesian-modeler strategy with the oracle and
semi-modeler modeling strategies.
The bayesian-modeler performance is better than
any semi-modeler one
The bayesian-modeler performance is close to the
oracle one.
Question 1 Why the bayesian-modeler performance
is not as good as the oracle one?
Question 2 How fast the bayesian-modeler can
learn the models about the others?

32
Experiments Situating our modeler agent

Goal To evaluate the performance of the
bayesian-modeler strategy varying the number of
rounds needed to learn.
Clearly the bayesian-modeler performance improves
as the number of rounds increases.
After 13 rounds, its performance does not improve
very much but it is already very close to the
oracle performance.

33
Experiments Situating our modeler agent

Goal To evaluate the performance of the
bayesian-modeler strategy when it faces
unreliable agents (after 20 rounds).
Lying agents impostors agents that broadcast a
false personality (the opposite one) at the
beginning of the game.
Fickle agents Unstable agents that start with an
unknown personality and randomly changes it at
the middle of the game.

34
Closing remarks

The modeling mechanism used by the
bayesian-modeler has two main advantages
The decision-theoretic approach chooses the
rational decision at each round of the game
maximizing his advantage with respect to the most
dangerous opponent.
The bayesian updating mechanism is capable of
building models about the others in an iterative
and incremental way after each round.
Furthermore, it also can correctly rebuild the
models about the others, if the others
personalities (roles and strategies) dynamically
change during the game.

35
Graphical summaryThe bayesian approach
36
Other learning approaches

Machine-based learning approaches
Reinforcement-learning strategies
Reinforcement-modeler strategy
Experiments situating reinforcement approaches
Closing remarks

37
Machine-based learning approaches

Reinforcement learning techniques may be used
without taking into account the behavior of the
other agents (the myslotvalues-learner strategy)
These techniques may also be useful to model the
behavior of the other agents without considering
explicit cognitive models about the others
(theirslotvalues-learner strategy).
These techniques may be even more useful if they
try to learn models about the others
(reinforcement-modeler strategy).

38
Reinforcement learningThe myslotvalues-learner
strategy

Start first round with a slot-value vector v
initialized with zeros for each calendar slot v
(0, 0, 0, 0, 0, 0, 0, 0). Then choose a random
initial slot proposal s.
At the next round k, observe the points
accumulated by proposing the slot s in the
previous round. This is the reward rk to be used
in this round k.
Using reinforcement learning and the reward rk,
update the value s in the slot-value vector v
vk(s) vk-1(s) ? rk - vk-1(s)
Choose a new slot s with a predefined slot
selection mechanism (e.g. ?-greedy) and propose
it in the current round.
If it was the last round, the game is over.
Otherwise go to the second step.

39
Reinforcement learningThe theirslotvalues-learne
r strategy

Start first round with a slot-value vector va
initialized with zeros for each other agent a va
(0, 0, 0, 0, 0, 0, 0, 0). Then choose a random
initial slot proposal s.
At the next round k, for each other agent a,
observe the others proposals sa and their
accumulated points ra,k , then update all the
slot-value vectors va
va,k(sa) va,k-1(sa) ? ra,k - va,k-1(sa)
Assuming the other agents will choose their slots
for this round k using a greedy selection
mechanism based on the slot-value vectors va.
Then, using the oracle strategy, select a free
slot with the highest utility, trying to maximize
his gain in this round k.
If it was the last round, the game is over.
Otherwise go to the second step.

40
Reinforcement modelingReinforcement models
representation

Basic models
Role-value and strategy-value vectors, recording
the estimated value of each agents role and
strategy
Role model
Strategy model
Combined model
Two-dimensional matrix where each element is
based on the basic models
Personality model

41
Reinforcement modelingThe reinforcement-modeler
strategy

Start the first round with role-value and
strategy-value vectors (and the combined
personality-value vectors) for every other agent
initialized with zeros. Then choose an initial
slot proposal using the semi-modeler strategy.
At the next round k, for each other agent a,
update their role-value and strategy-value
vectors using a reinforcement learning mechanism.
Using the new updated models about the others and
using the semi-modeler strategy, propose the slot
sm with the maximum expected utility
If it was the last round, the game is over.
Otherwise go to the second step.

42
Reinforcement modelingThe reinforcement-modeler
strategy

At the next round k, for each other agent a
Observe the others proposals sa and the teams
formed in the previous round.
Calculate the rewards ra,k,i,j in this step k for
each possible personality rsa(i,j) of agent a.
Update each as possible personality value in his
vector rsa
rsa,k(i,j) rsa,k-1(i,j) ? ra,k,i,j -
rsa,k-1(i,j)
Decompose the new updated as personality model
in order to build two new separated role and
strategy models.

43
ExperimentsSituating reinforcement approaches

Goal To evaluate the myslotvalues-learner
strategy with different slot selection
mechanisms.
The myslotvalues-learner performance is very bad
and increasing the epsilon value it is worse.
The classic exploration-exploitation trade-off is
clearly present here.
It shows that the performance is better with a
totally greedy selection slot mechanism.
The MSG density provides an extra exploration
phase in this case.
Question What if we let the agent learn during
more rounds?

44
ExperimentsSituating reinforcement approaches

Goal To evaluate the myslotvalues-learner
strategy when increasing the number of rounds.
In this case, we fixed the slot selection
mechanism at epsilon0.1
Unexpectedly, instead of increasing, the
myslotvalues-learner performance decreases when
we increases the number of rounds.
Question What if the agent try to learn the slot
values of the other agents?

45
ExperimentsSituating reinforcement approaches

Goal To evaluate the theirslotvalues-learner
strategy.
We confirm our expectation of getting better
results with this strategy.
However it is still bad and it is outperformed by
the self-centered one.
Here we can observe the same effect of the
exploration-exploitation trade-off observed
before when increasing the epsilon value.
Question What if we again increase the number of
rounds now with this strategy?

46
ExperimentsSituating reinforcement approaches

Goal To evaluate the theirslotvalues-learner
strategy when increasing the number of rounds.
Now we fixed the totally greedy slot selection
mechanism in these experiments.
As happened with the previous strategy the
performance decreases (instead of increasing)
when we increase the number of rounds.
Question What if the learner try to learn the
personality-value vectors of the other agents?

47
ExperimentsSituating reinforcement approaches

Goal To evaluate the reinforcement-modeler
strategy varying the slot selection mechanism.
Finally this is a good performance. This strategy
always outperforms the other two.
We still observe the exploration-exploitation
trade-off.
Question Would this performance vary when
increasing the number of rounds?

48
ExperimentsSituating reinforcement approaches

Goal To evaluate the reinforcement-modeler
strategy varying the number of rounds.
As in the case of the bayesian-modeler strategy,
this one also increases with the number of
rounds.
After 13 rounds, its performance does not improve
very much but it is already very close to the
bayesian-modeler performance.

49
Closing remarks

The reinforcement-modeler strategy is a hybrid
approach
The same decision-theoretic approach used by the
semi-oracle agent.
A new learning technique using RL, instead of the
bayesian one, without using nor computing
explicit probabilities.
Although preliminary, the experiments with humans
showed encouraging results for further research
on the hybridization of human-agent modelers.

50
Graphical summaryThe bayesian and reinforcement
approaches
51
Conclusions

Changes
Generality
Aplicability
Main contributions
Future research work

52
Last changes

Introduction of the communication concept and
its relationship with other concepts in chapter
2.
Contrast and comparison with the other related
work presented in chapter 7.
Explicit state space representation in RL agents
in chapter 5.
Change of the name of collaborative agents to
team-centered agents through the whole
document.
Explicit mention of the generality of the
framework in chapter 6.
Explicit set up of experiments with human beings
in chapter 5.
Explicit explanation of the computation of the
personality models, specially in the RL part in
chapter 4 and 5.
Detail of book shopper example using the
bayesian-modeler agent in chapter 6.

53
General framework

Our experimental framework is a general frame of
reference for the assessment of modeling
approaches
The empirical upper-limit established by the
oracle strategy is clearly the optimal limit when
evaluating modeling strategies because it indeed
has the correct models about the others.
Although the lower-limit established by the
indifferent strategy may be broken by other
possible strategies, these can not be seen as
reasonable ones.
The other middle-limits established by the
semi-modeler strategy using different static
models provide different sub-limits from totally
wrong or opposite models to the complete correct
or real models.

54
Applicability of our approach

Clearly, we would choose the bayesian modeling
approach than the reinforcement one because it
converges faster and in a more accurate way to
the correct models about the others computing
their explicit probabilities.
However, if we are able to do this would depend
on two basic issues
The features of the domain.
The tractability of the computation.

55
Applicability of the bayesian-modelerRequirement
s

To identify a finite set of possible
personalities to be modeled over the other agents
(such as the roles and strategies in the MSG).
To identify a finite set of possible behaviors to
be observed in the other agents.
To be able to have in advance or to compute the
conditional probabilities of the possible
observations given the possible personalities.
To have independent new pieces of evidence to
compute the new posterior probabilities.

56
An example modeling book shopping

A book shopper may have different shopping
personalities. For instance The poor shopper,
the rich one, the novel reader, the Christmas
shopper, etc.
Their possible observed behaviors are clearly
differentiated by the features of the book they
buy (e.g. author, subject), the season when they
buy and the amount of money they spend.
It is possible to compute conditional
probabilities of these behaviors given each
possible personalities.
Each new shopping can be considered as
independent events.

57
Main contributions

An empirical framework for the assessment of
modeler agents. We created a set of basic
non-modeling strategies to establish the lower-
and upper-limits and other middle-limits. Then,
using this frame of reference and a systematic
and detailed empirical research methodology we
have evaluated the performance of our
probabilistic strategy and compared it with
other different learning approaches. The results
have showed that the performance of the
bayesian-modeler strategy is very closed to the
upper-limit, outperforming other agents using
different learning and modeling strategies.

58
Main contributions

The design and development of the
bayesian-modeler agent which uses a totally
probabilistic approach, combining concepts of
utility, decision and probability theories which
are certainly well founded and guarantee to
converge to the correct models. The
decision-theoretic module always chooses a
rational decision at each round of the game,
maximizing the agents expected utility. The
bayesian learning module is capable of building
and updating models about the others in an
iterative and incremental way after each round.
Furthermore, it is robust enough to correctly
rebuild the models, if they change dynamically
during the process.

59
Main contributions

A computational multiagent testbed, the
Multiagent Meeting Scheduling Game, for doing
research in multiagent systems. This is a
non-zero sum game of incomplete information that
is mainly competitive but it has collaborative
features. Furthermore, it is flexible enough for
doing experimental research because is possible
to easily run different kinds of experiments
varying several control variables in a gradual
way, such as the number of agents, kinds of
roles and strategies, the randomness of the
environment, the size of the calendars
personalities, and the publicity of the involved
information.

60
Future research work

Migration to other domains and learning other
traits. In this work the modeler strategy model
the others roles and strategies but in other
domains could be interesting to models other
traits or personalities, such as goals or plans.
As we explained before, it is possible to migrate
our modeling approach to other domains, such as
the book shopping domain. There are other
distributed problems with similar competitive and
collaborative features in the multiagent research
agenda such as the robotic soccer domain where we
think could be possible to empirically explore
the use of our approach.

61
Future research work

Integration of bayesian or belief networks.
Although we used the recursive bayesian mechanism
for learning the others models, we did not use
bayesian networks in this work. Basically we use
the probability distribution and the conditional
probabilities to compute the joint distribution
and then update the posterior probabilities using
the recursive bayesian mechanism. However, this
can be intractable and the use of bayesian
networks may be the solution since they sidestep
the joint and one of the basic tasks of these
probabilistic reasoning systems is precisely the
computation of the posterior probability
distribution.

62
Future research work

Collaborative environments and coalition
formation. In contrast of special focus on the
competitive advantage of modeling other agents,
the modeling-other-agents task can prove to be
useful in collaborative problems, given possibly
more efficient coordination mechanisms of giving
the ability of allowing coordination without
communication for example.
Concurrent multiagent modeling. Instead of having
only one modeler agent the research agenda
expands if we allow many modeler agents modeling
each other in a concurrent way. This includes to
address the problem of modeling nested models.

63
Future research work

Hibridization of human and modeler agents. Our
experiments have also given some although
preliminary but encouraging results for further
research on how can be humans benefited by
modeler artificial agents. This approach has to
be with the human-computer interaction and multi
agent-human collaboration research agendas.
The multiagent testbed that we have designed and
developed is a flexible one for further MAS
research. However, it is necessary to refine and
detail the computational code and documentation
in order to have a clear and easy-to-use release.