Lecture VI: Adaptive Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: Lecture VI: Adaptive Systems

1
Lecture VI Adaptive Systems

Zhixin Liu
Complex Systems Research Center,
Academy of Mathematics and Systems Sciences, CAS

2
In the last lecture, we talked about

Game Theory
An embodiment of the complex interactions among
individuals
Nash equilibrium
Evolutionarily stable strategy

3
In this lecture, we will talk about

Adaptive Systems

4
Adaptation

To adapt to change oneself to conform to a new
or changed circumstance.
What we know from the new circumstance?
Adaptive estimation, learning, identification
How to do the corresponding response?
Control/decision making

5
Why Adaptation?

Uncertainties always exist in modeling of
practical systems.
Adaptation can reduce the uncertainties by using
the system information.
Adaptation is an important embodiment of human
intelligence.

6
Framework of Adaptive Systems
Environment

7
Two levels of adaptation

Individual learn and adapt
Population level
Death of old individuals
Creation of new individuals
Hierarchy

8
Some Examples

Adaptive control systems
adaptation in a single agent
Iterated prisoners dilemma
adaptation among agents

9
Some Examples

Adaptive control systems
adaptation in a single agent
Iterated prisoners dilemma
adaptation among agents

10
Adaptation In A Single Agent
Environment

11
Information
wt
yt
ut
Dynamical System

Information prior posterior
I0I1

I0 prior knowledge about the system I1
posterior knowledge about the system
u0,u1,ut, y0,y1,,yt (Observations)
The posterior information can be used to reduce
the uncertainties of the system.
12
Uncertainty

Model

External Uncertainty
Internal Uncertainty

External uncertainty noise/disturbance
Internal uncertainty
Parameter uncertainty
Signal uncertainty
Functional uncertainty

13
Adaptation

To adapt to change oneself to conform to a new
or changed circumstance.
What we know from the new circumstance?
Adaptive estimation, learning, identification
How to do the corresponding response?
Control/decision making

Adaptive Estimation

15
Adaptive Estimation

Adaptive estimation parameter or structure
estimator, which can be updated based on the
on-line observations.

yt
Adaptive Estimator
e
-
?
yt
ut
System

Example In the parametric case, the parameter
estimator can be obtained by minimizing certain
prediction error

16
Adaptive Estimation

Parameter estimation
Consider the following linear regression model

unknown parameter vector
regression vector
noise sequence

Remark
Linear regression model may be nonlinear.
Linear system can be translated into linear
regression model.

17
Least Square (LS) Algorithm

1795, Gauss, least square algorithm
The number of functions is greater than that of
the unknown parameters.
The data contain noise.
Minimize the following prediction error

18
Recursive Form of LS
Recursive Form of LS

where Pt is the following estimation covariance
matrix

A basic problem
19
Recursive Form of LS
Theorem (T.L. Lai C.Z. Wei)
Under the above assumption, if the following
condition holds
then the LS has the strong consistency.
20
Weighted Least Square

Minimize the following prediction error

Recursive form of WLS

21
Self-Convergence of WLS

Take the weight as follows

with .
TheoremUnder Assumption 1, for any initial
value and any regression vector
, will converge to some vector almost surely.
Lei Guo, 1996, IEEE TAC
22
Adaptation

To adapt to change oneself to conform to a new
or changed circumstance.
What we know from the new circumstance?
Adaptive estimation, learning, identification
How to do the corresponding response?
Control/decision making

Adaptive Control

24
Adaptive Control
Adaptive Control a controller with adjustable
parameters (or structures) together with a
mechanism for adjusting them.
y
u
Adaptive Estimator
Plant
r
Adaptive Controller
r
25
Robust Control
Model Nominal Ball
r
Can not reduce uncertainty!
26
Adaptive Control

An example
Consider the following linear regression model

Where a and b are unknown parameters, yt ,
ut, and wt are the output, input and white
noise sequence.
Objective design a control law to minimize the
following average tracking errors
27
Adaptive Control

If (a,b) is known, we can get the optimal
controller

Certainty Equivalence Principle Replace
the unknown parameters in a non-adaptive
controller by its online estimate
If (a,b) is unknown, the adaptive controller can
be taken as
28
Adaptive control

If (a,b) is unknown, the adaptive controller can
be taken as

where (at,bt) can be obtained by LS
29
Adaptive Control

The closed-loop system

30
Theoretical Problems

a) Stability

b) Optimality
31
Theoretical Obstacles

Controller

32
Theoretical Obstacles

1) The closed-loop system is a very complicated
nonlinear stochastic dynamical system.

2) No useful statistical properties, like
stationarity or independency of the system
signals are available. 3) No properties of (at,
bt) are known a priori.
33
Theorem
Assumption1) The noise sequence is a
martingale difference sequence, and there exists
a constant , such that
2) The regression vector is an
adaptive sequence, i.e.,
3) is a deterministic bounded signal.

Theorem
Under the above assumptions, the closed-loop
system is stable and optimal.

Lei Guo, Automatica, 1995
34
Some Examples

Adaptive control systems
adaptation in a single agent
Iterated prisoners dilemma
adaptation among agents

35
Prisoners Dilemma

The story of prisoners dilemma
Player two prisoners
Action cooperation, Defect
Payoff matrix

Prisoner B
C
D
(0,5)
(3,3)
C
Prisoner A
(1,1)
(5,0)
D
36
Prisoners Dilemma

No matter what the other does, the best choice
is D.
(D,D) is a Nash Equilibrium.
But, if both choose D, both will do worse than
if both select C

Prisoner B
C
D
(0,5)
(3,3)
C
Prisoner A
(1,1)
(5,0)
D
37
Iterated Prisoners Dilemma

The individuals
Meet many times
Can recognize a previous interactant
Remember the prior outcome
Strategy specify the probability of cooperation
and defect based on the history
P(C)f1(History)
P(D)f2(History)

38
Strategies

Tit For Tat cooperating on the first time, then
repeat opponent's last choice.
Player A C D D C C C C C D D D D C
Player B D D C C C C C D D D D C
Tit For Tat and Random - Repeat opponent's last
choice skewed by random setting.
Tit For Two Tats and Random - Like Tit For Tat
except that opponent must make the same choice
twice in a row before it is reciprocated. Choice
is skewed by random setting.
Tit For Two Tats - Like Tit For Tat except that
opponent must make the same choice twice in row
before it is reciprocated.
Naive Prober (Tit For Tat with Random Defection)
- Repeat opponent's last choice (ie Tit For Tat),
but sometimes probe by defecting in lieu of
cooperating.
Remorseful Prober (Tit For Tat with Random
Defection) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes probe by defecting in
lieu of cooperating. If the opponent defects in
response to probing, show remorse by cooperating
once.
Naive Peace Maker (Tit For Tat with Random
Co-operation) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes make peace by
co-operating in lieu of defecting.
True Peace Maker (hybrid of Tit For Tat and Tit
For Two Tats with Random Cooperation) - Cooperate
unless opponent defects twice in a row, then
defect once, but sometimes make peace by
cooperating in lieu of defecting.
Random - always set at 50 probability

39
Strategies

Tit For Tat cooperating on the first time, then
repeat opponent's last choice.
Player A C D D C C C C C D D D D C
Player B D D C C C C C D D D D C
Tit For Tat and Random - Repeat opponent's last
choice skewed by random setting.
Tit For Two Tats and Random - Like Tit For Tat
except that opponent must make the same choice
twice in a row before it is reciprocated. Choice
is skewed by random setting.
Tit For Two Tats - Like Tit For Tat except that
opponent must make the same choice twice in row
before it is reciprocated.
Naive Prober (Tit For Tat with Random Defection)
- Repeat opponent's last choice (ie Tit For Tat),
but sometimes probe by defecting in lieu of
cooperating.
Remorseful Prober (Tit For Tat with Random
Defection) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes probe by defecting in
lieu of cooperating. If the opponent defects in
response to probing, show remorse by cooperating
once.
Naive Peace Maker (Tit For Tat with Random
Co-operation) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes make peace by
co-operating in lieu of defecting.
True Peace Maker (hybrid of Tit For Tat and Tit
For Two Tats with Random Cooperation) - Cooperate
unless opponent defects twice in a row, then
defect once, but sometimes make peace by
cooperating in lieu of defecting.
Random - always set at 50 probability

40
Strategies

Always Defect
Always Cooperate
Grudger (Co-operate, but only be a sucker once) -
Cooperate until the opponent defects. Then always
defect unforgivingly.
Pavlov (repeat last choice if good outcome) - If
5 or 3 points scored in the last round then
repeat last choice.
Pavlov / Random (repeat last choice if good
outcome and Random) - If 5 or 3 points scored in
the last round then repeat last choice, but
sometimes make random choices.
Adaptive - Starts with c,c,c,c,c,c,d,d,d,d,d and
then takes choices which have given the best
average score re-calculated after every move.
Gradual - Cooperates until the opponent defects,
in such case defects the total number of times
the opponent has defected during the game.
Followed up by two co-operations.
Suspicious Tit For Tat - As for Tit For Tat
except begins by defecting.
Soft Grudger - Cooperates until the opponent
defects, in such case opponent is punished with
d,d,d,d,c,c.
Customised strategy 1 - default setting is T1,
P1, R1, S0, B1, always co-operate unless
sucker (ie 0 points scored).
Customised strategy 2 - default setting is T1,
P1, R0, S0, B0, always play alternating
defect/cooperate.

41
Iterated Prisoners Dilemma

Which strategy can thrive/what is the good
strategy?
Robert Axelrod, 1980s
A computer round-robin tournament
The first round
The second round

AXELROD R. 1987. The evolution of strategies in
the iterated Prisoners' Dilemma. In L. Davis,
editor, Genetic Algorithms and Simulated
Annealing. Morgan Kaufmann, Los Altos, CA.
42
Characters of good strategies

Goodness never defect first
First round the first eight strategies with
goodness
Second round fourteen strategies with
goodness in the first fifteen strategies
Forgiveness may revenge, but the memory is
short.
Grudger is not s strategy with forgiveness
Goodness and forgiveness is a kind of
collective behavior.
For a single agent, defect is the best strategy.

43
Evolution of the Strategies

Evolve good strategies by genetic algorithm
(GA)

44
Some Notations in GA

String the individuals, and it is used to
represent the chromosome in genetics.
Population the set of the individuals
Population size the number of the individuals
Gene the elements of the string
E.g., S1011, where 1,0,1,1 are called
genes.
Fitness the adaptation of the agent for the
circumstance

From Jing Hans PPT
45
How GA works?

Represent the solution of the problem by
chromosome, i.e., the string
Generate some chromosomes as the initial solution
randomly
According to the principle of Survival of the
Fittest , the chromosome with high fitness can
reproduce, then by crossover and mutation the new
generation can be generated.
The chromosome with the highest fitness may be
the solution of the problem.

From Jing Hans PPT
46
GA
Natural Selection
Create new generation
crossover

choose an initial population
determine the fitness of each individual
perform selection
repeat
perform crossover
perform mutation
determine the fitness of each individual
perform selection
until some stopping criterion applies

mutation
From Jing Hans PPT
47
Some Remarks On GA

The GA search the optimal solution from a set of
solution, rather than a single solution
The search space is large 0,1L
GA is a random algorithm, since selection,
crossover and mutation are all random operations.
Suitable for the following situation
There is structure in the search space but it is
not well-understood
The inputs are non-stationary (i.e., the
environment is changing)
The goal is not global optimization, but finding
a reasonably good solution quickly

48
Evolution of Strategies By GA

Each chromosome represents one strategy
The strategy is deterministic and it is
determined by the previous moves.
E.g., the strategy is determined by one step
history, then there are four cases of history
Player I C D D C
Player II D D C C
The number of the possible strategies is
222216.
TFT F(CC)C, F(CD)D, F(DC)C, F(DD)D
Always cooperate F(CC)F(CD)F(DC)F(DD)C
Always defect F(CC)F(CD)F(DC)F(DD)D

49
Evolution of the Strategies

Strategies use the outcome of the three previous
moves to determine the current move.
The possible number of the histories is
44464.
Player I CCC CCD CDC CDD DCC DCD
DDD DDD
Player II CCC CCC CCC CCC CCC
CCC DDC DDD

C C C C C C
C C C C C
C C C C
D D D D D D
D D D

The initial premises is three hypothetical move.
The length of the chromosome is 70.
The total number of strategies is 2701021.

50
Evolution of good strategy

Five steps of evolving good strategies by GA
An initial population is chosen.
Each individual is run in the current environment
to determine its effectiveness.
The relatively successful individual are selected
to have more offspring.
The successful individuals are randomly paired
off to produce two offspring per mating.
Crossover way of constructing the chromosomes of
the two offspring from the chromosome of two
parents.
Mutation randomly changing a very small
proportion of the Cs to Ds and vice versa.
New population are generated.

51
Evolution of the Strategies

Some parameters
The population size in each generation is 20.
Each game consists of 151 moves.
Each of them meet eight representatives, and this
made about 24,000 moves per generation.
A run consists of 50 generation
Forty runs were conducted.

52
Results

The median member is as successful as TFT
Most of the strategies is resemble TFT
Some of them have the similar patterns as TFT
Do not rock the boat continue to cooperate after
the mutual cooperation
Be provocable defect when the other player
defects out of the blue
Accept an apology continue to cooperate after
cooperation has been restored
Forget cooperate when mutual cooperation has
been restored after an exploitation
Accept a rut defect after three mutual
defections

53
What is a good strategy?

TFT is a good strategy?
Tit For Two Tats may be the best strategy in the
first round, but it is not a good strategy in the
second round.
Good strategy depends on other strategies,
i.e., environment.

Evolutionarily stable strategy
54
Evolutionarily stable strategy (ESS)

Introduced by John Maynard Smith and George R.
Price in 1973
ESS means evolutionarily stable strategy, that is
a strategy such that, if all member of the
population adopt it, then no mutant strategy
could invade the population under the influence
of natural selection.
ESS is robust for evolution, it can not be
invaded by mutation.

John Maynard Smith, Evolution and the Theory of
Games
55
Definition of ESS

A strategy x is an ESS if for all y, y ? x, such
that
holds for small positivee.

56
ESS in IPD

Tit For Tat can not be invaded by the wiliness
strategies, such as always defect.
TFT can be invaded by goodness strategies, such
as always cooperate, Tit For Two Tats and
Suspicious Tit For Tat
Tit For Tat is not a strict ESS.
Always Cooperate can be invaded by Always
Defect.
Always Defect is an ESS.

57
Other Adaptive Systems

Complex adaptive system
John Holland, Hidden Order, 1996
Examples
stock market, social insect, ant colonies,
biosphere, brain, immune system, cell ,
developing embryo,
Evolutionary algorithms
genetic algorithm, neural network,

58
References

Lei Guo, Self-convergence of weighted
least-squares with applications to stochastic
adaptive control, IEEE Trans. Automat. Contr.,
1996, 41(1) 79-89.
Lei Guo, Convergence and logarithm laws of
self-tuning regulators, 1995, Automatica, 31(3)
435-450.
Lei Guo, Adaptive systems theory some basic
concepts, methods and results, Journal of Systems
Science and Complexity, 16(3) 293-306.
Drew Fudenberg, Jean Tirole, Game Theory, The
MIT Press, 1991.
AXELROD R. 1987, The evolution of strategies in
the iterated Prisoners' Dilemma. In L. Davis,
editor, Genetic Algorithms and Simulated
Annealing. Morgan Kaufmann, Los Altos, CA.
Richard Dawkins, The Selfish Gene, Oxford
University Press.
John Holland, Hidden Order, 1996.

Adaptation in games

Adaptation in a single agent
60
Summary
In these six lectures, we have talked about
Complex Networks Collective Behavior of
MAS Game Theory Adaptive Systems
61
Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS Game Theory Adaptive Systems
62
Three concepts

Average path length ltlgt
where dij is the shortest distance between i
and j.
Clustering Coefficient CltC(i)gt
Degree distribution
P(k)probability that the randomly chosen
node i has exactly k neighbors

Short average path length Large clustering
coefficient Power law degree distribution
63
Regular Graphs

Regular graphs graphs where each vertex has
the same number of neighbors.
Examples

64
Random Graph

ER random graph model G(N,p)
Given N nodes
Add an edge between a randomly-selected pair of
nodes with probability p

Homogeneous nature each node has roughly the
same number of edges

65
Small World Networks

WS model

Introduce pNK/2 long-range edges
A few long-range links are sufficient to
decrease l, but will not significantly change C.
66
Scale Free Networks

Some observations
A breakthrough Barabási Albert, 1999, Science
Generating process of BA model
1) Starting with a network with m0 nodes
2) Growth at each step, we add a new node
with m(?m0) edges that link the new node to m
different nodes already present in the network.
3) Preferential attachment When choosing
the nodes to which the new nodes connects, we
assume that the probability ? that a new node
will be connected to node i on the degree ki of
node i, such that

67
Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS More is different Game Theory Adaptive
Systems
68
Multi-Agent System (MAS)

MAS
Many agents
Local interactions between agents
Collective behavior in the population level
More is different.---Philp Anderson, 1972
e.g., Phase transition, coordination,
synchronization, consensus, clustering,
aggregation,
Examples
Physical systems
Biological systems
Social and economic systems
Engineering systems

69
Vicsek Model
Neighbors
70
Theorem 2 (Jadbabaie et al. , 2003)
Joint connectivity of the neighbor graphs on each
time interval th, (t1)h with h gt0
Synchronization of the linearized Vicsek model
Related result J.N.Tsitsiklis, et al., IEEE
TAC, 1984
71
Theorem 7 High Density Implies Synchronization

For any given system parameters
and when the number of agnets n
is large, the Vicsek model will synchronize
almost surely.

This theorem is consistent with the simulation
result.
72
Theorem 8 High density with short distance
interaction
Let
and the velocity satisfy Then
for large population, the MAS will synchronize
almost surely.
73
Soft Control

Key points
Different from distributed control approach.
Intervention to the distributed system
Not to change the local rule of the existing
agents
Add one (or a few) special agent called shill
based on the system state information, to
intervene the collective behavior
The shill is controlled by us, but is treated
as an ordinary agent by all other agents.
Shill is not leader, not leader-follower type.
Feedback intervention by shill(s).

This page is very important!
From Jing Hans PPT
74
Leader-Follower Model

Key points
Not to change the local rule of the existing
agents.
Add some (usually not very few) information
agents called leaders, to control or
intervene the MAS But the existing agents
treated them as ordinary agents.
The proportion of the leaders is controlled by us
(If the number of leaders is small, then
connectivity may not be guaranteed).
Open-loop intervention by leaders.

75
Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS More is different Game Theory
Interactions Adaptive Systems
76
Definition of Nash Equilibrium

Nash Equilibrium (NE) A solution concept of a
game

(N, S, u) a game
Si strategy set for player i
set of
strategy profiles
payoff function
s-i strategy profile of all players except
player i
A strategy profile s is called a Nash
equilibrium if
where si is any pure strategy of the player i.

77
Definition of ESS

A strategy x is an ESS if for all y, y ? x, such
that
holds for small positivee.

78
Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS More is different Game Theory
Interactions Adaptive Systems Adaptation
79
Other Topics

Self-organizing criticality
Earthquakes, fire, sand pile model, Bak-Sneppen
model
Nonlinear dynamics
chaos, bifurcation,
Artificial life
Tierra model, gene pool, game of life,
Evolutionary dynamics
genetic algorithm, neural network,

80
Complex systems

Not a mature subject
No unified framework or universal methods

81
THE END

Write a Comment

User Comments (0)

About PowerShow.com

Lecture VI: Adaptive Systems PowerPoint PPT Presentation