Fuzzy Inference System Learning By Reinforcement presentation

About This Presentation

Transcript and Presenter's Notes

Title: Fuzzy Inference System Learning By Reinforcement

1
Fuzzy Inference System Learning By Reinforcement

Presented by
Alp Sardag

2
A Comparison of Fuzzy Classical Controllers

Fuzzy Controller Expert systems based on if-then
rules where premises and conclusions are
expressed by means of linguistic terms.
Rules close to natural language
A priori knowledge
Classical Controller Need analytical task model.

3
Design Problem of FC

A priori knowledge extraction is not easy
Disagreement between experts
Great number of variables necessary to solve the
control task

4
Self Tunning FIS

A direct teacher based on input-output set of
trainning data.
A distal teacher does not give the correct
actions, but the desired effect on the process.
A performance measure EA
A critic gives rewards and punishment with
respect to state reached by the learner. RL
methods.
There are no more than two fuzzy sets activated
for an input value

5
Goal

To overcome the limitations of classical
reinforcement learning methods, discrete state
perception and discrete actions.
NOTE In this paper MISO FIS is used.

6
A MIMO FIS
FIS is made of N rules of the following form
Ri ith rule of the rule base Siinput
variables Lij linguistic term of input variable
its membership function ?Lij YNOoutput
variables Oij linguistic term of output variable
7
Rule Preconditions

Membership functions are triangles and trapezoids
(altough not differentiable).
because they are simple
Sufficient in a number of application
Strong fuzzy partition used
All values activate at least one fuzzy set, the
input universe is completely covered.

8
Strong Fuzzy Partition Example
9
Rule Conclusions

Each of i rule has No corresponding conclusions
For Each Rule the truth value with respect to S
is computed with
where T norm is implemented by a product
The FIS outputs are

10
Learning

Number and positions of the input fuzzy labels
being set using a priori knowledge.
Structural Learning consists in tuning the
number of rules.
FACL and FQL learning are reinforcement learning
methods that deal with only the conclusion part.

11
Reinforcement Learning
NOTE state observability is total.
12
Markovian Decision Problem

S a finite discrete state
U a finite discrete action
R primary reinforcements RSxU?R
P transition probabilities
PSxUxS ?0,1.
State evaluation function

13
The Curse of Dimensionality

Some form of generalization must be incorporated
in state representation. Various function
approximators used
CMAC
Neural Networks
FIS the state space encoding is based on a
vector corresponding to the current state.

14
Adaptive Heuristic Critic

AHC is made of two components
Adaptive Critic Element Critic developed in an
adaptive way from primary reinforcements,
represent an evaluation function more informative
than the one given by the environment through
rewards and punishment (V(S) values).
Associative Search Element selects actions which
lead to better critic values

15
FACL Scheme
16
The Critic
At time step t, the critic value is computed with
conclusion vector
TD error is given by
TD-learning update rule
17
The Actor

When the rule Ri is activated, one of the Ri
local action is elected to participate in the
global action, based on its quality. The global
action triggered
where ?-greedy is a function implementing
mixed exploration-exploitation strategy.

18
Tunning vector w

TD error, the improvement measure except in the
beginning is a good approximator of the optimal
evaluation function. The actor learning rule

19
Meta Learning Rule

Update strategie for learning rate
Every parameter should have its learning rate.
(?1?n)
Every learning rate should be allowed to vary
over time. (in order V values to converge)
When the derivative of a parameter have the same
sign for several consecutive time steps, its
learning rate should be increased.
When the parameter derivative sign alternates for
several consecutive time steps, its learning rate
should be decreased. Delta-Bar-Delta rule

20
Execution Procedure

Estimation of evaluation function corresponding
to the current state.
Computation of the TD error.
Tunning of parameter vector v and w.
Estimation of the new evaluation function for the
current state with new conclusion vector vt1.
Learning rate updating with Delta-Bar-Delta rule.
For each activated rule, election of the local
action computation and triggering of the global
action Ut1.

21
Example
22
Example Cont.

The number of rules is twenty five.
For the sake of simplicity, the discerete actions
available are the same for all rules.
The discerete action set
The reinforcement function

Fuzzy Inference System Learning By Reinforcement PowerPoint PPT Presentation