Fuzzy Inference System Learning By Reinforcement - PowerPoint PPT Presentation


PPT – Fuzzy Inference System Learning By Reinforcement PowerPoint presentation | free to view - id: 1b8503-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Fuzzy Inference System Learning By Reinforcement


Fuzzy Inference System Learning By Reinforcement. Presented by. Alp Sardag. A ... FACL and FQL learning: are reinforcement learning methods that deal with only ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 24
Provided by: hom4184


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Fuzzy Inference System Learning By Reinforcement

Fuzzy Inference System Learning By Reinforcement
  • Presented by
  • Alp Sardag

A Comparison of Fuzzy Classical Controllers
  • Fuzzy Controller Expert systems based on if-then
    rules where premises and conclusions are
    expressed by means of linguistic terms.
  • Rules close to natural language
  • A priori knowledge
  • Classical Controller Need analytical task model.

Design Problem of FC
  • A priori knowledge extraction is not easy
  • Disagreement between experts
  • Great number of variables necessary to solve the
    control task

Self Tunning FIS
  • A direct teacher based on input-output set of
    trainning data.
  • A distal teacher does not give the correct
    actions, but the desired effect on the process.
  • A performance measure EA
  • A critic gives rewards and punishment with
    respect to state reached by the learner. RL
  • There are no more than two fuzzy sets activated
    for an input value

  • To overcome the limitations of classical
    reinforcement learning methods, discrete state
    perception and discrete actions.
  • NOTE In this paper MISO FIS is used.

FIS is made of N rules of the following form
Ri ith rule of the rule base Siinput
variables Lij linguistic term of input variable
its membership function ?Lij YNOoutput
variables Oij linguistic term of output variable
Rule Preconditions
  • Membership functions are triangles and trapezoids
    (altough not differentiable).
  • because they are simple
  • Sufficient in a number of application
  • Strong fuzzy partition used
  • All values activate at least one fuzzy set, the
    input universe is completely covered.

Strong Fuzzy Partition Example
Rule Conclusions
  • Each of i rule has No corresponding conclusions
  • For Each Rule the truth value with respect to S
    is computed with
  • where T norm is implemented by a product
  • The FIS outputs are

  • Number and positions of the input fuzzy labels
    being set using a priori knowledge.
  • Structural Learning consists in tuning the
    number of rules.
  • FACL and FQL learning are reinforcement learning
    methods that deal with only the conclusion part.

Reinforcement Learning
NOTE state observability is total.
Markovian Decision Problem
  • S a finite discrete state
  • U a finite discrete action
  • R primary reinforcements RSxU?R
  • P transition probabilities
  • PSxUxS ?0,1.
  • State evaluation function

The Curse of Dimensionality
  • Some form of generalization must be incorporated
    in state representation. Various function
    approximators used
  • CMAC
  • Neural Networks
  • FIS the state space encoding is based on a
    vector corresponding to the current state.

Adaptive Heuristic Critic
  • AHC is made of two components
  • Adaptive Critic Element Critic developed in an
    adaptive way from primary reinforcements,
    represent an evaluation function more informative
    than the one given by the environment through
    rewards and punishment (V(S) values).
  • Associative Search Element selects actions which
    lead to better critic values

FACL Scheme
The Critic
At time step t, the critic value is computed with
conclusion vector
TD error is given by
TD-learning update rule
The Actor
  • When the rule Ri is activated, one of the Ri
    local action is elected to participate in the
    global action, based on its quality. The global
    action triggered
  • where ?-greedy is a function implementing
    mixed exploration-exploitation strategy.

Tunning vector w
  • TD error, the improvement measure except in the
    beginning is a good approximator of the optimal
    evaluation function. The actor learning rule

Meta Learning Rule
  • Update strategie for learning rate
  • Every parameter should have its learning rate.
  • Every learning rate should be allowed to vary
    over time. (in order V values to converge)
  • When the derivative of a parameter have the same
    sign for several consecutive time steps, its
    learning rate should be increased.
  • When the parameter derivative sign alternates for
    several consecutive time steps, its learning rate
    should be decreased. Delta-Bar-Delta rule

Execution Procedure
  • Estimation of evaluation function corresponding
    to the current state.
  • Computation of the TD error.
  • Tunning of parameter vector v and w.
  • Estimation of the new evaluation function for the
    current state with new conclusion vector vt1.
  • Learning rate updating with Delta-Bar-Delta rule.
  • For each activated rule, election of the local
    action computation and triggering of the global
    action Ut1.

Example Cont.
  • The number of rules is twenty five.
  • For the sake of simplicity, the discerete actions
    available are the same for all rules.
  • The discerete action set
  • The reinforcement function

  • Performance measure for distance
  • Results
About PowerShow.com