Collaborative Reinforcement Learning of Autonomic Behaviour - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Collaborative Reinforcement Learning of Autonomic Behaviour

Description:

Bottom up, decentralised collection of components who make their own decisions ... Decentralised co-ordination of components to support system-wide properties ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 13
Provided by: compLa
Category:

less

Transcript and Presenter's Notes

Title: Collaborative Reinforcement Learning of Autonomic Behaviour


1
Collaborative Reinforcement Learning of Autonomic
Behaviour
  • Jim Dowling, Eoin Curran, Raymond Cunningham and
    Vinny Cahill
  • 2nd International Workshop on Self-Adaptive and
    Autonomic Computing Systems, 2004

2
Overview
  • Building Autonomic distributed systems with self
    properties
  • Self-Organising
  • Self-Healing
  • Self-Optimising
  • Add collaborative learning mechanism to
    self-adaptive component model
  • Improved ad-hoc routing protocol

3
Introduction
  • Autonomous Distributed systems will consist of
    interacting components free from human
    interference
  • Existing top-down management and programming
    solutions require too much global state
  • Bottom up, decentralised collection of components
    who make their own decisions based on local
    information
  • System wide self behaviour emerges from
    interactions

4
Self- Behaviour in K-Components
  • Self-Adaptive components that change structure
    and/or behaviour at run-time
  • Adapt to discovered faults
  • Reduced performance
  • Requires active monitoring of component states
    and external dependencies
  • A K-Component contains modularised Self
    behaviour
  • Defined in CDL (Contract description language)
  • Allow programmer to declare feedback events with
    adaptation actions (event-condition-action)
  • Encapsulated in the reflective agent

5
Self- Behaviour in K-Components
6
Self- Distributed Systems using Distributed
(collaborative) Reinforcement Learning
  • For complex systems, programmers cannot be
    expected to describe all conditions
  • Self-adaptive behaviour learnt by components
  • Decentralised co-ordination of components to
    support system-wide properties
  • Distributed Reinforcement Learning (DRL) is
    extension to RL and uses neighbour interactions
    only

7
Reinforcement Learning
  • Agent associates actions with system states in a
    trial and error manner
  • Outcome of action reinforcement
  • gt update to agents action-value policy
  • Goal of reinforcement learning is to maximise the
    the total REWARD (reinforcements) an agent
    receives over a timeframe by selecting optimal
    actions
  • Short-term actions may have short-term poor
    performance to give higher longer term payoff
  • An action is a decision the agent learns to make
  • Action selection is probabilistic no guarentees

8
DRL
  • Agents learn from the successes of their
    neighbours
  • Solves system-wide optimisation properties by
    specifying how individual DOP (Discrete
    Optimisation Problems) using RL share results
  • System-wide problems are specified as a set of
    DOPs to be performed by a set of agents
  • An agent can solve the DOP itself or delegate to
    another agent

9
DRL Agent Model
10
SAMPLE Adhoc Routing using DRL
  • Probabilistic ad-hoc routing protocol based on
    DRL
  • Adaptation of network traffic around areas of
    congestion
  • Exploitation of stable routes
  • Routing agents share link information with local
    nodes
  • Broadcast
  • Routing decisions based on local information and
    information obtained from neighbours
  • Outperforms Ad-hoc On Demand Distance Vector
    Routing and Dynamic Source Routing

11
Routing Decision in SAMPLE
Learn if link FAIL/UNAVAILABLE/Congested
12
Observations/Questions
  • How general is this approach?
  • Easy to represent problems in DRL?
  • Does DRL work for all problems
  • Needs many more examples
  • Separation of self behaviour from functional
    components ?
  • What guarentees of optimisation are there?
  • Does it work for problems requiring system wide
    guarentees?
  • Learning algorithms can and do reduce performance
    along stages
  • More suited to off-line discovery?
Write a Comment
User Comments (0)
About PowerShow.com