Collaborative Reinforcement Learning of Autonomic Behaviour - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

Collaborative Reinforcement Learning of Autonomic Behaviour

Description:

Number of Views:21

Avg rating:3.0/5.0

Slides: 13

Provided by: compLa

Category:

more less

Transcript and Presenter's Notes

Title: Collaborative Reinforcement Learning of Autonomic Behaviour

1
Collaborative Reinforcement Learning of Autonomic
Behaviour

Jim Dowling, Eoin Curran, Raymond Cunningham and
Vinny Cahill
2nd International Workshop on Self-Adaptive and
Autonomic Computing Systems, 2004

2
Overview

3
Introduction

Autonomous Distributed systems will consist of
interacting components free from human
interference
Existing top-down management and programming
solutions require too much global state
Bottom up, decentralised collection of components
who make their own decisions based on local
information
System wide self behaviour emerges from
interactions

4
Self- Behaviour in K-Components

Self-Adaptive components that change structure
and/or behaviour at run-time
Adapt to discovered faults
Reduced performance
Requires active monitoring of component states
and external dependencies
A K-Component contains modularised Self
behaviour
Defined in CDL (Contract description language)
Allow programmer to declare feedback events with
adaptation actions (event-condition-action)
Encapsulated in the reflective agent

5
Self- Behaviour in K-Components
6
Self- Distributed Systems using Distributed
(collaborative) Reinforcement Learning

For complex systems, programmers cannot be
expected to describe all conditions
Self-adaptive behaviour learnt by components
Decentralised co-ordination of components to
support system-wide properties
Distributed Reinforcement Learning (DRL) is
extension to RL and uses neighbour interactions
only

7
Reinforcement Learning

Agent associates actions with system states in a
trial and error manner
Outcome of action reinforcement
gt update to agents action-value policy
Goal of reinforcement learning is to maximise the
the total REWARD (reinforcements) an agent
receives over a timeframe by selecting optimal
actions
Short-term actions may have short-term poor
performance to give higher longer term payoff
An action is a decision the agent learns to make
Action selection is probabilistic no guarentees

8
DRL

Agents learn from the successes of their
neighbours
Solves system-wide optimisation properties by
specifying how individual DOP (Discrete
Optimisation Problems) using RL share results
System-wide problems are specified as a set of
DOPs to be performed by a set of agents
An agent can solve the DOP itself or delegate to
another agent

9
DRL Agent Model
10
SAMPLE Adhoc Routing using DRL

Probabilistic ad-hoc routing protocol based on
DRL
Adaptation of network traffic around areas of
congestion
Exploitation of stable routes
Routing agents share link information with local
nodes
Broadcast
Routing decisions based on local information and
information obtained from neighbours
Outperforms Ad-hoc On Demand Distance Vector
Routing and Dynamic Source Routing