Investigations on Automatic Behavior-based System Design [A Survey on] Hierarchical Reinforcement Learning - PowerPoint PPT Presentation

1 / 133
About This Presentation
Title:

Investigations on Automatic Behavior-based System Design [A Survey on] Hierarchical Reinforcement Learning

Description:

Title: PowerPoint Presentation Last modified by: Amir massoud Farahmand Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:844
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Investigations on Automatic Behavior-based System Design [A Survey on] Hierarchical Reinforcement Learning


1
Investigations on Automatic Behavior-based System
DesignA Survey on Hierarchical Reinforcement
Learning
  • Amir massoud Farahmand
  • Majid Nili Ahmadabadi, Babak N. Araabi, Caro
    Lucas
  • www.SoloGen.net
  • SoloGen_at_SoloGen.net

2
a non-uniformOutline
  • Brief History of AI
  • Challenges and Requirements of Robotic
    Applications
  • Behavior-based Approach to AI
  • The Problem of Behavior-based System Design
  • MDP and Standard Reinforcement Learning Framework
  • A Survey on Hierarchical Reinforcement Learning
  • Behavior-based System Design
  • Learning in BBS
  • Structure Learning
  • Behavior Learning
  • Behavior Evolution and Hierarchy Learning in
    Behavior-based Systems

3
Happy birthday to Artificial Intelligence
  • 1941 Konrad Zuse, Germany, general purpose
    computer
  • 1943 Britain (Turing and others) Collossus, for
    decoding
  • 1945 ENIAC, US. John von Neumann a consultant
  • 1946 The Logic Theorist on JOHNNIAC--Newell, Shaw
    and Simon
  • 1956 Dartmouth Conference organized by John
    McCarthy (inventor of LISP)
  • The term Artificial Intelligence coined at
    Dartmouth---intended as a two month, ten man
    study!

4
HP to AI (2)
  • It is not my aim to surprise or shock you----but
    the simplest way I can summarize is to say that
    there are now in the world machines that think,
    that learn and that create. Moreover, their
    ability to these things is going to increase
    rapidly until........
  • (Herb Simon 1957)
  • Unfortunately, Simon was too optimistic!

5
What AI have done for us?
  • Rather good OCR (Optical Character Recognition)
    and Speech recognition softwares
  • Robots make cars in all advanced countries
  • Reasonable machine translation is available for
    a large range of foreign web pages
  • Systems land 200 ton jumbo jets unaided every few
    minutes
  • Search systems like Google are not perfect but
    very effective information retrieval
  • Computer games and autogenerated cartoons are
    advancing at an astonishing rate and have huge
    markets
  • Deep blue beat Kasparov in 1997. The world Go
    champion is a computer.
  • Medical expert systems can outperform doctors in
    many areas of diagnosis (but we arent allowed to
    find out easily!)

6
AI What is it?
  • What is AI?
  • Different definitions
  • The use of computer programs and programming
    techniques to cast light on the principles of
    intelligence in general and human thought in
    particular (Boden)
  • The study of intelligence independent of its
    embodiment in humans, animals or machines
    (McCarthy)
  • AI is the study of how to do things which at the
    moment people do better (Rich Knight)
  • AI is the science of making machines do things
    that would require intelligence if done by men.
    (Minsky) (fast arithmetic?)
  • Is it definable?!
  • Turing test, Weak and Strong AI and

7
AI Basic assumption
  • Symbol System Hypothesis it is possible to
    construct a universal symbol system that thinks
  • Strong Symbol System Hypothesis the only way a
    system can think is through symbolic processing
  • Happy birthday Symbolic (Traditional Good
    old-fashioned) AI

8
Symbolic AI Methods
  • Knowledge representation (Abstraction)
  • Search
  • Logic and deduction
  • Planning
  • Learning

9
Symbolic AI Was it efficient?
  • Chess OK!
  • Block-worlds OK!
  • Daily Life Problems
  • Robots OK!
  • Commonsense OK!
  • OK

10
Symbolic AI and Robotics
World Modelling
Motor control
sensors
actuators
  • Functional decomposition
  • Sequential flow
  • Correct perceptions is assumed to be done by
    vision-researched in a a-good-and-happy-will-come
    -day!
  • Get a logic-based or formal description of
    percepts
  • Apply search operators or logical inference or
    planning operators

11
Challenges and Requirements of Robotic Systems
  • Challenges
  • Sensor and Effector Uncertainty
  • Partial Observability
  • Non-Stationarity
  • Requirements
  • (among many others)
  • Multi-goal
  • Robustness
  • Multiple Sensors
  • Scalability
  • Automatic design
  • Adaptation (Learning/Evolution)

12
Behavior-based approach to AI
  • Behavioral (activity) decomposition against
    functional decomposition
  • Behavior Sensor-gtAction (Direct link between
    perception and action)
  • Situatedness
  • Embodiment
  • Intelligence as Emergence of

13
Behavioral decomposition
manipulate the world
build maps
sensors
actuators
explore
avoid obstacles
locomote
14
Situatedness
  • No world modelling and abstraction
  • No planning
  • No sequence of operations on symbols
  • Direct link between sensors and actions
  • Motto The world is its own best model

15
Embodiment
  • Only an embodied agent is validated as one that
    can deal with real world.
  • Only through a physical grounding can any
    internal symbolic system be given meaning

16
Emergence as a Route to Intelligence
  • Emergence interaction of some simple systems
    which results in something more than sum of those
    systems
  • Intelligence as emergent outcome of dynamical
    interaction of behaviors with the world

17
Behavior-based design
  • Robust
  • not sensitive to failure of particular part of
    the system
  • no need for precise perception as there is no
    modelling there
  • Reactive Fast response as there is no long route
    from perception to action
  • No representation

18
A Simple problem
  • Goal make a mobile robot controller that
    collects balls from the field and move them to
    home
  • What we have
  • Differentially controlled mobile robot
  • 8 sonar sensors
  • Vision system that detects balls and home

19
Basic design
avoid obstacles
move toward ball
move toward home
exploration
20
A Simple Shot
21
?
  • How should we
  • DESIGN
  • a behavior-based system?!

22
Behavior-based System Design Methodologies
  • Hand Design
  • Common in almost everywhere.
  • Complicated may be even infeasible in complex
    problems
  • Even if it is possible to find a working system,
    it is not optimal probably.
  • Evolution
  • Good solutions can be found
  • Biologically feasible
  • Time consuming
  • Not fast in making new solutions
  • Learning
  • Biologically feasible
  • Learning is essential for life-time survival of
    the agent.

23
The Importance of Adaptation (Learning/Evolution)
  • Unknown environment/body
  • exact Model of environment/body is not known
  • Non-stationary environment/body
  • Changing environment (offices, houses, streets,
    and almost everywhere)
  • Aging
  • cannot be remedied with evolution very easily
  • Designer may not know how to benefit from every
    aspects of her agent/environment
  • Lets the agent learn it by itself (learning as
    optimization)
  • etc

24
Different Learning Methods
25
Reinforcement Learning
  • Agent senses state of the environment
  • Agent chooses an action
  • Agent receives reward from an internal/external
    critic
  • Agent learns to maximize its received rewards
    through time.

26
Reinforcement Learning
  • Inspired from Psychology
  • Thorndike, Skinner, Hull, Pavlov,
  • Very successful applications
  • Games (Backgammon)
  • Control
  • Robotics
  • Elevator Scheduling
  • Well-defined mathematical formulation
  • Markov Decision Problems

27
Markov Decision Problems
  • Markov Process Formulating a wide range of
    dynamical systems
  • Finding an optimal solution of an objective
    function
  • Stochastic Dynamics Programming
  • Planning Known environment
  • Learning Unknown environment

28
MDP
29
Reinforcement Learning Revisited (1)
  • Very important Machine Learning method
  • An approximate online solution of MDP
  • Monte Carlo method
  • Stochastic Approximation
  • Function Approximation

30
Reinforcement Learning Revisited (2)
  • Q-Learning and SARSA are among the most important
    solution of RL

31
Some Simple Samples
1D Grid World
Map of the Environment
Policy
Value Function
32
Some Simple Samples
2D Grid World
Map
Value Function
Policy
Value Function (3D view)
33
Some Simple Samples
2D Grid World
Map
Value Function
Value Function (3D view)
Policy
34
Curses of DP
  • It is not easy to use DP (and RL) in robotic
    tasks.
  • Curse of Modeling
  • RL solves this problem
  • Curse of Dimensionality (e.g. robotic tasks have
    a very big state space)
  • Approximating Value function
  • Neural Networks
  • Fuzzy Approximation
  • Hierarchical Reinforcement Learning

35
A Sample of Learning in a Robot
Hajime Kimura, Shigenobu Kobayashi,
Reinforcement Learning using Stochastic Gradient
Algorithm and its Application to Robots, The
Transaction of the Institute of Electrical
Engineers of Japan, Vol.119, No.8 (1999) (in
Japanese!)
36
Hierarchical Reinforcement Learning
37
ATTENTION
  • Hierarchical reinforcement learning methods are
    not specially designed for behavior-based
    systems.
  • Covering them in this presentation with this
    depth should not be interpreted as their high
    amount of relation to behavior-based system
    design.

38
Hierarchical RL (1)
  • Use some kind of hierarchy in order to
  • Learn faster
  • Need less values to be updated (smaller storage
    dimension)
  • Incorporate a priori knowledge by designer
  • Increase reusability
  • Have a more meaningful structure than a mere
    Q-table

39
Hierarchical RL (2)
  • Is there any unified meaning of hierarchy?
  • NO!
  • Different methods
  • Temporal abstraction
  • State abstraction
  • Behavioral decomposition

40
Hierarchical RL (3)
  • Feudal Q-Learning Dayan, Hinton
  • Options Sutton, Precup, Singh
  • MaxQ Dietterich
  • HAM Russell, Parr, Andre
  • ALisp Andre, Russell
  • HexQ Hengst
  • Weakly-Coupled MDP Bernstein, Dean Lin,
  • Structure Learning in SSA Farahmand, Nili
  • Behavior Learning in SSA Farahmand, Nili

41
Feudal Q-Learning
  • Divide each task to a few smaller sub-tasks
  • State abstraction method
  • Different layers of managers
  • Each manager gets orders from its super-manager
    and orders to its sub-managers

42
Feudal Q-Learning
  • Principles of Feudal Q-Learning
  • Reward Hiding Managers must reward sub-managers
    for doing their bidding whether or not this
    satisfies the commands of the super-managers.
    Sub-managers should just learn to obey their
    managers and leave it up to them to determine
    what it is best to do at the next level up.
  • Information Hiding Managers only need to know
    the state of the system at the granularity of
    their own choices of tasks. Indeed, allowing some
    decision making to take place at a coarser grain
    is one of the main goals of the hierarchical
    decomposition. Information is hidden both
    downwards - sub-managers do not know the task the
    super-manager has set the manager - and upwards
    -a super-manager does not know what choices its
    manager has made to satisfy its command.

43
Feudal Q-Learning
44
Feudal Q-Learning
45
Options Introduction
  • People make decisions at different time scales
  • Traveling example
  • People perform actions with different time scales
  • Kicking a ball
  • Becoming a soccer player
  • It is desirable to have a method to support this
    temporally-extended actions over different time
    scales

46
Options Concept
  • Macro-actions
  • Temporal abstraction method of Hierarchical RL
  • Options are temporally extended actions which
    each of them is consisted of a set of primitive
    actions
  • Example
  • Primitive actions walking NSWE
  • Options go to door, cornet, table, straight
  • Options can be Open-loop or Closed-loop
  • Semi-Markov Decision Process Theory Puterman

47
Options Formal Definitions
48
Options Rise of SMDP!
  • Theorem MDP Options SMDP

49
Options Value function
50
Options Bellman-like optimality condition
51
Options A simple example
52
Options A simple example
53
Options A simple example
54
Interrupting Options
  • Options policy is followed until it terminates.
  • It is somehow unnecessary condition
  • You may change your decision in the middle of
    execution of your previous decision.
  • Interruption Theorem Yes! It is better!

55
Interrupting OptionsAn example
56
Options Other issues
  • Intra-option model, value learning
  • Learning each options
  • Defining sub-goal reward function
  • Generating new options
  • Intrinsically Motivated RL

57
MaxQ
  • MaxQ Value Function Decomposition
  • Somehow related to Feudal Q-Learning
  • Decomposing value function in a hierarchical
    structure

58
MaxQ
59
MaxQ Value decomposition
60
MaxQ Existence theorem
  • Recursive optimal policy.
  • There may be many recursive optimal policies with
    different value function.
  • Recursive optimal policies are not an optimal
    policy.
  • If H is stationary macro hierarchy for MDP M,
    then all recursively optimal policies w.r.t. have
    the same value.

61
MaxQ Learning
  • Theorem If M is MDP, H is stationary macro, GLIE
    (Greedy in the Limit with Infinite Exploration)
    policy, common convergence conditions (bounded V
    and C, sum of alpha is ), then with Prob. 1,
    algorithm MaxQ-0 will converge!

62
MaxQ
  • Faster learning all states updating
  • Similar to all-goal-updating of Kaelbling

63
MaxQ
64
MaxQ State abstraction
  • Advantageous
  • Memory reduction
  • Needed exploration will be reduced
  • Increase reusability as it is not dependent on
    its higher parents
  • Is it possible?!

65
MaxQ State abstraction
  • Exact preservation of value function
  • Approximate preservation

66
MaxQ State abstraction
  • Does it converge?
  • It has not proved formally yet.
  • What can we do if we want to use an abstraction
    that violates theorem 3?
  • Reward function decomposition
  • Design a reward function that reinforces those
    responsible parts of the architecture.

67
MaxQ Other issues
  • Undesired Terminal states
  • Non-hierarchical execution (polling execution)
  • Better performance
  • Computational intensive

68
Return of BBS (Episode II) Automatic Design
69
Learning in Behavior-based Systems
  • There are a few works on behavior-based learning
  • Mataric, Mahadevan, Maes, and ...
  • but there is no deep investigation about it
    (specially mathematical formulation)!
  • And most of them incorporate flat architectures.

70
Learning in Behavior-based Systems
  • There are different methods of learning with
    different viewpoints, but we have concentrated on
    Reinforcement Learning.
  • Agent Did I perform it correctly?!
  • Tutor Yes/No! (or 0.3)

71
Learning in Behavior-based Systems
  • We have divided learning in BBS into two parts
  • Structure Learning
  • How should we organize behaviors in the
    architecture assume having a repertoire of
    working behaviors
  • Behavior Learning
  • How should each behavior behave? (we do not have
    a necessary toolbox)

72
Structure LearningAssumptions
  • Structure Learning in Subsumption Architecture as
    a good sample for BBS
  • Purely parallel case
  • We know B1, B2, and but we do not know how to
    arrange them in the architecture
  • we know how to avoid obstacles, pick an object,
    stop, move forward, turn, but we dont know
    which one is superior to others.

73
Structure Learning
build maps
explore
manipulate the world
The agent wants to learn how to arrange these
behaviors in order to get maximum reward from its
environment (or tutor).
locomote
avoid obstacles
Behavior Toolbox
74
Structure Learning
build maps
explore
manipulate the world
locomote
avoid obstacles
Behavior Toolbox
75
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
1-explore becomes controlling behavior and
suppress avoid obstacles 2-The agent hits a wall!
Behavior Toolbox
76
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
Tutor (environment) gives explore a punishment
for its being in that place of the structure.
Behavior Toolbox
77
Structure Learning
build maps
manipulate the world
explore
locomote
avoid obstacles
explore is not a very good behavior for the
highest position of the structure. So it is
replaced by avoid obstacles.
Behavior Toolbox
78
Structure LearningChallenging Issues
  • Representation How should the agent represent
    knowledge gathered during learning?
  • Sufficient (Concept space should be covered by
    Hypothesis space)
  • Tractable (small Hypothesis space)
  • Well-defined credit assignment
  • Hierarchical Credit Assignment How should the
    agent assign credit to different behaviors and
    layers in its architecture?
  • If the agent receives a reward/punishment, how
    should we reward/punish the structure of the
    agent?
  • Learning How should the agent update its
    knowledge when it receives reinforcement signal?

79
Structure LearningOvercoming Challenging Issues
  • Decomposing the behavior of a multi-agent system
    to simpler components may enhance our vision to
    the problem under investigation decomposing
    value function of the agent to simpler elements.
  • Structure can provide a lot of clues to us.

80
Structure LearningValue Function Decomposition
  • Each structure has a value regarding its
    receiving reinforcement signal.
  • The objective is finding a structure T with a
    high value.
  • We have decomposed value function to simpler
    components that enable the agent to benefit from
    previous interaction with the environment.

81
Structure LearningValue Function Decomposition
  • It is possible to decompose total systems value
    to value of each behavior in each layer.
  • We call it Zero-Order method.

Dont read the following equations!
82
Structure Learning Value Function
Decomposition(Zero Order Method)
  • It stores the value of behavior-being in a
    specific layer.

ZO Value Table in the agents mind
avoid obstacles (0.8)
explore (0.7)
locomote (0.4)
Higher layer
avoid obstacles (0.6)
explore (0.9)
locomote (0.4)
Lower layer
83
Structure LearningCredit Assignment(Zero Order
Method)
  • Controlling behavior is the only responsible
    behavior for the current reinforcement signal.
  • Appropriate ZO value table updating method is
    available.

84
Structure LearningValue Function Decomposition
and Credit AssignmentAnother Method (First Order)
  • It stores the value of relative order of
    behaviors
  • How much is it good/bad if B1 is being placed
    higher than B2?!
  • V(avoid obstaclesgtexplore) 0.8
  • V(exploregtavoid obstacles) -0.3
  • Sorry! Not that easy (and informative) to show
    graphically!!
  • Credits are assigned to all (controlling,
    activated) pairs of behaviors.
  • The agent receives reward while B1 is controlling
    and B3 and B5 are activated
  • (B1gtB3)
  • (B1gtB5)

85
Structure LearningExperiment Multi-RobotObject
Lifting
  • A Group of three robots want to lift an object
    using their own local sensors
  • No central control
  • No communication
  • Local sensors
  • Objectives
  • Reaching prescribed height
  • Keeping tilt angle small

86
Structure LearningExperiment Multi-RobotObject
Lifting
Push More
?!
Hurry Up
Stop
Slow Down
Dont Go Fast
Behavior Toolbox
87
Structure LearningExperiment Multi-RobotObject
Lifting
88
Structure LearningExperiment Multi-RobotObject
Lifting
Sample shot of height of each robot after
sufficient learning
89
Structure LearningExperiment Multi-RobotObject
Lifting
Sample shot of tilt angle of the object after
sufficient learning
90
Behavior Learning
  • The assumption of having a working behavior
    repertoire may not be practical in every
    situations
  • Partial Knowledge of the Designer to the Problem
    Suboptimal Solutions
  • Assumption
  • Input and output spaces of each behavior is known
    (S and A).
  • Fixed Structure

91
Behavior Learning
92
Behavior Learning
a1B1(s1)
explore
avoid obstacles
a2B2(s2)
How should each behavior behave when the system
is in state S?!
93
Behavior LearningChallenging Issues
  • Hierarchical Behavior Credit Assignment How
    should the agent assign credit to different
    behaviors in its architecture?
  • If the agent receives a reward/punishment, how
    should we reward/punish the behaviors of the
    agent?
  • Multi-agent Credit Assignment Problem
  • Cooperation between Behaviors How should we
    design behaviors so that they can cooperate with
    each other?
  • Learning How should the agent update its
    knowledge when it receives reinforcement signal?

94
Behavior LearningValue Function Decomposition
  • Value function of the agent can be decomposed
    into simpler behavior-level components.

95
Behavior LearningHierarchical Behavior Credit
Assignment
  • Augmenting action space of behaviors with No
    Action
  • Cooperation between behaviors
  • Each behavior knows whether there exists a better
    behavior in lower behaviors
  • Do not suppress them!
  • Developed a multi-agent credit assignment
    framework for logically expressible teams.

96
Behavior LearningHierarchical Behavior Credit
Assignment
97
Behavior LearningOptimality Condition and Value
Updating
  • !

98
Concurrent Behavior and Structure Learning
  • We have divided the BBS learning task into two
    separate process
  • Structure Learning
  • Behavior Learning
  • Concurrent behavior and structure learning is
    possible

99
Concurrent Behavior and Structure Learning
Initialize Learning Parameters
Interact with the environment and receive
reinforcement signal
Update estimation of structure and behavior
value functions
Update Architecture according to new estimations
100
Behavior and Structure LearningExperiment
Multi-RobotObject Lifting
101
Behavior and Structure LearningExperiment
Multi-RobotObject Lifting
102
Austin Villa Robot Soccer Team
N. Kohl and P. Stone, Policy Gradient
Reinforcement Learning for Fast Quadrupedal
Locomotion, IEEE International Conference on
Robotics and Automation (ICRA) 2004
103
Austin Villa Robot Soccer Team
Initial Gait
N. Kohl and P. Stone, Policy Gradient
Reinforcement Learning for Fast Quadrupedal
Locomotion, IEEE International Conference on
Robotics and Automation (ICRA) 2004
104
Austin Villa Robot Soccer Team
During Training Process
N. Kohl and P. Stone, Policy Gradient
Reinforcement Learning for Fast Quadrupedal
Locomotion, IEEE International Conference on
Robotics and Automation (ICRA) 2004
105
Austin Villa Robot Soccer Team
Fastest Final Result
N. Kohl and P. Stone, Policy Gradient
Reinforcement Learning for Fast Quadrupedal
Locomotion, IEEE International Conference on
Robotics and Automation (ICRA) 2004
106
Artificial Evolution
  • Computational framework inspired from natural
    evolution.
  • Natural Selection (Selection of the Fittest)
  • Reproduction
  • Crossover
  • Mutation

107
Artificial Evolution
  • A good (fit) individual survives from different
    hazards and difficulties during its lifetime and
    can find a mate and reproduce itself.
  • Its useful genetic information is passed to its
    offspring.
  • If two fit parents mate with each other, their
    offspring is probably better than both of them.

108
Artificial Evolution
  • Artificial Evolution is used a method of
    optimization
  • Does not need explicit knowledge of objective
    function
  • Does not need objective function derivatives
  • Does not get stuck in local min./max.
  • In contrast with Gradient-based searches

109
Artificial Evolution
110
Artificial Evolution
111
Artificial EvolutionA General Scheme
Initialize population
Calculate fitness of each individual
Select best individuals
Mate best individuals
112
Artificial Evolution in Robotics
  • Artificial Evolution as an approach to
    automatically design controller of situated
    agent.
  • Evolving Controller Neural Network

113
Artificial Evolution in Robotics
  • Objective function is not a very well-defined in
    robotic task.
  • The dynamic of the whole system
    (agent/environment) is too complex to compute
    derivative of objective function.

114
Artificial Evolution in Robotics
  • Evolution is very time consuming.
  • Actually in most cases, we do not have a
    population of robots. So we use a single robot
    instead of a population (take much more time).
  • Implementation on a real physical robot may cause
    damage to the robot before evolving a suitable
    controller.

115
Artificial Evolution in RoboticsSimulated/Physi
cal Robot
  • Evolve from the first generation on the physical
    robot.
  • Too expensive
  • Simulate robots and evolve an appropriate
    controller in a simulated world. Transfer the
    final solution to the physical robot.
  • Different dynamics of physical and simulated
    robots.
  • After evolving a controller on a simulated robot,
    continue the evolution on the physical system
    too.

116
Artificial Evolution in Robotics
117
Artificial Evolution in Robotics
118
Artificial Evolution in Robotics
Best individual of generation 45, born after 35
hours
Floreano, D. and Mondada, F. Automatic Creation
of an Agent Genetic Evolution of a Neural
Network Driven Robot, In D. Cliff, P. Husbands,
J.-A. Meyer, and S. Wilson (Eds.), From Animals
to Animats III, Cambridge, MA MIT Press, 1994.
119
Artificial Evolution in Robotics
25 generations (a few days)
D. Floreano, S. Nolfi, and F. Mondada,
Co-Evolution and Ontogenetic Change in Competing
Robots, Robotics and Autonomous Systems, To
appear, 1999
120
Artificial Evolution in Robotics
J. Urzelai, D. Floreano, M. Dorigo, and M.
Colombetti, Incremental Robot Shaping,
Connection Science, 10, 341-360, 1998.
121
Hybrid Evolution/Learning in Robots
  • Evolution is slow
  • but can find very good solutions
  • Learning is fast (more flexible during lifetime)
  • but may get stuck in local maxima of fitness
    function.
  • We may use both evolution and learning.

122
Hybrid Evolution/Learning in Robots
  • You may remember that in the structure learning
    method, we have assumed that there is a set of
    working behaviors.
  • To develop behaviors, we have used learning.
  • Now, we want to use evolution instead.

123
Behavior Evolution and Hierarchy Learning in BBS
  • Behavior Generation
  • Co-evolution
  • Slow
  • Structure Organization
  • Learning
  • Memetically Biased Initial Structure

124
Behavior Evolution and Hierarchy Learning in BBS
  • Fitness function How to calculate fitness of
    each behavior?
  • Fitness Sharing
  • Uniform
  • Value-based
  • Genetic Operators
  • Mutation
  • Crossover

125
Behavior Evolution and Hierarchy Learning in
BBSExperiment Multi-Robot Object Lifting
126
Behavior Evolution and Hierarchy Learning in
BBSExperiment Multi-Robot Object Lifting
127
Behavior Evolution and Hierarchy Learning in
BBSExperiment Multi-Robot Object Lifting
128
Behavior Evolution and Hierarchy Learning in
BBSExperiment Multi-Robot Object Lifting
129
Behavior Evolution and Hierarchy Learning in
BBSExperiment Multi-Robot Object Lifting
130
Behavior Evolution and Hierarchy Learning in
BBSExperiment Multi-Robot Object Lifting
131
Conclusions, Ongoing Research, and Future Work
  • A rather complete and mathematical
    investigation on automatic designing of
    behavior-based systems
  • Structure Learning
  • Behavior Learning
  • Concurrent Behavior and Structure Learning
  • Behavior Evolution and Structure Learning
  • Memetical Bias
  • Good results in two different domain
  • Multi-robot Object Lifting
  • An Abstract Problem

132
Conclusions, Ongoing Research, and Future Work
  • However, there are many steps remained for fully
    automated agent design
  • Extending to Multi-Step Formulation
  • How should we generate new behaviors without even
    knowing which sensory information is necessary
    for the task (feature selection)
  • Applying structure learning methods to more
    general architectures, e.g. MaxQ.
  • Problem of Reinforcement Signal Design
  • Designing a good reinforcement signal is not easy
    at all.

133
Questions?!
Write a Comment
User Comments (0)
About PowerShow.com