Convergence Analysis of Reinforcement Learning Agents - PowerPoint PPT Presentation

About This Presentation
Title:

Convergence Analysis of Reinforcement Learning Agents

Description:

Players use stochastic strategies. Players only observe their reward. ... Simulations of stochastic algorithm and deterministic dynamics converge as expected. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 7
Provided by: sriniva4
Learn more at: http://web.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Convergence Analysis of Reinforcement Learning Agents


1
Convergence Analysis of Reinforcement Learning
Agents
  • Srinivas Turaga
  • 9.912
  • 30th March, 2004

2
The Learning Algorithm
The Assumptions
  • Players use stochastic strategies.
  • Players only observe their reward.
  • Players attempt to estimate the value of choosing
    a particular action.

The Algorithm
  • Play action i with probability Pr(i)
  • Observe reward r
  • Update value function v

3
The Learning Algorithm
The Algorithm
Value of action i
  • Play action i with probability Pr(i)
  • Proportional to value of action i
  • Observe reward r
  • Depends on other players choice j also
  • Update value function v
  • 2 simple schemes

Algorithm 1
Algorithm 2
If action i chosen
If action i not chosen
forgetting
no forgetting
4
Analysis Techniques
  • Analysis of stochastic dynamics is hard!
  • So approximate
  • Consider average case (deterministic)
  • Consider continuous time (differential equation)

Random! Discrete time!
Deterministic! Discrete time!
Deterministic! Continuous time!
5
Results - Matching Pennies Game
  • Analysis shows a stable fixed point corresponding
    to matching behavior.
  • Simulations of stochastic algorithm and
    deterministic dynamics converge as expected.
  • Analysis shows a fixed point corresponding to the
    Nash equilibrium. Linear stability analysis shows
    marginal stability.
  • Simulations of stochastic algorithm and
    deterministic dynamics diverge to corners.

6
Future Directions
  • Validate approximation technique.
  • Analyze properties of more general reinforcement
    learners.
  • Consider situations with asymmetric learning
    rates.
  • Study behavior of algorithms for arbitrary payoff
    matrices.
Write a Comment
User Comments (0)
About PowerShow.com