Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion - PowerPoint PPT Presentation

About This Presentation
Title:

Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

Description:

Planning footsteps for a quadruped robot over challenging, irregular, previously ... Hierarchical Reinforcement Learning: Parr and Russell (1998), Sutton et. al ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 2
Provided by: morganq
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion


1
Hierarchical Apprenticeship Learning with
Application to Quadruped Locomotion
S T A N F O R D
S T A N F O R D
J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng
  • 1. Motivating Application
  • Planning footsteps for a quadruped robot over
    challenging, irregular, previously unseen terrain
  • Good footsteps need to properly trade off several
    features slope, proximity to drop-offs,
    stability of robots pose, etc.
  • Highly non-trivial to hand-specify the reward
    function for a planner, which requires manually
    determining relative weights for all features
  • 2. Apprenticeship Learning Background
  • Key idea of Apprenticeship Learning often easier
    to demonstrate good behavior than to specify a
    reward that induces this behavior
  • Two factors make Apprenticeship Learning hard to
    apply to large, complex problems such as
    quadruped planning
  • Very difficult, even for a domain expert, to
    specify a good complete path (e.g., a full set of
    footsteps across terrain)
  • Even given a reward function, planning (e.g.
    finding a complete set of a footsteps) is a hard,
    high-dimensional, task

Goal
Initial Position
3. Hierarchical Apprenticeship Learning Main Idea
  • 4. Convex Formulation
  • Two assumptions on the reward function
  • Reward is linear in state features
  • High level rewards are averages of low level
    rewards
  • High-level demonstrations imply constraints on
    value function
  • Low-level demonstrations imply constraints on
    reward function
  • Can combine high and low-level constraints (plus
    adding slack variables) to form a single,
    unified, convex optimization problem
  • Demonstrate good behavior at each level separately
  • Decompose planning task into multiple levels of
    abstraction

High level demonstrationDemonstrate body path
across terrain
Low level demonstrationGreedy local footsteps
at a few key locations
Footstep specified by teacher
Goal
Initial Position
Step 1 High level Plan path for center of robot
body
Current foot positions
Goal
Initial Position
Step 2 Low level Plan footsteps along body path
Easier to demonstrate greedy actions than
long-term optimal actions
Easier to specify a path in the reduced, abstract
state space than in the full state space
5. Experimental Results
Quadruped Robot
Multi-room Grid World
  • 10x10 rooms connected by doors, where each room
    is a 10x10 grid world
  • High level demonstration shows only room-to-room
    path (using true reward function)
  • Low-level demonstration shows only local greedy
    action at grid level
  • Evaluated algorithm on easier terrain for
    training, and harder terrain for testing
  • On training terrain, demonstrated a single
    high-level body path and 20 greedy low-level foot
    placements (10 minutes to gather all data)
  • System achieves state-of-the-art performance on
    this task

Planned Footsteps
  • 6. Related Work
  • Apprenticeship Learning Abbeel and Ng (2004),
    Ratliff et. al (2006, 2007), Neu and Szepesvari
    (2007), Syed and Schapire (2007)
  • Hierarchical Reinforcement Learning Parr and
    Russell (1998), Sutton et. al (1999), Dietterich
    (2000), Barto and Mahadevan (2003)
  • 7. Conclusion
  • Presented a novel algorithm for applying
    apprenticeship learning to large, complex domains
    via hierarchical decomposition
  • Demonstrated algorithm on multi-room grid world
    and challenging quadruped task, where we achieve
    state-of the-art performance
  • More generally, algorithm is applicable whenever
    reward function can be hierarchically decomposed
    as described above

Training Terrain
Testing Terrain
No Planning
Hierarchical Apprenticeship Learning
High Level (Body Path) Constraints Only
Low Level (Footstep) Constraints Only
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAA
Write a Comment
User Comments (0)
About PowerShow.com