RL2

Paper

RL2 : Fast Reinforcement Learning via Slow Reinforcement Learning [1]

Framework(s)

../_images/tf.png

TensorFlow

API Reference

garage.tf.algos.RL2

Code

garage/tf/algos/rl2.py

When sampling for RL2, there are more than one environments to be sampled from. In the original implementation, within each trial, all episodes sampled will be concatenated into one single episode, and fed to the inner algorithm. Thus, returns and advantages are calculated across the episode.

RL2PPO

Proximal Policy Optimization specific for RL2. Below are some examples of running RL2 in different environments.

rl2_ppo_halfcheetah

rl2_ppo_metaworld_ml10

rl2_ppo_halfcheetah_meta_test

RL2TRPO

Trust Region Policy Optimization specific for RL2.

rl2_trpo_halfcheetah

References

1

Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. Rl $ˆ 2$: fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.


This page was authored by Ruofu Wang (@yeukfu).