RL²¶

Paper	RL² : Fast Reinforcement Learning via Slow Reinforcement Learning [1]
Framework(s)	TensorFlow¶
API Reference	garage.tf.algos.RL2
Code	garage/tf/algos/rl2.py

When sampling for RL², there are more than one environments to be sampled from. In the original implementation, within each trial, all episodes sampled will be concatenated into one single episode, and fed to the inner algorithm. Thus, returns and advantages are calculated across the episode.

RL²PPO¶

Proximal Policy Optimization specific for RL². Below are some examples of running RL² in different environments.

rl2_ppo_halfcheetah¶

rl2_ppo_metaworld_ml10¶

rl2_ppo_halfcheetah_meta_test¶

RL²TRPO¶

Trust Region Policy Optimization specific for RL².

rl2_trpo_halfcheetah¶

References¶

1: Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. Rl $ˆ 2$: fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.

This page was authored by Ruofu Wang (@yeukfu).