RL2¶
Paper |
RL2 : Fast Reinforcement Learning via Slow Reinforcement Learning [1] |
Framework(s) |
![]() TensorFlow¶ |
API Reference |
|
Code |
When sampling for RL2, there are more than one environments to be sampled from. In the original implementation, within each trial, all episodes sampled will be concatenated into one single episode, and fed to the inner algorithm. Thus, returns and advantages are calculated across the episode.
RL2PPO¶
Proximal Policy Optimization specific for RL2. Below are some examples of running RL2 in different environments.
rl2_ppo_halfcheetah¶
rl2_ppo_metaworld_ml10¶
rl2_ppo_halfcheetah_meta_test¶
RL2TRPO¶
Trust Region Policy Optimization specific for RL2.