Probablistic Embeddings for Actor-Critic Reinforcement Learning (PEARL)¶
Paper |
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [1] |
Framework(s) |
![]() PyTorch¶ |
API Reference |
|
Code |
|
Examples |
pearl_half_cheetah_vel, pearl_metaworld_ml1_push, pearl_metaworld_ml10, pearl_metaworld_ml45 |
PEARL, which stands for Probablistic Embeddings for Actor-Critic Reinforcement Learning, is an off-policy meta-RL algorithm. It is built on top of SAC using two Q-functions and a value function with an addition of an inference network that estimates the posterior 𝑞(𝑧‖𝑐)
. The policy is conditioned on the latent variable Z
in order to adpat its behavior to specific tasks.
Default Parameters¶
batch_size=256,
embedding_batch_size=100,
embedding_mini_batch_size=100,
encoder_hidden_size=200,
latent_size=5,
max_episode_length=200,
meta_batch_size=16,
net_size=300,
num_epochs=500,
num_train_tasks=100,
num_test_tasks=30,
num_steps_per_epoch=2000,
num_initial_steps=2000,
num_tasks_sample=5,
num_steps_prior=400,
num_extra_rl_steps_posterior=600,
reward_scale=5.
Examples¶
pearl_half_cheetah_vel¶
pearl_metaworld_ml1_push¶
pearl_metaworld_ml10¶
pearl_metaworld_ml45¶
References¶
- 1
Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, and Sergey Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254, 2019.
This page was authored by Iris Liu (@irisliucy).