Probablistic Embeddings for Actor-Critic Reinforcement Learning (PEARL)

Paper

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [1]

Framework(s)

../_images/pytorch.png

PyTorch

API Reference

garage.torch.algos.PEARL

Code

garage/torch/algos/pearl.py

Examples

pearl_half_cheetah_vel, pearl_metaworld_ml1_push, pearl_metaworld_ml10, pearl_metaworld_ml45

PEARL, which stands for Probablistic Embeddings for Actor-Critic Reinforcement Learning, is an off-policy meta-RL algorithm. It is built on top of SAC using two Q-functions and a value function with an addition of an inference network that estimates the posterior 𝑞(𝑧‖𝑐). The policy is conditioned on the latent variable Z in order to adpat its behavior to specific tasks.

Default Parameters

batch_size=256,
embedding_batch_size=100,
embedding_mini_batch_size=100,
encoder_hidden_size=200,
latent_size=5,
max_episode_length=200,
meta_batch_size=16,
net_size=300,
num_epochs=500,
num_train_tasks=100,
num_test_tasks=30,
num_steps_per_epoch=2000,
num_initial_steps=2000,
num_tasks_sample=5,
num_steps_prior=400,
num_extra_rl_steps_posterior=600,
reward_scale=5.

Examples

pearl_half_cheetah_vel

pearl_metaworld_ml1_push

pearl_metaworld_ml10

pearl_metaworld_ml45

References

1

Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, and Sergey Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254, 2019.


This page was authored by Iris Liu (@irisliucy).