Probablistic Embeddings for Actor-Critic Reinforcement Learning (PEARL)¶

Paper	Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [1]
Framework(s)	PyTorch¶
API Reference	garage.torch.algos.PEARL
Code	garage/torch/algos/pearl.py
Examples	pearl_half_cheetah_vel, pearl_metaworld_ml1_push, pearl_metaworld_ml10, pearl_metaworld_ml45

PEARL, which stands for Probablistic Embeddings for Actor-Critic Reinforcement Learning, is an off-policy meta-RL algorithm. It is built on top of SAC using two Q-functions and a value function with an addition of an inference network that estimates the posterior 𝑞(𝑧‖𝑐). The policy is conditioned on the latent variable Z in order to adpat its behavior to specific tasks.

Default Parameters¶

batch_size=256,
embedding_batch_size=100,
embedding_mini_batch_size=100,
encoder_hidden_size=200,
latent_size=5,
max_episode_length=200,
meta_batch_size=16,
net_size=300,
num_epochs=500,
num_train_tasks=100,
num_test_tasks=30,
num_steps_per_epoch=2000,
num_initial_steps=2000,
num_tasks_sample=5,
num_steps_prior=400,
num_extra_rl_steps_posterior=600,
reward_scale=5.

Examples¶

pearl_half_cheetah_vel¶

pearl_metaworld_ml1_push¶

pearl_metaworld_ml10¶

pearl_metaworld_ml45¶

References¶

1: Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, and Sergey Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables. arXiv preprint arXiv:1903.08254, 2019.

This page was authored by Iris Liu (@irisliucy).