Multi-Task Proximal Policy Optimization (MT-PPO)

Paper

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning [1], Proximal Policy Optimization Algorithms [1]

Framework(s)

../_images/pytorch.png

PyTorch

API Reference

garage.torch.algos.PPO

Code

garage/torch/algos/ppo.py

Examples

mtppo_metaworld_mt1_push, mtppo_metaworld_mt10, mtppo_metaworld_mt50

Multi-Task PPO is a multi-task RL method that aims to learn PPO algorithm to maximize the average discounted return across multiple tasks. The algorithm is evaluated on the average performance over training tasks.

Examples

mtppo_metaworld_mt1_push

This example is to train PPO on Multi-Task 1 (MT1) push environment, in which we learn a policy to perform push tasks.

mtppo_metaworld_mt10

This example is to train PPO on Multi-Task 10 (MT10) environment, in which we learn a policy to perform 10 different manipulation tasks.

mtppo_metaworld_mt50

This example is to train PPO on Multi-Task 50 (MT50) environment, in which we learn a policy to perform 50 different manipulation tasks.

References

1

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. arXiv:1910.10897, 2019.

2

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.


This page was authored by Iris Liu (@irisliucy).