Multi-Task Proximal Policy Optimization (MT-PPO)¶
Multi-Task PPO is a multi-task RL method that aims to learn PPO algorithm to maximize the average discounted return across multiple tasks. The algorithm is evaluated on the average performance over training tasks.
This example is to train PPO on Multi-Task 1 (MT1) push environment, in which we learn a policy to perform push tasks.
This example is to train PPO on Multi-Task 10 (MT10) environment, in which we learn a policy to perform 10 different manipulation tasks.
This example is to train PPO on Multi-Task 50 (MT50) environment, in which we learn a policy to perform 50 different manipulation tasks.
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. arXiv:1910.10897, 2019.
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
This page was authored by Iris Liu (@irisliucy).