Multi-Task Trust Region Policy Optimization (MT-TRPO)¶
Paper |
Trust Region Policy Optimization [1], Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning [1] |
Framework(s) |
|
API Reference |
|
Code |
|
Examples |
mttrpo_metaworld_mt1_push, mttrpo_metaworld_mt10, mttrpo_metaworld_mt50 |
Multi-Task Trust Region Policy Optimization (MT-TRPO) is a multi-task RL method that aims to learn TRPO algorithm to maximize the average discounted return across multiple tasks. The algorithm is evaluated on the average performance over training tasks.
Examples¶
mttrpo_metaworld_mt1_push¶
This example is to train TRPO on Multi-Task 1 (MT1) push environment, in which we learn a policy to perform push tasks.
mttrpo_metaworld_mt10¶
This example is to train TRPO on Multi-Task 10 (MT10) environment, in which we learn a policy to perform 10 different manipulation tasks.
mttrpo_metaworld_mt50¶
This example is to train TRPO on Multi-Task 50 (MT50) environment, in which we learn a policy to perform 10 different manipulation tasks.
References¶
- 1
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. arXiv:1910.10897, 2019.
- 2
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. arXiv, 2015. arXiv:1502.05477.
This page was authored by Ruofu Wang (@yeukfu).