Multi-Task Trust Region Policy Optimization (MT-TRPO)

Paper

Trust Region Policy Optimization [1], Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning [1]

Framework(s)

../_images/pytorch.png

PyTorch

API Reference

garage.torch.algos.TRPO

Code

garage/torch/algos/trpo.py

Examples

mttrpo_metaworld_mt1_push, mttrpo_metaworld_mt10, mttrpo_metaworld_mt50

Multi-Task Trust Region Policy Optimization (MT-TRPO) is a multi-task RL method that aims to learn TRPO algorithm to maximize the average discounted return across multiple tasks. The algorithm is evaluated on the average performance over training tasks.

Examples

mttrpo_metaworld_mt1_push

This example is to train TRPO on Multi-Task 1 (MT1) push environment, in which we learn a policy to perform push tasks.

mttrpo_metaworld_mt10

This example is to train TRPO on Multi-Task 10 (MT10) environment, in which we learn a policy to perform 10 different manipulation tasks.

mttrpo_metaworld_mt50

This example is to train TRPO on Multi-Task 50 (MT50) environment, in which we learn a policy to perform 10 different manipulation tasks.

References

1

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. arXiv:1910.10897, 2019.

2

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. arXiv, 2015. arXiv:1502.05477.


This page was authored by Ruofu Wang (@yeukfu).