Trust Region Policy Optimization (TRPO)

Paper

Trust Region Policy Optimization [1]

Framework(s)

../_images/pytorch.png

PyTorch

../_images/tf.png

TensorFlow

API Reference

garage.torch.algos.TRPO

garage.tf.algos.TRPO

Code

garage/torch/algos/trpo.py

garage/tf/algos/trpo.py

Examples

examples

Trust Region Policy Optimization, or TRPO, is a policy gradient algorithm that builds on REINFORCE/VPG to improve performance. It introduces a KL constraint that prevents incremental policy updates from deviating excessively from the current policy, and instead mandates that it remains within a specified trust region. The TRPO paper is available here. Also, please see Spinning Up’s write up for a detailed description of the inner workings of the algorithm.

Examples

TF

Pytorch

References

1

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. arXiv, 2015. arXiv:1502.05477.


This page was authored by Mishari Aliesa (@maliesa96).