Trust Region Policy Optimization (TRPO)¶

Paper	Trust Region Policy Optimization [1]
Framework(s)	PyTorch¶	TensorFlow¶
API Reference	garage.torch.algos.TRPO	garage.tf.algos.TRPO
Code	garage/torch/algos/trpo.py	garage/tf/algos/trpo.py
Examples	examples

Trust Region Policy Optimization, or TRPO, is a policy gradient algorithm that builds on REINFORCE/VPG to improve performance. It introduces a KL constraint that prevents incremental policy updates from deviating excessively from the current policy, and instead mandates that it remains within a specified trust region. The TRPO paper is available here. Also, please see Spinning Up’s write up for a detailed description of the inner workings of the algorithm.

Examples¶

TF¶

Pytorch¶

References¶

1: John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. arXiv, 2015. arXiv:1502.05477.

This page was authored by Mishari Aliesa (@maliesa96).