Trust Region Policy Optimization (TRPO)¶
Paper |
Trust Region Policy Optimization [1] |
|
Framework(s) |
||
API Reference |
||
Code |
||
Examples |
Trust Region Policy Optimization, or TRPO, is a policy gradient algorithm that builds on REINFORCE/VPG to improve performance. It introduces a KL constraint that prevents incremental policy updates from deviating excessively from the current policy, and instead mandates that it remains within a specified trust region. The TRPO paper is available here. Also, please see Spinning Up’s write up for a detailed description of the inner workings of the algorithm.
References¶
- 1
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. arXiv, 2015. arXiv:1502.05477.
This page was authored by Mishari Aliesa (@maliesa96).