Deep Deterministic Policy Gradient (DDPG)

Paper

Continuous control with deep reinforcement learning [1]

Framework(s)

../_images/pytorch.png

PyTorch

../_images/tf.png

TensorFlow

API Reference

garage.torch.algos.DDPG

garage.tf.algos.DDPG

Code

garage/torch/algos/ddpg.py

garage/tf/algos/ddpg.py

Examples

torch/ddpg_pendulum

tf/ddpg_pendulum

DDPG, also known as Deep Deterministic Policy Gradient, uses actor-critic method to optimize the policy and reward prediction. It uses a supervised method to update the critic network and policy gradient to update the actor network. And there are exploration strategy, replay buffer and target networks involved to stabilize the training process.

Examples

Garage has implementations of DDPG with PyTorch and TensorFlow.

PyTorch

TensorFlow

References

1

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.


This page was authored by Ruofu Wang (@yeukfu).