Deep Deterministic Policy Gradient (DDPG)¶

Paper	Continuous control with deep reinforcement learning [1]
Framework(s)	PyTorch¶	TensorFlow¶
API Reference	garage.torch.algos.DDPG	garage.tf.algos.DDPG
Code	garage/torch/algos/ddpg.py	garage/tf/algos/ddpg.py
Examples	torch/ddpg_pendulum	tf/ddpg_pendulum

DDPG, also known as Deep Deterministic Policy Gradient, uses actor-critic method to optimize the policy and reward prediction. It uses a supervised method to update the critic network and policy gradient to update the actor network. And there are exploration strategy, replay buffer and target networks involved to stabilize the training process.

Examples¶

Garage has implementations of DDPG with PyTorch and TensorFlow.

PyTorch¶

TensorFlow¶

References¶

1: Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.

This page was authored by Ruofu Wang (@yeukfu).