Deep Deterministic Policy Gradient (DDPG)¶
Paper |
Continuous control with deep reinforcement learning [1] |
|
Framework(s) |
||
API Reference |
||
Code |
||
Examples |
DDPG, also known as Deep Deterministic Policy Gradient, uses actor-critic method to optimize the policy and reward prediction. It uses a supervised method to update the critic network and policy gradient to update the actor network. And there are exploration strategy, replay buffer and target networks involved to stabilize the training process.