Deep Q Networks (DQN)¶
Playing Atari with Deep Reinforcement Learning. 
Deep Q Networks, or simply DQN, is a staple off-policy method ontop of which many more recent algorithms were developed. It uses a learned Q function to obtain estimates for the values of each state, action pair (S,A), and selects the optimal value by simply taking the argmax of all Q values. The algorithm is most known for learning policies that perform well on a large suite of Atari games.
Supported DQN Variants¶
As of now, both the Tensorflow and PyTorch implementations support Double DQN and Dueling DQN, as well as both combined. Double DQN can be used by simply passing a flag to the algorthim:
algo = DQN(policy=policy, double_q=True, ...)
Dueling DQN can be used by passing in a Q function with the dueling architecture to algorithm. When using the Tensorflow branch, a dueling Q function can be constructed as follows:
qf = DiscreteCNNQFunction(env_spec=env.spec, dueling=True, ...)
See the Pong example below for a full launcher file.
In the Pytorch branch, the architecture belongs to its own class:
qf = DiscreteDuelingCNNQFunction(env_spec=env.spec, ...)
See the Atari launcher below for a full example.
Both the Tensorflow and PyTorch branches of garage contain several DQN examples.
The PyTorch branch provides an Atari launcher (see below) that allows you to easily run experiments from the command line like so:
# train on PongNoFrameskip-v4 for 10M steps python examples/torch/dqn_atari.py Pong --seed 22 --n_steps 10e6
by default, all the hyperparameters stored in the
hyperparams dict at the top of the launcher file, as well as the ones passed in via the command line, are stored in the
variant.json file in the experiment’s directory. You can use this feature to quickly compare hyperparameters between multiple runs.
The training curves for Dueling DQN on the Atari-10M suite are included below.
Dueling DQN on Atari 10M