Run Experiments¶
In garage, experiments are described using Python files we call “experiment launchers.” There is nothing unusual about how experiment launchers are evaluated, and we recommend making use of off-the-shelf python libraries for common tasks such as command line argument parsing, experiment configuration, or remote execution.
All experiment launchers eventually call a function wrapped with a decorator
called wrap_experiment
, which defines the scope of an experiment, and
handles common tasks like setting up a log directory for the results of the
experiment.
Within the decorated experiment function, experiment launchers then construct the important objects involved in running an experiment, such as the following:
The
trainer
, which sets up important state (such as a TensorFlow Session) for running the algorithm in the experiment.The
environment
object, which is the environment in which reinforcement learning is being done.The
policy
object, which is trained to optimize for maximal reward in theenvironment
.The
sampler
object, which make samples for thealgorithm
when training.The
algorithm
, which trains thepolicy
.
Finally, the launcher calls trainer.setup
and trainer.train
which co-ordinate running the algorithm.
The garage repository contains several example experiment launchers. A fairly
simple one, examples/tf/trpo_cartpole.py
, is also pasted below:
Running the above should produce output like:
...
2020-05-11 14:13:05 | [trpo_cartpole] Logging to /home/kr/garage/data/local/experiment/trpo_cartpole_1
2020-05-11 14:13:05 | [trpo_cartpole] Setting seed to 1
2020-05-11 14:13:06 | [trpo_cartpole] Obtaining samples...
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Obtaining samples for iteration 0...
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Logging diagnostics...
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Optimizing policy...
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Computing loss before
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Computing KL before
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Optimizing
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | Start CG optimization: #parameters: 1282, #inputs: 201, #subsample_inputs: 201
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | computing loss before
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | computing gradient
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | gradient computed
2020-05-11 14:13:06 | [trpo_cartpole] epoch #0 | computing descent direction
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | descent direction computed
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | backtrack iters: 4
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | optimization finished
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | Computing KL after
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | Computing loss after
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | Fitting baseline...
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | Saving snapshot...
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | Saved
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | Time 1.25 s
2020-05-11 14:13:07 | [trpo_cartpole] epoch #0 | EpochTime 1.25 s
--------------------------------------- --------------
Entropy 0.690996
EnvExecTime 0.0628054
Evaluation/AverageDiscountedReturn 17.8993
Evaluation/AverageReturn 20.1095
Evaluation/TerminationRate 1
Evaluation/Iteration 0
Evaluation/MaxReturn 61
Evaluation/MinReturn 9
Evaluation/NumEpisodes 201
Evaluation/StdReturn 10.0935
Extras/EpisodeRewardMean 20.43
LinearFeatureBaseline/ExplainedVariance -2.65605e-08
Perplexity 1.9957
PolicyExecTime 0.430455
ProcessExecTime 0.0215859
TotalEnvSteps 4042
policy/Entropy 0.687919
policy/KL 0.0051155
policy/KLBefore 0
policy/LossAfter -0.0077831
policy/LossBefore -3.77624e-07
policy/dLoss 0.00778273
--------------------------------------- --------------
Note that the wrap_experiment
wrapped function still acts like a normal function, but requires all arguments to be passed by keyword. The function will automatically allocate an experiment directory based on the name of the wrapped function, and save various files to assist in reproducing the experiment (such as all of the arguments to the wrapped function).
Several arguments can be passed to wrap_experiment
, or passed as a dictionary as the first argument to the wrapped function.
For example, to use a specific log directory, the call to trpo_cartpole()
above can be replaced with trpo_cartpole({log_dir: 'my/log/directory', use_existing_dir: True}, seed=100)
.
For additional details on the other objects used in experiment launchers, we recommend browsing the reference documentation, or using Python’s dynamic documentation tools.
For example:
>>> print(garage.wrap_experiment.__doc__)
Decorate a function to turn it into an ExperimentTemplate.
When invoked, the wrapped function will receive an ExperimentContext, which
will contain the log directory into which the experiment should log
information.
This decorator can be invoked in two differed ways.
Without arguments, like this:
@wrap_experiment
def my_experiment(ctxt, seed, lr=0.5):
...
Or with arguments:
@wrap_experiment(snapshot_mode='all')
def my_experiment(ctxt, seed, lr=0.5):
...
All arguments must be keyword arguments.
Args:
function (callable or None): The experiment function to wrap.
log_dir (str or None): The full log directory to log to. Will be
computed from `name` if omitted.
name (str or None): The name of this experiment template. Will be
filled from the wrapped function's name if omitted.
prefix (str): Directory under data/local in which to place the
experiment directory.
snapshot_mode (str): Policy for which snapshots to keep (or make at
all). Can be either "all" (all iterations will be saved), "last"
(only the last iteration will be saved), "gap" (every snapshot_gap
iterations are saved), or "none" (do not save snapshots).
snapshot_gap (int): Gap between snapshot iterations. Waits this number
of iterations before taking another snapshot.
archive_launch_repo (bool): Whether to save an archive of the
repository containing the launcher script. This is a potentially
expensive operation which is useful for ensuring reproducibility.
name_parameters (str or None): Parameters to insert into the experiment
name. Should be either None (the default), 'all' (all parameters
will be used), or 'passed' (only passed parameters will be used).
The used parameters will be inserted in the order they appear in
the function definition.
use_existing_dir (bool): If true, (re)use the directory for this
experiment, even if it already contains data.
Returns:
callable: The wrapped function.
Running Experiments on GPU / CPU¶
When training on-policy RL algorithms (such as PPO and TRPO) on a low-dimensional (i.e. non-image) environment using a GPU typically results in slower training overall.
However, TensorFlow will default to using a GPU if one is available. This can be changed by setting the CUDA_VISIBLE_DEVICES
environment variable.
export CUDA_VISIBLE_DEVICES=-1 # CPU only
python path/to/my/experiment/launcher.py
When training off-policy RL algorithms (such as DDPG, TD3, SAC, and PEARL), using a GPU generally allows faster training. However, PyTorch won’t use a GPU by default.
In order to enable the GPU for PyTorch, add the following code snippets to the experiment launcher.
import torch
from garage.torch import set_gpu_mode
# ...
if torch.cuda.is_available():
set_gpu_mode(True)
else:
set_gpu_mode(False)
algo.to()
See examples/torch/sac_half_cheetah_batch.py
for a more detailed example.
This page was authored by K.R. Zentner (@krzentner), with contributions from Iris Liu (@irisliucy), Zequn Yu (@zequnyu), Angel Ivan Gonzalez (@gonzaiva), @wyjw, Gitanshu Sardana (@gitanshu), Ryan Julian (@ryanjulian), Jonathon Shen (@jonashen), Gunjan Baid (@gunjanbaid), and Rocky Duan (@dementrock).